Rita Cucchiara

11

Papers

48

Total Citations

Papers (11)

Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation

Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering

Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas

Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval

Diffusion Transformers for Tabular Data Time Series Generation

What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models

Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction

Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation

Zero-Shot Styled Text Image Generation, but Make It Autoregressive

MissRAG: Addressing the Missing Modality Challenge in Multimodal Large Language Models

Hyperbolic Safety-Aware Vision-Language Models