Cordelia Schmid

21
Papers
95
Total Citations

Papers (21)

Learning Correlation Structures for Vision Transformers

CVPR 2024
25
citations

Language-Guided Image Tokenization for Generation

CVPR 2025arXiv
23
citations

DataDream: Few-shot Guided Dataset Generation

ECCV 2024
23
citations

Flexible Frame Selection for Efficient Video Reasoning

CVPR 2025
10
citations

FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement

CVPR 2025
7
citations

A Generative Approach for Wikipedia-Scale Visual Entity Recognition

CVPR 2024
7
citations

CoVR: Learning Composed Video Retrieval from Web Video Captions

AAAI 2024
0
citations

Streaming Dense Video Captioning

CVPR 2024
0
citations

End-to-End Spatio-Temporal Action Localisation with Video Transformers

CVPR 2024
0
citations

Dense Optical Tracking: Connecting the Dots

CVPR 2024
0
citations

MoReVQA: Exploring Modular Reasoning Models for Video Question Answering

CVPR 2024
0
citations

SUGAR: Pre-training 3D Visual Representations for Robotics

CVPR 2024
0
citations

Pixel-Aligned Language Model

CVPR 2024
0
citations

Time- Memory- and Parameter-Efficient Visual Adaptation

CVPR 2024
0
citations

InteractVLM: 3D Interaction Reasoning from 2D Foundational Models

CVPR 2025
0
citations

SceneCraft: An LLM Agent for Synthesizing 3D Scenes as Blender Code

ICML 2024
0
citations

Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs

CVPR 2025
0
citations

Visual Lexicon: Rich Image Features in Language Space

CVPR 2025
0
citations

MINERVA: Evaluating Complex Video Reasoning

ICCV 2025
0
citations

Large-scale Pre-training for Grounded Video Caption Generation

ICCV 2025
0
citations

HORT: Monocular Hand-held Objects Reconstruction with Transformers

ICCV 2025
0
citations