Cordelia Schmid
21
Papers
95
Total Citations
Papers (21)
Learning Correlation Structures for Vision Transformers
CVPR 2024
25
citations
Language-Guided Image Tokenization for Generation
CVPR 2025arXiv
23
citations
DataDream: Few-shot Guided Dataset Generation
ECCV 2024
23
citations
Flexible Frame Selection for Efficient Video Reasoning
CVPR 2025
10
citations
FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement
CVPR 2025
7
citations
A Generative Approach for Wikipedia-Scale Visual Entity Recognition
CVPR 2024
7
citations
CoVR: Learning Composed Video Retrieval from Web Video Captions
AAAI 2024
0
citations
Streaming Dense Video Captioning
CVPR 2024
0
citations
End-to-End Spatio-Temporal Action Localisation with Video Transformers
CVPR 2024
0
citations
Dense Optical Tracking: Connecting the Dots
CVPR 2024
0
citations
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
CVPR 2024
0
citations
SUGAR: Pre-training 3D Visual Representations for Robotics
CVPR 2024
0
citations
Pixel-Aligned Language Model
CVPR 2024
0
citations
Time- Memory- and Parameter-Efficient Visual Adaptation
CVPR 2024
0
citations
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models
CVPR 2025
0
citations
SceneCraft: An LLM Agent for Synthesizing 3D Scenes as Blender Code
ICML 2024
0
citations
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
CVPR 2025
0
citations
Visual Lexicon: Rich Image Features in Language Space
CVPR 2025
0
citations
MINERVA: Evaluating Complex Video Reasoning
ICCV 2025
0
citations
Large-scale Pre-training for Grounded Video Caption Generation
ICCV 2025
0
citations
HORT: Monocular Hand-held Objects Reconstruction with Transformers
ICCV 2025
0
citations