Kristen Grauman

57
Papers
151
Total Citations

Papers (57)

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

NeurIPS 2025arXiv
40
citations

Learning Object State Changes in Videos: An Open-World Perspective

CVPR 2024arXiv
33
citations

Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

ECCV 2024arXiv
19
citations

ExpertAF: Expert Actionable Feedback from Video

CVPR 2025arXiv
11
citations

SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos

CVPR 2024arXiv
11
citations

Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos

CVPR 2024arXiv
8
citations

Progress-Aware Video Frame Captioning

CVPR 2025arXiv
7
citations

Detours for Navigating Instructional Videos

CVPR 2024arXiv
7
citations

Seeing the Arrow of Time in Large Multimodal Models

NeurIPS 2025arXiv
5
citations

When Thinking Drifts: Evidential Grounding for Robust Video Reasoning

NeurIPS 2025arXiv
4
citations

Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos

CVPR 2025arXiv
3
citations

FIction: 4D Future Interaction Prediction from Video

CVPR 2025arXiv
3
citations

Viewpoint Rosetta Stone: Unlocking Unpaired Ego-Exo Videos for View-invariant Representation Learning

CVPR 2025
0
citations

Switch-a-View: View Selection Learned from Unlabeled In-the-wild Videos

ICCV 2025arXiv
0
citations

Learning Skill-Attributes for Transferable Assessment in Video

NeurIPS 2025arXiv
0
citations

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

CVPR 2024arXiv
0
citations

Ego-Topo: Environment Affordances From Egocentric Video

CVPR 2020
0
citations

ViBE: Dressing for Diverse Body Shapes

CVPR 2020arXiv
0
citations

You2Me: Inferring Body Pose in Egocentric Video via First and Second Person Interactions

CVPR 2020arXiv
0
citations

Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias

CVPR 2020
0
citations

Listen to Look: Action Recognition by Previewing Audio

CVPR 2020arXiv
0
citations

From Paris to Berlin: Discovering Fashion Style Influences Around the World

CVPR 2020
0
citations

Ego-Exo: Transferring Visual Representations From Third-Person to First-Person Videos

CVPR 2021
0
citations

Semantic Audio-Visual Navigation

CVPR 2021
0
citations

Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback

CVPR 2021arXiv
0
citations

VisualVoice: Audio-Visual Speech Separation With Cross-Modal Consistency

CVPR 2021arXiv
0
citations

PONI: Potential Functions for ObjectGoal Navigation With Interaction-Free Learning

CVPR 2022
0
citations

Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation

CVPR 2022
0
citations

Visual Acoustic Matching

CVPR 2022arXiv
0
citations

Ego4D: Around the World in 3,000 Hours of Egocentric Video

CVPR 2022
0
citations

Chat2Map: Efficient Scene Mapping From Multi-Ego Conversations

CVPR 2023arXiv
0
citations

Novel-View Acoustic Synthesis

CVPR 2023arXiv
0
citations

NaQ: Leveraging Narrations As Queries To Supervise Episodic Memory

CVPR 2023arXiv
0
citations

HierVL: Learning Hierarchical Video-Language Embeddings

CVPR 2023arXiv
0
citations

Egocentric Video Task Translation

CVPR 2023
0
citations

From Culture to Clothing: Discovering the World Events Behind a Century of Fashion Images

ICCV 2021arXiv
0
citations

Move2Hear: Active Audio-Visual Source Separation

ICCV 2021
0
citations

Multiview Pseudo-Labeling for Semi-Supervised Learning From Video

ICCV 2021arXiv
0
citations

Audio-Visual Floorplan Reconstruction

ICCV 2021
0
citations

Anticipative Video Transformer

ICCV 2021arXiv
0
citations

Occupancy Anticipation for Efficient Exploration and Navigation

ECCV 2020
0
citations

SoundSpaces: Audio-Visual Navigation in 3D Environments

ECCV 2020
0
citations

VisualEchoes: Spatial Image Representation Learning through Echolocation

ECCV 2020
0
citations

Proposal-based Video Completion

ECCV 2020
0
citations

Egocentric Activity Recognition and Localization on a 3D Map

ECCV 2022
0
citations

Active Audio-Visual Separation of Dynamic Sound Sources

ECCV 2022
0
citations

Learning Affordance Landscapes for Interaction Exploration in 3D Environments

NeurIPS 2020arXiv
0
citations

Shaping embodied agent behavior with activity-context priors from egocentric video

NeurIPS 2021arXiv
0
citations

Few-Shot Audio-Visual Learning of Environment Acoustics

NeurIPS 2022arXiv
0
citations

SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning

NeurIPS 2022arXiv
0
citations

Single-Stage Visual Query Localization in Egocentric Videos

NeurIPS 2023arXiv
0
citations

Self-Supervised Visual Acoustic Matching

NeurIPS 2023arXiv
0
citations

EgoDistill: Egocentric Head Motion Distillation for Efficient Video Understanding

NeurIPS 2023arXiv
0
citations

Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment

NeurIPS 2023arXiv
0
citations

EgoEnv: Human-centric environment representations from egocentric video

NeurIPS 2023arXiv
0
citations

Video-Mined Task Graphs for Keystep Recognition in Instructional Videos

NeurIPS 2023arXiv
0
citations

EgoTracks: A Long-term Egocentric Visual Object Tracking Dataset

NeurIPS 2023arXiv
0
citations