Kristen Grauman

57

Papers

151

Total Citations

Papers (57)

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

NeurIPS 2025arXiv

Learning Object State Changes in Videos: An Open-World Perspective

Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

ExpertAF: Expert Actionable Feedback from Video

SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos

Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos

Progress-Aware Video Frame Captioning

Detours for Navigating Instructional Videos

Seeing the Arrow of Time in Large Multimodal Models

NeurIPS 2025arXiv

When Thinking Drifts: Evidential Grounding for Robust Video Reasoning

NeurIPS 2025arXiv

Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos

FIction: 4D Future Interaction Prediction from Video

Viewpoint Rosetta Stone: Unlocking Unpaired Ego-Exo Videos for View-invariant Representation Learning

Switch-a-View: View Selection Learned from Unlabeled In-the-wild Videos

Learning Skill-Attributes for Transferable Assessment in Video

NeurIPS 2025arXiv

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Ego-Topo: Environment Affordances From Egocentric Video

ViBE: Dressing for Diverse Body Shapes

You2Me: Inferring Body Pose in Egocentric Video via First and Second Person Interactions

Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias

Listen to Look: Action Recognition by Previewing Audio

From Paris to Berlin: Discovering Fashion Style Influences Around the World

Ego-Exo: Transferring Visual Representations From Third-Person to First-Person Videos

Semantic Audio-Visual Navigation

Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback

VisualVoice: Audio-Visual Speech Separation With Cross-Modal Consistency

PONI: Potential Functions for ObjectGoal Navigation With Interaction-Free Learning

Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation

Visual Acoustic Matching

Ego4D: Around the World in 3,000 Hours of Egocentric Video

Chat2Map: Efficient Scene Mapping From Multi-Ego Conversations

Novel-View Acoustic Synthesis

NaQ: Leveraging Narrations As Queries To Supervise Episodic Memory

HierVL: Learning Hierarchical Video-Language Embeddings

Egocentric Video Task Translation

From Culture to Clothing: Discovering the World Events Behind a Century of Fashion Images

Move2Hear: Active Audio-Visual Source Separation

Multiview Pseudo-Labeling for Semi-Supervised Learning From Video

Audio-Visual Floorplan Reconstruction

Anticipative Video Transformer

Occupancy Anticipation for Efficient Exploration and Navigation

SoundSpaces: Audio-Visual Navigation in 3D Environments

VisualEchoes: Spatial Image Representation Learning through Echolocation

Proposal-based Video Completion

Egocentric Activity Recognition and Localization on a 3D Map

Active Audio-Visual Separation of Dynamic Sound Sources

Learning Affordance Landscapes for Interaction Exploration in 3D Environments

NeurIPS 2020arXiv

Shaping embodied agent behavior with activity-context priors from egocentric video

NeurIPS 2021arXiv

Few-Shot Audio-Visual Learning of Environment Acoustics

NeurIPS 2022arXiv

SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning

NeurIPS 2022arXiv

Single-Stage Visual Query Localization in Egocentric Videos

NeurIPS 2023arXiv

Self-Supervised Visual Acoustic Matching

NeurIPS 2023arXiv

EgoDistill: Egocentric Head Motion Distillation for Efficient Video Understanding

NeurIPS 2023arXiv

Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment

NeurIPS 2023arXiv

EgoEnv: Human-centric environment representations from egocentric video

NeurIPS 2023arXiv

Video-Mined Task Graphs for Keystep Recognition in Instructional Videos

NeurIPS 2023arXiv

EgoTracks: A Long-term Egocentric Visual Object Tracking Dataset

NeurIPS 2023arXiv