Kristen Grauman
86
Papers
106
Total Citations
Papers (86)
Learning Object State Changes in Videos: An Open-World Perspective
CVPR 2024
33
citations
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
ECCV 2024
19
citations
ExpertAF: Expert Actionable Feedback from Video
CVPR 2025
11
citations
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
CVPR 2024
11
citations
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
CVPR 2024
8
citations
Detours for Navigating Instructional Videos
CVPR 2024
7
citations
Progress-Aware Video Frame Captioning
CVPR 2025
7
citations
When Thinking Drifts: Evidential Grounding for Robust Video Reasoning
NeurIPS 2025
4
citations
Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos
CVPR 2025arXiv
3
citations
FIction: 4D Future Interaction Prediction from Video
CVPR 2025
3
citations
Seeing Invisible Poses: Estimating 3D Body Pose From Egocentric Video
CVPR 2017arXiv
0
citations
Detangling People: Individuating Multiple Close People and Their Body Parts via Region Assembly
CVPR 2017arXiv
0
citations
Making 360deg Video Watchable in 2D: Learning Videography for Click Free Viewing
CVPR 2017
0
citations
Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks
CVPR 2018arXiv
0
citations
Compare and Contrast: Learning Prominent Visual Differences
CVPR 2018arXiv
0
citations
VizWiz Grand Challenge: Answering Visual Questions From Blind People
CVPR 2018arXiv
0
citations
Im2Flow: Motion Hallucination From Static Images for Action Recognition
CVPR 2018arXiv
0
citations
Creating Capsule Wardrobes From Fashion Images
CVPR 2018arXiv
0
citations
Learning Compressible 360° Video Isomers
CVPR 2018
0
citations
BlockDrop: Dynamic Inference Paths in Residual Networks
CVPR 2018arXiv
0
citations
2.5D Visual Sound
CVPR 2019
0
citations
Thinking Outside the Pool: Active Training Image Creation for Relative Attributes
CVPR 2019
0
citations
Less Is More: Learning Highlight Detection From Video Duration
CVPR 2019
0
citations
Extreme Relative Pose Estimation for RGB-D Scans via Scene Completion
CVPR 2019
0
citations
SpotTune: Transfer Learning Through Adaptive Fine-Tuning
CVPR 2019
0
citations
Kernel Transformer Networks for Compact Spherical Convolution
CVPR 2019
0
citations
Ego-Topo: Environment Affordances From Egocentric Video
CVPR 2020
0
citations
ViBE: Dressing for Diverse Body Shapes
CVPR 2020arXiv
0
citations
You2Me: Inferring Body Pose in Egocentric Video via First and Second Person Interactions
CVPR 2020arXiv
0
citations
Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias
CVPR 2020
0
citations
Listen to Look: Action Recognition by Previewing Audio
CVPR 2020arXiv
0
citations
From Paris to Berlin: Discovering Fashion Style Influences Around the World
CVPR 2020
0
citations
Ego-Exo: Transferring Visual Representations From Third-Person to First-Person Videos
CVPR 2021
0
citations
Semantic Audio-Visual Navigation
CVPR 2021
0
citations
Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback
CVPR 2021arXiv
0
citations
VisualVoice: Audio-Visual Speech Separation With Cross-Modal Consistency
CVPR 2021arXiv
0
citations
PONI: Potential Functions for ObjectGoal Navigation With Interaction-Free Learning
CVPR 2022
0
citations
Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation
CVPR 2022
0
citations
Visual Acoustic Matching
CVPR 2022arXiv
0
citations
Ego4D: Around the World in 3,000 Hours of Egocentric Video
CVPR 2022
0
citations
Chat2Map: Efficient Scene Mapping From Multi-Ego Conversations
CVPR 2023arXiv
0
citations
Novel-View Acoustic Synthesis
CVPR 2023arXiv
0
citations
NaQ: Leveraging Narrations As Queries To Supervise Episodic Memory
CVPR 2023arXiv
0
citations
HierVL: Learning Hierarchical Video-Language Embeddings
CVPR 2023arXiv
0
citations
Learning Image Representations Tied to Ego-Motion
ICCV 2015
0
citations
Just Noticeable Differences in Visual Attributes
ICCV 2015
0
citations
Fashion Forward: Forecasting Visual Style in Fashion
ICCV 2017arXiv
0
citations
On-Demand Learning for Deep Image Restoration
ICCV 2017arXiv
0
citations
Learning the Latent "Look": Unsupervised Discovery of a Style-Coherent Embedding From Fashion Images
ICCV 2017arXiv
0
citations
Semantic Jitter: Dense Supervision for Visual Comparisons via Synthetic Images
ICCV 2017arXiv
0
citations
Co-Separating Sounds of Visual Objects
ICCV 2019
0
citations
Fashion++: Minimal Edits for Outfit Improvement
ICCV 2019
0
citations
Grounded Human-Object Interaction Hotspots From Video
ICCV 2019
0
citations
From Culture to Clothing: Discovering the World Events Behind a Century of Fashion Images
ICCV 2021arXiv
0
citations
Move2Hear: Active Audio-Visual Source Separation
ICCV 2021
0
citations
Multiview Pseudo-Labeling for Semi-Supervised Learning From Video
ICCV 2021arXiv
0
citations
Audio-Visual Floorplan Reconstruction
ICCV 2021
0
citations
Anticipative Video Transformer
ICCV 2021arXiv
0
citations
Occupancy Anticipation for Efficient Exploration and Navigation
ECCV 2020
0
citations
SoundSpaces: Audio-Visual Navigation in 3D Environments
ECCV 2020
0
citations
VisualEchoes: Spatial Image Representation Learning through Echolocation
ECCV 2020
0
citations
Proposal-based Video Completion
ECCV 2020
0
citations
Egocentric Activity Recognition and Localization on a 3D Map
ECCV 2022
0
citations
Active Audio-Visual Separation of Dynamic Sound Sources
ECCV 2022
0
citations
Learning Spherical Convolution for Fast Features from 360° Imagery
NeurIPS 2017arXiv
0
citations
Egocentric Video Task Translation
CVPR 2023
0
citations
Viewpoint Rosetta Stone: Unlocking Unpaired Ego-Exo Videos for View-invariant Representation Learning
CVPR 2025
0
citations
Switch-a-View: View Selection Learned from Unlabeled In-the-wild Videos
ICCV 2025
0
citations
Learning Skill-Attributes for Transferable Assessment in Video
NeurIPS 2025
0
citations
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
CVPR 2024
0
citations
Pull the Plug? Predicting If Computers or Humans Should Segment Images
CVPR 2016
0
citations
Summary Transfer: Exemplar-Based Subset Selection for Video Summarization
CVPR 2016
0
citations
Active Image Segmentation Propagation
CVPR 2016
0
citations
Slow and Steady Feature Analysis: Higher Order Temporal Coherence in Video
CVPR 2016
0
citations
FusionSeg: Learning to Combine Motion and Appearance for Fully Automatic Segmentation of Generic Objects in Videos
CVPR 2017arXiv
0
citations
Learning Affordance Landscapes for Interaction Exploration in 3D Environments
NeurIPS 2020
0
citations
Shaping embodied agent behavior with activity-context priors from egocentric video
NeurIPS 2021
0
citations
Few-Shot Audio-Visual Learning of Environment Acoustics
NeurIPS 2022
0
citations
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
NeurIPS 2022
0
citations
Single-Stage Visual Query Localization in Egocentric Videos
NeurIPS 2023
0
citations
Self-Supervised Visual Acoustic Matching
NeurIPS 2023
0
citations
EgoDistill: Egocentric Head Motion Distillation for Efficient Video Understanding
NeurIPS 2023
0
citations
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment
NeurIPS 2023
0
citations
EgoEnv: Human-centric environment representations from egocentric video
NeurIPS 2023
0
citations
Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
NeurIPS 2023
0
citations
EgoTracks: A Long-term Egocentric Visual Object Tracking Dataset
NeurIPS 2023
0
citations