Kristen Grauman

86

Papers

106

Total Citations

Papers (86)

Learning Object State Changes in Videos: An Open-World Perspective

Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

ExpertAF: Expert Actionable Feedback from Video

SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos

Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos

Detours for Navigating Instructional Videos

Progress-Aware Video Frame Captioning

When Thinking Drifts: Evidential Grounding for Robust Video Reasoning

Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos

FIction: 4D Future Interaction Prediction from Video

Seeing Invisible Poses: Estimating 3D Body Pose From Egocentric Video

Detangling People: Individuating Multiple Close People and Their Body Parts via Region Assembly

Making 360deg Video Watchable in 2D: Learning Videography for Click Free Viewing

Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks

Compare and Contrast: Learning Prominent Visual Differences

VizWiz Grand Challenge: Answering Visual Questions From Blind People

Im2Flow: Motion Hallucination From Static Images for Action Recognition

Creating Capsule Wardrobes From Fashion Images

Learning Compressible 360° Video Isomers

BlockDrop: Dynamic Inference Paths in Residual Networks

2.5D Visual Sound

Thinking Outside the Pool: Active Training Image Creation for Relative Attributes

Less Is More: Learning Highlight Detection From Video Duration

Extreme Relative Pose Estimation for RGB-D Scans via Scene Completion

SpotTune: Transfer Learning Through Adaptive Fine-Tuning

Kernel Transformer Networks for Compact Spherical Convolution

Ego-Topo: Environment Affordances From Egocentric Video

ViBE: Dressing for Diverse Body Shapes

You2Me: Inferring Body Pose in Egocentric Video via First and Second Person Interactions

Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias

Listen to Look: Action Recognition by Previewing Audio

From Paris to Berlin: Discovering Fashion Style Influences Around the World

Ego-Exo: Transferring Visual Representations From Third-Person to First-Person Videos

Semantic Audio-Visual Navigation

Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback

VisualVoice: Audio-Visual Speech Separation With Cross-Modal Consistency

PONI: Potential Functions for ObjectGoal Navigation With Interaction-Free Learning

Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation

Visual Acoustic Matching

Ego4D: Around the World in 3,000 Hours of Egocentric Video

Chat2Map: Efficient Scene Mapping From Multi-Ego Conversations

Novel-View Acoustic Synthesis

NaQ: Leveraging Narrations As Queries To Supervise Episodic Memory

HierVL: Learning Hierarchical Video-Language Embeddings

Learning Image Representations Tied to Ego-Motion

Just Noticeable Differences in Visual Attributes

Fashion Forward: Forecasting Visual Style in Fashion

On-Demand Learning for Deep Image Restoration

Learning the Latent "Look": Unsupervised Discovery of a Style-Coherent Embedding From Fashion Images

Semantic Jitter: Dense Supervision for Visual Comparisons via Synthetic Images

Co-Separating Sounds of Visual Objects

Fashion++: Minimal Edits for Outfit Improvement

Grounded Human-Object Interaction Hotspots From Video

From Culture to Clothing: Discovering the World Events Behind a Century of Fashion Images

Move2Hear: Active Audio-Visual Source Separation

Multiview Pseudo-Labeling for Semi-Supervised Learning From Video

Audio-Visual Floorplan Reconstruction

Anticipative Video Transformer

Occupancy Anticipation for Efficient Exploration and Navigation

SoundSpaces: Audio-Visual Navigation in 3D Environments

VisualEchoes: Spatial Image Representation Learning through Echolocation

Proposal-based Video Completion

Egocentric Activity Recognition and Localization on a 3D Map

Active Audio-Visual Separation of Dynamic Sound Sources

Learning Spherical Convolution for Fast Features from 360° Imagery

NeurIPS 2017arXiv

Egocentric Video Task Translation

Viewpoint Rosetta Stone: Unlocking Unpaired Ego-Exo Videos for View-invariant Representation Learning

Switch-a-View: View Selection Learned from Unlabeled In-the-wild Videos

Learning Skill-Attributes for Transferable Assessment in Video

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Pull the Plug? Predicting If Computers or Humans Should Segment Images

Summary Transfer: Exemplar-Based Subset Selection for Video Summarization

Active Image Segmentation Propagation

Slow and Steady Feature Analysis: Higher Order Temporal Coherence in Video

FusionSeg: Learning to Combine Motion and Appearance for Fully Automatic Segmentation of Generic Objects in Videos

Learning Affordance Landscapes for Interaction Exploration in 3D Environments

Shaping embodied agent behavior with activity-context priors from egocentric video

Few-Shot Audio-Visual Learning of Environment Acoustics

SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning

Single-Stage Visual Query Localization in Egocentric Videos

Self-Supervised Visual Acoustic Matching

EgoDistill: Egocentric Head Motion Distillation for Efficient Video Understanding

Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment

EgoEnv: Human-centric environment representations from egocentric video

Video-Mined Task Graphs for Keystep Recognition in Instructional Videos

EgoTracks: A Long-term Egocentric Visual Object Tracking Dataset