Ruohan Gao

22

Papers

45

Total Citations

Papers (22)

The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective

AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs

Spherical World-Locking for Audio-Visual Localization in Egocentric Videos

Hearing Anywhere in Any Environment

Learning to Highlight Audio by Watching Movies

GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning

EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception

Differentiable Room Acoustic Rendering with Multi-View Vision Priors

RealImpact: A Dataset of Impact Sound Fields for Real Objects

The ObjectFolder Benchmark: Multisensory Learning With Neural and Real Objects

On-Demand Learning for Deep Image Restoration

Co-Separating Sounds of Visual Objects

VisualEchoes: Spatial Image Representation Learning through Echolocation

2.5D Visual Sound

AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs

Hearing Anything Anywhere

Im2Flow: Motion Hallucination From Static Images for Action Recognition

Listen to Look: Action Recognition by Previewing Audio

VisualVoice: Audio-Visual Speech Separation With Cross-Modal Consistency

ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer

Visual Acoustic Matching

SoundCam: A Dataset for Finding Humans Using Room Acoustics