Yifei Huang

22

Papers

157

Total Citations

Papers (22)

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World

CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding

Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning

EgoExoBench: A Benchmark for First- and Third-person View Video Understanding in MLLMs

ActionVOS: Actions as Prompts for Video Object Segmentation

Learning Streaming Video Representation via Multitask Training

TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation

Interact Before Align: Leveraging Cross-Modal Knowledge for Domain Adaptive Action Recognition

Ego4D: Around the World in 3,000 Hours of Egocentric Video

Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction

Weakly Supervised Temporal Sentence Grounding With Uncertainty-Guided Self-Training

FACIAL: Synthesizing Dynamic Talking Face With Implicit Attribute Learning

Memory-and-Anticipation Transformer for Online Action Understanding

Learn to Recover Visible Color for Video Surveillance in a Day

Beyond Label Semantics: Language-Guided Action Anatomy for Few-shot Action Recognition

Compound Prototype Matching for Few-Shot Action Recognition

Egocentric Action-aware Inertial Localization in Point Clouds with Vision-Language Guidance

Retrieval-Augmented Egocentric Video Captioning

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Improving Action Segmentation via Graph-Based Temporal Reasoning

Goal-Oriented Gaze Estimation for Zero-Shot Learning

CLRNet: Cross Layer Refinement Network for Lane Detection