Wei Zhai

23

Papers

58

Total Citations

Papers (23)

Improved Video VAE for Latent Video Diffusion Model

Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning

MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling

Bidirectional Progressive Transformer for Interaction Intention Anticipation

MATE: Motion-Augmented Temporal Consistency for Event-based Point Tracking

EMoTive: Event-guided Trajectory Modeling for 3D Motion Estimation

Hypercorrelation Evolution for Video Class-Incremental Learning

LEMON: Learning 3D Human-Object Interaction Relation from 2D Images

Deep Structure-Revealed Network for Texture Recognition

Self-Promoted Prototype Refinement for Few-Shot Class-Incremental Learning

Background Activation Suppression for Weakly Supervised Object Localization

Self-Sustaining Representation Expansion for Non-Exemplar Class-Incremental Learning

Learning Affordance Grounding From Exocentric Images

Leverage Interactive Affinity for Affordance Learning

Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection

Deep Multiple-Attribute-Perceived Network for Real-World Texture Recognition

Spatial-Aware Token for Weakly Supervised Object Localization

Grounding 3D Object Affordance from 2D Interactions in Images

Efficient Test-time Adaptive Object Detection via Sensitivity-Guided Pruning

GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding

SIGMAN: Scaling 3D Human Gaussian Generation with Millions of Assets

HERO: Human Reaction Generation from Videos

Exploring Figure-Ground Assignment Mechanism in Perceptual Organization