Wenhao Chai

14

Papers

649

Total Citations

Papers (14)

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

Learning Diffusion Texture Priors for Image Restoration

PAD: Personalized Alignment of LLMs at Decoding-time

RT-Pose: A 4D Radar-Tensor based 3D Human Pose Estimation and Localization Benchmark

Zero-shot 3D Question Answering via Voxel-based Dynamic Token Compression

Global Adaptation Meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation

MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection

StableVideo: Text-driven Consistency-aware Diffusion Video Editing

Science-T2I: Addressing Scientific Illusions in Image Synthesis

Bringing RNNs Back to Efficient Open-Ended Video Understanding

AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement

PromptHaze: Prompting Real-world Dehazing via Depth Anything Model

UniAP: Towards Universal Animal Perception in Vision via Few-Shot Learning