Zesen Cheng
13
Papers
80
Total Citations
Papers (13)
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
CVPR 2025
40
citations
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
NeurIPS 2025
26
citations
Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation
ECCV 2024
13
citations
Tune-Your-Style: Intensity-tunable 3D Style Transfer with Gaussian Splatting
ICCV 2025
1
citations
ACSeg: Adaptive Conceptualization for Unsupervised Semantic Segmentation
CVPR 2023arXiv
0
citations
Out-of-Candidate Rectification for Weakly Supervised Semantic Segmentation
CVPR 2023arXiv
0
citations
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding
CVPR 2023arXiv
0
citations
Multi-granularity Interaction Simulation for Unsupervised Interactive Segmentation
ICCV 2023arXiv
0
citations
Breaking the Memory Barrier of Contrastive Loss via Tile-Based Strategy
CVPR 2025
0
citations
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
ICCV 2023arXiv
0
citations
Temporal-aware Query Routing for Real-time Video Instance Segmentation
ICCV 2025
0
citations
Aligning Instance Brownian Bridge with Texts for Open-Vocabulary Video Instance Segmentation
AAAI 2025
0
citations
GraCo: Granularity-Controllable Interactive Segmentation
CVPR 2024
0
citations