Zeyuan Chen
12
Papers
548
Total Citations
Papers (12)
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
AAAI 2024arXiv
190
citations
HIVE: Harnessing Human Feedback for Instructional Visual Editing
CVPR 2024
164
citations
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization
ICLR 2024
104
citations
Dolfin: Diffusion Layout Transformers without Autoencoder
ECCV 2024
25
citations
Bayesian Diffusion Models for 3D Shape Reconstruction
CVPR 2024
23
citations
LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer
ECCV 2024arXiv
14
citations
X-Dyna: Expressive Dynamic Human Image Animation
CVPR 2025
13
citations
X-Dancer: Expressive Music to Human Dance Video Generation
ICCV 2025
9
citations
YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
ICCV 2025arXiv
6
citations
CADGrasp: Learning Contact and Collision Aware General Dexterous Grasping in Cluttered Scenes
NeurIPS 2025arXiv
0
citations
DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion
ICCV 2025
0
citations
Structured Policy Optimization: Enhance Large Vision-Language Model via Self-referenced Dialogue
ICCV 2025
0
citations