Zeyuan Chen

12

Papers

548

Total Citations

Papers (12)

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

HIVE: Harnessing Human Feedback for Instructional Visual Editing

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

Dolfin: Diffusion Layout Transformers without Autoencoder

Bayesian Diffusion Models for 3D Shape Reconstruction

LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer

X-Dyna: Expressive Dynamic Human Image Animation

X-Dancer: Expressive Music to Human Dance Video Generation

YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

CADGrasp: Learning Contact and Collision Aware General Dexterous Grasping in Cluttered Scenes

NeurIPS 2025arXiv

DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion

Structured Policy Optimization: Enhance Large Vision-Language Model via Self-referenced Dialogue