Zhuowen Tu
14
Papers
270
Total Citations
Papers (14)
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
AAAI 2024arXiv
190
citations
Dolfin: Diffusion Layout Transformers without Autoencoder
ECCV 2024
25
citations
Bayesian Diffusion Models for 3D Shape Reconstruction
CVPR 2024
23
citations
Enhancing Vision-Language Pre-training with Rich Supervisions
CVPR 2024
15
citations
YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
ICCV 2025
6
citations
Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
ICCV 2025
5
citations
Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels
CVPR 2025
3
citations
Open-World Dynamic Prompt and Continual Visual Representation Learning
ECCV 2024
3
citations
Restoration by Generation with Constrained Priors
CVPR 2024
0
citations
DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion
ICCV 2025
0
citations
Non-autoregressive Sequence-to-Sequence Vision-Language Models
CVPR 2024
0
citations
On the Scalability of Diffusion-based Text-to-Image Generation
CVPR 2024
0
citations
TokenCompose: Text-to-Image Diffusion with Token-level Supervision
CVPR 2024
0
citations
HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data
CVPR 2024
0
citations