Yunze Man

6

Papers

130

Total Citations

Papers (6)

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

Frozen Transformers in Language Models Are Effective Visual Encoder Layers

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought

AgMMU: A Comprehensive Agricultural Multimodal Understanding Benchmark

Floating No More: Object-Ground Reconstruction from a Single Image

Situational Awareness Matters in 3D Vision Language Reasoning