Yu-Xiong Wang

19
Papers
250
Total Citations

Papers (19)

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

CVPR 2025
61
citations

Frozen Transformers in Language Models Are Effective Visual Encoder Layers

ICLR 2024
48
citations

Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion

CVPR 2024
25
citations

GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation

CVPR 2025
21
citations

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought

CVPR 2025arXiv
19
citations

RMem: Restricted Memory Banks Improve Video Object Segmentation

CVPR 2024
18
citations

TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding

CVPR 2024
18
citations

ConsistDreamer: 3D-Consistent 2D Diffusion for High-Fidelity Scene Editing

CVPR 2024
15
citations

Region-Based Representations Revisited

CVPR 2024
14
citations

InterAct: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation

CVPR 2025
7
citations

Refer to Any Segmentation Mask Group With Vision-Language Prompts

ICCV 2025
2
citations

AgMMU: A Comprehensive Agricultural Multimodal Understanding Benchmark

NeurIPS 2025
2
citations

Situational Awareness Matters in 3D Vision Language Reasoning

CVPR 2024
0
citations

Floating No More: Object-Ground Reconstruction from a Single Image

CVPR 2025
0
citations

InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions

CVPR 2025
0
citations

Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models

ICML 2024
0
citations

Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching

ICML 2024
0
citations

ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories

ICML 2024
0
citations

ReferEverything: Towards Segmenting Everything We Can Speak of in Videos

ICCV 2025
0
citations