Yu-Xiong Wang
19
Papers
250
Total Citations
Papers (19)
RandAR: Decoder-only Autoregressive Visual Generation in Random Orders
CVPR 2025
61
citations
Frozen Transformers in Language Models Are Effective Visual Encoder Layers
ICLR 2024
48
citations
Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion
CVPR 2024
25
citations
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation
CVPR 2025
21
citations
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
CVPR 2025arXiv
19
citations
RMem: Restricted Memory Banks Improve Video Object Segmentation
CVPR 2024
18
citations
TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding
CVPR 2024
18
citations
ConsistDreamer: 3D-Consistent 2D Diffusion for High-Fidelity Scene Editing
CVPR 2024
15
citations
Region-Based Representations Revisited
CVPR 2024
14
citations
InterAct: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation
CVPR 2025
7
citations
Refer to Any Segmentation Mask Group With Vision-Language Prompts
ICCV 2025
2
citations
AgMMU: A Comprehensive Agricultural Multimodal Understanding Benchmark
NeurIPS 2025
2
citations
Situational Awareness Matters in 3D Vision Language Reasoning
CVPR 2024
0
citations
Floating No More: Object-Ground Reconstruction from a Single Image
CVPR 2025
0
citations
InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions
CVPR 2025
0
citations
Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models
ICML 2024
0
citations
Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching
ICML 2024
0
citations
ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories
ICML 2024
0
citations
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
ICCV 2025
0
citations