Yu-Xiong Wang

49
Papers
250
Total Citations

Papers (49)

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

CVPR 2025arXiv
61
citations

Frozen Transformers in Language Models Are Effective Visual Encoder Layers

ICLR 2024arXiv
48
citations

Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion

CVPR 2024arXiv
25
citations

GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation

CVPR 2025arXiv
21
citations

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought

CVPR 2025arXiv
19
citations

RMem: Restricted Memory Banks Improve Video Object Segmentation

CVPR 2024arXiv
18
citations

TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding

CVPR 2024arXiv
18
citations

ConsistDreamer: 3D-Consistent 2D Diffusion for High-Fidelity Scene Editing

CVPR 2024arXiv
15
citations

Region-Based Representations Revisited

CVPR 2024arXiv
14
citations

InterAct: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation

CVPR 2025
7
citations

Refer to Any Segmentation Mask Group With Vision-Language Prompts

ICCV 2025arXiv
2
citations

AgMMU: A Comprehensive Agricultural Multimodal Understanding Benchmark

NeurIPS 2025arXiv
2
citations

InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions

CVPR 2025
0
citations

Floating No More: Object-Ground Reconstruction from a Single Image

CVPR 2025
0
citations

ReferEverything: Towards Segmenting Everything We Can Speak of in Videos

ICCV 2025
0
citations

Situational Awareness Matters in 3D Vision Language Reasoning

CVPR 2024
0
citations

Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models

ICML 2024
0
citations

Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching

ICML 2024
0
citations

ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories

ICML 2024
0
citations

Hallucination Improves Few-Shot Object Detection

CVPR 2021arXiv
0
citations

DAP: Detection-Aware Pre-Training With Weak Supervision

CVPR 2021arXiv
0
citations

Discovering Objects That Can Move

CVPR 2022arXiv
0
citations

Embracing Single Stride 3D Object Detector With Sparse Transformer

CVPR 2022arXiv
0
citations

Long-Tailed Recognition via Weight Balancing

CVPR 2022arXiv
0
citations

DIVeR: Real-Time and Accurate Neural Radiance Fields With Deterministic Integration for Volume Rendering

CVPR 2022arXiv
0
citations

Object Discovery From Motion-Guided Tokens

CVPR 2023arXiv
0
citations

BEV-Guided Multi-Modality Fusion for Driving Perception

CVPR 2023
0
citations

Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking

CVPR 2023arXiv
0
citations

NeuralEditor: Editing Neural Radiance Fields via Manipulating Point Clouds

CVPR 2023arXiv
0
citations

Contrastive Mean Teacher for Domain Adaptive Object Detectors

CVPR 2023arXiv
0
citations

On the Importance of Distractors for Few-Shot Classification

ICCV 2021arXiv
0
citations

Learning To Hallucinate Examples From Extrinsic and Intrinsic Supervision

ICCV 2021
0
citations

Pixel Contrastive-Consistent Semi-Supervised Semantic Segmentation

ICCV 2021arXiv
0
citations

Contrastive Learning Relies More on Spatial Inductive Bias Than Supervised Learning: An Empirical Study

ICCV 2023
0
citations

Video State-Changing Object Segmentation

ICCV 2023
0
citations

InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion

ICCV 2023arXiv
0
citations

Multi-task View Synthesis with Neural Radiance Fields

ICCV 2023
0
citations

MV-Map: Offboard HD-Map Generation with Multi-view Consistency

ICCV 2023
0
citations

Improving Equivariance in State-of-the-Art Supervised Depth and Normal Predictors

ICCV 2023
0
citations

Towards Streaming Perception

ECCV 2020
0
citations

PointTree: Transformation-Robust Point Cloud Encoder with Relaxed K-D Trees

ECCV 2022
0
citations

Diverse Human Motion Prediction Guided by Multi-level Spatial-Temporal Anchors

ECCV 2022
0
citations

CEIP: Combining Explicit and Implicit Priors for Reinforcement Learning with Demonstrations

NeurIPS 2022arXiv
0
citations

Continual Learning with Evolving Class Ontologies

NeurIPS 2022arXiv
0
citations

A Simple Solution for Offline Imitation from Observations and Examples with Possibly Incomplete Trajectories

NeurIPS 2023arXiv
0
citations

Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models

NeurIPS 2023arXiv
0
citations

YouTubePD: A Multimodal Benchmark for Parkinson’s Disease Analysis

NeurIPS 2023
0
citations

HASSOD: Hierarchical Adaptive Self-Supervised Object Detection

NeurIPS 2023arXiv
0
citations

ViCA-NeRF: View-Consistency-Aware 3D Editing of Neural Radiance Fields

NeurIPS 2023arXiv
0
citations