Xihui Liu
46
Papers
1,260
Total Citations
Papers (46)
WorldSimBench: Towards Video Generation Models as World Simulators
ICML 2025
806
citations
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D Capabilities
ICCV 2025
127
citations
Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training
CVPR 2024
77
citations
GameFactory: Creating New Games with Generative Interactive Videos
ICCV 2025
63
citations
GoT: Unleashing Reasoning Capability of MLLM for Visual Generation and Editing
NeurIPS 2025
60
citations
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation
CVPR 2025
25
citations
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
ICCV 2025
22
citations
DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
CVPR 2024
19
citations
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
ICCV 2025
17
citations
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
ICCV 2025
17
citations
RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints
ICCV 2025
11
citations
MBQ: Modality-Balanced Quantization for Large Vision-Language Models
CVPR 2025
10
citations
OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
NeurIPS 2025
6
citations
Object Detection in Videos With Tubelet Proposal Networks
CVPR 2017arXiv
0
citations
Improving Referring Expression Grounding With Cross-Modal Attention-Guided Erasing
CVPR 2019
0
citations
HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation
CVPR 2025
0
citations
Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning
CVPR 2023arXiv
0
citations
GLeaD: Improving GANs With a Generator-Leading Task
CVPR 2023arXiv
0
citations
RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer
CVPR 2023
0
citations
Back to the Source: Diffusion-Driven Adaptation To Test-Time Corruption
CVPR 2023
0
citations
Learning Transferable Spatiotemporal Representations From Natural Script Knowledge
CVPR 2023arXiv
0
citations
HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis
ICCV 2017
0
citations
Orientation Invariant Feature Embedding and Spatial Temporal Regularization for Vehicle Re-Identification
ICCV 2017
0
citations
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
ICCV 2019
0
citations
DDP: Diffusion Model for Dense Visual Prediction
ICCV 2023arXiv
0
citations
Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions
ECCV 2020
0
citations
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval
ECCV 2022
0
citations
Bridging Video-Text Retrieval With Multiple Choice Questions
CVPR 2022arXiv
0
citations
Parallelized Autoregressive Visual Generation
CVPR 2025
0
citations
MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation
CVPR 2025
0
citations
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation
CVPR 2025
0
citations
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
ICCV 2025
0
citations
LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation
ICCV 2025
0
citations
V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
ICCV 2025
0
citations
DreamCube: RGB-D Panorama Generation via Multi-plane Synchronization
ICCV 2025
0
citations
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
CVPR 2024
0
citations
Point Transformer V3: Simpler Faster Stronger
CVPR 2024
0
citations
HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting
CVPR 2024
0
citations
UniMC: Taming Diffusion Transformer for Unified Keypoint-Guided Multi-Class Image Generation
ICML 2025
0
citations
FiT: Flexible Vision Transformer for Diffusion Model
ICML 2024
0
citations
Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis
NeurIPS 2019
0
citations
Point Transformer V2: Grouped Vector Attention and Partition-based Pooling
NeurIPS 2022
0
citations
Seeing is not always believing: Benchmarking Human and Model Perception of AI-Generated Images
NeurIPS 2023
0
citations
CorresNeRF: Image Correspondence Priors for Neural Radiance Fields
NeurIPS 2023
0
citations
OV-PARTS: Towards Open-Vocabulary Part Segmentation
NeurIPS 2023
0
citations
T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation
NeurIPS 2023
0
citations