Xiaojuan Qi

23

Papers

600

Total Citations

Papers (23)

SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes

RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

V-IRL: Grounding Virtual Intelligence in Real Life

Mixture Compressor for Mixture-of-Experts LLMs Gains More

DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation

Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction

Can OOD Object Detectors Learn from Foundation Models?

ObjectMover: Generative Object Movement with Video Prior

SaCo Loss: Sample-wise Affinity Consistency for Vision-Language Pre-training

Deformable Radial Kernel Splatting

``Principal Components" Enable A New Language of Images

A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning

Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection

How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?

Learning from Neighbors: Category Extrapolation for Long-Tail Learning

UniScene: Unified Occupancy-centric Driving Scene Generation

Holistic Tokenizer for Autoregressive Image Generation

Aligning Effective Tokens with Video Anomaly in Large Language Models

Mixture-of-Scores: Robust Image-Text Data Valuation via Three Lines of Code

How Far are AI-generated Videos from Simulating the 3D Visual World: A Learned 3D Evaluation Approach

EscherNet: A Generative Model for Scalable View Synthesis

Classes Are Not Equal: An Empirical Study on Image Recognition Fairness