Xiaojuan Qi

84
Papers
579
Total Citations

Papers (84)

SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes

CVPR 2024
302
citations

RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding

CVPR 2024
103
citations

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

CVPR 2025
54
citations

V-IRL: Grounding Virtual Intelligence in Real Life

ECCV 2024arXiv
35
citations

Mixture Compressor for Mixture-of-Experts LLMs Gains More

ICLR 2025
23
citations

Can OOD Object Detectors Learn from Foundation Models?

ECCV 2024
12
citations

Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction

CVPR 2024
12
citations

ObjectMover: Generative Object Movement with Video Prior

CVPR 2025
10
citations

SaCo Loss: Sample-wise Affinity Consistency for Vision-Language Pre-training

CVPR 2024
10
citations

Deformable Radial Kernel Splatting

CVPR 2025
8
citations

``Principal Components" Enable A New Language of Images

ICCV 2025
6
citations

A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning

CVPR 2025
3
citations

Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection

ICCV 2025
1
citations

Pyramid Scene Parsing Network

CVPR 2017arXiv
0
citations

GeoNet: Geometric Neural Network for Joint Depth and Surface Normal Estimation

CVPR 2018
0
citations

Referring Image Segmentation via Recurrent Refinement Networks

CVPR 2018
0
citations

Semi-Parametric Image Synthesis

CVPR 2018arXiv
0
citations

3D Motion Decomposition for RGBD Future Dynamic Scene Synthesis

CVPR 2019
0
citations

Global Texture Enhancement for Fake Face Detection in the Wild

CVPR 2020arXiv
0
citations

ManiGAN: Text-Guided Image Manipulation

CVPR 2020arXiv
0
citations

Unifying Training and Inference for Panoptic Segmentation

CVPR 2020arXiv
0
citations

3D-to-2D Distillation for Indoor Scene Parsing

CVPR 2021arXiv
0
citations

PAConv: Position Adaptive Convolution With Dynamic Kernel Assembling on Point Clouds

CVPR 2021arXiv
0
citations

ST3D: Self-Training for Unsupervised Domain Adaptation on 3D Object Detection

CVPR 2021arXiv
0
citations

Fully Convolutional Networks for Panoptic Segmentation

CVPR 2021arXiv
0
citations

One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation

CVPR 2021arXiv
0
citations

TWIST: Two-Way Inter-Label Self-Training for Semi-Supervised 3D Instance Segmentation

CVPR 2022
0
citations

Voxel Field Fusion for 3D Object Detection

CVPR 2022arXiv
0
citations

Towards Implicit Text-Guided 3D Shape Generation

CVPR 2022arXiv
0
citations

Slot-VPS: Object-Centric Representation Learning for Video Panoptic Segmentation

CVPR 2022
0
citations

HINT: Hierarchical Neuron Concept Explainer

CVPR 2022arXiv
0
citations

Progressive End-to-End Object Detection in Crowded Scenes

CVPR 2022arXiv
0
citations

Knowledge Distillation As Efficient Pre-Training: Faster Convergence, Higher Data-Efficiency, and Better Transferability

CVPR 2022
0
citations

Video Demoireing With Relation-Based Temporal Consistency

CVPR 2022arXiv
0
citations

Stratified Transformer for 3D Point Cloud Segmentation

CVPR 2022arXiv
0
citations

MarS3D: A Plug-and-Play Motion-Aware Model for Semantic Segmentation on Multi-Scan 3D Point Clouds

CVPR 2023
0
citations

PLA: Language-Driven Open-Vocabulary 3D Scene Understanding

CVPR 2023arXiv
0
citations

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking

CVPR 2023arXiv
0
citations

Understanding Imbalanced Semantic Segmentation Through Neural Collapse

CVPR 2023arXiv
0
citations

LargeKernel3D: Scaling Up Kernels in 3D Sparse CNNs

CVPR 2023arXiv
0
citations

Command-Driven Articulated Object Understanding and Manipulation

CVPR 2023
0
citations

Semantic Segmentation With Object Clique Potential

ICCV 2015
0
citations

3D Graph Neural Networks for RGBD Semantic Segmentation

ICCV 2017
0
citations

Improved Techniques for Training Adaptive Deep Networks

ICCV 2019
0
citations

AGSS-VOS: Attention Guided Single-Shot Video Object Segmentation

ICCV 2019
0
citations

Aggregation With Feature Detection

ICCV 2021
0
citations

Re-Distributing Biased Pseudo Labels for Semi-Supervised Semantic Segmentation: A Baseline Investigation

ICCV 2021arXiv
0
citations

Texture Generation on 3D Meshes with Point-UV Diffusion

ICCV 2023arXiv
0
citations

Learning a Room with the Occ-SDF Hybrid: Signed Distance Function Mingled with Occupancy Aids Scene Representation

ICCV 2023arXiv
0
citations

Parametric Classification for Generalized Category Discovery: A Baseline Study

ICCV 2023arXiv
0
citations

IST-Net: Prior-Free Category-Level Pose Estimation with Implicit Space Transformation

ICCV 2023
0
citations

Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video

ICCV 2023arXiv
0
citations

Domain-invariant Stereo Matching Networks

ECCV 2020
0
citations

Few-shot Action Recognition with Permutation-invariant Attention

ECCV 2020
0
citations

CN: Channel Normalization For Point Cloud Recognition

ECCV 2020
0
citations

Memory Selection Network for Video Propagation

ECCV 2020
0
citations

Towards Efficient and Scale-Robust Ultra-High-Definition Image Demoiréing

ECCV 2022
0
citations

DODA: Data-Oriented Sim-to-Real Domain Adaptation for 3D Semantic Segmentation

ECCV 2022
0
citations

Multimodal Transformer for Automatic 3D Annotation and Object Detection

ECCV 2022
0
citations

Hybrid Neural Rendering for Large-Scale Scenes With Motion Blur

CVPR 2023arXiv
0
citations

Learning from Neighbors: Category Extrapolation for Long-Tail Learning

CVPR 2025
0
citations

UniScene: Unified Occupancy-centric Driving Scene Generation

CVPR 2025
0
citations

Holistic Tokenizer for Autoregressive Image Generation

ICCV 2025
0
citations

Aligning Effective Tokens with Video Anomaly in Large Language Models

ICCV 2025
0
citations

Mixture-of-Scores: Robust Image-Text Data Valuation via Three Lines of Code

ICCV 2025
0
citations

DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation

ICCV 2025
0
citations

How Far are AI-generated Videos from Simulating the 3D Visual World: A Learned 3D Evaluation Approach

ICCV 2025
0
citations

EscherNet: A Generative Model for Scalable View Synthesis

CVPR 2024
0
citations

Classes Are Not Equal: An Empirical Study on Image Recognition Fairness

CVPR 2024
0
citations

How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?

CVPR 2024
0
citations

DCAN: Deep Contour-Aware Networks for Accurate Gland Segmentation

CVPR 2016
0
citations

Multi-Scale Patch Aggregation (MPA) for Simultaneous Detection and Segmentation

CVPR 2016
0
citations

Image Inpainting via Generative Multi-column Convolutional Neural Networks

NeurIPS 2018
0
citations

Controllable Text-to-Image Generation

NeurIPS 2019
0
citations

Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation

NeurIPS 2020
0
citations

Spatial Pruned Sparse Convolution for Efficient 3D Object Detection

NeurIPS 2022
0
citations

Prototypical VoteNet for Few-Shot 3D Point Cloud Object Detection

NeurIPS 2022
0
citations

Self-Supervised Visual Representation Learning with Semantic Grouping

NeurIPS 2022
0
citations

Unifying Voxel-based Representation with Transformer for 3D Object Detection

NeurIPS 2022
0
citations

Towards Efficient 3D Object Detection with Knowledge Distillation

NeurIPS 2022
0
citations

Rethinking Resolution in the Context of Efficient Video Recognition

NeurIPS 2022
0
citations

Data Pruning via Moving-one-Sample-out

NeurIPS 2023
0
citations

CL-NeRF: Continual Learning of Neural Radiance Fields for Evolving Scene Representation

NeurIPS 2023
0
citations

CoDet: Co-occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection

NeurIPS 2023
0
citations