Xiaodan Liang
128
Papers
608
Total Citations
Papers (128)
Matching-CNN Meets KNN: Quasi-Parametric Human Parsing
CVPR 2015
168
citations
Tree-Structured Reinforcement Learning for Sequential Object Localization
NeurIPS 2016arXiv
129
citations
Structured Generative Adversarial Networks
NeurIPS 2017arXiv
56
citations
DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
CVPR 2024
45
citations
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
CVPR 2025
44
citations
Making Large Language Models Better Planners with Reasoning-Decision Alignment
ECCV 2024
35
citations
WISA: World simulator assistant for physics-aware text-to-video generation
NeurIPS 2025
33
citations
MLP Can Be A Good Transformer Learner
CVPR 2024
20
citations
AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis
CVPR 2024
20
citations
FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model
CVPR 2025
15
citations
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
CVPR 2025
13
citations
RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation
CVPR 2025
12
citations
RoboTron-Drive: All-in-One Large Multimodal Model for Autonomous Driving
ICCV 2025
11
citations
PTUS: Photo-Realistic Talking Upper-Body Synthesis via 3D-Aware Motion Decomposition
AAAI 2024
3
citations
S2-Track: A Simple yet Strong Approach for End-to-End 3D Multi-Object Tracking
ICML 2025
2
citations
Monocular 3D Hand Mesh Recovery via Dual Noise Estimation
AAAI 2024arXiv
2
citations
Deep Variation-Structured Reinforcement Learning for Visual Relationship and Attribute Detection
CVPR 2017arXiv
0
citations
Look Into Person: Self-Supervised Structure-Sensitive Learning and a New Benchmark for Human Parsing
CVPR 2017arXiv
0
citations
Interpretable Structure-Evolving LSTM
CVPR 2017arXiv
0
citations
Object Region Mining With Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach
CVPR 2017arXiv
0
citations
Dynamic-Structured Semantic Propagation Network
CVPR 2018arXiv
0
citations
Visual Question Reasoning on General Dependency Tree
CVPR 2018arXiv
0
citations
Reinforcement Cutting-Agent Learning for Video Object Segmentation
CVPR 2018
0
citations
Blending-Target Domain Adaptation by Adversarial Meta-Adaptation Networks
CVPR 2019
0
citations
Layout-Graph Reasoning for Fashion Landmark Detection
CVPR 2019
0
citations
Reasoning-RCNN: Unifying Adaptive Global Reasoning Into Large-Scale Object Detection
CVPR 2019
0
citations
Graphonomy: Universal Human Parsing via Graph Transfer Learning
CVPR 2019
0
citations
Learning Personalized Modular Network Guided by Structured Knowledge
CVPR 2019
0
citations
Spatial-Aware Graph Relation Network for Large-Scale Object Detection
CVPR 2019
0
citations
Rethinking Knowledge Graph Propagation for Zero-Shot Learning
CVPR 2019
0
citations
Block-Wisely Supervised Neural Architecture Search With Knowledge Distillation
CVPR 2020
0
citations
Fashion Editing With Adversarial Parsing Learning
CVPR 2020arXiv
0
citations
Bidirectional Graph Reasoning Network for Panoptic Segmentation
CVPR 2020arXiv
0
citations
Vision-Language Navigation With Self-Supervised Auxiliary Reasoning Tasks
CVPR 2020arXiv
0
citations
SP-NAS: Serial-to-Parallel Backbone Search for Object Detection
CVPR 2020
0
citations
Vision-Dialog Navigation by Exploring Cross-Modal Memory
CVPR 2020arXiv
0
citations
TransNAS-Bench-101: Improving Transferability and Generalizability of Cross-Task Neural Architecture Search
CVPR 2021
0
citations
Dynamic Slimmable Network
CVPR 2021arXiv
0
citations
SOON: Scenario Oriented Object Navigation With Graph-Based Exploration
CVPR 2021arXiv
0
citations
Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism
CVPR 2022arXiv
0
citations
Cross-Modal Clinical Graph Transformer for Ophthalmic Report Generation
CVPR 2022
0
citations
Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search
CVPR 2022
0
citations
Dressing in the Wild by Watching Dance Videos
CVPR 2022arXiv
0
citations
Knowledge Distillation via the Target-Aware Transformer
CVPR 2022arXiv
0
citations
Beyond Fixation: Dynamic Window Visual Transformer
CVPR 2022arXiv
0
citations
ADAPT: Vision-Language Navigation With Modality-Aligned Action Prompts
CVPR 2022arXiv
0
citations
Automated Progressive Learning for Efficient Training of Vision Transformers
CVPR 2022arXiv
0
citations
M5Product: Self-Harmonized Contrastive Learning for E-Commercial Multi-Modal Pretraining
CVPR 2022
0
citations
BodyGAN: General-Purpose Controllable Neural Human Body Generation
CVPR 2022
0
citations
Dynamic Graph Enhanced Contrastive Learning for Chest X-Ray Report Generation
CVPR 2023
0
citations
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-Training via Word-Region Alignment
CVPR 2023arXiv
0
citations
Learning To Segment Every Referring Object Point by Point
CVPR 2023
0
citations
Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving
CVPR 2023arXiv
0
citations
GP-VTON: Towards General Purpose Virtual Try-On via Collaborative Local-Flow Global-Parsing Learning
CVPR 2023
0
citations
CLIP2: Contrastive Language-Image-Point Pretraining From Real-World Point Cloud Data
CVPR 2023
0
citations
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
CVPR 2023arXiv
0
citations
Towards Computational Baby Learning: A Weakly-Supervised Approach for Object Detection
ICCV 2015
0
citations
Human Parsing With Contextualized Convolutional Neural Network
ICCV 2015
0
citations
Dual Motion GAN for Future-Flow Embedded Video Prediction
ICCV 2017arXiv
0
citations
Temporal Dynamic Graph LSTM for Action-Driven Video Object Detection
ICCV 2017arXiv
0
citations
Recurrent Topic-Transition GAN for Visual Paragraph Generation
ICCV 2017arXiv
0
citations
Nonparametric Variational Auto-Encoders for Hierarchical Representation Learning
ICCV 2017arXiv
0
citations
FW-GAN: Flow-Navigated Warping GAN for Video Virtual Try-On
ICCV 2019
0
citations
Auto-FPN: Automatic Network Architecture Adaptation for Object Detection Beyond Classification
ICCV 2019
0
citations
Towards Multi-Pose Guided Virtual Try-On Network
ICCV 2019
0
citations
Meta R-CNN: Towards General Solver for Instance-Level Low-Shot Learning
ICCV 2019
0
citations
Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift
ICCV 2021
0
citations
M3D-VTON: A Monocular-to-3D Virtual Try-On Network
ICCV 2021
0
citations
UltraPose: Synthesizing Dense Pose With 1 Billion Points by Human-Body Decoupling 3D Model
ICCV 2021
0
citations
Voxel Transformer for 3D Object Detection
ICCV 2021arXiv
0
citations
Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-Modal Pretraining
ICCV 2021arXiv
0
citations
Self-Motivated Communication Agent for Real-World Vision-Dialog Navigation
ICCV 2021
0
citations
Vision-Language Navigation With Random Environmental Mixup
ICCV 2021arXiv
0
citations
Linguistically Routing Capsule Network for Out-of-Distribution Visual Question Answering
ICCV 2021
0
citations
BossNAS: Exploring Hybrid CNN-Transformers With Block-Wisely Self-Supervised Neural Architecture Search
ICCV 2021arXiv
0
citations
NASOA: Towards Faster Task-Oriented Online Fine-Tuning With a Zoo of Models
ICCV 2021arXiv
0
citations
Exploring Geometry-Aware Contrast and Clustering Harmonization for Self-Supervised 3D Object Detection
ICCV 2021
0
citations
Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection
ICCV 2021
0
citations
Exploring Inter-Channel Correlation for Diversity-Preserved Knowledge Distillation
ICCV 2021
0
citations
Coordinate Transformer: Achieving Single-stage Multi-person Mesh Recovery from Videos
ICCV 2023arXiv
0
citations
CTP:Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation
ICCV 2023arXiv
0
citations
MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation
ICCV 2023arXiv
0
citations
DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability
ICCV 2023arXiv
0
citations
GrowCLIP: Data-Aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-Training
ICCV 2023arXiv
0
citations
FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration
ICCV 2023arXiv
0
citations
DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment
ICCV 2023arXiv
0
citations
LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts
ICCV 2023
0
citations
Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images
ICCV 2023arXiv
0
citations
CurveLane-NAS: Unifying Lane-Sensitive Architecture Search and Adaptive Point Blending
ECCV 2020
0
citations
CATCH: Context-based Meta Reinforcement Learning for Transferrable Architecture Search
ECCV 2020
0
citations
Open-World Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding
ECCV 2022
0
citations
SiRi: A Simple Selective Retraining Mechanism for Transformer-Based Visual Grounding
ECCV 2022
0
citations
CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving
ECCV 2022
0
citations
Perceptual Generative Adversarial Networks for Small Object Detection
CVPR 2017arXiv
0
citations
RoboPearls: Editable Video Simulation for Robot Manipulation
ICCV 2025
0
citations
A₀ : An Affordance-Aware Hierarchical Model for General Robotic Manipulation
ICCV 2025
0
citations
RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation
ICCV 2025
0
citations
DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder
AAAI 2025
0
citations
MUSE: Mamba Is Efficient Multi-scale Learner for Text-video Retrieval
AAAI 2025
0
citations
BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving
AAAI 2025
0
citations
Affordances-Oriented Planning Using Foundation Models for Continuous Vision-Language Navigation
AAAI 2025
0
citations
3D Visibility-Aware Generalizable Neural Radiance Fields for Interacting Hands
AAAI 2024
0
citations
Towards Detailed Text-to-Motion Synthesis via Basic-to-Advanced Hierarchical Diffusion Model
AAAI 2024
0
citations
Holistic Autonomous Driving Understanding by Bird’s-Eye-View Injected Multi-Modal Large Models
CVPR 2024
0
citations
Reversible Recursive Instance-Level Object Segmentation
CVPR 2016
0
citations
Deep Structured Scene Parsing by Learning With Image Descriptions
CVPR 2016
0
citations
Semantic Object Parsing With Local-Global Long Short-Term Memory
CVPR 2016
0
citations
Attention-Aware Face Hallucination via Deep Reinforcement Learning
CVPR 2017arXiv
0
citations
Recurrent 3D Pose Sequence Machines
CVPR 2017arXiv
0
citations
Soft-Gated Warping-GAN for Pose-Guided Person Image Synthesis
NeurIPS 2018
0
citations
Hybrid Knowledge Routed Modules for Large-scale Object Detection
NeurIPS 2018
0
citations
Symbolic Graph Reasoning Meets Convolutions
NeurIPS 2018
0
citations
Deep Generative Models with Learnable Knowledge Constraints
NeurIPS 2018
0
citations
Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation
NeurIPS 2018
0
citations
Heterogeneous Graph Learning for Visual Commonsense Reasoning
NeurIPS 2019
0
citations
AutoSync: Learning to Synchronize for Data-Parallel Distributed Deep Learning
NeurIPS 2020
0
citations
Towards Interpretable Natural Language Understanding with Explanations as Latent Variables
NeurIPS 2020
0
citations
Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation
NeurIPS 2020
0
citations
Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN
NeurIPS 2021
0
citations
DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
NeurIPS 2022
0
citations
CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation
NeurIPS 2022
0
citations
Structure-Preserving 3D Garment Modeling with Neural Sewing Machines
NeurIPS 2022
0
citations
Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving
NeurIPS 2022
0
citations
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark
NeurIPS 2022
0
citations
Towards Hard-pose Virtual Try-on via 3D-aware Global Correspondence Learning
NeurIPS 2022
0
citations
RIO: A Benchmark for Reasoning Intention-Oriented Objects in Open Environments
NeurIPS 2023
0
citations
Toward Controlled Generation of Text
ICML 2017
0
citations
Multivariate-Information Adversarial Ensemble for Scalable Joint Distribution Matching
ICML 2019
0
citations