Xiaodan Liang

128
Papers
608
Total Citations

Papers (128)

Matching-CNN Meets KNN: Quasi-Parametric Human Parsing

CVPR 2015
168
citations

Tree-Structured Reinforcement Learning for Sequential Object Localization

NeurIPS 2016arXiv
129
citations

Structured Generative Adversarial Networks

NeurIPS 2017arXiv
56
citations

DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection

CVPR 2024
45
citations

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

CVPR 2025
44
citations

Making Large Language Models Better Planners with Reasoning-Decision Alignment

ECCV 2024
35
citations

WISA: World simulator assistant for physics-aware text-to-video generation

NeurIPS 2025
33
citations

MLP Can Be A Good Transformer Learner

CVPR 2024
20
citations

AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis

CVPR 2024
20
citations

FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model

CVPR 2025
15
citations

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

CVPR 2025
13
citations

RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation

CVPR 2025
12
citations

RoboTron-Drive: All-in-One Large Multimodal Model for Autonomous Driving

ICCV 2025
11
citations

PTUS: Photo-Realistic Talking Upper-Body Synthesis via 3D-Aware Motion Decomposition

AAAI 2024
3
citations

S2-Track: A Simple yet Strong Approach for End-to-End 3D Multi-Object Tracking

ICML 2025
2
citations

Monocular 3D Hand Mesh Recovery via Dual Noise Estimation

AAAI 2024arXiv
2
citations

Deep Variation-Structured Reinforcement Learning for Visual Relationship and Attribute Detection

CVPR 2017arXiv
0
citations

Look Into Person: Self-Supervised Structure-Sensitive Learning and a New Benchmark for Human Parsing

CVPR 2017arXiv
0
citations

Interpretable Structure-Evolving LSTM

CVPR 2017arXiv
0
citations

Object Region Mining With Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach

CVPR 2017arXiv
0
citations

Dynamic-Structured Semantic Propagation Network

CVPR 2018arXiv
0
citations

Visual Question Reasoning on General Dependency Tree

CVPR 2018arXiv
0
citations

Reinforcement Cutting-Agent Learning for Video Object Segmentation

CVPR 2018
0
citations

Blending-Target Domain Adaptation by Adversarial Meta-Adaptation Networks

CVPR 2019
0
citations

Layout-Graph Reasoning for Fashion Landmark Detection

CVPR 2019
0
citations

Reasoning-RCNN: Unifying Adaptive Global Reasoning Into Large-Scale Object Detection

CVPR 2019
0
citations

Graphonomy: Universal Human Parsing via Graph Transfer Learning

CVPR 2019
0
citations

Learning Personalized Modular Network Guided by Structured Knowledge

CVPR 2019
0
citations

Spatial-Aware Graph Relation Network for Large-Scale Object Detection

CVPR 2019
0
citations

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

CVPR 2019
0
citations

Block-Wisely Supervised Neural Architecture Search With Knowledge Distillation

CVPR 2020
0
citations

Fashion Editing With Adversarial Parsing Learning

CVPR 2020arXiv
0
citations

Bidirectional Graph Reasoning Network for Panoptic Segmentation

CVPR 2020arXiv
0
citations

Vision-Language Navigation With Self-Supervised Auxiliary Reasoning Tasks

CVPR 2020arXiv
0
citations

SP-NAS: Serial-to-Parallel Backbone Search for Object Detection

CVPR 2020
0
citations

Vision-Dialog Navigation by Exploring Cross-Modal Memory

CVPR 2020arXiv
0
citations

TransNAS-Bench-101: Improving Transferability and Generalizability of Cross-Task Neural Architecture Search

CVPR 2021
0
citations

Dynamic Slimmable Network

CVPR 2021arXiv
0
citations

SOON: Scenario Oriented Object Navigation With Graph-Based Exploration

CVPR 2021arXiv
0
citations

Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism

CVPR 2022arXiv
0
citations

Cross-Modal Clinical Graph Transformer for Ophthalmic Report Generation

CVPR 2022
0
citations

Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search

CVPR 2022
0
citations

Dressing in the Wild by Watching Dance Videos

CVPR 2022arXiv
0
citations

Knowledge Distillation via the Target-Aware Transformer

CVPR 2022arXiv
0
citations

Beyond Fixation: Dynamic Window Visual Transformer

CVPR 2022arXiv
0
citations

ADAPT: Vision-Language Navigation With Modality-Aligned Action Prompts

CVPR 2022arXiv
0
citations

Automated Progressive Learning for Efficient Training of Vision Transformers

CVPR 2022arXiv
0
citations

M5Product: Self-Harmonized Contrastive Learning for E-Commercial Multi-Modal Pretraining

CVPR 2022
0
citations

BodyGAN: General-Purpose Controllable Neural Human Body Generation

CVPR 2022
0
citations

Dynamic Graph Enhanced Contrastive Learning for Chest X-Ray Report Generation

CVPR 2023
0
citations

DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-Training via Word-Region Alignment

CVPR 2023arXiv
0
citations

Learning To Segment Every Referring Object Point by Point

CVPR 2023
0
citations

Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving

CVPR 2023arXiv
0
citations

GP-VTON: Towards General Purpose Virtual Try-On via Collaborative Local-Flow Global-Parsing Learning

CVPR 2023
0
citations

CLIP2: Contrastive Language-Image-Point Pretraining From Real-World Point Cloud Data

CVPR 2023
0
citations

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining

CVPR 2023arXiv
0
citations

Towards Computational Baby Learning: A Weakly-Supervised Approach for Object Detection

ICCV 2015
0
citations

Human Parsing With Contextualized Convolutional Neural Network

ICCV 2015
0
citations

Dual Motion GAN for Future-Flow Embedded Video Prediction

ICCV 2017arXiv
0
citations

Temporal Dynamic Graph LSTM for Action-Driven Video Object Detection

ICCV 2017arXiv
0
citations

Recurrent Topic-Transition GAN for Visual Paragraph Generation

ICCV 2017arXiv
0
citations

Nonparametric Variational Auto-Encoders for Hierarchical Representation Learning

ICCV 2017arXiv
0
citations

FW-GAN: Flow-Navigated Warping GAN for Video Virtual Try-On

ICCV 2019
0
citations

Auto-FPN: Automatic Network Architecture Adaptation for Object Detection Beyond Classification

ICCV 2019
0
citations

Towards Multi-Pose Guided Virtual Try-On Network

ICCV 2019
0
citations

Meta R-CNN: Towards General Solver for Instance-Level Low-Shot Learning

ICCV 2019
0
citations

Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift

ICCV 2021
0
citations

M3D-VTON: A Monocular-to-3D Virtual Try-On Network

ICCV 2021
0
citations

UltraPose: Synthesizing Dense Pose With 1 Billion Points by Human-Body Decoupling 3D Model

ICCV 2021
0
citations

Voxel Transformer for 3D Object Detection

ICCV 2021arXiv
0
citations

Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-Modal Pretraining

ICCV 2021arXiv
0
citations

Self-Motivated Communication Agent for Real-World Vision-Dialog Navigation

ICCV 2021
0
citations

Vision-Language Navigation With Random Environmental Mixup

ICCV 2021arXiv
0
citations

Linguistically Routing Capsule Network for Out-of-Distribution Visual Question Answering

ICCV 2021
0
citations

BossNAS: Exploring Hybrid CNN-Transformers With Block-Wisely Self-Supervised Neural Architecture Search

ICCV 2021arXiv
0
citations

NASOA: Towards Faster Task-Oriented Online Fine-Tuning With a Zoo of Models

ICCV 2021arXiv
0
citations

Exploring Geometry-Aware Contrast and Clustering Harmonization for Self-Supervised 3D Object Detection

ICCV 2021
0
citations

Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection

ICCV 2021
0
citations

Exploring Inter-Channel Correlation for Diversity-Preserved Knowledge Distillation

ICCV 2021
0
citations

Coordinate Transformer: Achieving Single-stage Multi-person Mesh Recovery from Videos

ICCV 2023arXiv
0
citations

CTP:Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation

ICCV 2023arXiv
0
citations

MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation

ICCV 2023arXiv
0
citations

DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability

ICCV 2023arXiv
0
citations

GrowCLIP: Data-Aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-Training

ICCV 2023arXiv
0
citations

FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration

ICCV 2023arXiv
0
citations

DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment

ICCV 2023arXiv
0
citations

LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts

ICCV 2023
0
citations

Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images

ICCV 2023arXiv
0
citations

CurveLane-NAS: Unifying Lane-Sensitive Architecture Search and Adaptive Point Blending

ECCV 2020
0
citations

CATCH: Context-based Meta Reinforcement Learning for Transferrable Architecture Search

ECCV 2020
0
citations

Open-World Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding

ECCV 2022
0
citations

SiRi: A Simple Selective Retraining Mechanism for Transformer-Based Visual Grounding

ECCV 2022
0
citations

CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving

ECCV 2022
0
citations

Perceptual Generative Adversarial Networks for Small Object Detection

CVPR 2017arXiv
0
citations

RoboPearls: Editable Video Simulation for Robot Manipulation

ICCV 2025
0
citations

A₀ : An Affordance-Aware Hierarchical Model for General Robotic Manipulation

ICCV 2025
0
citations

RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation

ICCV 2025
0
citations

DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder

AAAI 2025
0
citations

MUSE: Mamba Is Efficient Multi-scale Learner for Text-video Retrieval

AAAI 2025
0
citations

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving

AAAI 2025
0
citations

Affordances-Oriented Planning Using Foundation Models for Continuous Vision-Language Navigation

AAAI 2025
0
citations

3D Visibility-Aware Generalizable Neural Radiance Fields for Interacting Hands

AAAI 2024
0
citations

Towards Detailed Text-to-Motion Synthesis via Basic-to-Advanced Hierarchical Diffusion Model

AAAI 2024
0
citations

Holistic Autonomous Driving Understanding by Bird’s-Eye-View Injected Multi-Modal Large Models

CVPR 2024
0
citations

Reversible Recursive Instance-Level Object Segmentation

CVPR 2016
0
citations

Deep Structured Scene Parsing by Learning With Image Descriptions

CVPR 2016
0
citations

Semantic Object Parsing With Local-Global Long Short-Term Memory

CVPR 2016
0
citations

Attention-Aware Face Hallucination via Deep Reinforcement Learning

CVPR 2017arXiv
0
citations

Recurrent 3D Pose Sequence Machines

CVPR 2017arXiv
0
citations

Soft-Gated Warping-GAN for Pose-Guided Person Image Synthesis

NeurIPS 2018
0
citations

Hybrid Knowledge Routed Modules for Large-scale Object Detection

NeurIPS 2018
0
citations

Symbolic Graph Reasoning Meets Convolutions

NeurIPS 2018
0
citations

Deep Generative Models with Learnable Knowledge Constraints

NeurIPS 2018
0
citations

Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation

NeurIPS 2018
0
citations

Heterogeneous Graph Learning for Visual Commonsense Reasoning

NeurIPS 2019
0
citations

AutoSync: Learning to Synchronize for Data-Parallel Distributed Deep Learning

NeurIPS 2020
0
citations

Towards Interpretable Natural Language Understanding with Explanations as Latent Variables

NeurIPS 2020
0
citations

Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation

NeurIPS 2020
0
citations

Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN

NeurIPS 2021
0
citations

DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection

NeurIPS 2022
0
citations

CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation

NeurIPS 2022
0
citations

Structure-Preserving 3D Garment Modeling with Neural Sewing Machines

NeurIPS 2022
0
citations

Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving

NeurIPS 2022
0
citations

Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark

NeurIPS 2022
0
citations

Towards Hard-pose Virtual Try-on via 3D-aware Global Correspondence Learning

NeurIPS 2022
0
citations

RIO: A Benchmark for Reasoning Intention-Oriented Objects in Open Environments

NeurIPS 2023
0
citations

Toward Controlled Generation of Text

ICML 2017
0
citations

Multivariate-Information Adversarial Ensemble for Scalable Joint Distribution Matching

ICML 2019
0
citations