Xiu Li

54

Papers

1,000

Total Citations

Papers (54)

Disentangled Non-local Neural Networks

Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos

Taming Rectified Flow for Inversion and Editing

Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives

MultiBooth: Towards Generating All Your Concepts in an Image from Text

Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders

Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation

ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

MagicArticulate: Make Your 3D Models Articulation-Ready

SkillMimic: Learning Basketball Interaction Skills from Demonstrations

Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning

AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward

GIVEPose: Gradual Intra-class Variation Elimination for RGB-based Category-Level Object Pose Estimation

InterSyn: Interleaved Learning for Dynamic Motion Synthesis in the Wild

ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning

FAFA: Frequency-Aware Flow-Aided Self-Supervision for Underwater Object Pose Estimation

A Plug-and-Play Physical Motion Restoration Approach for In-the-Wild High-Difficulty Motions

A Self-Boosting Framework for Automated Radiographic Report Generation

FLAG3D: A 3D Fitness Activity Dataset With Language Instruction

Camouflaged Object Detection With Feature Decomposition and Edge Reconstruction

Neighborhood Preserving Hashing for Scalable Video Retrieval

Frequency-Aware Spatiotemporal Transformers for Video Inpainting Detection

Universal and Flexible Optical Aberration Correction Using Deep-Prior Based Deconvolution

Degradation-Resistant Unfolding Network for Heterogeneous Image Fusion

FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation

BoxSnake: Polygonal Instance Segmentation with Box Supervision

Neural Capture of Animatable 3D Human from Monocular Video

ScalableViT: Rethinking the Context-Oriented Generalization of Vision Transformer

4D Association Graph for Realtime Multi-Person Motion Capture Using Multiple Video Cameras

MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation

Hunyuan-Portrait: Implicit Condition Control for Enhanced Portrait Animation

MaTe: Images Are All You Need for Material Transfer via Diffusion Transformer

REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment

SUMO: Search-Based Uncertainty Estimation for Model-Based Offline Reinforcement Learning

Chain of Generation: Multi-Modal Gesture Synthesis via Cascaded Conditional Control

Cross-Modal Match for Language Conditioned 3D Object Grounding

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection

Cross-Domain Policy Adaptation by Capturing Representation Mismatch

Exploration and Anti-Exploration with Distributional Random Network Distillation

PEARL: Zero-shot Cross-task Preference Alignment and Robust Reward Learning for Robotic Manipulation

SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation

Joint Training of Cascaded CNN for Face Detection

Scale-Aware Face Detection

Structure From Recurrent Motion: From Rigidity to Recurrency

Self-Supervised Video Hashing via Bidirectional Transformers

Mildly Conservative Q-Learning for Offline Reinforcement Learning

OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression

Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination

SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation

Weakly-Supervised Concealed Object Segmentation with SAM-based Pseudo Labeling and Multi-scale Feature Grouping

MeGraph: Capturing Long-Range Interactions by Alternating Local and Hierarchical Aggregation on Multi-Scaled Graph Hierarchy

GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction