Xiaokang Yang

91

Papers

285

Total Citations

Papers (91)

VidToMe: Video Token Merging for Zero-Shot Video Editing

Discrete Hyper-Graph Matching

Domain-Controlled Prompt Learning

Domain Prompt Learning with Quaternion Networks

Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction

Multimodal Latent Diffusion Model for Complex Sewing Pattern Generation

Monocular Identity-Conditioned Facial Reflectance Reconstruction

PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing

Partial Label Learning with a Partner

Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning

Tendency-driven Mutual Exclusivity for Weakly Supervised Incremental Semantic Segmentation

Disentangled Clothed Avatar Generation with Layered Representation

AniSDF: Fused-Granularity Neural Surfaces with Anisotropic Encoding for High-Fidelity 3D Reconstruction

Degradation-Modeled Multipath Diffusion for Tunable Metalens Photography

Rethinking Classifier Re-Training in Long-Tailed Recognition: Label Over-Smooth Can Balance

Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video

POMP: Physics-constrainable Motion Generative Model through Phase Manifolds

Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions

HyperET: Efficient Training in Hyperbolic Space for Multi-modal Large Language Models

Long-Term Correlation Tracking

Factors in Finetuning Deep Model for Object Detection With Long-Tail Distribution

Progressively Parsing Interactional Objects for Fine Grained Action Detection

Cascaded Interactional Targeting Network for Egocentric Video Analysis

Temporal Action Localization With Pyramid of Score Distribution Features

Video Segmentation via Multiple Granularity Analysis

Recurrent Modeling of Interaction Context for Collective Activity Recognition

Structure Preserving Video Prediction

Multiple Granularity Group Interaction Prediction

Crowd Counting via Adversarial Cross-Scale Consistency Pursuit

Fine-Grained Video Captioning for Sports Narrative

Learning Context Graph for Person Search

Deep Kinematics Analysis for Monocular 3D Human Pose Estimation

IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking

PointAugmenting: Cross-Modal Augmentation for 3D Object Detection

Bilevel Online Adaptation for Out-of-Domain Human Mesh Reconstruction

Combinatorial Learning of Graph Edit Distance via Dynamic Embedding

Learning Invisible Markers for Hidden Codes in Offline-to-Online Photography

Exploring Frequency Adversarial Attacks for Face Forgery Detection

Continual Predictive Learning From Videos

Align Representations With Base: A New Approach to Self-Supervised Learning

End-to-End Reconstruction-Classification Learning for Face Forgery Detection

NeRFVS: Neural Radiance Fields for Free View Synthesis via Geometry Scaffolds

Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective

3D-Aware Face Swapping

Deep Learning of Partial Graph Matching via Differentiable Top-K

Improving Fairness in Facial Albedo Estimation via Visual-Textual Cues

Active Finetuning: Exploiting Annotation Budget in the Pretraining-Finetuning Paradigm

A Matrix Decomposition Perspective to Multiple Graph Matching

Hierarchical Convolutional Features for Visual Tracking

S^3-Face: SSS-Compliant Facial Reflectance Estimation via Diffusion Priors

Variational Few-Shot Learning

Learning Combinatorial Embedding Networks for Deep Graph Matching

Learning To Track Objects From Unlabeled Videos

Self-Supervised Character-to-Character Distillation for Text Recognition

Dual Aggregation Transformer for Image Super-Resolution

ROME: Robustifying Memory-Efficient NAS via Topology Disentanglement and Gradient Accumulation

ActFormer: A GAN-based Transformer towards General Action-Conditioned 3D Human Motion Generation

Layered Neighborhood Expansion for Incremental Multiple Graph Matching

Hierarchical Style-based Networks for Motion Synthesis

Robust Tracking against Adversarial Attacks

Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering

EAutoDet: Efficient Architecture Search for Object Detection

Self-Supervised Learning of Visual Graph Matching

Performance Guaranteed Network Acceleration via High-Order Residual Quantization

OSDFace: One-Step Diffusion Model for Face Restoration

Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding

Star with Bilinear Mapping

Domain Generalization in CLIP via Learning with Diverse Text Prompts

PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution

Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations

QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation

A Token-level Text Image Foundation Model for Document Understanding

HAODiff: Human-Aware One-Step Diffusion via Dual-Prompt Guidance

DAWP: A framework for global observation forecasting via Data Assimilation and Weather Prediction in satellite observation space

FATE: Feature-Adapted Parameter Tuning for Vision-Language Models

SAM-PARSER: Fine-Tuning SAM Efficiently by Parameter Space Reconstruction

LERE: Learning-Based Low-Rank Matrix Recovery with Rank Estimation

Inter-X: Towards Versatile Human-Human Interaction Analysis

ReGenNet: Towards Human Action-Reaction Synthesis

CasCast: Skillful High-resolution Precipitation Nowcasting via Cascaded Modelling

Cross-Scene Crowd Counting via Deep Convolutional Neural Networks

Motion Part Regularization: Improving Action Recognition via Trajectory Selection

Video Prediction via Selective Sampling

Graduated Assignment for Joint Multi-Graph Matching and Clustering with Application to Unsupervised Graph Matching Network Learning

A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Graphs

Perceptual Attacks of No-Reference Image Quality Models with Human-in-the-Loop

ZARTS: On Zero-order Optimization for Neural Architecture Search

Iso-Dream: Isolating and Leveraging Noncontrollable Visual Dynamics in World Models

CageNeRF: Cage-based Neural Radiance Field for Generalized 3D Deformation and Animation

Adv-Attribute: Inconspicuous and Transferable Adversarial Attack on Face Recognition

NeRF-IBVS: Visual Servo Based on NeRF for Visual Localization and Navigation