Dacheng Tao

217

Papers

1,232

Total Citations

Papers (217)

MUlti-Store Tracker (MUSTer): A Cognitive Psychology Inspired Approach to Object Tracking

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

CNNpack: Packing Convolutional Neural Networks in the Frequency Domain

Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models

Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift

Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages

SimDistill: Simulated Multi-Modal Distillation for BEV 3D Object Detection

One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls

Synergy of Sight and Semantics: Visual Intention Understanding with CLIP

MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI

Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning

Free-Form Motion Control: Controlling the 6D Poses of Camera and Objects in Video Generation

Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer

ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks

Learning system dynamics without forgetting

Adaptive Defense against Harmful Fine-Tuning for Large Language Models via Bayesian Data Scheduler

AiDE-Q: Synthetic Labeled Datasets Can Enhance Learning Models for Quantum Property Estimation

LLM Data Selection and Utilization via Dynamic Bi-level Optimization

Sparse Model Inversion: Efficient Inversion of Vision Transformers for Data-Free Applications

Merging Multi-Task Models via Weight-Ensembling Mixture of Experts

Generalization Analysis of Stochastic Weight Averaging with General Sampling

Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models

Representation Surgery for Multi-Task Model Merging

Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases

Saliency Propagation From Simple to Difficult

FaLRR: A Fast Low Rank Representation Solver

A Maximum Entropy Feature Descriptor for Age Invariant Face Recognition

Occlusion Boundary Detection via Deep Exploration of Context

Part-Stacked CNN for Fine-Grained Visual Categorization

Conditional Graphical Lasso for Multi-Label Image Classification

Multilinear Hyperplane Hashing

Improving Training of Deep Neural Networks via Singular Value Bounding

On Compressing Deep Models by Low Rank and Sparse Decomposition

Geometry-Aware Scene Text Detection With Instance Transformation Network

Deep Ordinal Regression Network for Monocular Depth Estimation

Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval

An Efficient and Provable Approach for Mixture Proportion Estimation Using Linear Independence Assumption

LoRA Recycle: Unlocking Tuning-Free Few-Shot Adaptability in Visual Foundation Models by Recycling Pre-Tuned LoRAs

Geometry-Consistent Generative Adversarial Networks for One-Sided Unsupervised Domain Mapping

DistillHash: Unsupervised Deep Hashing by Distilling Data Pairs

On Exploring Undetermined Relationships for Visual Relationship Detection

Deep Modular Co-Attention Networks for Visual Question Answering

World From Blur

Geometry-Aware Symmetric Domain Adaptation for Monocular Depth Estimation

Self-Supervised Representation Learning by Rotation Feature Decoupling

Image-Question-Answer Synergistic Network for Visual Dialog

Fast Spatio-Temporal Residual Network for Video Super-Resolution

GPS-Net: Graph Property Sensing Network for Scene Graph Generation

Recurrent Feature Reasoning for Image Inpainting

On Positive-Unlabeled Classification in GAN

Distilling Knowledge From Graph Convolutional Networks

Learning Oracle Attention for High-Fidelity Face Completion

Syntax-Aware Action Targeting for Video Captioning

Context Aware Graph Convolution for Skeleton-Based Action Recognition

FeatureFlow: Robust Video Interpolation via Structure-to-Texture Generation

Learning Unseen Concepts via Hierarchical Decomposition and Composition

PuppeteerGAN: Arbitrary Portrait Animation With Semantic-Aware Appearance Transformation

AdderSR: Towards Energy Efficient Image Super-Resolution

Online Multiple Object Tracking With Cross-Task Synergy

Scene Essence

HourNAS: Extremely Fast Neural Architecture Search Through an Hourglass Lens

Tree-Like Decision Distillation

Learning Progressive Point Embeddings for 3D Point Cloud Generation

Turning Frequency to Resolution: Video Super-Resolution via Event Cameras

Glance and Gaze: Inferring Action-Aware Points for One-Stage Human-Object Interaction Detection

Where and What? Examining Interpretable Disentangled Representations

Detecting Human-Object Interaction via Fabricated Compositional Learning

Affordance Transfer Learning for Human-Object Interaction Detection

Manifold Regularized Dynamic Network Pruning

Amalgamating Knowledge From Heterogeneous Graph Neural Networks

Contrastive Boundary Learning for Point Cloud Segmentation

Alleviating Semantics Distortion in Unsupervised Low-Level Image-to-Image Translation via Structure Consistency Constraint

BatchFormer: Learning To Explore Sample Relationships for Robust Representation Learning

DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers

GMFlow: Learning Optical Flow via Global Matching

Recurrent Glimpse-Based Decoder for Detection With Transformer

Learning To Collaborate in Decentralized Learning of Personalized Models

ART-Point: Improving Rotation Robustness of Point Cloud Classifiers via Adversarial Rotation

Source-Free Domain Adaptation via Distribution Estimation

Distillation Using Oracle Queries for Transformer-Based Human-Object Interaction Detection

Defensive Patches for Robust Recognition in the Physical World

HL-Net: Heterophily Learning Network for Scene Graph Generation

Modeling Image Composition for Complex Scene Generation

Learning Affordance Grounding From Exocentric Images

Few-Shot Backdoor Defense Using Shapley Estimation

Patch Slimming for Efficient Vision Transformers

RU-Net: Regularized Unrolling Network for Scene Graph Generation

Continual Learning With Lifelong Vision Transformer

Self-Augmented Unpaired Image Dehazing via Density and Depth Decomposition

FIBA: Frequency-Injection Based Backdoor Attack in Medical Image Analysis

Bridged Transformer for Vision and Point Cloud 3D Object Detection

Fine-Tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning

Dynamic Focus-Aware Positional Queries for Semantic Segmentation

Leverage Interactive Affinity for Affordance Learning

Upcycling Models Under Domain and Category Shift

Learnable Skeleton-Aware 3D Point Cloud Sampling

CLAMP: Prompt-Based Contrastive Learning for Connecting Language and Animal Pose

Exploring the Relationship Between Architectural Design and Adversarially Robust Generalization

Generating Holistic 3D Human Motion From Speech

Architecture, Dataset and Model-Scale Agnostic Data-Free Meta-Learning

DeepSolo: Let Transformer Decoder With Explicit Points Solo for Text Spotting

Make Landscape Flatter in Differentially Private Federated Learning

From Images to Textual Prompts: Zero-Shot Visual Question Answering With Frozen Large Language Models

Deep Graph Reprogramming

TriDet: Temporal Action Detection With Relative Boundary Modeling

Referring Image Matting

Augmenting Strong Supervision Using Web Data for Fine-Grained Categorization

Multi-Modal Factorized Bilinear Pooling With Co-Attention Learning for Visual Question Answering

Centered Weight Normalization in Accelerating Training of Deep Neural Networks

A Coarse-Fine Network for Keypoint Localization

A Joint Intrinsic-Extrinsic Prior Model for Retinex

Self-Supervised Representation Learning From Multi-Domain Data

Approximated Bilinear Modules for Temporal Modeling

Asymmetric Cross-Guided Attention Network for Actor and Action Video Segmentation From Natural Language Query

Progressive Reconstruction of Visual Structure for Image Inpainting

Deep Reinforcement Active Learning for Human-in-the-Loop Person Re-Identification

Deep Metric Learning With Tuplet Margin Loss

Not All Parts Are Created Equal: 3D Pose Estimation by Modeling Bi-Directional Dependencies of Body Parts

Learning a Mixture of Granularity-Specific Experts for Fine-Grained Categorization

Collect and Select: Semantic Alignment Metric Learning for Few-Shot Learning

Out-of-Boundary View Synthesis Towards Full-Frame Video Stabilization

Meta-Aggregator: Learning To Aggregate for 1-Bit Graph Neural Networks

Adaptive Curriculum Learning

SynFace: Face Recognition With Synthetic Data

Stochastic Partial Swap: Enhanced Model Generalization and Interpretability for Fine-Grained Recognition

TGRNet: A Table Graph Reconstruction Network for Table Structure Recognition

DDS2M: Self-Supervised Denoising Diffusion Spatio-Spectral Model for Hyperspectral Image Restoration

Exploring Temporal Concurrency for Video-Language Representation Learning

Domain Specified Optimization for Deployment Authorization

Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning

Knowledge-Aware Federated Active Learning with Non-IID Data

Class-Aware Patch Embedding Adaptation for Few-Shot Image Classification

Short-Term and Long-Term Context Aggregation Network for Video Inpainting

Hallucinating Visual Instances in Total Absentia

Learning Disentangled Representations with Latent Variation Predictability

Symbiotic Adversarial Learning for Attribute-based Person Search

Visual Compositional Learning for Human-Object Interaction Detection

Spatiotemporal Attacks for Embodied Agents

Polysemy Deciphering Network for Human-Object Interaction Detection

Learning Propagation Rules for Attribution Map Generation

On Dropping Clusters to Regularize Graph Convolutional Neural Networks

Learning Graph Neural Networks for Image Style Transfer

Towards Data-Efficient Detection Transformers

ReAct: Temporal Action Detection with Relational Queries

Online Continual Learning with Contrastive Vision Transformer

VSA: Learning Varied-Size Window Attention in Vision Transformers

Balancing Stability and Plasticity through Advanced Null Space in Continual Learning

Discovering Human-Object Interaction Concepts via Self-Compositional Learning

PolyphonicFormer: Unified Query Learning for Depth-Aware Video Panoptic Segmentation

Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation

RegionCL: Exploring Contrastive Region Pairs for Self-Supervised Representation Learning

BMD: A General Class-Balanced Multicentric Dynamic Prototype Strategy for Source-Free Domain Adaptation

"Fashionformer: A Simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition"

"Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular Depth Estimation by Integrating IMU Motion Dynamics"

ST-P3: End-to-End Vision-Based Autonomous Driving via Spatial-Temporal Feature Learning

"JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes"

MirrorGAN: Learning Text-To-Image Generation by Redescription

Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition

Harnessing Text-to-Image Diffusion Models for Point Cloud Self-Supervised Learning

CopyrightShield: Enhancing Diffusion Model Security Against Copyright Infringement Attacks

Rethink Sparse Signals for Pose-guided Text-to-image Generation

Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning

Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning

Modeling All Response Surfaces in One for Conditional Search Spaces

TD²-Net: Toward Denoising and Debiasing for Video Scene Graph Generation

Multi-Step Denoising Scheduled Sampling: Towards Alleviating Exposure Bias for Diffusion Models

Sheared Backpropagation for Fine-tuning Foundation Models

UniMix: Towards Domain Adaptive and Generalizable LiDAR Semantic Segmentation in Adverse Weather

FREE: Faster and Better Data-Free Meta-Learning

Local-consistent Transformation Learning for Rotation-invariant Point Cloud Analysis

Learn from Downstream and Be Yourself in Multimodal Large Language Models Fine-Tuning

HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning

Q-value Regularized Transformer for Offline Reinforcement Learning

Towards Theoretical Understandings of Self-Consuming Generative Models

Learning Versatile Filters for Efficient Convolutional Neural Networks

Dual Swap Disentangling

LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning

Theoretical Analysis of Adversarial Learning: A Minimax Approach

Likelihood-Free Overcomplete ICA and Applications In Causal Discovery

NeurIPS 2019arXiv

Category Anchor-Guided Unsupervised Domain Adaptation for Semantic Segmentation

Learning from Bad Data via Generation

Positive-Unlabeled Compression on the Cloud

Learn, Imagine and Create: Text-to-Image Generation from Prior Knowledge

Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence

Auto Learning Attention

Searching for Low-Bit Weights in Quantized Neural Networks

Part-dependent Label Noise: Towards Instance-dependent Label Noise

SCOP: Scientific Control for Reliable Neural Network Pruning

Video Frame Interpolation without Temporal Priors

Hard Example Generation by Texture Synthesis for Cross-domain Shape Similarity Learning

Domain Generalization via Entropy Regularization

Class-Disentanglement and Applications in Adversarial Detection and Defense

Gauge Equivariant Transformer

ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

CGLB: Benchmark Tasks for Continual Graph Learning

APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking

Benefits of Permutation-Equivariance in Auction Mechanisms

Escaping from the Barren Plateau via Gaussian Initializations in Deep Variational Quantum Circuits

Adversarial Auto-Augment with Label Preservation: A Representation Learning Principle Guided Approach

Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach

Inducing Neural Collapse in Imbalanced Learning: Do We Really Need a Learnable Classifier at the End of Deep Neural Network?

ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation

VanillaNet: the Power of Minimalism in Deep Learning

SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model

Extending the Design Space of Graph Neural Networks by Rethinking Folklore Weisfeiler-Lehman

MAG-GNN: Reinforcement Learning Boosted Graph Neural Network

ConDaFormer: Disassembled Transformer with Local Structure Enhancement for 3D Point Cloud Understanding

Stability and Generalization of the Decentralized Stochastic Gradient Descent Ascent Algorithm

Cocktail: Mixing Multi-Modality Control for Text-Conditional Image Generation

Domain Re-Modulation for Few-Shot Generative Domain Adaptation

Learning Better with Less: Effective Augmentation for Sample-Efficient Visual Reinforcement Learning

All Points Matter: Entropy-Regularized Distribution Alignment for Weakly-supervised 3D Segmentation

Understanding How Consistency Works in Federated Learning via Stage-wise Relaxed Initialization

Discovering Temporal Causal Relations from Subsampled Data

Domain Adaptation with Conditional Transferable Components

Algorithmic Stability and Hypothesis Complexity

Beyond Filters: Compact Feature Map for Portable Deep Model