Dacheng Tao

100

Papers

398

Total Citations

Papers (100)

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models

Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift

Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages

SimDistill: Simulated Multi-Modal Distillation for BEV 3D Object Detection

One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls

MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI

Synergy of Sight and Semantics: Visual Intention Understanding with CLIP

Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning

Free-Form Motion Control: Controlling the 6D Poses of Camera and Objects in Video Generation

Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer

Adaptive Defense against Harmful Fine-Tuning for Large Language Models via Bayesian Data Scheduler

NeurIPS 2025arXiv

Learning system dynamics without forgetting

ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks

AiDE-Q: Synthetic Labeled Datasets Can Enhance Learning Models for Quantum Property Estimation

NeurIPS 2025arXiv

LLM Data Selection and Utilization via Dynamic Bi-level Optimization

LoRA Recycle: Unlocking Tuning-Free Few-Shot Adaptability in Visual Foundation Models by Recycling Pre-Tuned LoRAs

Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition

Harnessing Text-to-Image Diffusion Models for Point Cloud Self-Supervised Learning

CopyrightShield: Enhancing Diffusion Model Security Against Copyright Infringement Attacks

Rethink Sparse Signals for Pose-guided Text-to-image Generation

Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning

NeurIPS 2025arXiv

Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning

Modeling All Response Surfaces in One for Conditional Search Spaces

TD²-Net: Toward Denoising and Debiasing for Video Scene Graph Generation

Multi-Step Denoising Scheduled Sampling: Towards Alleviating Exposure Bias for Diffusion Models

Sheared Backpropagation for Fine-tuning Foundation Models

UniMix: Towards Domain Adaptive and Generalizable LiDAR Semantic Segmentation in Adverse Weather

FREE: Faster and Better Data-Free Meta-Learning

Local-consistent Transformation Learning for Rotation-invariant Point Cloud Analysis

Learn from Downstream and Be Yourself in Multimodal Large Language Models Fine-Tuning

HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning

Q-value Regularized Transformer for Offline Reinforcement Learning

Towards Theoretical Understandings of Self-Consuming Generative Models

Sparse Model Inversion: Efficient Inversion of Vision Transformers for Data-Free Applications

Merging Multi-Task Models via Weight-Ensembling Mixture of Experts

Generalization Analysis of Stochastic Weight Averaging with General Sampling

Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models

Representation Surgery for Multi-Task Model Merging

Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases

GPS-Net: Graph Property Sensing Network for Scene Graph Generation

Recurrent Feature Reasoning for Image Inpainting

On Positive-Unlabeled Classification in GAN

Distilling Knowledge From Graph Convolutional Networks

Learning Oracle Attention for High-Fidelity Face Completion

Syntax-Aware Action Targeting for Video Captioning

Context Aware Graph Convolution for Skeleton-Based Action Recognition

FeatureFlow: Robust Video Interpolation via Structure-to-Texture Generation

Learning Unseen Concepts via Hierarchical Decomposition and Composition

PuppeteerGAN: Arbitrary Portrait Animation With Semantic-Aware Appearance Transformation

AdderSR: Towards Energy Efficient Image Super-Resolution

Online Multiple Object Tracking With Cross-Task Synergy

Scene Essence

HourNAS: Extremely Fast Neural Architecture Search Through an Hourglass Lens

Tree-Like Decision Distillation

Learning Progressive Point Embeddings for 3D Point Cloud Generation

Turning Frequency to Resolution: Video Super-Resolution via Event Cameras

Glance and Gaze: Inferring Action-Aware Points for One-Stage Human-Object Interaction Detection

Where and What? Examining Interpretable Disentangled Representations

Detecting Human-Object Interaction via Fabricated Compositional Learning

Affordance Transfer Learning for Human-Object Interaction Detection

Manifold Regularized Dynamic Network Pruning

Amalgamating Knowledge From Heterogeneous Graph Neural Networks

Contrastive Boundary Learning for Point Cloud Segmentation

Alleviating Semantics Distortion in Unsupervised Low-Level Image-to-Image Translation via Structure Consistency Constraint

BatchFormer: Learning To Explore Sample Relationships for Robust Representation Learning

DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers

GMFlow: Learning Optical Flow via Global Matching

Recurrent Glimpse-Based Decoder for Detection With Transformer

Learning To Collaborate in Decentralized Learning of Personalized Models

ART-Point: Improving Rotation Robustness of Point Cloud Classifiers via Adversarial Rotation

Source-Free Domain Adaptation via Distribution Estimation

Distillation Using Oracle Queries for Transformer-Based Human-Object Interaction Detection

Defensive Patches for Robust Recognition in the Physical World

HL-Net: Heterophily Learning Network for Scene Graph Generation

Modeling Image Composition for Complex Scene Generation

Learning Affordance Grounding From Exocentric Images

Few-Shot Backdoor Defense Using Shapley Estimation

Patch Slimming for Efficient Vision Transformers

RU-Net: Regularized Unrolling Network for Scene Graph Generation

Continual Learning With Lifelong Vision Transformer

Self-Augmented Unpaired Image Dehazing via Density and Depth Decomposition

FIBA: Frequency-Injection Based Backdoor Attack in Medical Image Analysis

Bridged Transformer for Vision and Point Cloud 3D Object Detection

Fine-Tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning

Dynamic Focus-Aware Positional Queries for Semantic Segmentation

Leverage Interactive Affinity for Affordance Learning

Upcycling Models Under Domain and Category Shift

Learnable Skeleton-Aware 3D Point Cloud Sampling

CLAMP: Prompt-Based Contrastive Learning for Connecting Language and Animal Pose

Exploring the Relationship Between Architectural Design and Adversarially Robust Generalization

Generating Holistic 3D Human Motion From Speech

Architecture, Dataset and Model-Scale Agnostic Data-Free Meta-Learning

DeepSolo: Let Transformer Decoder With Explicit Points Solo for Text Spotting

Make Landscape Flatter in Differentially Private Federated Learning

From Images to Textual Prompts: Zero-Shot Visual Question Answering With Frozen Large Language Models

Deep Graph Reprogramming

TriDet: Temporal Action Detection With Relative Boundary Modeling

Referring Image Matting

Out-of-Boundary View Synthesis Towards Full-Frame Video Stabilization