Hao Tang

63
Papers
207
Total Citations

Papers (63)

Delving into Multimodal Prompting for Fine-Grained Visual Classification

AAAI 2024arXiv
55
citations

Stable-Hair: Real-World Hair Transfer via Diffusion Model

AAAI 2025
33
citations

G2P-DDM: Generating Sign Pose Sequence from Gloss Sequence with Discrete Diffusion Model

AAAI 2024arXiv
31
citations

HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud

CVPR 2024
17
citations

Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer

CVPR 2024
16
citations

MambaIC: State Space Models for High-Performance Learned Image Compression

CVPR 2025
14
citations

Distilling ODE Solvers of Diffusion Models into Smaller Steps

CVPR 2024
10
citations

DiffFNO: Diffusion Fourier Neural Operator

CVPR 2025
8
citations

RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness

NeurIPS 2025
7
citations

Learning with Unreliability: Fast Few-shot Voxel Radiance Fields with Relative Geometric Consistency

CVPR 2024
5
citations

Towards Robust 3D Pose Transfer with Adversarial Learning

CVPR 2024
5
citations

A Training-free Synthetic Data Selection Method for Semantic Segmentation

AAAI 2025
4
citations

DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding

ICCV 2025
1
citations

Boosting Adversarial Transferability with Spatial Adversarial Alignment

NeurIPS 2025
1
citations

Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation

CVPR 2020arXiv
0
citations

DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis

CVPR 2022
0
citations

Learning To Restore 3D Face From In-the-Wild Degraded Images

CVPR 2022
0
citations

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

CVPR 2022arXiv
0
citations

Topology-Preserving Shape Reconstruction and Registration via Neural Diffeomorphic Flow

CVPR 2022arXiv
0
citations

Physically-Guided Disentangled Implicit Rendering for 3D Face Modeling

CVPR 2022
0
citations

Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model

CVPR 2022arXiv
0
citations

Graph Transformer GANs for Graph-Constrained House Generation

CVPR 2023arXiv
0
citations

SMAE: Few-Shot Learning for HDR Deghosting With Saturation-Aware Masked Autoencoders

CVPR 2023arXiv
0
citations

Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration

CVPR 2023arXiv
0
citations

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis

CVPR 2023arXiv
0
citations

DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network

CVPR 2023arXiv
0
citations

Pruning Parameterization With Bi-Level Optimization for Efficient Semantic Segmentation on the Edge

CVPR 2023
0
citations

Master: Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic Style Transfer

CVPR 2023arXiv
0
citations

Intrinsic-Extrinsic Preserved GANs for Unsupervised 3D Pose Transfer

ICCV 2021arXiv
0
citations

Transformer-Based Attention Networks for Continuous Pixel-Wise Prediction

ICCV 2021arXiv
0
citations

Recurrent Mask Refinement for Few-Shot Medical Image Segmentation

ICCV 2021arXiv
0
citations

Learning Concordant Attention via Target-aware Alignment for Visible-Infrared Person Re-identification

ICCV 2023
0
citations

UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation

ICCV 2023
0
citations

XingGAN for Person Image Generation

ECCV 2020
0
citations

PPT: Token-Pruned Pose Transformer for Monocular and Multi-View Human Pose Estimation

ECCV 2022
0
citations

Toward Adaptive Large Language Models Structured Pruning via Hybrid-grained Weight Importance Assessment

AAAI 2025
0
citations

PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model

CVPR 2025
0
citations

HOIGPT: Learning Long-Sequence Hand-Object Interaction with Language Models

CVPR 2025
0
citations

Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

CVPR 2025
0
citations

Similarity Memory Prior is All You Need for Medical Image Segmentation

ICCV 2025
0
citations

MaskSAM: Auto-prompt SAM with Mask Classification for Volumetric Medical Image Segmentation

ICCV 2025
0
citations

ARNet: Self-Supervised FG-SBIR with Unified Sample Feature Alignment and Multi-Scale Token Recycling

AAAI 2025
0
citations

Multi-scale Activation, Refinement, and Aggregation: Exploring Diverse Cues for Fine-Grained Bird Recognition

AAAI 2025
0
citations

SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation

CVPR 2024
0
citations

ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization

CVPR 2024
0
citations

Versatile Navigation Under Partial Observability via Value-guided Diffusion Policy

CVPR 2024
0
citations

On the Faithfulness of Vision Transformer Explanations

CVPR 2024
0
citations

Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation

CVPR 2018arXiv
0
citations

Multi-Channel Attention Selection GAN With Cascaded Semantic Guidance for Cross-View Image Translation

CVPR 2019
0
citations

3D-Aware Semantic-Guided Generative Model for Human Synthesis

ECCV 2022
0
citations

Towards Interpretable Video Super-Resolution via Alternating Optimization

ECCV 2022
0
citations

Compiler-Aware Neural Architecture Search for On-Mobile Real-Time Super-Resolution

ECCV 2022
0
citations

Mining Relations among Cross-Frame Affinities for Video Semantic Segmentation

ECCV 2022
0
citations

EgoTracks: A Long-term Egocentric Visual Object Tracking Dataset

NeurIPS 2023
0
citations

Belief Propagation Neural Networks

NeurIPS 2020
0
citations

Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals

NeurIPS 2020
0
citations

Towards Scale-Invariant Graph-related Problem Solving by Iterative Homogeneous GNNs

NeurIPS 2020
0
citations

HotBEV: Hardware-oriented Transformer-based Multi-View 3D Detector for BEV Perception

NeurIPS 2023
0
citations

PackQViT: Faster Sub-8-bit Vision Transformers via Full and Packed Quantization on the Mobile

NeurIPS 2023
0
citations

SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning

ECCV 2022
0
citations

LART: Neural Correspondence Learning with Latent Regularization Transformer for 3D Motion Transfer

NeurIPS 2023
0
citations

Does Graph Distillation See Like Vision Dataset Counterpart?

NeurIPS 2023
0
citations

Object Reprojection Error (ORE): Camera pose benchmarks from lightweight tracking annotations

NeurIPS 2023
0
citations