Hao Tang

63

Papers

207

Total Citations

Papers (63)

Delving into Multimodal Prompting for Fine-Grained Visual Classification

Stable-Hair: Real-World Hair Transfer via Diffusion Model

G2P-DDM: Generating Sign Pose Sequence from Gloss Sequence with Discrete Diffusion Model

HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud

Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer

MambaIC: State Space Models for High-Performance Learned Image Compression

Distilling ODE Solvers of Diffusion Models into Smaller Steps

DiffFNO: Diffusion Fourier Neural Operator

RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness

Learning with Unreliability: Fast Few-shot Voxel Radiance Fields with Relative Geometric Consistency

Towards Robust 3D Pose Transfer with Adversarial Learning

A Training-free Synthetic Data Selection Method for Semantic Segmentation

DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding

Boosting Adversarial Transferability with Spatial Adversarial Alignment

Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation

DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis

Learning To Restore 3D Face From In-the-Wild Degraded Images

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

Topology-Preserving Shape Reconstruction and Registration via Neural Diffeomorphic Flow

Physically-Guided Disentangled Implicit Rendering for 3D Face Modeling

Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model

Graph Transformer GANs for Graph-Constrained House Generation

SMAE: Few-Shot Learning for HDR Deghosting With Saturation-Aware Masked Autoencoders

Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis

DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network

Pruning Parameterization With Bi-Level Optimization for Efficient Semantic Segmentation on the Edge

Master: Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic Style Transfer

Intrinsic-Extrinsic Preserved GANs for Unsupervised 3D Pose Transfer

Transformer-Based Attention Networks for Continuous Pixel-Wise Prediction

Recurrent Mask Refinement for Few-Shot Medical Image Segmentation

Learning Concordant Attention via Target-aware Alignment for Visible-Infrared Person Re-identification

UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation

XingGAN for Person Image Generation

PPT: Token-Pruned Pose Transformer for Monocular and Multi-View Human Pose Estimation

Toward Adaptive Large Language Models Structured Pruning via Hybrid-grained Weight Importance Assessment

PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model

HOIGPT: Learning Long-Sequence Hand-Object Interaction with Language Models

Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

Similarity Memory Prior is All You Need for Medical Image Segmentation

MaskSAM: Auto-prompt SAM with Mask Classification for Volumetric Medical Image Segmentation

ARNet: Self-Supervised FG-SBIR with Unified Sample Feature Alignment and Multi-Scale Token Recycling

Multi-scale Activation, Refinement, and Aggregation: Exploring Diverse Cues for Fine-Grained Bird Recognition

SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation

ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization

Versatile Navigation Under Partial Observability via Value-guided Diffusion Policy

On the Faithfulness of Vision Transformer Explanations

Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation

Multi-Channel Attention Selection GAN With Cascaded Semantic Guidance for Cross-View Image Translation

3D-Aware Semantic-Guided Generative Model for Human Synthesis

Towards Interpretable Video Super-Resolution via Alternating Optimization

Compiler-Aware Neural Architecture Search for On-Mobile Real-Time Super-Resolution

Mining Relations among Cross-Frame Affinities for Video Semantic Segmentation

EgoTracks: A Long-term Egocentric Visual Object Tracking Dataset

Belief Propagation Neural Networks

Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals

Towards Scale-Invariant Graph-related Problem Solving by Iterative Homogeneous GNNs

HotBEV: Hardware-oriented Transformer-based Multi-View 3D Detector for BEV Perception

PackQViT: Faster Sub-8-bit Vision Transformers via Full and Packed Quantization on the Mobile

SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning

LART: Neural Correspondence Learning with Latent Regularization Transformer for 3D Motion Transfer

Does Graph Distillation See Like Vision Dataset Counterpart?

Object Reprojection Error (ORE): Camera pose benchmarks from lightweight tracking annotations