Hao Chen

112

Papers

380

Total Citations

1

Affiliations

Affiliations

CMU

Papers (112)

VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis

ImageFolder: Autoregressive Image Generation with Folded Tokens

What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?

SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

360+x: A Panoptic Multi-modal Scene Understanding Dataset

OSV: One Step is Enough for High-Quality Image to Video Generation

Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration

FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior

SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

WeatherGFM: Learning a Weather Generalist Foundation Model via In-context Learning

Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks

DiffCalib: Reformulating Monocular Camera Calibration as Diffusion-Based Dense Incident Map Generation

TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings

FOCUS: Knowledge-enhanced Adaptive Visual Compression for Few-shot Whole Slide Image Classification

Fast Encoding and Decoding for Implicit Video Representation

PEACE: Empowering Geologic Map Holistic Understanding with MLLMs

SwitchLingua: The First Large-Scale Multilingual and Multi-Ethnic Code-Switching Dataset

Distilled Prompt Learning for Incomplete Multimodal Survival Prediction

Improving Multimodal Learning Balance and Sufficiency through Data Remixing

SDP-CROWN: Efficient Bound Propagation for Neural Network Verification with Tightness of Semidefinite Programming

Point Cloud Upsampling Using Conditional Diffusion Module with Adaptive Noise Suppression

VA-MoE: Variables-Adaptive Mixture of Experts for Incremental Weather Forecasting

Rethinking the Bias of Foundation Model under Long-tailed Distribution

Evaluating Program Semantics Reasoning with Type Inference in System $F$

Revisiting Open-Set Panoptic Segmentation

A General Framework for Learning from Weak Supervision

Completing Visual Objects via Bridging Generation and Segmentation

Floating Anchor Diffusion Model for Multi-motif Scaffolding

Post-hoc Part-Prototype Networks

CompeteAI: Understanding the Competition Dynamics of Large Language Model-based Agents

Generative Active Learning for Long-tailed Instance Segmentation

Towards a Self-contained Data-driven Global Weather Forecasting Framework

DCAN: Deep Contour-Aware Networks for Accurate Gland Segmentation

Progressively Complementarity-Aware Fusion Network for RGB-D Salient Object Detection

Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells

Memory-Efficient Hierarchical Neural Architecture Search for Image Denoising

BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network

NAS-FCOS: Fast Neural Architecture Search for Object Detection

Joint Generative and Contrastive Learning for Unsupervised Person Re-Identification

The Lottery Ticket Hypothesis for Object Recognition

Generic Perceptual Loss for Modeling Structured Output Dependencies

BoxInst: High-Performance Instance Segmentation With Box Annotations

Uni6D: A Unified CNN Framework Without Projection Breakdown for 6D Pose Estimation

What To Look at and Where: Semantic and Spatial Refined Transformer for Detecting Human-Object Interactions

A Voxel Graph CNN for Object Classification With Event Cameras

TubeR: Tubelet Transformer for Video Action Detection

Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models

Towards Scalable Neural Representation for Diverse Videos

Learning Conditional Attributes for Compositional Zero-Shot Learning

DoNet: Deep De-Overlapping Network for Cytology Instance Segmentation

Sparsely Annotated Semantic Segmentation With Adaptive Gaussian Mixtures

Image Quality-Aware Diagnosis via Meta-Knowledge Co-Embedding

Boosting Transductive Few-Shot Fine-Tuning With Margin-Based Uncertainty Weighting and Probability Regularization

HNeRV: A Hybrid Neural Representation for Videos

Learning To Fuse Monocular and Multi-View Cues for Multi-Frame Depth Estimation in Dynamic Scenes

Square Localization for Efficient and Accurate Object Detection

Explaining Neural Networks Semantically and Quantitatively

EchoTraffic: Enhancing Traffic Anomaly Understanding with Audio-Visual Insights

Detecting 11K Classes: Large Scale Object Detection Without Fine-Grained Bounding Boxes

Selective Feature Compression for Efficient Activity Recognition Inference

VidTr: Video Transformer Without Convolutions

ICE: Inter-Instance Contrastive Encoding for Unsupervised Person Re-Identification

Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image

Cross-Modal Translation and Alignment for Survival Analysis

CTVIS: Consistent Training for Online Video Instance Segmentation

MHCN: A Hyperbolic Neural Network Model for Multi-view Hierarchical Clustering

FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models

Traj-MAE: Masked Autoencoders for Trajectory Prediction

Multimodal Optimal Transport-based Co-Attention Transformer with Global Structure Consistency for Survival Prediction

SegPrompt: Boosting Open-World Segmentation via Category-Level Prompt Learning

Multi-view Self-supervised Disentanglement for General Image Denoising

Conditional Convolutions for Instance Segmentation

3D Human Shape and Pose from a Single Low-Resolution Image with Self-Supervised Learning

Yet Another Intermediate-Level Attack

"Unitail: Detecting, Reading, and Matching in Retail Scene"

Automatic Check-Out via Prototype-Based Classifier Learning from Single-Product Exemplars

FCOS: Fully Convolutional One-Stage Object Detection

Dual-Interrelated Diffusion Model for Few-Shot Anomaly Image Generation

Monocular and Generalizable Gaussian Talking Head Animation

Satellite Observations Guided Diffusion Model for Accurate Meteorological States at Arbitrary Resolution

POMATO: Marrying Pointmap Matching with Temporal Motions for Dynamic 3D Reconstruction

Scaling Tumor Segmentation: Best Lessons from Real and Synthetic Data

FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization

UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI

Separation for Better Integration: Disentangling Edge and Motion in Event-based Deblurring

Conditional Visual Autoregressive Modeling for Pathological Image Restoration

SurfaceSplat: Connecting Surface Reconstruction and Gaussian Splatting

Unified Open-World Segmentation with Multi-Modal Prompts

Learning Concept Prerequisite Relation via Global Knowledge Relation Optimization

Know Where You Are From: Event-Based Segmentation via Spatio-Temporal Propagation

MM-Tracker: Motion Mamba for UAV-platform Multiple Object Tracking

ESEG: Event-Based Segmentation Boosted by Explicit Edge-Semantic Guidance

Time Series Supplier Allocation via Deep Black-Litterman Model

Towards Loss-Resilient Image Coding for Unstable Satellite Networks

PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation

Retrieval-Augmented Primitive Representations for Compositional Zero-Shot Learning

A Dynamic GCN with Cross-Representation Distillation for Event-Based Learning

MICA: Towards Explainable Skin Lesion Diagnosis via Multi

DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data

Video Frame Interpolation via Direct Synthesis with the Event-based Reference

FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition

Backpropagating Linearly Improves Transferability of Adversarial Examples

Stochastic Gradient Descent in Correlated Settings: A Study on Gaussian Processes

Practical No-box Adversarial Attacks against DNNs

Long Short-Term Transformer for Online Action Detection

NeRV: Neural Representations for Videos

USB: A Unified Semi-supervised Learning Benchmark for Classification

An In-depth Study of Stochastic Backpropagation

Improving Adversarial Transferability via Intermediate-level Perturbation Decay

Towards Evaluating Transfer-based Attacks Systematically, Practically, and Fairly

DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models