Hao Jiang

35
Papers
154
Total Citations

Papers (35)

HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation

ICML 2025
63
citations

SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM Optimization

AAAI 2024arXiv
35
citations

The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective

CVPR 2024
15
citations

Towards Universal Soccer Video Understanding

CVPR 2025
14
citations

PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation

CVPR 2025arXiv
10
citations

All-in-One: Transferring Vision Foundation Models into Stereo Matching

AAAI 2025
9
citations

Reward Penalties on Augmented States for Solving Richly Constrained RL Effectively

AAAI 2024
2
citations

Political Actor Agent: Simulating Legislative System for Roll Call Votes Prediction with Large Language Models

AAAI 2025
2
citations

CursorCore: Assist Programming through Aligning Anything

ICML 2025
2
citations

TMetaNet: Topological Meta-Learning Framework for Dynamic Link Prediction

ICML 2025
1
citations

D^2-DPM: Dual Denoising for Quantized Diffusion Probabilistic Models

AAAI 2025
1
citations

Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback

AAAI 2025
0
citations

VERSE: Verification-based Self-Play for Code Instructions

AAAI 2025
0
citations

Transferable Video Moment Localization by Moment-Guided Query Prompting

AAAI 2024
0
citations

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

AAAI 2025
0
citations

Ink Dot-Oriented Differentiable Optimization for Neural Image Halftoning

CVPR 2024
0
citations

Granularity-Adaptive Spatial Evidence Tokenization for Video Question Answering

AAAI 2025
0
citations

Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation

AAAI 2025
0
citations

Boosting MLLM Reasoning with Text-Debiased Hint-GRPO

ICCV 2025
0
citations

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

ICML 2024
0
citations

Matching Bags of Regions in RGBD images

CVPR 2015
0
citations

Seeing Invisible Poses: Estimating 3D Body Pose From Egocentric Video

CVPR 2017arXiv
0
citations

Detangling People: Individuating Multiple Close People and Their Body Parts via Region Assembly

CVPR 2017arXiv
0
citations

Action4D: Online Action Recognition in the Crowd and Clutter

CVPR 2019
0
citations

Joint Video Summarization and Moment Localization by Cross-Task Sample Transfer

CVPR 2022
0
citations

Ego4D: Around the World in 3,000 Hours of Egocentric Video

CVPR 2022
0
citations

Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization

CVPR 2022arXiv
0
citations

Chat2Map: Efficient Scene Mapping From Multi-Ego Conversations

CVPR 2023arXiv
0
citations

DoNet: Deep De-Overlapping Network for Cytology Instance Segmentation

CVPR 2023arXiv
0
citations

DATE: Domain Adaptive Product Seeker for E-Commerce

CVPR 2023
0
citations

Egocentric Auditory Attention Localization in Conversations

CVPR 2023arXiv
0
citations

Egocentric Pose Estimation From Human Vision Span

ICCV 2021arXiv
0
citations

Conditional Diffusion Process for Inverse Halftoning

NeurIPS 2022
0
citations

BMU-MoCo: Bidirectional Momentum Update for Continual Video-Language Modeling

NeurIPS 2022
0
citations

FairLISA: Fair User Modeling with Limited Sensitive Attributes Information

NeurIPS 2023
0
citations