Hao Jiang
35
Papers
154
Total Citations
Papers (35)
HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation
ICML 2025
63
citations
SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM Optimization
AAAI 2024arXiv
35
citations
The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective
CVPR 2024
15
citations
Towards Universal Soccer Video Understanding
CVPR 2025
14
citations
PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation
CVPR 2025arXiv
10
citations
All-in-One: Transferring Vision Foundation Models into Stereo Matching
AAAI 2025
9
citations
Reward Penalties on Augmented States for Solving Richly Constrained RL Effectively
AAAI 2024
2
citations
Political Actor Agent: Simulating Legislative System for Roll Call Votes Prediction with Large Language Models
AAAI 2025
2
citations
CursorCore: Assist Programming through Aligning Anything
ICML 2025
2
citations
TMetaNet: Topological Meta-Learning Framework for Dynamic Link Prediction
ICML 2025
1
citations
D^2-DPM: Dual Denoising for Quantized Diffusion Probabilistic Models
AAAI 2025
1
citations
Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback
AAAI 2025
0
citations
VERSE: Verification-based Self-Play for Code Instructions
AAAI 2025
0
citations
Transferable Video Moment Localization by Moment-Guided Query Prompting
AAAI 2024
0
citations
MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
AAAI 2025
0
citations
Ink Dot-Oriented Differentiable Optimization for Neural Image Halftoning
CVPR 2024
0
citations
Granularity-Adaptive Spatial Evidence Tokenization for Video Question Answering
AAAI 2025
0
citations
Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation
AAAI 2025
0
citations
Boosting MLLM Reasoning with Text-Debiased Hint-GRPO
ICCV 2025
0
citations
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
ICML 2024
0
citations
Matching Bags of Regions in RGBD images
CVPR 2015
0
citations
Seeing Invisible Poses: Estimating 3D Body Pose From Egocentric Video
CVPR 2017arXiv
0
citations
Detangling People: Individuating Multiple Close People and Their Body Parts via Region Assembly
CVPR 2017arXiv
0
citations
Action4D: Online Action Recognition in the Crowd and Clutter
CVPR 2019
0
citations
Joint Video Summarization and Moment Localization by Cross-Task Sample Transfer
CVPR 2022
0
citations
Ego4D: Around the World in 3,000 Hours of Egocentric Video
CVPR 2022
0
citations
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization
CVPR 2022arXiv
0
citations
Chat2Map: Efficient Scene Mapping From Multi-Ego Conversations
CVPR 2023arXiv
0
citations
DoNet: Deep De-Overlapping Network for Cytology Instance Segmentation
CVPR 2023arXiv
0
citations
DATE: Domain Adaptive Product Seeker for E-Commerce
CVPR 2023
0
citations
Egocentric Auditory Attention Localization in Conversations
CVPR 2023arXiv
0
citations
Egocentric Pose Estimation From Human Vision Span
ICCV 2021arXiv
0
citations
Conditional Diffusion Process for Inverse Halftoning
NeurIPS 2022
0
citations
BMU-MoCo: Bidirectional Momentum Update for Continual Video-Language Modeling
NeurIPS 2022
0
citations
FairLISA: Fair User Modeling with Limited Sensitive Attributes Information
NeurIPS 2023
0
citations