Hao Fei

21
Papers
149
Total Citations

Papers (21)

Towards Semantic Equivalence of Tokenization in Multimodal LLM

ICLR 2025
57
citations

CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models

AAAI 2025
30
citations

CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs

ICLR 2025
17
citations

Reverse Multi-Choice Dialogue Commonsense Inference with Graph-of-Thought

AAAI 2024arXiv
10
citations

Multi-Granular Multimodal Clue Fusion for Meme Understanding

AAAI 2025
8
citations

Where, What, Why: Towards Explainable Driver Attention Prediction

ICCV 2025
6
citations

PhysSplat: Efficient Physics Simulation for 3D Scenes via MLLM-Guided Gaussian Splatting

ICCV 2025
6
citations

Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene

CVPR 2025
4
citations

VEGAS: Towards Visually Explainable and Grounded Artificial Social Intelligence

AAAI 2025
4
citations

Divide-Solve-Combine: An Interpretable and Accurate Prompting Framework for Zero-shot Multi-Intent Detection

AAAI 2025
4
citations

Universal Scene Graph Generation

CVPR 2025
3
citations

Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning

AAAI 2025
0
citations

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

ICCV 2025
0
citations

Harnessing Holistic Discourse Features and Triadic Interaction for Sentiment Quadruple Extraction in Dialogues

AAAI 2024
0
citations

Derm1M: A Million-scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology

ICCV 2025
0
citations

Improving Expressive Power of Spectral Graph Neural Networks with Eigenvalue Correction

AAAI 2024arXiv
0
citations

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and Planning

CVPR 2024
0
citations

Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs

CVPR 2024
0
citations

Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition

ICML 2024
0
citations

NExT-GPT: Any-to-Any Multimodal LLM

ICML 2024
0
citations

Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning

ICML 2024
0
citations