Hao Fei
21
Papers
149
Total Citations
Papers (21)
Towards Semantic Equivalence of Tokenization in Multimodal LLM
ICLR 2025
57
citations
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models
AAAI 2025
30
citations
CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs
ICLR 2025
17
citations
Reverse Multi-Choice Dialogue Commonsense Inference with Graph-of-Thought
AAAI 2024arXiv
10
citations
Multi-Granular Multimodal Clue Fusion for Meme Understanding
AAAI 2025
8
citations
Where, What, Why: Towards Explainable Driver Attention Prediction
ICCV 2025
6
citations
PhysSplat: Efficient Physics Simulation for 3D Scenes via MLLM-Guided Gaussian Splatting
ICCV 2025
6
citations
Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene
CVPR 2025
4
citations
VEGAS: Towards Visually Explainable and Grounded Artificial Social Intelligence
AAAI 2025
4
citations
Divide-Solve-Combine: An Interpretable and Accurate Prompting Framework for Zero-shot Multi-Intent Detection
AAAI 2025
4
citations
Universal Scene Graph Generation
CVPR 2025
3
citations
Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning
AAAI 2025
0
citations
Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
ICCV 2025
0
citations
Harnessing Holistic Discourse Features and Triadic Interaction for Sentiment Quadruple Extraction in Dialogues
AAAI 2024
0
citations
Derm1M: A Million-scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology
ICCV 2025
0
citations
Improving Expressive Power of Spectral Graph Neural Networks with Eigenvalue Correction
AAAI 2024arXiv
0
citations
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and Planning
CVPR 2024
0
citations
Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs
CVPR 2024
0
citations
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition
ICML 2024
0
citations
NExT-GPT: Any-to-Any Multimodal LLM
ICML 2024
0
citations
Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
ICML 2024
0
citations