Conghui He

22
Papers
648
Total Citations

Papers (22)

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

CVPR 2024
365
citations

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

CVPR 2025
37
citations

LEGION: Learning to Ground and Explain for Synthetic Image Detection

ICCV 2025
32
citations

Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network

ECCV 2024
31
citations

Dataset Distillation with Neural Characteristic Function: A Minmax Perspective

CVPR 2025
28
citations

UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios

AAAI 2025
26
citations

Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation

ICLR 2025
25
citations

SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation

CVPR 2024
25
citations

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

ICLR 2025arXiv
19
citations

Where am I? Cross-View Geo-localization with Natural Language Descriptions

ICCV 2025
16
citations

Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching

CVPR 2025
11
citations

Multi-step Visual Reasoning with Visual Tokens Scaling and Verification

NeurIPS 2025
11
citations

Efficient Multi-modal Large Language Models via Progressive Consistency Distillation

NeurIPS 2025
8
citations

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos

ICCV 2025
8
citations

Utilize the Flow Before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning

AAAI 2025
6
citations

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

ICML 2024
0
citations

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

CVPR 2025
0
citations

Leveraging BEV Paradigm for Ground-to-Aerial Image Synthesis

ICCV 2025arXiv
0
citations

OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation

ICCV 2025
0
citations

VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis

AAAI 2025
0
citations

3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions

CVPR 2024
0
citations

Conical Visual Concentration for Efficient Large Vision-Language Models

CVPR 2025
0
citations