Conghui He
28
Papers
647
Total Citations
Papers (28)
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
CVPR 2024
365
citations
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
CVPR 2025
37
citations
LEGION: Learning to Ground and Explain for Synthetic Image Detection
ICCV 2025
32
citations
Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network
ECCV 2024
31
citations
Dataset Distillation with Neural Characteristic Function: A Minmax Perspective
CVPR 2025
28
citations
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
AAAI 2025
26
citations
Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation
ICLR 2025
25
citations
SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation
CVPR 2024
25
citations
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
ICLR 2025
18
citations
Where am I? Cross-View Geo-localization with Natural Language Descriptions
ICCV 2025
16
citations
Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching
CVPR 2025
11
citations
Multi-step Visual Reasoning with Visual Tokens Scaling and Verification
NeurIPS 2025
11
citations
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
NeurIPS 2025
8
citations
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
ICCV 2025
8
citations
Utilize the Flow Before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning
AAAI 2025
6
citations
PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark
ECCV 2022
0
citations
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
CVPR 2025
0
citations
Leveraging BEV Paradigm for Ground-to-Aerial Image Synthesis
ICCV 2025arXiv
0
citations
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation
ICCV 2025
0
citations
VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis
AAAI 2025
0
citations
3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions
CVPR 2024
0
citations
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
ICML 2024
0
citations
OmniCity: Omnipotent City Understanding With Multi-Level and Multi-View Images
CVPR 2023arXiv
0
citations
Think Twice Before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving
CVPR 2023arXiv
0
citations
Influence Selection for Active Learning
ICCV 2021arXiv
0
citations
3D Building Reconstruction From Monocular Remote Sensing Images
ICCV 2021
0
citations
V3Det: Vast Vocabulary Visual Detection Dataset
ICCV 2023arXiv
0
citations
Conical Visual Concentration for Efficient Large Vision-Language Models
CVPR 2025
0
citations