Hongsheng Li
44
Papers
758
Total Citations
Papers (44)
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
ICLR 2024
196
citations
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
ICML 2025
88
citations
GoT: Unleashing Reasoning Capability of MLLM for Visual Generation and Editing
NeurIPS 2025
60
citations
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
ICCV 2025
52
citations
Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow
ICLR 2025
46
citations
SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction
CVPR 2024
38
citations
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation
NeurIPS 2025
34
citations
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
CVPR 2025
34
citations
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
ICCV 2025
28
citations
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
ICLR 2025
26
citations
Mixture Compressor for Mixture-of-Experts LLMs Gains More
ICLR 2025
23
citations
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
CVPR 2025
20
citations
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
ICCV 2025
17
citations
SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving
CVPR 2025
15
citations
Docopilot: Improving Multimodal Models for Document-Level Understanding
CVPR 2025
14
citations
DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition
ECCV 2024
12
citations
Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos
ECCV 2024
10
citations
UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning
NeurIPS 2025
8
citations
Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation
ICLR 2025
8
citations
Language Model Guided Interpretable Video Action Reasoning
CVPR 2024
7
citations
BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events
ECCV 2024
7
citations
Delving Deep into Engagement Prediction of Short Videos
ECCV 2024
5
citations
One Leaf Reveals the Season: Occlusion-Based Contrastive Learning with Semantic-Aware Views for Efficient Visual Representation
ICML 2025
5
citations
Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding
CVPR 2025
3
citations
FlexDrive: Toward Trajectory Flexibility in Driving Scene Gaussian Splatting Reconstruction and Rendering
CVPR 2025
2
citations
ConsistentCity: Semantic Flow-guided Occupancy DiT for Temporally Consistent Driving Scene Synthesis
ICCV 2025
0
citations
HPSv3: Towards Wide-Spectrum Human Preference Score
ICCV 2025
0
citations
GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices
ICCV 2025
0
citations
CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models
ICCV 2025
0
citations
M3Net: Multimodal Multi-task Learning for 3D Detection, Segmentation, and Occupancy Prediction in Autonomous Driving
AAAI 2025
0
citations
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding
AAAI 2025
0
citations
GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance
AAAI 2025
0
citations
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
CVPR 2024
0
citations
GLID: Pre-training a Generalist Encoder-Decoder Vision Model
CVPR 2024
0
citations
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
CVPR 2024
0
citations
LMDrive: Closed-Loop End-to-End Driving with Large Language Models
CVPR 2024
0
citations
OPTICAL: Leveraging Optimal Transport for Contribution Allocation in Dataset Distillation
CVPR 2025
0
citations
DirectTriGS: Triplane-based Gaussian Splatting Field Representation for 3D Generation
CVPR 2025
0
citations
DiffInDScene: Diffusion-based High-Quality 3D Indoor Scene Generation
CVPR 2024
0
citations
GS-DiT: Advancing Video Generation with Dynamic 3D Gaussian Fields through Efficient Dense 3D Point Tracking
CVPR 2025
0
citations
Let's Verify and Reinforce Image Generation Step by Step
CVPR 2025
0
citations
FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes
CVPR 2025
0
citations
SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
ICML 2024
0
citations
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
ICML 2024
0
citations