Peng Gao
27
Papers
756
Total Citations
Papers (27)
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
ICLR 2024
320
citations
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
ICML 2025
88
citations
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
AAAI 2024arXiv
58
citations
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
ICCV 2025
52
citations
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation
ICLR 2024
46
citations
Digital Life Project: Autonomous 3D Characters with Social Intelligence
CVPR 2024
46
citations
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation
NeurIPS 2025
34
citations
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
ICCV 2025
28
citations
No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation
CVPR 2024
27
citations
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
ICLR 2025
26
citations
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
ICCV 2025
20
citations
Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation
ICLR 2025
8
citations
Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding
CVPR 2025
3
citations
Let's Verify and Reinforce Image Generation Step by Step
CVPR 2025
0
citations
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
ICML 2024
0
citations
Spatial Preference Rewarding for MLLMs Spatial Understanding
ICCV 2025
0
citations
TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
ICCV 2025
0
citations
FontAnimate: High Quality Few-shot Font Generation via Animating Font Transfer Process
ICCV 2025
0
citations
How Do Optical Flow and Textual Prompts Collaborate to Assist in Audio-Visual Semantic Segmentation?
ICCV 2025
0
citations
A Multi-Focus-Driven Multi-Branch Network for Robust Multimodal Sentiment Analysis
AAAI 2025
0
citations
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding
AAAI 2025
0
citations
OneLLM: One Framework to Align All Modalities with Language
CVPR 2024
0
citations
Masked AutoDecoder is Effective Multi-Task Vision Generalist
CVPR 2024
0
citations
InstructSpeech: Following Speech Editing Instructions via Large Language Models
ICML 2024
0
citations
SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
ICML 2024
0
citations
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
ICML 2024
0
citations
FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
ICML 2024
0
citations