Rising Stars in Research
Track citation trends and discover the most impactful papers in AI/ML research
Cut Through the Noise
Find papers actually getting cited, not just published
Spot Emerging Trends
Track citation velocity to find rising research early
Topic Lifecycle Analysis
See research areas rising, peaking, or declining
Browse by Conference
CVPR, NeurIPS, ICLR, ICML, ECCV, ICCV
Browse by Topic
Diffusion, Transformers, 3D Vision, LLMs
Browse by Author
Top researchers ranked by citations
2025 Conference Highlights
LatestICML 2025
Top 30WorldSimBench: Towards Video Generation Models as World Simulators
From Crowdsourced Data to High-quality Benchmarks: Arena-Hard and Benchbuilder Pipeline
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Training Software Engineering Agents and Verifiers with SWE-Gym
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
Layer by Layer: Uncovering Hidden Representations in Language Models
Imagine While Reasoning in Space: Multimodal Visualization-of-Thought
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Taming Rectified Flow for Inversion and Editing
A General Framework for Inference-time Scaling and Steering of Diffusion Models
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding
OR-Bench: An Over-Refusal Benchmark for Large Language Models
Theoretical guarantees on the best-of-n alignment policy
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Learning Smooth and Expressive Interatomic Potentials for Physical Property Prediction
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
Scaling Test-Time Compute Without Verification or RL is Suboptimal
Cradle: Empowering Foundation Agents towards General Computer Control
History-Guided Video Diffusion
GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
RE-Bench: Evaluating Frontier AI R&D Capabilities of Language Model Agents against Human Experts
AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML
Sundial: A Family of Highly Capable Time Series Foundation Models
NeurIPS 2025
Top 30MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
Video-R1: Reinforcing Video Reasoning in MLLMs
Why Do Multi-Agent LLM Systems Fail?
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Training Language Models to Reason Efficiently
ToolRL: Reward is All Tool Learning Needs
Mean Flows for One-step Generative Modeling
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
TTRL: Test-Time Reinforcement Learning
Improving Video Generation with Human Feedback
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
Show-o2: Improved Native Unified Multimodal Models
Remarkable Robustness of LLMs: Stages of Inference?
WebDancer: Towards Autonomous Information Seeking Agency
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
LMFusion: Adapting Pretrained Language Models for Multimodal Generation
Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models
The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning
General-Reasoner: Advancing LLM Reasoning Across All Domains
Offline Actor-Critic for Average Reward MDPs
Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
SWE-smith: Scaling Data for Software Engineering Agents
ICCV 2025
Top 30LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos
LVBench: An Extreme Long Video Understanding Benchmark
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D Capabilities
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Decoupled Video Diffusion
GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers
MV-Adapter: Multi-View Consistent Image Generation Made Easy
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer
MeshAnything V2: Artist-Created Mesh Generation with Adjacent Mesh Tokenization
GameFactory: Creating New Games with Generative Interactive Videos
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation
DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving
Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats
Long Context Tuning for Video Generation
EvaGaussians: Event Stream Assisted Gaussian Splatting from Blurry Images
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
Describe Anything: Detailed Localized Image and Video Captioning
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
Aether: Geometric-Aware Unified World Modeling
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers
MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control
UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance
CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy
Scaling Language-Free Visual Representation Learning
CVPR 2025
Top 30Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
OmniGen: Unified Image Generation
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
WonderWorld: Interactive 3D Scene Generation from a Single Image
From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
FoundationStereo: Zero-Shot Stereo Matching
Transformers without Normalization
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
DEIM: DETR with Improved Matching for Fast Convergence
FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views
MLVU: Benchmarking Multi-task Long Video Understanding
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation
Teaching Large Language Models to Regress Accurate Image Quality Scores Using Score Distribution
MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics
Adaptive Keyframe Sampling for Long Video Understanding
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
One-Minute Video Generation with Test-Time Training
SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement
RandAR: Decoder-only Autoregressive Visual Generation in Random Orders
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise
ICLR 2025
Top 30CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Causal Reasoning and Large Language Models: Opening a New Frontier for Causality
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
Generative Verifiers: Reward Modeling as Next-Token Prediction
OpenHands: An Open Platform for AI Software Developers as Generalist Agents
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
Scaling and evaluating sparse autoencoders
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Safety Alignment Should be Made More Than Just a Few Tokens Deep
Mixture-of-Agents Enhances Large Language Model Capabilities
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
LoRA Learns Less and Forgets Less
Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Generative Representational Instruction Tuning
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
Self-Play Preference Optimization for Language Model Alignment
Inverse Scaling: When Bigger Isn't Better
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Revisiting Feature Prediction for Learning Visual Representations from Video
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
The Unreasonable Ineffectiveness of the Deeper Layers
Diffusion Models Are Real-Time Game Engines
Topic Trends
Research topic lifecycleCompare Topic Trends
Compare research momentum across topics side-by-side
Most Cited
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao
DETRs Beat YOLOs on Real-time Object Detection
Yian Zhao, Wenyu Lv, Shangliang Xu et al.
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen, Jiannan Wu, Wenhai Wang et al.
T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion
Chong Mou, Xintao Wang, Liangbin Xie et al.
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Xin Li, Jing Yu Koh, Alexander Ku et al.
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Akari Asai, Zeqiu Wu, Yizhong Wang et al.
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Zhuoyi Yang, Jiayan Teng, Wendi Zheng et al.
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Chaoyou Fu, Peixian Chen, Yunhang Shen et al.
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
Pan Lu, Hritik Bansal, Tony Xia et al.
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
Yujia Qin, Shihao Liang, Yining Ye et al.
4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
Guanjun Wu, Taoran Yi, Jiemin Fang et al.
Grounding Multimodal Large Language Models to the World
Zhiliang Peng, Wenhui Wang, Li Dong et al.
VBench: Comprehensive Benchmark Suite for Video Generative Models
Ziqi Huang, Yinan He, Jiashuo Yu et al.
A Generalist Agent
Jackie Kay, Sergio GΓ³mez Colmenarejo, Mahyar Bordbar et al.
MVDream: Multi-view Diffusion for 3D Generation
Yichun Shi, Peng Wang, Jianglong Ye et al.
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Kunchang Li, Yali Wang, Yinan He et al.
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Chaoyou Fu, Yuhan Dai, Yongdong Luo et al.
WorldSimBench: Towards Video Generation Models as World Simulators
Yiran Qin, Zhelun Shi, Jiwen Yu et al.
LISA: Reasoning Segmentation via Large Language Model
Xin Lai, Zhuotao Tian, Yukang Chen et al.
Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction
Ziyi Yang, Xinyu Gao, Wen Zhou et al.
Top Authors
Topic trends: 31,945 papers Β· similarity β₯ 0.4 Β· year β₯ 2024 Β· Data sourced from Semantic Scholar
34,180 papers | Abstracts: 21,617 (63.2%) | Citations: 34,180 (100.0%) | arXiv: 2,234 (6.5%)
Built: Jan 31, 2026, 4:15 AM AMS