Rising Stars in Research

Track citation trends and discover the most impactful papers in AI/ML research

Cut Through the Noise

Find papers actually getting cited, not just published

Spot Emerging Trends

Track citation velocity to find rising research early

Topic Lifecycle Analysis

See research areas rising, peaking, or declining

πŸ…

2025 Conference Highlights

Latest

ICML 2025

Top 30
1

WorldSimBench: Towards Video Generation Models as World Simulators

806 citations
2

From Crowdsourced Data to High-quality Benchmarks: Arena-Hard and Benchbuilder Pipeline

329 citations
3

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

190 citations
4

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

165 citations
5

Training Software Engineering Agents and Verifiers with SWE-Gym

130 citations
6

AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs

123 citations
7

Layer by Layer: Uncovering Hidden Representations in Language Models

118 citations
8

Imagine While Reasoning in Space: Multimodal Visualization-of-Thought

115 citations
9

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

110 citations
10

Taming Rectified Flow for Inversion and Editing

110 citations
11

A General Framework for Inference-time Scaling and Steering of Diffusion Models

103 citations
12

Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

103 citations
13

AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

100 citations
14

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

98 citations
15

OR-Bench: An Over-Refusal Benchmark for Large Language Models

97 citations
16

Theoretical guarantees on the best-of-n alignment policy

89 citations
17

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

88 citations
18

Learning Smooth and Expressive Interatomic Potentials for Physical Property Prediction

87 citations
19

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

72 citations
20

Scaling Test-Time Compute Without Verification or RL is Suboptimal

68 citations
21

Cradle: Empowering Foundation Agents towards General Computer Control

67 citations
22

History-Guided Video Diffusion

66 citations
23

GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning

66 citations
24

SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?

64 citations
25

LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models

63 citations
26

HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation

63 citations
27

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

56 citations
28

RE-Bench: Evaluating Frontier AI R&D Capabilities of Language Model Agents against Human Experts

56 citations
29

AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML

55 citations
30

Sundial: A Family of Highly Capable Time Series Foundation Models

55 citations

NeurIPS 2025

Top 30
1

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

1,227 citations
2

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

242 citations
3

Video-R1: Reinforcing Video Reasoning in MLLMs

236 citations
4

Why Do Multi-Agent LLM Systems Fail?

188 citations
5

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

169 citations
6

Training Language Models to Reason Efficiently

155 citations
7

ToolRL: Reward is All Tool Learning Needs

152 citations
8

Mean Flows for One-step Generative Modeling

143 citations
9

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

134 citations
10

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

130 citations
11

TTRL: Test-Time Reinforcement Learning

118 citations
12

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

118 citations
13

Improving Video Generation with Human Feedback

106 citations
14

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

96 citations
15

EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test

95 citations
16

Show-o2: Improved Native Unified Multimodal Models

90 citations
17

Remarkable Robustness of LLMs: Stages of Inference?

87 citations
18

WebDancer: Towards Autonomous Information Seeking Agency

81 citations
19

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

81 citations
20

OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation

78 citations
21

LMFusion: Adapting Pretrained Language Models for Multimodal Generation

77 citations
22

Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models

74 citations
23

General-Reasoner: Advancing LLM Reasoning Across All Domains

74 citations
24

Offline Actor-Critic for Average Reward MDPs

73 citations
25

Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning

70 citations
26

Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension

69 citations
27

Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

67 citations
28

SWE-smith: Scaling Data for Software Engineering Agents

64 citations
29

dKV-Cache: The Cache for Diffusion Language Models

64 citations
30

UMA: A Family of Universal Models for Atoms

62 citations

ICCV 2025

Top 30
1

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

338 citations
2

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

247 citations
3

CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos

211 citations
4

LVBench: An Extreme Long Video Understanding Benchmark

208 citations
5

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

206 citations
6

LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D Capabilities

127 citations
7

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Decoupled Video Diffusion

103 citations
8

GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

96 citations
9

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

86 citations
10

Stable Virtual Camera: Generative View Synthesis with Diffusion Models

78 citations
11

REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers

73 citations
12

MV-Adapter: Multi-View Consistent Image Generation Made Easy

69 citations
13

EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer

69 citations
14

MeshAnything V2: Artist-Created Mesh Generation with Adjacent Mesh Tokenization

66 citations
15

GameFactory: Creating New Games with Generative Interactive Videos

63 citations
16

ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation

62 citations
17

DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving

58 citations
18

Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats

56 citations
19

Long Context Tuning for Video Generation

56 citations
20

EvaGaussians: Event Stream Assisted Gaussian Splatting from Blurry Images

53 citations
21

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

52 citations
22

Describe Anything: Detailed Localized Image and Video Captioning

49 citations
23

Aether: Geometric-Aware Unified World Modeling

47 citations
24

DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers

44 citations
25

MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control

44 citations
26

UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization

43 citations
27

ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance

43 citations
28

Scaling Language-Free Visual Representation Learning

39 citations
29

EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis

38 citations
30

GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting

37 citations

CVPR 2025

Top 30
1

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

858 citations
2

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

342 citations
3

OmniGen: Unified Image Generation

253 citations
4

CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models

203 citations
5

Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

159 citations
6

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

154 citations
7

Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding

142 citations
8

GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control

138 citations
9

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

123 citations
10

WonderWorld: Interactive 3D Scene Generation from a Single Image

120 citations
11

From Slow Bidirectional to Fast Autoregressive Video Diffusion Models

119 citations
12

FoundationStereo: Zero-Shot Stereo Matching

98 citations
13

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

96 citations
14

DEIM: DETR with Improved Matching for Fast Convergence

93 citations
15

FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views

92 citations
16

MLVU: Benchmarking Multi-task Long Video Understanding

89 citations
17

RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete

89 citations
18

DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation

83 citations
19

Teaching Large Language Models to Regress Accurate Image Quality Scores Using Score Distribution

81 citations
20

MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds

80 citations
21

AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers

78 citations
22

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

70 citations
23

UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics

70 citations
24

Adaptive Keyframe Sampling for Long Video Understanding

68 citations
25

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

68 citations
26

One-Minute Video Generation with Test-Time Training

65 citations
27

SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement

62 citations
28

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

61 citations
29

Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

59 citations
30

DIFIX3D+: Improving 3D Reconstructions with Single-Step Diffusion Models

59 citations

ICLR 2025

Top 30
1

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

1,318 citations
2

RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

642 citations
3

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

629 citations
4

Causal Reasoning and Large Language Models: Opening a New Frontier for Causality

386 citations
5

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks

375 citations
6

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

365 citations
7

Generative Verifiers: Reward Modeling as Next-Token Prediction

348 citations
8

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

334 citations
9

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

299 citations
10

Scaling and evaluating sparse autoencoders

298 citations
11

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

294 citations
12

Safety Alignment Should be Made More Than Just a Few Tokens Deep

277 citations
13

Mixture-of-Agents Enhances Large Language Model Capabilities

274 citations
14

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

272 citations
15

MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion

262 citations
16

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

252 citations
17

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

236 citations
18

LoRA Learns Less and Forgets Less

233 citations
19

Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models

233 citations
20

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models

230 citations
21

Generative Representational Instruction Tuning

212 citations
22

Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge

207 citations
23

Self-Play Preference Optimization for Language Model Alignment

207 citations
24

Inverse Scaling: When Bigger Isn't Better

180 citations
25

AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents

180 citations
26

Revisiting Feature Prediction for Learning Visual Representations from Video

178 citations
27

Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts

170 citations
28

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

169 citations
29

The Unreasonable Ineffectiveness of the Deeper Layers

158 citations
30

Diffusion Models Are Real-Time Game Engines

156 citations
〰️

Topic Trends

Research topic lifecycle
View all topics β†’
Large Language Models
3194 papers Β· Language Models
Feb '24 β€” Jan '263147 papers
Diffusion Models
2550 papers Β· Generative Models
Feb '24 β€” Jan '262503 papers
Vision Transformers
2307 papers Β· Architectures
Feb '24 β€” Jan '262251 papers
Representation Learning
2278 papers Β· Representation Learning
Feb '24 β€” Jan '262218 papers
Graph Neural Networks
1904 papers Β· Architectures
Feb '24 β€” Jan '261836 papers
Language Modeling
1745 papers Β· Language Models
Feb '24 β€” Jan '261716 papers

Compare Topic Trends

Compare research momentum across topics side-by-side

πŸ“œ

Most Cited

#1

YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao

ECCV 2024
2,952
citations
#2

DETRs Beat YOLOs on Real-time Object Detection

Yian Zhao, Wenyu Lv, Shangliang Xu et al.

CVPR 2024
2,424
citations
#3

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Zhe Chen, Jiannan Wu, Wenhai Wang et al.

CVPR 2024
2,210
citations
#4

T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion

Chong Mou, Xintao Wang, Liangbin Xie et al.

AAAI 2024
1,423
citations
#5

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Xin Li, Jing Yu Koh, Alexander Ku et al.

ICLR 2024
1,366
citations
#6

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Akari Asai, Zeqiu Wu, Yizhong Wang et al.

ICLR 2024
1,356
citations
#7

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Zhuoyi Yang, Jiayan Teng, Wendi Zheng et al.

ICLR 2025
1,318
citations
#8

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

Chaoyou Fu, Peixian Chen, Yunhang Shen et al.

NeurIPS 2025
1,227
citations
#9

MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

Pan Lu, Hritik Bansal, Tony Xia et al.

ICLR 2024
1,171
citations
#10

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Yujia Qin, Shihao Liang, Yining Ye et al.

ICLR 2024
1,128
citations
#11

4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

Guanjun Wu, Taoran Yi, Jiemin Fang et al.

CVPR 2024
1,061
citations
#12

Grounding Multimodal Large Language Models to the World

Zhiliang Peng, Wenhui Wang, Li Dong et al.

ICLR 2024
1,032
citations
#13

VBench: Comprehensive Benchmark Suite for Video Generative Models

Ziqi Huang, Yinan He, Jiashuo Yu et al.

CVPR 2024
996
citations
#14

A Generalist Agent

Jackie Kay, Sergio GΓ³mez Colmenarejo, Mahyar Bordbar et al.

ICLR 2024
978
citations
#15

MVDream: Multi-view Diffusion for 3D Generation

Yichun Shi, Peng Wang, Jianglong Ye et al.

ICLR 2024
871
citations
#16

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

Kunchang Li, Yali Wang, Yinan He et al.

CVPR 2024
864
citations
#17

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Chaoyou Fu, Yuhan Dai, Yongdong Luo et al.

CVPR 2025
858
citations
#18

WorldSimBench: Towards Video Generation Models as World Simulators

Yiran Qin, Zhelun Shi, Jiwen Yu et al.

ICML 2025
806
citations
#19

LISA: Reasoning Segmentation via Large Language Model

Xin Lai, Zhuotao Tian, Yukang Chen et al.

CVPR 2024
721
citations
#20

Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction

Ziyi Yang, Xinyu Gao, Wen Zhou et al.

CVPR 2024
686
citations
πŸŽ“

Top Authors

Topic trends: 31,945 papers Β· similarity β‰₯ 0.4 Β· year β‰₯ 2024 Β· Data sourced from Semantic Scholar

34,180 papers | Abstracts: 21,545 (63.0%) | Citations: 34,180 (100.0%) | arXiv: 1,972 (5.8%)

Built: Jan 31, 2026, 1:11 AM AMS