🧬Language Models

Large Language Models

LLMs including GPT, LLaMA, and scaling laws

100 papers27,818 total citations
Compare with other topics
Feb '24 Jan '263147 papers
Also includes: large language models, llm, llms, language models, foundation models, gpt, llama

Top Papers

#1

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Zhe Chen, Jiannan Wu, Wenhai Wang et al.

CVPR 2024
2,210
citations
#2

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

Chaoyou Fu, Peixian Chen, Yunhang Shen et al.

NeurIPS 2025
1,227
citations
#3

MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

Pan Lu, Hritik Bansal, Tony Xia et al.

ICLR 2024
1,171
citations
#4

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Yujia Qin, Shihao Liang, Yining Ye et al.

ICLR 2024
1,128
citations
#5

A Generalist Agent

Jackie Kay, Sergio Gómez Colmenarejo, Mahyar Bordbar et al.

ICLR 2024
978
citations
#6

LISA: Reasoning Segmentation via Large Language Model

Xin Lai, Zhuotao Tian, Yukang Chen et al.

CVPR 2024
721
citations
#7

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

Haipeng Luo, Qingfeng Sun, Can Xu et al.

ICLR 2025
629
citations
#8

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Longhui Yu, Weisen JIANG, Han Shi et al.

ICLR 2024
554
citations
#9

Language Model Beats Diffusion - Tokenizer is key to visual generation

Lijun Yu, José Lezama, Nitesh Bharadwaj Gundavarapu et al.

ICLR 2024
525
citations
#10

Benchmarking Large Language Models in Retrieval-Augmented Generation

Jiawei Chen, Hongyu Lin, Xianpei Han et al.

AAAI 2024arXiv:2309.01431
retrieval-augmented generationlarge language modelsnoise robustnessnegative rejection+4
458
citations
#11

SALMONN: Towards Generic Hearing Abilities for Large Language Models

Changli Tang, Wenyi Yu, Guangzhi Sun et al.

ICLR 2024
447
citations
#12

Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation

Yangsibo Huang, Samyak Gupta, Mengzhou Xia et al.

ICLR 2024
412
citations
#13

YaRN: Efficient Context Window Extension of Large Language Models

Bowen Peng, Jeffrey Quesnelle, Honglu Fan et al.

ICLR 2024
410
citations
#14

Causal Reasoning and Large Language Models: Opening a New Frontier for Causality

Chenhao Tan, Robert Ness, Amit Sharma et al.

ICLR 2025
386
citations
#15

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Zhang Li, Biao Yang, Qiang Liu et al.

CVPR 2024
384
citations
#16

Prometheus: Inducing Fine-Grained Evaluation Capability in Language Models

Seungone Kim, Jamin Shin, yejin cho et al.

ICLR 2024
378
citations
#17

Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

Suyu Ge, Yunan Zhang, Liyuan Liu et al.

ICLR 2024
372
citations
#18

Large Language Models Are Not Robust Multiple Choice Selectors

Chujie Zheng, Hao Zhou, Fandong Meng et al.

ICLR 2024
370
citations
#19

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Qidong Huang, Xiaoyi Dong, Pan Zhang et al.

CVPR 2024
365
citations
#20

Generative Verifiers: Reward Modeling as Next-Token Prediction

Lunjun Zhang, Arian Hosseini, Hritik Bansal et al.

ICLR 2025
348
citations
#21

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

Wenqi Shao, Mengzhao Chen, Zhaoyang Zhang et al.

ICLR 2024
320
citations
#22

OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

Guan Wang, Sijie Cheng, Xianyuan Zhan et al.

ICLR 2024
309
citations
#23

Scaling and evaluating sparse autoencoders

Leo Gao, Tom Dupre la Tour, Henk Tillman et al.

ICLR 2025arXiv:2406.04093
sparse autoencoderslanguage model interpretabilityfeature extractionk-sparse autoencoders+4
298
citations
#24

SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Saleh Ashkboos, Maximilian Croci, Marcelo Gennari do Nascimento et al.

ICLR 2024
289
citations
#25

Safety Alignment Should be Made More Than Just a Few Tokens Deep

Xiangyu Qi, Ashwinee Panda, Kaifeng Lyu et al.

ICLR 2025
277
citations
#26

NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models

Gengze Zhou, Yicong Hong, Qi Wu

AAAI 2024arXiv:2305.16986
vision-and-language navigationlarge language modelsinstruction-following navigationzero-shot action prediction+4
276
citations
#27

DreamLLM: Synergistic Multimodal Comprehension and Creation

Runpei Dong, chunrui han, Yuang Peng et al.

ICLR 2024
275
citations
#28

Mixture-of-Agents Enhances Large Language Model Capabilities

Junlin Wang, Jue Wang, Ben Athiwaratkun et al.

ICLR 2025
274
citations
#29

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

Chenglei Si, Diyi Yang, Tatsunori Hashimoto

ICLR 2025
272
citations
#30

Provable Robust Watermarking for AI-Generated Text

Xuandong Zhao, Prabhanjan Ananth, Lei Li et al.

ICLR 2024
271
citations
#31

Understanding the Effects of RLHF on LLM Generalisation and Diversity

Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis et al.

ICLR 2024
267
citations
#32

Large Language Models as Tool Makers

Tianle Cai, Xuezhi Wang, Tengyu Ma et al.

ICLR 2024
262
citations
#33

On Scaling Up a Multilingual Vision and Language Model

Xi Chen, Josip Djolonga, Piotr Padlewski et al.

CVPR 2024
254
citations
#34

Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts

Jian Xie, Kai Zhang, Jiangjie Chen et al.

ICLR 2024
252
citations
#35

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier et al.

ECCV 2024
246
citations
#36

GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

Kai Zhang, Sai Bi, Hao Tan et al.

ECCV 2024
245
citations
#37

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh vahid et al.

NeurIPS 2025
242
citations
#38

AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models

Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.

AAAI 2024arXiv:2308.15366
industrial anomaly detectionvision-language modelsanomaly localizationfew-shot learning+4
240
citations
#39

SaProt: Protein Language Modeling with Structure-aware Vocabulary

Jin Su, Chenchen Han, Yuyang Zhou et al.

ICLR 2024
237
citations
#40

Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models

Jimeng Sun, Shubhendu Trivedi, Zhen Lin

ICLR 2025
233
citations
#41

Language Models Represent Space and Time

Wes Gurnee, Max Tegmark

ICLR 2024
232
citations
#42

Sequential Modeling Enables Scalable Learning for Large Vision Models

Yutong Bai, Xinyang Geng, Karttikeya Mangalam et al.

CVPR 2024
230
citations
#43

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models

Jiabo Ye, Haiyang Xu, Haowei Liu et al.

ICLR 2025
230
citations
#44

RECOMP: Improving Retrieval-Augmented LMs with Context Compression and Selective Augmentation

Fangyuan Xu, Weijia Shi, Eunsol Choi

ICLR 2024
222
citations
#45

Listen, Think, and Understand

Yuan Gong, Hongyin Luo, Alexander Liu et al.

ICLR 2024
221
citations
#46

Generative Representational Instruction Tuning

Niklas Muennighoff, Hongjin SU, Liang Wang et al.

ICLR 2025
212
citations
#47

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Jingyi Zhang, Jiaxing Huang, Huanjin Yao et al.

ICCV 2025
206
citations
#48

Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-World Multi-Turn Dialogue

Songhua Yang, Hanjie Zhao, Senbin Zhu et al.

AAAI 2024arXiv:2308.03549
large language modelschinese medicineexpert feedbackmulti-turn dialogue+4
204
citations
#49

Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification

Aojun Zhou, Ke Wang, Zimu Lu et al.

ICLR 2024
196
citations
#50

Why Do Multi-Agent LLM Systems Fail?

Mert Cemri, Melissa Z Pan, Shuyi Yang et al.

NeurIPS 2025arXiv:2503.13657
multi-agent llm systemsfailure pattern analysissystem failure taxonomyllm-as-a-judge+3
188
citations
#51

Think before you speak: Training Language Models With Pause Tokens

Sachin Goyal, Ziwei Ji, Ankit Singh Rawat et al.

ICLR 2024
187
citations
#52

MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models

Xin Liu, Yichen Zhu, Jindong Gu et al.

ECCV 2024arXiv:2311.17600
multimodal large language modelssafety evaluationimage-based manipulationsadversarial attacks+4
183
citations
#53

Inverse Scaling: When Bigger Isn't Better

Joe Cavanagh, Andrew Gritsevskiy, Najoung Kim et al.

ICLR 2025
180
citations
#54

ReLoRA: High-Rank Training Through Low-Rank Updates

Vladislav Lialin, Sherin Muckatira, Namrata Shivagunde et al.

ICLR 2024
179
citations
#55

On the Reliability of Watermarks for Large Language Models

John Kirchenbauer, Jonas Geiping, Yuxin Wen et al.

ICLR 2024
176
citations
#56

LLaVA-UHD: an LMM Perceiving any Aspect Ratio and High-Resolution Images

Zonghao Guo, Ruyi Xu, Yuan Yao et al.

ECCV 2024
170
citations
#57

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

Jiahui Gao, Renjie Pi, Jipeng Zhang et al.

ICLR 2025
169
citations
#58

Can Large Language Models Infer Causation from Correlation?

Zhijing Jin, Jiarui Liu, Zhiheng LYU et al.

ICLR 2024
166
citations
#59

Is Self-Repair a Silver Bullet for Code Generation?

Theo X. Olausson, Jeevana Priya Inala, Chenglong Wang et al.

ICLR 2024
160
citations
#60

Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation

Niels Mündler, Jingxuan He, Slobodan Jenko et al.

ICLR 2024
159
citations
#61

The Unreasonable Ineffectiveness of the Deeper Layers

Andrey Gromov, Kushal Tirumala, Hassan Shapourian et al.

ICLR 2025
158
citations
#62

Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory

Niloofar Mireshghallah, Hyunwoo Kim, Xuhui Zhou et al.

ICLR 2024
158
citations
#63

MUSE: Machine Unlearning Six-Way Evaluation for Language Models

Weijia Shi, Jaechan Lee, Yangsibo Huang et al.

ICLR 2025arXiv:2407.06460
machine unlearninglanguage modelsprivacy leakageverbatim memorization+4
157
citations
#64

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Guangxuan Xiao, Jiaming Tang, Jingwei Zuo et al.

ICLR 2025
156
citations
#65

BooookScore: A systematic exploration of book-length summarization in the era of LLMs

Yapei Chang, Kyle Lo, Tanya Goyal et al.

ICLR 2024
156
citations
#66

Training Language Models to Reason Efficiently

Daman Arora, Andrea Zanette

NeurIPS 2025
155
citations
#67

JudgeBench: A Benchmark for Evaluating LLM-Based Judges

Sijun Tan, Siyuan Zhuang, Kyle Montgomery et al.

ICLR 2025
147
citations
#68

Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving

Yangzhen Wu, Zhiqing Sun, Shanda Li et al.

ICLR 2025
inference scaling lawscompute-optimal inferencelarge language modelstest-time scaling+4
146
citations
#69

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Bill Yuchen Lin, Yuntian Deng, Khyathi Chandu et al.

ICLR 2025arXiv:2406.04770
large language modelsautomated evaluation frameworkreal-world user queriespairwise comparison metrics+3
142
citations
#70

Physics of Language Models: Part 3.2, Knowledge Manipulation

Zeyuan Allen-Zhu, Yuanzhi Li

ICLR 2025
142
citations
#71

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal

Tinghao Xie, Xiangyu Qi, Yi Zeng et al.

ICLR 2025arXiv:2406.14598
safety refusal evaluationlarge language modelsfine-grained taxonomieslinguistic augmentations+4
141
citations
#72

Linearity of Relation Decoding in Transformer Language Models

Evan Hernandez, Arnab Sen Sharma, Tal Haklay et al.

ICLR 2024
140
citations
#73

AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models

Junfeng Fang, Houcheng Jiang, Kun Wang et al.

ICLR 2025
138
citations
#74

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

Shansan Gong, Shivam Agarwal, Yizhe Zhang et al.

ICLR 2025
135
citations
#75

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark for Large Language Models

Bofei Gao, Feifan Song, Zhe Yang et al.

ICLR 2025
135
citations
#76

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Jonas Geiping, Sean McLeish, Neel Jain et al.

NeurIPS 2025
134
citations
#77

Task Contamination: Language Models May Not Be Few-Shot Anymore

Changmao Li, Jeffrey Flanigan

AAAI 2024arXiv:2312.16337
task contaminationfew-shot learningzero-shot learninglanguage model evaluation+3
130
citations
#78

LongVLM: Efficient Long Video Understanding via Large Language Models

Yuetian Weng, Mingfei Han, Haoyu He et al.

ECCV 2024arXiv:2404.03384
video understandinglarge language modelshierarchical token merginglong video processing+4
128
citations
#79

SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research

Liangtai Sun, Yang Han, Zihan Zhao et al.

AAAI 2024arXiv:2308.13149
large language modelsscientific evaluation benchmarkmulti-disciplinary evaluationbloom's taxonomy+4
127
citations
#80

GSVA: Generalized Segmentation via Multimodal Large Language Models

Zhuofan Xia, Dongchen Han, Yizeng Han et al.

CVPR 2024
127
citations
#81

LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model

Dilxat Muhtar, Zhenshi Li, Feng Gu et al.

ECCV 2024
127
citations
#82

Adapting Large Language Models via Reading Comprehension

Daixuan Cheng, Shaohan Huang, Furu Wei

ICLR 2024
126
citations
#83

ST-LLM: Large Language Models Are Effective Temporal Learners

Ruyang Liu, Chen Li, Haoran Tang et al.

ECCV 2024
124
citations
#84

GenSim: Generating Robotic Simulation Tasks via Large Language Models

Lirui Wang, Yiyang Ling, Zhecheng Yuan et al.

ICLR 2024
120
citations
#85

Layer by Layer: Uncovering Hidden Representations in Language Models

Oscar Skean, Md Rifat Arefin, Dan Zhao et al.

ICML 2025
118
citations
#86

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Xeron Du, Yifan Yao, Kaijing Ma et al.

NeurIPS 2025arXiv:2502.14739
llm evaluationgraduate-level knowledgespecialized disciplinesbenchmark construction+4
118
citations
#87

TTRL: Test-Time Reinforcement Learning

Yuxin Zuo, Kaiyan Zhang, Li Sheng et al.

NeurIPS 2025
118
citations
#88

Teaching Arithmetic to Small Transformers

Nayoung Lee, Kartik Sreenivasan, Jason Lee et al.

ICLR 2024
117
citations
#89

SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

Rui-Jie Zhu, Qihang Zhao, Jason Eshraghian et al.

ICLR 2025
115
citations
#90

ToolACE: Winning the Points of LLM Function Calling

Weiwen Liu, Xu Huang, Xingshan Zeng et al.

ICLR 2025
114
citations
#91

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

Zekun Qi, Runpei Dong, Shaochen Zhang et al.

ECCV 2024
113
citations
#92

LLMCarbon: Modeling the End-to-End Carbon Footprint of Large Language Models

Ahmad Faiz, Sotaro Kaneda, Ruhan Wang et al.

ICLR 2024
110
citations
#93

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Jan Betley, Daniel Tan, Niels Warncke et al.

ICML 2025
110
citations
#94

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Zehan Qi, Xiao Liu, Iat Long Iong et al.

ICLR 2025
110
citations
#95

VideoLLM-online: Online Video Large Language Model for Streaming Video

Joya Chen, Zhaoyang Lv, Shiwei Wu et al.

CVPR 2024
109
citations
#96

Tamper-Resistant Safeguards for Open-Weight LLMs

Rishub Tamirisa, Bhrugu Bharathi, Long Phan et al.

ICLR 2025arXiv:2408.00761
tamper-resistant safeguardsopen-weight llmsmodel weight tamperingrefusal safeguards+3
108
citations
#97

Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-Based Retrofitting

Xinyan Guan, Yanjiang Liu, Hongyu Lin et al.

AAAI 2024arXiv:2311.13314
knowledge graph integrationlarge language model hallucinationfactual knowledge retrievalautonomous knowledge verification+2
108
citations
#98

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

Di Wu, Hongwei Wang, Wenhao Yu et al.

ICLR 2025
108
citations
#99

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

Han Zhao, Min Zhang, Wei Zhao et al.

AAAI 2025
106
citations
#100

A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

Haoran Xu, Young Jin Kim, Amr Mohamed Nabil Aly Aly Sharaf et al.

ICLR 2024
105
citations