🧬Language Models

Prompt Engineering

Designing effective prompts for LLMs

100 papers4,740 total citations
Compare with other topics
Feb '24 Jan '26554 papers
Also includes: prompt engineering, prompt tuning, prompt learning, prompt optimization

Top Papers

#1

From Crowdsourced Data to High-quality Benchmarks: Arena-Hard and Benchbuilder Pipeline

Tianle Li, Wei-Lin Chiang, Evan Frick et al.

ICML 2025
329
citations
#2

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

Chenglei Si, Diyi Yang, Tatsunori Hashimoto

ICLR 2025
272
citations
#3

Knowledge Graph Prompting for Multi-Document Question Answering

Yu Wang, Nedim Lipka, Ryan A. Rossi et al.

AAAI 2024arXiv:2308.11730
knowledge graph promptingmulti-document question answeringgraph construction modulegraph traversal module+4
231
citations
#4

Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge

Jiayi Ye, Yanbo Wang, Yue Huang et al.

ICLR 2025arXiv:2410.02736
llm-as-a-judgebias quantificationautomated evaluation frameworklanguage model evaluation+2
207
citations
#5

BooookScore: A systematic exploration of book-length summarization in the era of LLMs

Yapei Chang, Kyle Lo, Tanya Goyal et al.

ICLR 2024
156
citations
#6

JudgeBench: A Benchmark for Evaluating LLM-Based Judges

Sijun Tan, Siyuan Zhuang, Kyle Montgomery et al.

ICLR 2025arXiv:2410.12784
llm-based judgesevaluation frameworkpreference labelingobjective correctness+4
150
citations
#7

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal

Tinghao Xie, Xiangyu Qi, Yi Zeng et al.

ICLR 2025arXiv:2406.14598
safety refusal evaluationlarge language modelsfine-grained taxonomieslinguistic augmentations+4
141
citations
#8

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian et al.

ICLR 2025
127
citations
#9

AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs

Anselm Paulus, Arman Zharmagambetov, Chuan Guo et al.

ICML 2025
123
citations
#10

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

Di Wu, Hongwei Wang, Wenhao Yu et al.

ICLR 2025arXiv:2410.10813
long-term memorychat assistant systemsmemory indexingmemory retrieval+4
114
citations
#11

ToolACE: Winning the Points of LLM Function Calling

Weiwen Liu, Xu Huang, Xingshan Zeng et al.

ICLR 2025
114
citations
#12

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

Hanrong Zhang, Jingyuan Huang, Kai Mei et al.

ICLR 2025
103
citations
#13

Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control

Longtao Zheng, Rundong Wang, Xinrun Wang et al.

ICLR 2024
103
citations
#14

OR-Bench: An Over-Refusal Benchmark for Large Language Models

Jiaxing Cui, Wei-Lin Chiang, Ion Stoica et al.

ICML 2025
97
citations
#15

Curiosity-driven Red-teaming for Large Language Models

Zhang-Wei Hong, Idan Shenfeld, Johnson (Tsun-Hsuan) Wang et al.

ICLR 2024
77
citations
#16

Eliciting Human Preferences with Language Models

Belinda Li, Alex Tamkin, Noah Goodman et al.

ICLR 2025
76
citations
#17

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis

Ziyue Jiang, Jinglin Liu, Yi Ren et al.

ICLR 2024
74
citations
#18

Enhancing Job Recommendation through LLM-Based Generative Adversarial Networks

Yingpeng Du, Di Luo, Rui Yan et al.

AAAI 2024arXiv:2307.10747
job recommendationlarge language modelsgenerative adversarial networksresume completion+4
72
citations
#19

PromptTTS 2: Describing and Generating Voices with Text Prompt

Yichong Leng, ZHifang Guo, Kai Shen et al.

ICLR 2024
70
citations
#20

Programming Refusal with Conditional Activation Steering

Bruce W. Lee, Inkit Padhi, Karthikeyan Natesan Ramamurthy et al.

ICLR 2025
70
citations
#21

Does Refusal Training in LLMs Generalize to the Past Tense?

Maksym Andriushchenko, Nicolas Flammarion

ICLR 2025
66
citations
#22

AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents

Ke Yang, Yao Liu, Sapana Chaudhary et al.

ICLR 2025arXiv:2410.13825
web agent groundingobservation space alignmentaction space alignmentllm-based agents+4
66
citations
#23

Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge

Swarnadeep Saha, Xian Li, Marjan Ghazvininejad et al.

ICML 2025
53
citations
#24

Causal Order: The Key to Leveraging Imperfect Experts in Causal Inference

Aniket Vashishtha, Abbavaram Gowtham Reddy, Abhinav Kumar et al.

ICLR 2025arXiv:2310.15117
causal inferencecausal ordercausal graphspairwise prompting+4
48
citations
#25

DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM

Yixuan Wu, Yizhou Wang, Shixiang Tang et al.

ECCV 2024
47
citations
#26

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

Hang Hua, Yunlong Tang, Chenliang Xu et al.

AAAI 2025
47
citations
#27

Learning How Hard to Think: Input-Adaptive Allocation of LM Computation

Mehul Damani, Idan Shenfeld, Andi Peng et al.

ICLR 2025
45
citations
#28

ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance

Chunwei Wang, Guansong Lu, Junwei Yang et al.

ICCV 2025
43
citations
#29

How efficient is LLM-generated code? A rigorous & high-standard benchmark

Ruizhong Qiu, Weiliang Zeng, James Ezick et al.

ICLR 2025arXiv:2406.06647
program synthesiscode efficiency evaluationefficiency metric designllm-generated code+4
43
citations
#30

Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement

Jaehun Jung, Faeze Brahman, Yejin Choi

ICLR 2025
42
citations
#31

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

Yusu Qian, Hanrong Ye, Jean-Philippe Fauconnier et al.

ICLR 2025
41
citations
#32

PREFER: Prompt Ensemble Learning via Feedback-Reflect-Refine

Chenrui Zhang, Lin Liu, Chuyuan Wang et al.

AAAI 2024arXiv:2308.12033
prompt ensemble learninglarge language modelsfeedback mechanismboosting algorithms+4
41
citations
#33

HSEvo: Elevating Automatic Heuristic Design with Diversity-Driven Harmony Search and Genetic Algorithm Using LLMs

Pham Vu Tuan Dat, Long Doan, Huynh Thi Thanh Binh

AAAI 2025
39
citations
#34

Agents' Room: Narrative Generation through Multi-step Collaboration

Fantine Huot, Reinald Kim Amplayo, Jennimaria Palomaki et al.

ICLR 2025
38
citations
#35

MathAttack: Attacking Large Language Models towards Math Solving Ability

Zihao Zhou, Qiufeng Wang, Mingyu Jin et al.

AAAI 2024arXiv:2309.01686
adversarial attacksmath word problemslarge language modelslogical entity recognition+4
37
citations
#36

Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions

Bhuvanashree Murugadoss, Christian Poelitz, Ian Drosos et al.

AAAI 2025
36
citations
#37

PAD: Personalized Alignment of LLMs at Decoding-time

Ruizhe Chen, Xiaotian Zhang, Meng Luo et al.

ICLR 2025
35
citations
#38

Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance

Yaxi Lu, Shenzhi Yang, Cheng Qian et al.

ICLR 2025
34
citations
#39

SCALM: Detecting Bad Practices in Smart Contracts Through LLMs

Zongwei Li, Xiaoqi Li, Wenkai Li et al.

AAAI 2025
33
citations
#40

Adversarial Prompt Tuning for Vision-Language Models

Jiaming Zhang, Xingjun Ma, Xin Wang et al.

ECCV 2024
33
citations
#41

Preference Optimization for Reasoning with Pseudo Feedback

Fangkai Jiao, Geyang Guo, Xingxing Zhang et al.

ICLR 2025
33
citations
#42

Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation

Yiming Wang, Pei Zhang, Baosong Yang et al.

ICLR 2025
32
citations
#43

Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions

Michael Zhang, W. Bradley Knox, Eunsol Choi

ICLR 2025
32
citations
#44

Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference

Barys Liskavets, Maxim Ushakov, Shuvendu Roy et al.

AAAI 2025
31
citations
#45

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Mantas Mazeika, Xuwang Yin, Rishub Tamirisa et al.

NeurIPS 2025
31
citations
#46

Open-World Human-Object Interaction Detection via Multi-modal Prompts

Jie Yang, Bingliang Li, Ailing Zeng et al.

CVPR 2024
31
citations
#47

Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs

Muhammad Jehanzeb Mirza, Leonid Karlinsky, Wei Lin et al.

ECCV 2024
30
citations
#48

Soft Prompt Generation for Domain Generalization

Shuanghao Bai, Yuedi Zhang, Wanqi Zhou et al.

ECCV 2024arXiv:2404.19286
soft prompt learningdomain generalizationvision language modelsprompt generation+4
30
citations
#49

What Makes Large Language Models Reason in (Multi-Turn) Code Generation?

Kunhao Zheng, Juliette Decugis, Jonas Gehring et al.

ICLR 2025
30
citations
#50

Self-Boosting Large Language Models with Synthetic Preference Data

Qingxiu Dong, Li Dong, Xingxing Zhang et al.

ICLR 2025
29
citations
#51

LAMM: Label Alignment for Multi-Modal Prompt Learning

Jingsheng Gao, Jiacheng Ruan, Suncheng Xiang et al.

AAAI 2024arXiv:2312.08212
prompt tuningvisual-language modelslabel alignmentfew-shot learning+3
28
citations
#52

Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking

Benjamin Feuer, Micah Goldblum, Teresa Datta et al.

ICLR 2025
27
citations
#53

CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs

Siyu Wang, Cailian Chen, Xinyi Le et al.

AAAI 2025
26
citations
#54

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

Zihan Zheng, Zerui Cheng, Zeyu Shen et al.

NeurIPS 2025
25
citations
#55

Cascade Prompt Learning for Visual-Language Model Adaptation

Ge Wu, Xin Zhang, Zheng Li et al.

ECCV 2024
24
citations
#56

MMQA: Evaluating LLMs with Multi-Table Multi-Hop Complex Questions

Jian Wu, Linyi Yang, Dongyuan Li et al.

ICLR 2025
tabular data understandingmulti-table question answeringtext-to-sql generationmulti-hop reasoning+4
23
citations
#57

Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors

Weixuan Wang, JINGYUAN YANG, Wei Peng

ICLR 2025
23
citations
#58

Unknown Prompt the only Lacuna: Unveiling CLIP's Potential for Open Domain Generalization

Mainak Singha, Ankit Jha, Shirsha Bose et al.

CVPR 2024
23
citations
#59

Truthful Aggregation of LLMs with an Application to Online Advertising

Ermis Soumalias, Michael Curry, Sven Seuken

NeurIPS 2025arXiv:2405.05905
auction mechanism designtruthful reportingpreference aggregationonline advertising+3
22
citations
#60

ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World

Weixiang Yan, Haitian Liu, Tengxiao Wu et al.

NeurIPS 2025
22
citations
#61

Diverse Preference Learning for Capabilities and Alignment

Stewart Slocum, Asher Parker-Sartori, Dylan Hadfield-Menell

ICLR 2025arXiv:2511.08594
preference learningkl divergence regularizerllm output diversityalignment algorithms+3
21
citations
#62

Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models

Yuanzhao Zhai, Tingkai Yang, Kele Xu et al.

AAAI 2025
21
citations
#63

Investigating Non-Transitivity in LLM-as-a-Judge

Yi Xu, Laura Ruis, Tim Rocktäschel et al.

ICML 2025
19
citations
#64

SCANS: Mitigating the Exaggerated Safety for LLMs via Safety-Conscious Activation Steering

Zouying Cao, Yifei Yang, Hai Zhao

AAAI 2025
19
citations
#65

Efficiently Scaling LLM Reasoning Programs with Certaindex

Yichao Fu, Junda Chen, Siqi Zhu et al.

NeurIPS 2025
19
citations
#66

Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly

Yexin Liu, Zhengyang Liang, Yueze Wang et al.

CVPR 2025
19
citations
#67

Mechanism Design for LLM Fine-tuning with Multiple Reward Models

Haoran Sun, Yurong Chen, Siwei Wang et al.

NeurIPS 2025
19
citations
#68

Reducing Tool Hallucination via Reliability Alignment

Hongshen Xu, Zichen Zhu, Lei Pan et al.

ICML 2025
19
citations
#69

Customizing Language Model Responses with Contrastive In-Context Learning

Xiang Gao, Kamalika Das

AAAI 2024arXiv:2401.17390
contrastive learninglanguage model alignmentin-context learningintent customization+4
19
citations
#70

Boosting of Thoughts: Trial-and-Error Problem Solving with Large Language Models

Sijia Chen, Baochun Li, Di Niu

ICLR 2024
19
citations
#71

Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy

Joonhyun Jeong, Seyun Bae, Yeonsung Jung et al.

CVPR 2025
18
citations
#72

Generating Novel Leads for Drug Discovery Using LLMs with Logical Feedback

Shreyas Bhat Brahmavar, Ashwin Srinivasan, Tirtharaj Dash et al.

AAAI 2024
18
citations
#73

OVOR: OnePrompt with Virtual Outlier Regularization for Rehearsal-Free Class-Incremental Learning

Wei-Cheng Huang, Chun-Fu Chen, Hsiang Hsu

ICLR 2024
18
citations
#74

One-stage Prompt-based Continual Learning

Youngeun Kim, YUHANG LI, Priyadarshini Panda

ECCV 2024
17
citations
#75

Memory Injection Attacks on LLM Agents via Query-Only Interaction

Shen Dong, Shaochen Xu, Pengfei He et al.

NeurIPS 2025arXiv:2503.03704
memory injection attacksllm agent securityquery-only interactionmalicious reasoning steps+3
16
citations
#76

LLMs Can Plan Only If We Tell Them

Bilgehan Sel, Ruoxi Jia, Ming Jin

ICLR 2025
16
citations
#77

Controllable Navigation Instruction Generation with Chain of Thought Prompting

Xianghao Kong, Jinyu Chen, Wenguan Wang et al.

ECCV 2024
16
citations
#78

MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization

Kangyu Zhu, Peng Xia, Yun Li et al.

ICML 2025
16
citations
#79

BEST-Route: Adaptive LLM Routing with Test-Time Optimal Compute

Dujian Ding, Ankur Mallick, Shaokun Zhang et al.

ICML 2025
16
citations
#80

Security Attacks on LLM-based Code Completion Tools

Wen Cheng, Ke Sun, Xinyu Zhang et al.

AAAI 2025
15
citations
#81

RocketEval: Efficient automated LLM evaluation via grading checklist

Tianjun Wei, Wei Wen, Ruizhi Qiao et al.

ICLR 2025
15
citations
#82

ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems

Xiangyuan Xue, Zeyu Lu, Di Huang et al.

CVPR 2025
15
citations
#83

Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval

Sheryl Hsu, Omar Khattab, Chelsea Finn et al.

ICLR 2025
15
citations
#84

Get an A in Math: Progressive Rectification Prompting

Zhenyu Wu, Meng Jiang, Chao Shen

AAAI 2024arXiv:2312.06867
chain-of-thought promptingmath word problemsprogressive rectification promptingreasoning path generation+3
15
citations
#85

Learning to Learn Better Visual Prompts

Fengxiang Wang, Wanrong Huang, Shaowu Yang et al.

AAAI 2024
14
citations
#86

Detecting High-Stakes Interactions with Activation Probes

Alex McKenzie, Urja Pawar, Phil Blandfort et al.

NeurIPS 2025
13
citations
#87

EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code

Yuhao Qing, Boyu Zhu, Mingzhe Du et al.

NeurIPS 2025
13
citations
#88

xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation

Qingchen Yu, Zifan Zheng, Shichao Song et al.

ICLR 2025
13
citations
#89

Compound Text-Guided Prompt Tuning via Image-Adaptive Cues

Hao Tan, Jun Li, Yizhuang Zhou et al.

AAAI 2024arXiv:2312.06401
prompt tuningvision-language modelsfew-shot recognitiondomain generalization+3
13
citations
#90

An Engorgio Prompt Makes Large Language Model Babble on

Jianshuo Dong, Ziyuan Zhang, Qingjie Zhang et al.

ICLR 2025
13
citations
#91

Citations and Trust in LLM Generated Responses

Yifan Ding, Matthew Facciani, Ellen Joyce et al.

AAAI 2025
12
citations
#92

MemSim: A Bayesian Simulator for Evaluating Memory of LLM-based Personal Assistants

Zeyu Zhang, Quanyu Dai, Luyu Chen et al.

NeurIPS 2025
12
citations
#93

Active Evaluation Acquisition for Efficient LLM Benchmarking

Yang Li, Jie Ma, Miguel Ballesteros et al.

ICML 2025
12
citations
#94

Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLMs

Xinyu Fang, Zhijian Chen, Kai Lan et al.

ICCV 2025arXiv:2503.14478
multimodal large language modelscreative intelligence assessmentcontext-aware evaluationimage-based tasks+4
12
citations
#95

Can Watermarked LLMs be Identified by Users via Crafted Prompts?

Aiwei Liu, Sheng Guan, Yiming Liu et al.

ICLR 2025
12
citations
#96

CyberPal.AI: Empowering LLMs with Expert-Driven Cybersecurity Instructions

Matan Levi, Yair Allouche, Daniel Ohayon et al.

AAAI 2025
12
citations
#97

SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks

Hwiwon Lee, Ziqi Zhang, Hanxiao Lu et al.

NeurIPS 2025
11
citations
#98

ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting

Shaofei Cai, Zihao Wang, Kewei Lian et al.

CVPR 2025
11
citations
#99

Context Steering: Controllable Personalization at Inference Time

Zhiyang He, Sashrika Pandey, Mariah Schrum et al.

ICLR 2025
11
citations
#100

Planning in the Dark: LLM-Symbolic Planning Pipeline Without Experts

Sukai Huang, Nir Lipovetzky, Trevor Cohn

AAAI 2025
11
citations