Most Cited COLM 2025 "separable non-linear least squares" Papers

418 papers found • Page 1 of 3

#1

Understanding R1-Zero-Like Training: A Critical Perspective

Zichen Liu, Changyu Chen, Wenjun Li et al.

COLM 2025paperarXiv:2503.20783
703
citations
#2

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Bowen Jin, Hansi Zeng, Zhenrui Yue et al.

COLM 2025paperarXiv:2503.09516
685
citations
#3

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

Nathan Lambert, Jacob Morrison, Valentina Pyatkin et al.

COLM 2025paperarXiv:2411.15124
491
citations
#4

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

Weihao Zeng, Yuzhen Huang, Qian Liu et al.

COLM 2025paperarXiv:2503.18892
389
citations
#5

LIMO: Less is More for Reasoning

Yixin Ye, Zhen Huang, Yang Xiao et al.

COLM 2025paperarXiv:2502.03387
380
citations
#6

Training Large Language Models to Reason in a Continuous Latent Space

Shibo Hao, Sainbayar Sukhbaatar, DiJia Su et al.

COLM 2025paperarXiv:2412.06769
349
citations
#7

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

Kanishk Gandhi, Ayush K Chakravarthy, Anikait Singh et al.

COLM 2025paperarXiv:2503.01307
314
citations
#8

FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios

Ethan Chern, Steffi Chern, Shiqi Chen et al.

COLM 2025paperarXiv:2307.13528
276
citations
#9

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

Pranjal Aggarwal, Sean Welleck

COLM 2025paperarXiv:2503.04697
247
citations
#10

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

Zefan Cai, Yichi Zhang, Bofei Gao et al.

COLM 2025paperarXiv:2406.02069
202
citations
#11

SmolVLM: Redefining small and efficient multimodal models

Andrés Marafioti, Orr Zohar, Miquel Farré et al.

COLM 2025paperarXiv:2504.05299
124
citations
#12

Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining

Rosie Zhao, Alexandru Meterez, Sham M. Kakade et al.

COLM 2025paperarXiv:2504.07912
87
citations
#13

Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving

Yong Lin, Shange Tang, Bohan Lyu et al.

COLM 2025paperarXiv:2502.07640
82
citations
#14

RWKV-7 "Goose" with Expressive Dynamic State Evolution

Bo Peng, Ruichong Zhang, Daniel Goldstein et al.

COLM 2025paper
76
citations
#15

Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

Saaket Agashe, Kyle Wong, Vincent Tu et al.

COLM 2025paperarXiv:2504.00906
72
citations
#16

A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility

Andreas Hochlehnert, Hardik Bhatnagar, Vishaal Udandarao et al.

COLM 2025paperarXiv:2504.07086
70
citations
#17

Why do LLMs attend to the first token?

Federico Barbero, Alvaro Arroyo, Xiangming Gu et al.

COLM 2025paperarXiv:2504.02732
63
citations
#18

LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K

Tao Yuan, Xuefei Ning, Dong Zhou et al.

COLM 2025paperarXiv:2402.05136
62
citations
#19

AIOS: LLM Agent Operating System

Kai Mei, Xi Zhu, Wujiang Xu et al.

COLM 2025paperarXiv:2403.16971
62
citations
#20

An Illusion of Progress? Assessing the Current State of Web Agents

Tianci Xue, Weijian Qi, Tianneng Shi et al.

COLM 2025paperarXiv:2504.01382
53
citations
#21

OpenCodeReasoning: Advancing Data Distillation for Competitive Coding

Wasi Uddin Ahmad, Sean Narenthiran, Somshubra Majumdar et al.

COLM 2025paper
52
citations
#22

X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents

Salman Rahman, Liwei Jiang, James Shiffer et al.

COLM 2025paperarXiv:2504.13203
49
citations
#23

Learning Adaptive Parallel Reasoning with Language Models

Jiayi Pan, Xiuyu Li, Long Lian et al.

COLM 2025paperarXiv:2504.15466
49
citations
#24

SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models

Arijit Ray, Jiafei Duan, Ellis L Brown II et al.

COLM 2025paperarXiv:2412.07755
48
citations
#25

FineWeb2: One Pipeline to Scale Them All — Adapting Pre-Training Data Processing to Every Language

Guilherme Penedo, Hynek Kydlíček, Vinko Sabolčec et al.

COLM 2025paperarXiv:2506.20920
48
citations
#26

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Deepak Nathani, Lovish Madaan, Nicholas Roberts et al.

COLM 2025paper
48
citations
#27

Rank1: Test-Time Compute for Reranking in Information Retrieval

Orion Weller, Kathryn Ricci, Eugene Yang et al.

COLM 2025paperarXiv:2502.18418
47
citations
#28

SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths

Kaixuan Huang, Xudong Guo, Mengdi Wang

COLM 2025paper
46
citations
#29

R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents

Naman Jain, Jaskirat Singh, Manish Shetty et al.

COLM 2025paper
46
citations
#30

LLMs as Research Tools: A Large Scale Survey of Researchers’ Usage and Perceptions

Zhehui Liao, Maria Antoniak, Inyoung Cheong et al.

COLM 2025paperarXiv:2411.05025
45
citations
#31

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

Chengqi Lyu, Songyang Gao, Yuzhe Gu et al.

COLM 2025paper
44
citations
#32

ReasonIR: Training Retrievers for Reasoning Tasks

Rulin Shao, Rui Qiao, Varsha Kishore et al.

COLM 2025paperarXiv:2504.20595
44
citations
#33

Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate

Yubo Wang, Xiang Yue, Wenhu Chen

COLM 2025paperarXiv:2501.17703
42
citations
#34

Arctic-Embed 2.0: Multilingual Retrieval Without Compromise

Puxuan Yu, Luke Merrick, Gaurav Nuti et al.

COLM 2025paperarXiv:2412.04506
42
citations
#35

Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

Mohammadreza Pourreza, Shayan Talaei, Ruoxi Sun et al.

COLM 2025paperarXiv:2503.23157
42
citations
#36

SEAL: Steerable Reasoning Calibration of Large Language Models for Free

Runjin Chen, Zhenyu Zhang, Junyuan Hong et al.

COLM 2025paperarXiv:2504.07986
41
citations
#37

Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?

Chenrui Fan, Ming Li, Lichao Sun et al.

COLM 2025paperarXiv:2504.06514
38
citations
#38

Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time

Van Yang, Xiang Yue, Vipin Chaudhary et al.

COLM 2025paperarXiv:2504.12329
37
citations
#39

Mixture of Attention Spans: Optimizing LLM Inference Efficiency with Heterogeneous Sliding-Window Lengths

Tianyu Fu, Haofeng Huang, Xuefei Ning et al.

COLM 2025paperarXiv:2406.14909
37
citations
#40

MALT: Improving Reasoning with Multi-Agent LLM Training

Sumeet Ramesh Motwani, Chandler Smith, Rocktim Jyoti Das et al.

COLM 2025paperarXiv:2412.01928
37
citations
#41

Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale

Bowen Jiang, Zhuoqun Hao, Young Min Cho et al.

COLM 2025paperarXiv:2504.14225
36
citations
#42

Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models

Ruikang Liu, Yuxuan Sun, Manyi Zhang et al.

COLM 2025paperarXiv:2504.04823
35
citations
#43

Scaling Laws of Synthetic Data for Language Model

Zeyu Qin, Qingxiu Dong, Xingxing Zhang et al.

COLM 2025paperarXiv:2503.19551
35
citations
#44

SuperBPE: Space Travel for Language Models

Alisa Liu, Jonathan Hayase, Valentin Hofmann et al.

COLM 2025paperarXiv:2503.13423
34
citations
#45

Steering Large Language Model Activations in Sparse Spaces

Reza Bayat, Ali Rahimi-Kalahroudi, Mohammad Pezeshki et al.

COLM 2025paperarXiv:2503.00177
33
citations
#46

Retrieval-Augmented Generation with Conflicting Evidence

Han Wang, Archiki Prasad, Elias Stengel-Eskin et al.

COLM 2025paperarXiv:2504.13079
32
citations
#47

Values in the Wild: Discovering and Mapping Values in Real-World Language Model Interactions

Saffron Huang, Esin DURMUS, Kunal Handa et al.

COLM 2025paper
31
citations
#48

Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers

Shalev Lifshitz, Sheila A. McIlraith, Yilun Du

COLM 2025paperarXiv:2502.20379
31
citations
#49

Spike No More: Stabilizing the Pre-training of Large Language Models

Sho Takase, Shun Kiyono, Sosuke Kobayashi et al.

COLM 2025paperarXiv:2312.16903
31
citations
#50

Cascade Reward Sampling for Efficient Decoding-Time Alignment

Bolian Li, Yifan Wang, Anamika Lochab et al.

COLM 2025paperarXiv:2406.16306
30
citations
#51

MegaMath: Pushing the Limits of Open Math Corpora

Fan Zhou, Zengzhi Wang, Nikhil Ranjan et al.

COLM 2025paperarXiv:2504.02807
29
citations
#52

NoveltyBench: Evaluating Language Models for Humanlike Diversity

Yiming Zhang, Harshita Diddee, Susan Holm et al.

COLM 2025paperarXiv:2504.05228
28
citations
#53

Evaluating the Diversity and Quality of LLM Generated Content

Alexander Shypula, Shuo Li, Botong Zhang et al.

COLM 2025paperarXiv:2504.12522
27
citations
#54

Modifying Large Language Model Post-Training for Diverse Creative Writing

John Joon Young Chung, Vishakh Padmakumar, Melissa Roemmele et al.

COLM 2025paperarXiv:2503.17126
25
citations
#55

Weight ensembling improves reasoning in language models

Xingyu Dang, Christina Baek, Kaiyue Wen et al.

COLM 2025paperarXiv:2504.10478
25
citations
#56

Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning

Anja Šurina, Amin Mansouri, Lars C.P.M. Quaedvlieg et al.

COLM 2025paperarXiv:2504.05108
25
citations
#57

What is the Visual Cognition Gap between Humans and Multimodal LLMs?

Xu Cao, Yifan Shen, Bolin Lai et al.

COLM 2025paperarXiv:2406.10424
24
citations
#58

CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing

Wenhao Zheng, Yixiao Chen, Weitong Zhang et al.

COLM 2025paperarXiv:2502.01976
24
citations
#59

When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning

Nishad Singhi, Hritik Bansal, Arian Hosseini et al.

COLM 2025paper
24
citations
#60

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

Xing Han Lù, Amirhossein Kazemnejad, Nicholas Meade et al.

COLM 2025paperarXiv:2504.08942
23
citations
#61

ThoughtTerminator: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models

Xiao Pu, Michael Saxon, Wenyue Hua et al.

COLM 2025paperarXiv:2504.13367
23
citations
#62

M-Prometheus: A Suite of Open Multilingual LLM Judges

José Pombal, Dongkeun Yoon, Patrick Fernandes et al.

COLM 2025paperarXiv:2504.04953
23
citations
#63

VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information

Ryo Kamoi, Yusen Zhang, Sarkar Snigdha Sarathi Das et al.

COLM 2025paperarXiv:2412.00947
22
citations
#64

Multi-Agent Systems Execute Arbitrary Malicious Code

Harold Triedman, Rishi Dev Jha, Vitaly Shmatikov

COLM 2025paperarXiv:2503.12188
22
citations
#65

Inducing Programmatic Skills for Agentic Tasks

Zora Zhiruo Wang, Apurva Gandhi, Graham Neubig et al.

COLM 2025paperarXiv:2504.06821
22
citations
#66

How do language models learn facts? Dynamics, curricula and hallucinations

Nicolas Zucchet, Jorg Bornschein, Stephanie C.Y. Chan et al.

COLM 2025paper
21
citations
#67

Establishing Task Scaling Laws via Compute-Efficient Model Ladders

Akshita Bhagia, Jiacheng Liu, Alexander Wettig et al.

COLM 2025paperarXiv:2412.04403
20
citations
#68

Streaming DiLoCo with overlapping communication

Arthur Douillard, Yani Donchev, J Keith Rush et al.

COLM 2025paperarXiv:2501.18512
20
citations
#69

Learning to Reason for Long-Form Story Generation

Alexander Gurung, Mirella Lapata

COLM 2025paper
19
citations
#70

PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages

Priyanshu Kumar, Devansh Jain, Akhila Yerukola et al.

COLM 2025paperarXiv:2504.04377
19
citations
#71

EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees

Zhiyuan Zeng, Yizhong Wang, Hannaneh Hajishirzi et al.

COLM 2025paperarXiv:2503.08893
18
citations
#72

A Controlled Study on Long Context Extension and Generalization in LLMs

Yi Lu, Jing Nathan Yan, Songlin Yang et al.

COLM 2025paperarXiv:2409.12181
18
citations
#73

EuroBERT: Scaling Multilingual Encoders for European Languages

Nicolas Boizard, Hippolyte Gisserot-Boukhlef, Duarte Miguel Alves et al.

COLM 2025paperarXiv:2503.05500
18
citations
#74

M²IV: Towards Efficient and Fine-grained Multimodal In-Context Learning via Representation Engineering

Yanshu Li, Yi Cao, Hongyang He et al.

COLM 2025paper
17
citations
#75

DoomArena: A framework for Testing AI Agents Against Evolving Security Threats

Léo Boisvert, Abhay Puri, Gabriel Huang et al.

COLM 2025paperarXiv:2504.14064
17
citations
#76

Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation

Songjun Tu, Jiahao Lin, Xiangyu Tian et al.

COLM 2025paperarXiv:2503.12854
17
citations
#77

Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration

Ran Xu, Wenqi Shi, Yuchen Zhuang et al.

COLM 2025paperarXiv:2504.04915
17
citations
#78

SAEs Can Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs

Aashiq Muhamed, Jacopo Bonato, Mona T. Diab et al.

COLM 2025paper
17
citations
#79

SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding

Mingze Xu, Mingfei Gao, Shiyu Li et al.

COLM 2025paperarXiv:2503.18943
17
citations
#80

Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics

Hamed Mahdavi, Alireza Hashemi, Majid Daliri et al.

COLM 2025paperarXiv:2504.01995
16
citations
#81

Shared Global and Local Geometry of Language Model Embeddings

Andrew Lee, Melanie Weber, Fernanda Viégas et al.

COLM 2025paperarXiv:2503.21073
16
citations
#82

Task Vectors in In-Context Learning: Emergence, Formation, and Benefits

Liu Yang, Ziqian Lin, Kangwook Lee et al.

COLM 2025paperarXiv:2501.09240
16
citations
#83

Law of Vision Representation in MLLMs

Shijia Yang, Bohan Zhai, Quanzeng You et al.

COLM 2025paperarXiv:2408.16357
16
citations
#84

SciReplicate-Bench: Benchmarking LLMs in Agent-driven Algorithmic Reproduction from Research Papers

Yanzheng Xiang, Hanqi Yan, Shuyin Ouyang et al.

COLM 2025paperarXiv:2504.00255
16
citations
#85

Base Models Beat Aligned Models at Randomness and Creativity

Peter West, Christopher Potts

COLM 2025paperarXiv:2505.00047
16
citations
#86

DynaSaur: Large Language Agents Beyond Predefined Actions

Dang Nguyen, Viet Dac Lai, Seunghyun Yoon et al.

COLM 2025paperarXiv:2411.01747
15
citations
#87

Finding Flawed Fictions: Evaluating Complex Reasoning in Language Models via Plot Hole Detection

Kabir Ahuja, Melanie Sclar, Yulia Tsvetkov

COLM 2025paperarXiv:2504.11900
15
citations
#88

Boundless Byte Pair Encoding: Breaking the Pre-tokenization Barrier

Craig W Schmidt, Varshini Reddy, Chris Tanner et al.

COLM 2025paperarXiv:2504.00178
15
citations
#89

LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation

Xi Ye, Fangcong Yin, Yinghui He et al.

COLM 2025paper
15
citations
#90

The Dual-Route Model of Induction

Sheridan Feucht, Eric Todd, Byron C Wallace et al.

COLM 2025paperarXiv:2504.03022
15
citations
#91

AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time computation

Tuhin Chakrabarty, Philippe Laban, Chien-Sheng Wu

COLM 2025paperarXiv:2504.07532
15
citations
#92

ALFA: Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning

Shuyue Stella Li, Jimin Mun, Faeze Brahman et al.

COLM 2025paperarXiv:2502.14860
15
citations
#93

Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models

Meghana Arakkal Rajeev, Rajkumar Ramamurthy, Prapti Trivedi et al.

COLM 2025paperarXiv:2503.01781
15
citations
#94

BEARCUBS: A benchmark for computer-using web agents

Yixiao Song, Katherine Thai, Chau Minh Pham et al.

COLM 2025paperarXiv:2503.07919
14
citations
#95

To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning

Tian Qin, David Alvarez-Melis, Samy Jelassi et al.

COLM 2025paperarXiv:2504.07052
14
citations
#96

Interpreting the linear structure of vision-language model embedding spaces

Isabel Papadimitriou, Huangyuan Su, Thomas Fel et al.

COLM 2025paperarXiv:2504.11695
14
citations
#97

Can A Society of Generative Agents Simulate Human Behavior and Inform Public Health Policy? A Case Study on Vaccine Hesitancy

Abe Bohan Hou, Hongru Du, Yichen Wang et al.

COLM 2025paperarXiv:2503.09639
14
citations
#98

LongCodeBench: Evaluating Coding LLMs at 1M Context Windows

Stefano Rando, Luca Romani, Alessio Sampieri et al.

COLM 2025paperarXiv:2505.07897
14
citations
#99

Assessing Judging Bias in Large Reasoning Models: An Empirical Study

Qian Wang, Zhanzhi Lou, Zhenheng Tang et al.

COLM 2025paperarXiv:2504.09946
14
citations
#100

Backdoor Attacks on Dense Retrieval via Public and Unintentional Triggers

Quanyu Long, Yue Deng, Leilei Gan et al.

COLM 2025paperarXiv:2402.13532
14
citations
#101

Learning to Generate Unit Tests for Automated Debugging

Archiki Prasad, Elias Stengel-Eskin, Justin Chen et al.

COLM 2025paperarXiv:2502.01619
14
citations
#102

Bayesian scaling laws for in-context learning

Aryaman Arora, Dan Jurafsky, Christopher Potts et al.

COLM 2025paperarXiv:2410.16531
13
citations
#103

Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback

Runlong Zhou, Maryam Fazel, Simon Shaolei Du

COLM 2025paperarXiv:2503.08942
13
citations
#104

Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models

Thao Nguyen, Yang Li, Olga Golovneva et al.

COLM 2025paperarXiv:2506.04689
13
citations
#105

Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers

Kusha Sareen, Morgane M Moss, Alessandro Sordoni et al.

COLM 2025paperarXiv:2505.04842
13
citations
#106

Prompt-Reverse Inconsistency: LLM Self-Inconsistency Beyond Generative Randomness and Prompt Paraphrasing

Jihyun Janice Ahn, Wenpeng Yin

COLM 2025paperarXiv:2504.01282
13
citations
#107

Understanding Layer Significance in LLM Alignment

Guangyuan SHI, ZEXIN LU, Xiaoyu DONG et al.

COLM 2025paperarXiv:2410.17875
12
citations
#108

Language Model Personalization via Reward Factorization

Idan Shenfeld, Felix Faltings, Pulkit Agrawal et al.

COLM 2025paperarXiv:2503.06358
12
citations
#109

Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models

Hyunwoo Kim, Melanie Sclar, Tan Zhi-Xuan et al.

COLM 2025paperarXiv:2502.11881
12
citations
#110

LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation

Juzheng Zhang, Jiacheng You, Ashwinee Panda et al.

COLM 2025paperarXiv:2504.07448
12
citations
#111

Understanding the Uncertainty of LLM Explanations: A Perspective Based on Reasoning Topology

Longchao Da, Xiaoou Liu, Jiaxin Dai et al.

COLM 2025paper
12
citations
#112

Language Models Fail to Introspect About Their Knowledge of Language

Siyuan Song, Jennifer Hu, Kyle Mahowald

COLM 2025paperarXiv:2503.07513
12
citations
#113

Adaptive Layer-skipping in Pre-trained LLMs

Xuan Luo, Weizhi Wang, Xifeng Yan

COLM 2025paperarXiv:2503.23798
12
citations
#114

LLMs Are In-Context Bandit Reinforcement Learners

Giovanni Monea, Antoine Bosselut, Kianté Brantley et al.

COLM 2025paperarXiv:2410.05362
12
citations
#115

CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation

Anirudh Khatry, Robert Zhang, Jia Pan et al.

COLM 2025paperarXiv:2504.15254
12
citations
#116

FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning

Zhehao Zhang, Weijie Xu, Fanyou Wu et al.

COLM 2025paperarXiv:2505.08054
12
citations
#117

Jigsaw Puzzles: Splitting Harmful Questions to Jailbreak Large Language Models in Multi-turn Interactions

Hao Yang, Lizhen Qu, Ehsan Shareghi et al.

COLM 2025paper
11
citations
#118

Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors

Fan Nie, Lan Feng, Haotian Ye et al.

COLM 2025paperarXiv:2504.04785
11
citations
#119

Epistemic Alignment: A Mediating Framework for User-LLM Knowledge Delivery

Nicholas Clark, Hua Shen, Bill Howe et al.

COLM 2025paperarXiv:2504.01205
11
citations
#120

Advancing Language Multi-Agent Learning with Credit Re-Assignment for Interactive Environment Generalization

Zhitao He, Zijun Liu, Peng Li et al.

COLM 2025paperarXiv:2502.14496
10
citations
#121

Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought'' Control

Hannah Cyberey, David Evans

COLM 2025paper
10
citations
#122

Fluid Language Model Benchmarking

Valentin Hofmann, David Heineman, Ian Magnusson et al.

COLM 2025paperarXiv:2509.11106
10
citations
#123

LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models

Minqian Liu, Zhiyang Xu, Xinyi Zhang et al.

COLM 2025paperarXiv:2504.10430
10
citations
#124

Unifying Autoregressive and Diffusion-Based Sequence Generation

Nima Fathi, Torsten Scholak, Pierre-Andre Noel

COLM 2025paperarXiv:2504.06416
10
citations
#125

Breaking the Data Barrier -- Building GUI Agents Through Task Generalization

Junlei Zhang, Zichen Ding, Chang Ma et al.

COLM 2025paperarXiv:2504.10127
10
citations
#126

Language Model Uncertainty Quantification with Attention Chain

Yinghao Li, Rushi Qiang, Lama Moukheiber et al.

COLM 2025paperarXiv:2503.19168
10
citations
#127

LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks

Soumyadeep Pal, Changsheng Wang, James Diffenderfer et al.

COLM 2025paperarXiv:2504.10185
10
citations
#128

Sample Efficient Preference Alignment in LLMs via Active Exploration

Viraj Mehta, Syrine Belakaria, Vikramjeet Das et al.

COLM 2025paperarXiv:2312.00267
10
citations
#129

Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models

Neel Jain, Aditya Shrivastava, Chenyang Zhu et al.

COLM 2025paperarXiv:2412.06748
10
citations
#130

EllieSQL: Cost-Efficient Text-to-SQL with Complexity-Aware Routing

Yizhang Zhu, Runzhi JIANG, Boyan Li et al.

COLM 2025paperarXiv:2503.22402
10
citations
#131

LoRe: Personalizing LLMs via Low-Rank Reward Modeling

Avinandan Bose, Zhihan Xiong, Yuejie Chi et al.

COLM 2025paperarXiv:2504.14439
10
citations
#132

Pairwise or Pointwise? Evaluating Feedback Protocols for Bias in LLM-Based Evaluation

Tuhina Tripathi, Manya Wadhwa, Greg Durrett et al.

COLM 2025paperarXiv:2504.14716
9
citations
#133

FineMedLM-o1: Enhancing Medical Knowledge Reasoning Ability of LLM from Supervised Fine-Tuning to Test-Time Training

hongzhou yu, Tianhao Cheng, Yingwen Wang et al.

COLM 2025paperarXiv:2501.09213
9
citations
#134

LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception

Yuan-Hong Liao, Sven Elflein, Liu He et al.

COLM 2025paperarXiv:2504.15362
9
citations
#135

Layers at Similar Depths Generate Similar Activations Across LLM Architectures

Christopher Wolfram, Aaron Schein

COLM 2025paperarXiv:2504.08775
9
citations
#136

Efficient Process Reward Model Training via Active Learning

Keyu Duan, Zichen Liu, Xin Mao et al.

COLM 2025paperarXiv:2504.10559
9
citations
#137

A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models

Zhouhang Xie, Junda Wu, Yiran Shen et al.

COLM 2025paperarXiv:2504.07070
9
citations
#138

One ruler to measure them all: Benchmarking multilingual long-context language models

Yekyung Kim, Jenna Russell, Marzena Karpinska et al.

COLM 2025paperarXiv:2503.01996
9
citations
#139

Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models

Lynn Chua, Badih Ghazi, Yangsibo Huang et al.

COLM 2025paperarXiv:2406.16135
9
citations
#140

SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?

Jianzhu Yao, Kevin Wang, Ryan Hsieh et al.

COLM 2025paperarXiv:2503.12349
9
citations
#141

Have Large Language Models Learned to Reason? A Characterization via 3-SAT

RISHI HAZRA, Gabriele Venturato, Pedro Zuidberg Dos Martires et al.

COLM 2025paperarXiv:2504.03930
9
citations
#142

Hardware-Efficient Attention for Fast Decoding

Ted Zadouri, Hubert Strauss, Tri Dao

COLM 2025paperarXiv:2505.21487
8
citations
#143

Texture or Semantics? Vision-Language Models Get Lost in Font Recognition

Zhecheng Li, Guoxian Song, Yujun Cai et al.

COLM 2025paperarXiv:2503.23768
8
citations
#144

Defending LLM Watermarking Against Spoofing Attacks with Contrastive Representation Learning

Li An, Yujian Liu, Yepeng Liu et al.

COLM 2025paperarXiv:2504.06575
8
citations
#145

Training Plug-and-Play Knowledge Modules with Deep Context Distillation

Lucas Caccia, Alan Ansell, Edoardo Ponti et al.

COLM 2025paperarXiv:2503.08727
8
citations
#146

One-shot Optimized Steering Vectors Mediate Safety-relevant Behaviors in LLMs

Jacob Dunefsky, Arman Cohan

COLM 2025paperarXiv:2502.18862
8
citations
#147

The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains

Scott Geng, Hamish Ivison, Chun-Liang Li et al.

COLM 2025paperarXiv:2507.06187
8
citations
#148

Agents Are All You Need for LLM Unlearning

Debdeep Sanyal, Murari Mandal

COLM 2025paperarXiv:2502.00406
8
citations
#149

Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling

Hang Zheng, Hongshen Xu, Yuncong Liu et al.

COLM 2025paperarXiv:2503.02233
8
citations
#150

Do LLMs Understand Your Translations? Evaluating Paragraph-level MT with Question Answering

Patrick Fernandes, Sweta Agrawal, Emmanouil Zaranis et al.

COLM 2025paperarXiv:2504.07583
8
citations
#151

KVSink: Understanding and Enhancing the Preservation of Attention Sinks in KV Cache Quantization for LLMs

Zunhai Su, Kehong Yuan

COLM 2025paperarXiv:2508.04257
8
citations
#152

EnrichIndex: Using LLMs to Enrich Retrieval Indices Offline

Peter Baile Chen, Tomer Wolfson, Mike Cafarella et al.

COLM 2025paperarXiv:2504.03598
8
citations
#153

ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations

Yubo Wang, Xueguang Ma, Ping Nie et al.

COLM 2025paperarXiv:2504.00824
8
citations
#154

Towards User-level Private Reinforcement Learning with Human Feedback

Jiaming Zhang, Mingxi Lei, Meng Ding et al.

COLM 2025paperarXiv:2502.17515
8
citations
#155

Not All Data Are Unlearned Equally

Aravind Krishnan, Siva Reddy, Marius Mosbach

COLM 2025paperarXiv:2504.05058
8
citations
#156

Contextualize-then-Aggregate: Circuits for In-Context Learning in Gemma-2 2B

Aleksandra Bakalova, Yana Veitsman, Xinting Huang et al.

COLM 2025paperarXiv:2504.00132
8
citations
#157

Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models

José Pombal, Nuno M Guerreiro, Ricardo Rei et al.

COLM 2025paperarXiv:2504.01001
8
citations
#158

Boosting LLM Reasoning via Spontaneous Self-Correction

Xutong Zhao, Tengyu Xu, Xuewei Wang et al.

COLM 2025paperarXiv:2506.06923
8
citations
#159

Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach

Shijian Deng, Wentian Zhao, Yu-Jhe Li et al.

COLM 2025paperarXiv:2411.17760
8
citations
#160

Can Test-Time Scaling Improve World Foundation Model?

Wenyan Cong, Hanqing Zhu, Peihao Wang et al.

COLM 2025paperarXiv:2503.24320
7
citations
#161

Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling

Ben Lipkin, Benjamin LeBrun, Jacob Hoover Vigly et al.

COLM 2025paperarXiv:2504.05410
7
citations
#162

AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation

Itay Nakash, Nitay Calderon, Eyal Ben-David et al.

COLM 2025paperarXiv:2503.19693
7
citations
#163

Adversarial Training of Reward Models

Alexander Bukharin, Haifeng Qian, Shengyang Sun et al.

COLM 2025paperarXiv:2504.06141
7
citations
#164

DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation

Jingyang Xiang, Sai Qian Zhang

COLM 2025paperarXiv:2412.00648
7
citations
#165

$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources

Apoorv Khandelwal, Tian Yun, Nihal V. Nayak et al.

COLM 2025paperarXiv:2410.23261
6
citations
#166

Rerouting LLM Routers

Avital Shafran, Roei Schuster, Tom Ristenpart et al.

COLM 2025paperarXiv:2501.01818
6
citations
#167

CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis

Anjiang Wei, Tarun Suresh, Jiannan Cao et al.

COLM 2025paper
6
citations
#168

Can LLMs Handle WebShell Detection? Overcoming Detection Challenges with Behavioral Function-Aware Framework

Feijiang Han, Jiaming Zhang, Chuyi Deng et al.

COLM 2025paperarXiv:2504.13811
6
citations
#169

Self-Steering Language Models

Gabriel Grand, Joshua B. Tenenbaum, Vikash Mansinghka et al.

COLM 2025paperarXiv:2504.07081
6
citations
#170

Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation

Julia Kreutzer, Eleftheria Briakou, Sweta Agrawal et al.

COLM 2025paperarXiv:2504.11829
6
citations
#171

LeakAgent: RL-based Red-teaming Agent for LLM Privacy Leakage

Yuzhou Nie, Zhun Wang, Ye Yu et al.

COLM 2025paperarXiv:2412.05734
6
citations
#172

The Blessing and Curse of Dimensionality in Safety Alignment

Rachel S.Y. Teo, Laziz Abdullaev, Tan Minh Nguyen

COLM 2025paperarXiv:2507.20333
6
citations
#173

Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging

Ryo Bertolissi, Jonas Hübotter, Ido Hakimi et al.

COLM 2025paperarXiv:2505.14136
6
citations
#174

Self-Evolving Critique Abilities in Large Language Models

Zhengyang Tang, Ziniu Li, Zhenyang Xiao et al.

COLM 2025paperarXiv:2501.05727
6
citations
#175

Plancraft: an evaluation dataset for planning with LLM agents

Gautier Dagan, Frank Keller, Alex Lascarides

COLM 2025paperarXiv:2412.21033
6
citations
#176

Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games

David Guzman Piedrahita, Yongjin Yang, Mrinmaya Sachan et al.

COLM 2025paper
6
citations
#177

ICQuant: Index Coding enables Low-bit LLM Quantization

Xinlin Li, Osama Hanna, Christina Fragouli et al.

COLM 2025paperarXiv:2505.00850
6
citations
#178

Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources

Weizhi Wang, Yu Tian, Linjie Yang et al.

COLM 2025paperarXiv:2504.00595
6
citations
#179

RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale

Daniel Goldstein, Eric Alcaide, Janna Lu et al.

COLM 2025paperarXiv:2505.03005
6
citations
#180

Multilingual Contextualization of Large Language Models for Document-Level Machine Translation

Miguel Moura Ramos, Patrick Fernandes, Sweta Agrawal et al.

COLM 2025paperarXiv:2504.12140
5
citations
#181

Supposedly Equivalent Facts That Aren’t? Entity Frequency in Pre-training Induces Asymmetry in LLMs

Yuan He, Bailan He, Zifeng Ding et al.

COLM 2025paper
5
citations
#182

SecurityLingua: Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression

Yucheng Li, Surin Ahn, Huiqiang Jiang et al.

COLM 2025paperarXiv:2506.12707
5
citations
#183

PersonaEval: Are LLM Evaluators Human Enough to Judge Role-Play?

Lingfeng Zhou, Jialing Zhang, Jin Gao et al.

COLM 2025paperarXiv:2508.10014
5
citations
#184

Sherkala-Chat: Building a State-of-the-Art LLM for Kazakh in a Moderately Resourced Setting

Fajri Koto, Rituraj Joshi, Nurdaulet Mukhituly et al.

COLM 2025paper
5
citations
#185

C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing

Zhongyang Li, Ziyue Li, Tianyi Zhou

COLM 2025paper
5
citations
#186

How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence

Hongzhe Du, Weikai Li, Min Cai et al.

COLM 2025paperarXiv:2504.02904
5
citations
#187

Gating is Weighting: Understanding Gated Linear Attention through In-context Learning

Yingcong Li, Davoud Ataee Tarzanagh, Ankit Singh Rawat et al.

COLM 2025paper
5
citations
#188

VideoSAVi: Self-Aligned Video Language Models without Human Supervision

Yogesh Kulkarni, Pooyan Fazli

COLM 2025paperarXiv:2412.00624
5
citations
#189

CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation

Jixuan Leng, Chengsong Huang, Langlin Huang et al.

COLM 2025paperarXiv:2504.00043
5
citations
#190

Multi-Token Attention

Olga Golovneva, Tianlu Wang, Jason E Weston et al.

COLM 2025paperarXiv:2504.00927
5
citations
#191

LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning

Gabriel Jacob Perin, Runjin Chen, Xuxi Chen et al.

COLM 2025paperarXiv:2506.15606
5
citations
#192

BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning

Ahmed Masry, Abhay Puri, Masoud Hashemi et al.

COLM 2025paperarXiv:2508.09804
5
citations
#193

SUV: Scalable Large Language Model Copyright Compliance with Regularized Selective Unlearning

Tianyang Xu, Xiaoze Liu, Feijie Wu et al.

COLM 2025paperarXiv:2503.22948
5
citations
#194

Don’t lie to your friends: Learning what you know from collaborative self-play

Jacob Eisenstein, Reza Aghajani, Adam Fisch et al.

COLM 2025paper
5
citations
#195

Scaling Analysis of Interleaved Speech-Text Language Models

Gallil Maimon, Michael Hassid, Amit Roth et al.

COLM 2025paperarXiv:2504.02398
5
citations
#196

Rethinking Safety in LLM Fine-tuning: An Optimization Perspective

Minseon Kim, Jin Myung Kwak, Lama Alssum et al.

COLM 2025paperarXiv:2508.12531
5
citations
#197

CultureCLIP: Empowering CLIP with Cultural Awareness through Synthetic Images and Contextualized Captions

Yuchen Huang, Zhiyuan Fan, Zhitao He et al.

COLM 2025paperarXiv:2507.06210
5
citations
#198

FormaRL: Enhancing Autoformalization with no Labeled Data

Yanxing Huang, Xinling Jin, Sijie Liang et al.

COLM 2025paperarXiv:2508.18914
5
citations
#199

(Im)possibility of Automated Hallucination Detection in Large Language Models

Amin Karbasi, Omar Montasser, John Sous et al.

COLM 2025paperarXiv:2504.17004
5
citations
#200

Out-of-Distribution Detection using Synthetic Data Generation

Momin Abbas, Muneeza Azmat, Raya Horesh et al.

COLM 2025paperarXiv:2502.03323
5
citations
PreviousNext