Most Cited COLM Highlight "cryptographic techniques" Papers

418 papers found • Page 1 of 3

#1

Pretraining on the Test Set Is No Longer All You Need: A Debate-Driven Approach to QA Benchmarks

Linbo Cao, Jinman Zhao

COLM 2025paperarXiv:2507.17747
#2

Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games

David Guzman Piedrahita, Yongjin Yang, Mrinmaya Sachan et al.

COLM 2025paper
#3

QAPyramid: Fine-grained Evaluation of Content Selection for Text Summarization

Shiyue Zhang, David Wan, Arie Cattan et al.

COLM 2025paper
#4

Sherkala-Chat: Building a State-of-the-Art LLM for Kazakh in a Moderately Resourced Setting

Fajri Koto, Rituraj Joshi, Nurdaulet Mukhituly et al.

COLM 2025paper
#5

Layerwise Importance Analysis of Feed-Forward Networks in Transformer-based Language Models

Wataru Ikeda, Kazuki Yano, Ryosuke Takahashi et al.

COLM 2025paperarXiv:2508.17734
#6

Teaching Models to Understand (but not Generate) High-risk Data

Ryan Yixiang Wang, Matthew Finlayson, Luca Soldaini et al.

COLM 2025paperarXiv:2505.03052
#7

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

Zefan Cai, Yichi Zhang, Bofei Gao et al.

COLM 2025paperarXiv:2406.02069
#8

Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining

Rosie Zhao, Alexandru Meterez, Sham M. Kakade et al.

COLM 2025paperarXiv:2504.07912
#9

Sample Efficient Preference Alignment in LLMs via Active Exploration

Viraj Mehta, Syrine Belakaria, Vikramjeet Das et al.

COLM 2025paperarXiv:2312.00267
#10

Probing Syntax in Large Language Models: Successes and Remaining Challenges

Pablo J. Diego Simon, Emmanuel Chemla, Jean-Remi King et al.

COLM 2025paperarXiv:2508.03211
#11

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Interactive AI Agents

Xuhui Zhou, Hyunwoo Kim, Faeze Brahman et al.

COLM 2025paper
#12

LIMO: Less is More for Reasoning

Yixin Ye, Zhen Huang, Yang Xiao et al.

COLM 2025paperarXiv:2502.03387
#13

Probing then Editing Response Personality of Large Language Models

Tianjie Ju, Zhenyu Shao, Bowen Wang et al.

COLM 2025paperarXiv:2504.10227
#14

A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models

Zhouhang Xie, Junda Wu, Yiran Shen et al.

COLM 2025paperarXiv:2504.07070
#15

Always Tell Me The Odds: Fine-grained Conditional Probability Estimation

Liaoyaqi Wang, Zhengping Jiang, Anqi Liu et al.

COLM 2025paperarXiv:2505.01595
#16

To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning

Tian Qin, David Alvarez-Melis, Samy Jelassi et al.

COLM 2025paper
#17

One-shot Optimized Steering Vectors Mediate Safety-relevant Behaviors in LLMs

Jacob Dunefsky, Arman Cohan

COLM 2025paper
#18

Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF

Syrine Belakaria, Joshua Kazdan, Charles Marx et al.

COLM 2025paper
#19

SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching

Yuxuan Zhu, Ali Falahati, David H. Yang et al.

COLM 2025paper
#20

Defending LLM Watermarking Against Spoofing Attacks with Contrastive Representation Learning

Li An, Yujian Liu, Yepeng Liu et al.

COLM 2025paper
#21

Do Language Models Agree with Human Perceptions of Suspense in Stories?

Glenn Matlin, Devin Zhang, Rodrigo Barroso Loza et al.

COLM 2025paper
#22

Learning by Teaching: Engaging Students as Instructors of Large Language Models in Computer Science Education

Xinming Yang, Haasil Pujara, Jun Li

COLM 2025paper
#23

CALLME: Call Graph Augmentation with Large Language Models for Javascript

Michael Wang, Kexin Pei, Armando Solar-Lezama

COLM 2025paper
#24

CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing

Wenhao Zheng, Yixiao Chen, Weitong Zhang et al.

COLM 2025paper
#25

LM Agents May Fail to Act on Their Own Risk Knowledge

Yuzhi Tang, Tianxiao Li, Elizabeth Li et al.

COLM 2025paper
#26

Approximating Language Model Training Data from Weights

John Xavier Morris, Junjie Oscar Yin, Woojeong Kim et al.

COLM 2025paper
#27

Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought'' Control

Hannah Cyberey, David Evans

COLM 2025paper
#28

Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics

Hamed Mahdavi, Alireza Hashemi, Majid Daliri et al.

COLM 2025paper
#29

Hardware-Efficient Attention for Fast Decoding

Ted Zadouri, Hubert Strauss, Tri Dao

COLM 2025paper
#30

Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality

Sewoong Lee, Adam Davies, Marc E. Canby et al.

COLM 2025paper
#31

Exploring Large Language Model Agents for Piloting Social Experiments

Jinghua Piao, Yuwei Yan, Nian Li et al.

COLM 2025paper
#32

In-context Ranking Preference Optimization

Junda Wu, Rohan Surana, Zhouhang Xie et al.

COLM 2025paper
#33

Weight ensembling improves reasoning in language models

Xingyu Dang, Christina Baek, Kaiyue Wen et al.

COLM 2025paper
#34

Arctic-Embed 2.0: Multilingual Retrieval Without Compromise

Puxuan Yu, Luke Merrick, Gaurav Nuti et al.

COLM 2025paper
#35

Multilingual Contextualization of Large Language Models for Document-Level Machine Translation

Miguel Moura Ramos, Patrick Fernandes, Sweta Agrawal et al.

COLM 2025paper
#36

SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding

Mingze Xu, Mingfei Gao, Shiyu Li et al.

COLM 2025paper
#37

Do Large Language Models Have a Planning Theory of Mind? Evidence from MindGames: a Multi-Step Persuasion Task

Jared Moore, Ned Cooper, Rasmus Overmark et al.

COLM 2025paper
#38

DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding

Hossein Entezari Zarch, Lei Gao, Chaoyi Jiang et al.

COLM 2025paper
#39

Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources

Weizhi Wang, Yu Tian, Linjie Yang et al.

COLM 2025paper
#40

LoRe: Personalizing LLMs via Low-Rank Reward Modeling

Avinandan Bose, Zhihan Xiong, Yuejie Chi et al.

COLM 2025paper
#41

CRABS: A syntactic-semantic pincer strategy for bounding LLM interpretation of Python notebooks

Meng Li, Timothy M. McPhillips, Dingmin Wang et al.

COLM 2025paper
#42

2 OLMo 2 Furious (COLM’s Version)

Evan Pete Walsh, Luca Soldaini, Dirk Groeneveld et al.

COLM 2025paper
#43

Beyond Blanket Masking: Examining Granularity for Privacy Protection in Images Captured by Blind and Low Vision Users

Jeffri Murrugarra-Llerena, Haoran Niu, K. Suzanne Barber et al.

COLM 2025paper
#44

Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning

Anja Šurina, Amin Mansouri, Lars C.P.M. Quaedvlieg et al.

COLM 2025paper
#45

MS-SSM: A Multi-Scale State Space Model for Efficient Sequence Modeling

Mahdi Karami, Ali Behrouz, Peilin Zhong et al.

COLM 2025paper
#46

IterKey: Iterative Keyword Generation with LLMs for Enhanced Retrieval Augmented Generation

Kazuki Hayashi, Hidetaka Kamigaito, Shinya Kouda et al.

COLM 2025paper
#47

Evaluating the Diversity and Quality of LLM Generated Content

Alexander Shypula, Shuo Li, Botong Zhang et al.

COLM 2025paper
#48

QUDsim: Quantifying Discourse Similarities in LLM-Generated Text

Ramya Namuduri, Yating Wu, Anshun Asher Zheng et al.

COLM 2025paper
#49

A Critical Look At Tokenwise Reward-Guided Text Generation

Ahmad Rashid, Ruotian Wu, Julia Grosse et al.

COLM 2025paper
#50

When Splitting Makes Stronger: A Theoretical and Empirical Analysis of Divide-and-Conquer Prompting in LLMs

Yizhou Zhang, Defu Cao, Lun Du et al.

COLM 2025paper
#51

Humans overrely on overconfident language models, across languages

Neil Rathi, Dan Jurafsky, Kaitlyn Zhou

COLM 2025paper
#52

Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation

Ziqiao Ma, Jing Ding, Xuejun Zhang et al.

COLM 2025paper
#53

Values in the Wild: Discovering and Mapping Values in Real-World Language Model Interactions

Saffron Huang, Esin DURMUS, Kunal Handa et al.

COLM 2025paper
#54

URANIA: Differentially Private Insights into AI Use

Daogao Liu, Edith Cohen, Badih Ghazi et al.

COLM 2025paper
#55

The Zero Body Problem: Probing LLM Use of Sensory Language

Rebecca M. M. Hicke, Sil Hamilton, David Mimno

COLM 2025paper
#56

RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing

Yiqing Xie, Alex Xie, Divyanshu Sheth et al.

COLM 2025paper
#57

Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors

Fan Nie, Lan Feng, Haotian Ye et al.

COLM 2025paper
#58

Guided Reasoning in LLM-Driven Penetration Testing Using Structured Attack Trees

Katsuaki Nakano, Reza Fayyazi, Shanchieh Yang et al.

COLM 2025paper
#59

AdaptMI: Adaptive Skill-based In-context Math Instructions for Small Language Models

Yinghui He, Abhishek Panigrahi, Yong Lin et al.

COLM 2025paper
#60

Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?

Chenrui Fan, Ming Li, Lichao Sun et al.

COLM 2025paper
#61

OpinioRAG: Towards Generating User-Centric Opinion Highlights from Large-scale Online Reviews

Mir Tafseer Nayeem, Davood Rafiei

COLM 2025paper
#62

Learning to Generate Unit Tests for Automated Debugging

Archiki Prasad, Elias Stengel-Eskin, Justin Chen et al.

COLM 2025paper
#63

LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks

Soumyadeep Pal, Changsheng Wang, James Diffenderfer et al.

COLM 2025paper
#64

Investigating Intersectional Bias in Large Language Models using Confidence Disparities in Coreference Resolution

Falaah Arif Khan, Nivedha Sivakumar, Yinong Oliver Wang et al.

COLM 2025paper
#65

Prompt-Reverse Inconsistency: LLM Self-Inconsistency Beyond Generative Randomness and Prompt Paraphrasing

Jihyun Janice Ahn, Wenpeng Yin

COLM 2025paper
#66

Positional Biases Shift as Inputs Approach Context Window Limits

Blerta Veseli, Julian Chibane, Mariya Toneva et al.

COLM 2025paper
#67

ADAPT: Actively Discovering and Adapting to Preferences for any Task

Maithili Patel, Xavier Puig, Ruta Desai et al.

COLM 2025paper
#68

SciReplicate-Bench: Benchmarking LLMs in Agent-driven Algorithmic Reproduction from Research Papers

Yanzheng Xiang, Hanqi Yan, Shuyin Ouyang et al.

COLM 2025paper
#69

True Multimodal In-Context Learning Needs Attention to the Visual Context

Shuo Chen, Jianzhe Liu, Zhen Han et al.

COLM 2025paper
#70

Post-training for Efficient Communication via Convention Formation

Yilun Hua, Evan Wang, Yoav Artzi

COLM 2025paper
#71

Streaming DiLoCo with overlapping communication

Arthur Douillard, Yani Donchev, J Keith Rush et al.

COLM 2025paper
#72

MuSeD: A Multimodal Spanish Dataset for Sexism Detection in Social Media Videos

Laura De Grazia, Pol Pastells, Mauro Vázquez Chas et al.

COLM 2025paper
#73

EvidenceBench: A Benchmark for Extracting Evidence from Biomedical Papers

Jianyou Wang, Weili Cao, Kaicheng Wang et al.

COLM 2025paper
#74

Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging

Ryo Bertolissi, Jonas Hübotter, Ido Hakimi et al.

COLM 2025paper
#75

An Illusion of Progress? Assessing the Current State of Web Agents

Tianci Xue, Weijian Qi, Tianneng Shi et al.

COLM 2025paper
#76

Boundless Byte Pair Encoding: Breaking the Pre-tokenization Barrier

Craig W Schmidt, Varshini Reddy, Chris Tanner et al.

COLM 2025paper
#77

Multi-Agent Systems Execute Arbitrary Malicious Code

Harold Triedman, Rishi Dev Jha, Vitaly Shmatikov

COLM 2025paper
#78

Privately Learning from Graphs with Applications in Fine-tuning Large Language Models

Haoteng Yin, Rongzhe Wei, Eli Chien et al.

COLM 2025paper
#79

Evaluating LLMs on Chinese Idiom Translation

Cai Yang, Yao Dou, David Heineman et al.

COLM 2025paper
#80

Rethinking Safety in LLM Fine-tuning: An Optimization Perspective

Minseon Kim, Jin Myung Kwak, Lama Alssum et al.

COLM 2025paper
#81

CodeXEmbed: A Generalist Embedding Model Family for Multilingual and Multi-task Code Retrieval

Ye Liu, Rui Meng, Shafiq Joty et al.

COLM 2025paper
#82

Do LLMs Understand Your Translations? Evaluating Paragraph-level MT with Question Answering

Patrick Fernandes, Sweta Agrawal, Emmanouil Zaranis et al.

COLM 2025paper
#83

Law of Vision Representation in MLLMs

Shijia Yang, Bohan Zhai, Quanzeng You et al.

COLM 2025paper
#84

Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts

Samin Yeasar Arnob, Zhan Su, Minseon Kim et al.

COLM 2025paper
#85

Towards Compute-Optimal Many-Shot In-Context Learning

Shahriar Golchin, Yanfei Chen, Rujun Han et al.

COLM 2025paper
#86

The World According to LLMs: How Geographic Origin Influences LLMs' Entity Deduction Capabilities

Harsh Nishant Lalai, Raj Sanjay Shah, Jiaxin Pei et al.

COLM 2025paper
#87

LeakAgent: RL-based Red-teaming Agent for LLM Privacy Leakage

Yuzhou Nie, Zhun Wang, Ye Yu et al.

COLM 2025paper
#88

ReasonIR: Training Retrievers for Reasoning Tasks

Rulin Shao, Rui Qiao, Varsha Kishore et al.

COLM 2025paper
#89

Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation

Julia Kreutzer, Eleftheria Briakou, Sweta Agrawal et al.

COLM 2025paper
#90

Self-Rewarding PPO: Aligning Large Language Models with Demonstrations Only

Qingru Zhang, Liang Qiu, Ilgee Hong et al.

COLM 2025paper
#91

BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning

Ahmed Masry, Abhay Puri, Masoud Hashemi et al.

COLM 2025paper
#92

Self-Steering Language Models

Gabriel Grand, Joshua B. Tenenbaum, Vikash Mansinghka et al.

COLM 2025paper
#93

Training Large Language Models to Reason in a Continuous Latent Space

Shibo Hao, Sainbayar Sukhbaatar, DiJia Su et al.

COLM 2025paper
#94

The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains

Scott Geng, Hamish Ivison, Chun-Liang Li et al.

COLM 2025paper
#95

ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data

Tong Chen, Faeze Brahman, Jiacheng Liu et al.

COLM 2025paper
#96

Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate

Yubo Wang, Xiang Yue, Wenhu Chen

COLM 2025paper
#97

Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving

Yong Lin, Shange Tang, Bohan Lyu et al.

COLM 2025paper
#98

Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling

Ben Lipkin, Benjamin LeBrun, Jacob Hoover Vigly et al.

COLM 2025paper
#99

On the Effectiveness and Generalization of Race Representations for Debiasing High-Stakes Decisions

Dang Nguyen, Chenhao Tan

COLM 2025paper
#100

Quantifying Fairness in LLMs Beyond Tokens: A Semantic and Statistical Perspective

Weijie Xu, Yiwen Wang, Chi Xue et al.

COLM 2025paper
#101

Improving Fisher Information Estimation and Efficiency for LoRA-based LLM Unlearning

Yejin Kim, Eunwon Kim, Buru Chang et al.

COLM 2025paper
#102

Multi-Token Attention

Olga Golovneva, Tianlu Wang, Jason E Weston et al.

COLM 2025paper
#103

From Queries to Criteria: Understanding How Astronomers Evaluate LLMs

Alina Hyk, Kiera McCormick, Mian Zhong et al.

COLM 2025paper
#104

Analyzing Multilingualism in Large Language Models with Sparse Autoencoders

Ikhyun Cho, Julia Hockenmaier

COLM 2025paper
#105

CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation

Jixuan Leng, Chengsong Huang, Langlin Huang et al.

COLM 2025paper
#106

Unifying Autoregressive and Diffusion-Based Sequence Generation

Nima Fathi, Torsten Scholak, Pierre-Andre Noel

COLM 2025paper
#107

Control the Temperature: Selective Sampling for Diverse and High-Quality LLM Outputs

Sergey Troshin, Wafaa Mohammed, Yan Meng et al.

COLM 2025paper
#108

UTF-8 Plumbing: Byte-level Tokenizers Unavoidably Enable LLMs to Generate Ill-formed UTF-8

Preston Firestone, Shubham Ugare, Gagandeep Singh et al.

COLM 2025paper
#109

RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale

Daniel Goldstein, Eric Alcaide, Janna Lu et al.

COLM 2025paper
#110

Learning Adaptive Parallel Reasoning with Language Models

Jiayi Pan, Xiuyu Li, Long Lian et al.

COLM 2025paper
#111

EllieSQL: Cost-Efficient Text-to-SQL with Complexity-Aware Routing

Yizhang Zhu, Runzhi JIANG, Boyan Li et al.

COLM 2025paper
#112

Why do LLMs attend to the first token?

Federico Barbero, Alvaro Arroyo, Xiangming Gu et al.

COLM 2025paper
#113

Overfill: Two-Stage Models for Efficient Language Model Decoding

Woojeong Kim, Junxiong Wang, Jing Nathan Yan et al.

COLM 2025paper
#114

CLIPPER: Compression enables long-context synthetic data generation

Chau Minh Pham, Yapei Chang, Mohit Iyyer

COLM 2025paper
#115

Society of Mind Meets Real-Time Strategy: A Hierarchical Multi-Agent Framework for Strategic Reasoning

Daechul Ahn, San Kim, Jonghyun Choi

COLM 2025paper
#116

A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility

Andreas Hochlehnert, Hardik Bhatnagar, Vishaal Udandarao et al.

COLM 2025paper
#117

SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?

Jianzhu Yao, Kevin Wang, Ryan Hsieh et al.

COLM 2025paper
#118

Resource-efficient Inference with Foundation Model Programs

Lunyiu Nie, Zhimin Ding, Kevin Yu et al.

COLM 2025paper
#119

Teach Old SAEs New Domain Tricks with Boosting

Nikita Koriagin, Yaroslav Aksenov, Daniil Laptev et al.

COLM 2025paper
#120

Improving LLMs‘ Generalized Reasoning Abilities by Graph Problems

Qifan Zhang, Nuo Chen, Zehua Li et al.

COLM 2025paper
#121

Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models

José Pombal, Nuno M Guerreiro, Ricardo Rei et al.

COLM 2025paper
#122

Register Always Matters: Analysis of LLM Pretraining Data Through the Lens of Language Variation

Amanda Myntti, Erik Henriksson, Veronika Laippala et al.

COLM 2025paper
#123

Scoring Verifiers: Evaluating Synthetic Verification for Code and Reasoning

Aleksander Ficek, Somshubra Majumdar, Vahid Noroozi et al.

COLM 2025paper
#124

Cascade Reward Sampling for Efficient Decoding-Time Alignment

Bolian Li, Yifan Wang, Anamika Lochab et al.

COLM 2025paper
#125

Reverse-engineering NLI: A study of the meta-inferential properties of Natural Language Inference

Rasmus Blanck, Bill Noble, Stergios Chatzikyriakidis

COLM 2025paper
#126

Have Large Language Models Learned to Reason? A Characterization via 3-SAT

RISHI HAZRA, Gabriele Venturato, Pedro Zuidberg Dos Martires et al.

COLM 2025paper
#127

HIPPO-VIDEO : Simulating Watch Histories with Large Language Models for History-Driven Video Highlighting

Jeongeun Lee, Youngjae Yu, Dongha Lee

COLM 2025paper
#128

Adversarial Training of Reward Models

Alexander Bukharin, Haifeng Qian, Shengyang Sun et al.

COLM 2025paper
#129

The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning

Raj Sanjay Shah, Jing Huang, Keerthiram Murugesan et al.

COLM 2025paper
#130

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs

Itay Itzhak, Yonatan Belinkov, Gabriel Stanovsky

COLM 2025paper
#131

ALOPE: Adaptive Layer Optimization for Translation Quality Estimation using Large Language Models

Archchana Sindhujan, Shenbin Qian, Chan Chi Chun Matthew et al.

COLM 2025paper
#132

The Blessing and Curse of Dimensionality in Safety Alignment

Rachel S.Y. Teo, Laziz Abdullaev, Tan Minh Nguyen

COLM 2025paper
#133

AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation

Itay Nakash, Nitay Calderon, Eyal Ben-David et al.

COLM 2025paper
#134

Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling

Hang Zheng, Hongshen Xu, Yuncong Liu et al.

COLM 2025paper
#135

Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation

Yi Lu, Wanxu Zhao, Xin Zhou et al.

COLM 2025paper
#136

A Controlled Study on Long Context Extension and Generalization in LLMs

Yi Lu, Jing Nathan Yan, Songlin Yang et al.

COLM 2025paper
#137

G1yphD3c0de: Towards Safer Language Models on Visually Perturbed Texts

Yejinchoi, Yejin Yeo, Yejin Son et al.

COLM 2025paper
#138

Efficient Process Reward Model Training via Active Learning

Keyu Duan, Zichen Liu, Xin Mao et al.

COLM 2025paper
#139

Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models

Ameen Ali Ali, Shahar Katz, Lior Wolf et al.

COLM 2025paper
#140

FormaRL: Enhancing Autoformalization with no Labeled Data

Yanxing Huang, Xinling Jin, Sijie Liang et al.

COLM 2025paper
#141

ICQuant: Index Coding enables Low-bit LLM Quantization

Xinlin Li, Osama Hanna, Christina Fragouli et al.

COLM 2025paper
#142

MAC: A Live Benchmark for Multimodal Large Language Models in Scientific Understanding

Mohan Jiang, Jin Gao, Jiahao Zhan et al.

COLM 2025paper
#143

Interpreting the linear structure of vision-language model embedding spaces

Isabel Papadimitriou, Huangyuan Su, Thomas Fel et al.

COLM 2025paper
#144

LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models

Minqian Liu, Zhiyang Xu, Xinyi Zhang et al.

COLM 2025paper
#145

Establishing Task Scaling Laws via Compute-Efficient Model Ladders

Akshita Bhagia, Jiacheng Liu, Alexander Wettig et al.

COLM 2025paper
#146

Can A Society of Generative Agents Simulate Human Behavior and Inform Public Health Policy? A Case Study on Vaccine Hesitancy

Abe Bohan Hou, Hongru Du, Yichen Wang et al.

COLM 2025paper
#147

LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception

Yuan-Hong Liao, Sven Elflein, Liu He et al.

COLM 2025paper
#148

RARe: Retrieval Augmented Retrieval with In-Context Examples

Atula Tejaswi, Yoonsang Lee, sujay sanghavi et al.

COLM 2025paper
#149

Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

Mohammadreza Pourreza, Shayan Talaei, Ruoxi Sun et al.

COLM 2025paper
#150

PersonaEval: Are LLM Evaluators Human Enough to Judge Role-Play?

Lingfeng Zhou, Jialing Zhang, Jin Gao et al.

COLM 2025paper
#151

MapIQ: Evaluating Multimodal Large Language Models for Map Question Answering

Varun Srivastava, Fan Lei, Srija Mukhopadhyay et al.

COLM 2025paper
#152

Bayesian scaling laws for in-context learning

Aryaman Arora, Dan Jurafsky, Christopher Potts et al.

COLM 2025paper
#153

DoomArena: A framework for Testing AI Agents Against Evolving Security Threats

Léo Boisvert, Abhay Puri, Gabriel Huang et al.

COLM 2025paper
#154

Texture or Semantics? Vision-Language Models Get Lost in Font Recognition

Zhecheng Li, Guoxian Song, Yujun Cai et al.

COLM 2025paper
#155

Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time

Van Yang, Xiang Yue, Vipin Chaudhary et al.

COLM 2025paper
#156

Transformers are Efficient Compilers, Provably

Xiyu Zhai, Runlong Zhou, Liao Zhang et al.

COLM 2025paper
#157

Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation

Lajanugen Logeswaran, Jaekyeom Kim, Sungryull Sohn et al.

COLM 2025paper
#158

Pretrained Hybrids with MAD Skills

Nicholas Roberts, Samuel Guo, Zhiqi Gao et al.

COLM 2025paper
#159

Benchmarking Retrieval-Augmented Generation for Chemistry

Xianrui Zhong, Bowen Jin, Siru Ouyang et al.

COLM 2025paper
#160

AutoScale: Scale-Aware Data Mixing for Pre-Training LLMs

Feiyang Kang, Yifan Sun, Bingbing Wen et al.

COLM 2025paper
#161

Meta-Learning for Speeding Up Large Model Inference in Decentralized Environments

Yipeng Du, Zihao Wang, Ahmad Farhan et al.

COLM 2025paper
#162

Multilingual and Multi-Accent Jailbreaking of Audio LLMs

Jaechul Roh, Virat Shejwalkar, Amir Houmansadr

COLM 2025paper
#163

X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression

Guihong Li, Mehdi Rezagholizadeh, Mingyu Yang et al.

COLM 2025paper
#164

OpenCodeReasoning: Advancing Data Distillation for Competitive Coding

Wasi Uddin Ahmad, Sean Narenthiran, Somshubra Majumdar et al.

COLM 2025paper
#165

UNVEILING: What Makes Linguistics Olympiad Puzzles Tricky for LLMs?

Mukund Choudhary, KV Aditya Srivatsa, Gaurja Aeron et al.

COLM 2025paper
#166

Inducing Programmatic Skills for Agentic Tasks

Zora Zhiruo Wang, Apurva Gandhi, Graham Neubig et al.

COLM 2025paper
#167

Learning to Reason for Long-Form Story Generation

Alexander Gurung, Mirella Lapata

COLM 2025paper
#168

Can LLM "Self-report"?: Evaluating the Validity of Self-report Scales in Measuring Personality Design in LLM-based Chatbots

Huiqi Zou, Pengda Wang, Zihan Yan et al.

COLM 2025paper
#169

Visual Representations inside the Language Model

Benlin Liu, Amita Kamath, Madeleine Grunde-McLaughlin et al.

COLM 2025paper
#170

SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths

Kaixuan Huang, Xudong Guo, Mengdi Wang

COLM 2025paper
#171

RankAlign: A Ranking View of the Generator-Validator Gap in Large Language Models

Juan Diego Rodriguez, Wenxuan Ding, Katrin Erk et al.

COLM 2025paper
#172

SAEs Can Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs

Aashiq Muhamed, Jacopo Bonato, Mona T. Diab et al.

COLM 2025paper
#173

Energy-Based Reward Models for Robust Language Model Alignment

Anamika Lochab, Ruqi Zhang

COLM 2025paper
#174

AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time computation

Tuhin Chakrabarty, Philippe Laban, Chien-Sheng Wu

COLM 2025paper
#175

Mitigating Modal Imbalance in Multimodal Reasoning

Chen Henry Wu, Neil Kale, Aditi Raghunathan

COLM 2025paper
#176

NoveltyBench: Evaluating Language Models for Humanlike Diversity

Yiming Zhang, Harshita Diddee, Susan Holm et al.

COLM 2025paper
#177

(Im)possibility of Automated Hallucination Detection in Large Language Models

Amin Karbasi, Omar Montasser, John Sous et al.

COLM 2025paper
#178

RRO: LLM Agent Optimization Through Rising Reward Trajectories

Zilong Wang, Jingfeng Yang, Sreyashi Nag et al.

COLM 2025paper
#179

Single-Pass Document Scanning for Question Answering

Weili Cao, Jianyou Wang, Youze Zheng et al.

COLM 2025paper
#180

Knowledge Graph Retrieval-Augmented Generation via GNN-Guided Prompting

Haochen Liu, Song Wang, Jundong Li

COLM 2025paper
#181

Don’t lie to your friends: Learning what you know from collaborative self-play

Jacob Eisenstein, Reza Aghajani, Adam Fisch et al.

COLM 2025paper
#182

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Bowen Jin, Hansi Zeng, Zhenrui Yue et al.

COLM 2025paper
#183

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

Xing Han Lù, Amirhossein Kazemnejad, Nicholas Meade et al.

COLM 2025paper
#184

EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees

Zhiyuan Zeng, Yizhong Wang, Hannaneh Hajishirzi et al.

COLM 2025paper
#185

ThoughtTerminator: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models

Xiao Pu, Michael Saxon, Wenyue Hua et al.

COLM 2025paper
#186

Scaling Analysis of Interleaved Speech-Text Language Models

Gallil Maimon, Michael Hassid, Amit Roth et al.

COLM 2025paper
#187

Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers

Kusha Sareen, Morgane M Moss, Alessandro Sordoni et al.

COLM 2025paper
#188

Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models

Neel Jain, Aditya Shrivastava, Chenyang Zhu et al.

COLM 2025paper
#189

Language Model Personalization via Reward Factorization

Idan Shenfeld, Felix Faltings, Pulkit Agrawal et al.

COLM 2025paper
#190

Resona: Improving Context Copying in Linear Recurrence Models with Retrieval

Xinyu Wang, Linrui Ma, Jerry Huang et al.

COLM 2025paper
#191

Model-Agnostic Policy Explanations with Large Language Models

Zhang Xi-Jia, Yue Guo, Shufei Chen et al.

COLM 2025paper
#192

How Multimodal LLMs Solve Image Tasks: A Lens on Visual Grounding, Task Reasoning, and Answer Decoding

Zhuoran Yu, Yong Jae Lee

COLM 2025paper
#193

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

Nathan Lambert, Jacob Morrison, Valentina Pyatkin et al.

COLM 2025paper
#194

Customize Multi-modal RAI Guardrails with Precedent-based predictions

Cheng-Fu Yang, Thanh Tran, Christos Christodoulopoulos et al.

COLM 2025paper
#195

Can Large Language Models Integrate Spatial Data? Empirical Insights into Reasoning Strengths and Computational Weaknesses

Bin HAN, Robert Wolfe, Anat Caspi et al.

COLM 2025paper
#196

What is the Visual Cognition Gap between Humans and Multimodal LLMs?

Xu Cao, Yifan Shen, Bolin Lai et al.

COLM 2025paper
#197

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

Pranjal Aggarwal, Sean Welleck

COLM 2025paper
#198

Elucidating the Design Space of Decay in Linear Attention

Zhen Qin, Xuyang Shen, Yiran Zhong

COLM 2025paper
#199

Noiser: Bounded Input Perturbations for Attributing Large Language Models

Mohammad Reza Ghasemi Madani, Aryo Pradipta Gema, Yu Zhao et al.

COLM 2025paper
#200

Readability ≠ Learnability: Rethinking the Role of Simplicity in Training Small Language Models

Ivan Lee, Taylor Berg-Kirkpatrick

COLM 2025paper
PreviousNext