Most Cited COLM 2025 "kernel-based initialization" Papers

418 papers found • Page 1 of 3

#1

Pretraining on the Test Set Is No Longer All You Need: A Debate-Driven Approach to QA Benchmarks

Linbo Cao, Jinman Zhao

COLM 2025paperarXiv:2507.17747
#2

Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games

David Guzman Piedrahita, Yongjin Yang, Mrinmaya Sachan et al.

COLM 2025paper
#3

QAPyramid: Fine-grained Evaluation of Content Selection for Text Summarization

Shiyue Zhang, David Wan, Arie Cattan et al.

COLM 2025paper
#4

Sherkala-Chat: Building a State-of-the-Art LLM for Kazakh in a Moderately Resourced Setting

Fajri Koto, Rituraj Joshi, Nurdaulet Mukhituly et al.

COLM 2025paper
#5

Fluid Language Model Benchmarking

Valentin Hofmann, David Heineman, Ian Magnusson et al.

COLM 2025paper
#6

Data-Centric Human Preference with Rationales for Direct Preference Alignment

Hoang Anh Just, Ming Jin, Anit Kumar Sahu et al.

COLM 2025paper
#7

DynaSaur: Large Language Agents Beyond Predefined Actions

Dang Nguyen, Viet Dac Lai, Seunghyun Yoon et al.

COLM 2025paper
#8

Layerwise Importance Analysis of Feed-Forward Networks in Transformer-based Language Models

Wataru Ikeda, Kazuki Yano, Ryosuke Takahashi et al.

COLM 2025paper
#9

PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages

Priyanshu Kumar, Devansh Jain, Akhila Yerukola et al.

COLM 2025paper
#10

Teaching Models to Understand (but not Generate) High-risk Data

Ryan Yixiang Wang, Matthew Finlayson, Luca Soldaini et al.

COLM 2025paper
#11

Advancing Language Multi-Agent Learning with Credit Re-Assignment for Interactive Environment Generalization

Zhitao He, Zijun Liu, Peng Li et al.

COLM 2025paper
#12

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

Zefan Cai, Yichi Zhang, Bofei Gao et al.

COLM 2025paper
#13

Partial Perspectives: How LLMs Handle Logically Inconsistent Knowledge in Reasoning Tasks

Zichao Li, Ines Arous, Jackie CK Cheung

COLM 2025paper
#14

Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining

Rosie Zhao, Alexandru Meterez, Sham M. Kakade et al.

COLM 2025paper
#15

EvalAgents: Discovering Implicit Evaluation Criteria from the Web

Manya Wadhwa, Zayne Rea Sprague, Chaitanya Malaviya et al.

COLM 2025paper
#16

Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models

Meghana Arakkal Rajeev, Rajkumar Ramamurthy, Prapti Trivedi et al.

COLM 2025paper
#17

ALFA: Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning

Shuyue Stella Li, Jimin Mun, Faeze Brahman et al.

COLM 2025paper
#18

Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models

Ruikang Liu, Yuxuan Sun, Manyi Zhang et al.

COLM 2025paper
#19

CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation

Anirudh Khatry, Robert Zhang, Jia Pan et al.

COLM 2025paper
#20

On Mechanistic Circuits for Extractive Question-Answering

Samyadeep Basu, Vlad I Morariu, Ryan A. Rossi et al.

COLM 2025paper
#21

Sample Efficient Preference Alignment in LLMs via Active Exploration

Viraj Mehta, Syrine Belakaria, Vikramjeet Das et al.

COLM 2025paper
#22

Boosting LLM Reasoning via Spontaneous Self-Correction

Xutong Zhao, Tengyu Xu, Xuewei Wang et al.

COLM 2025paper
#23

Probing Syntax in Large Language Models: Successes and Remaining Challenges

Pablo J. Diego Simon, Emmanuel Chemla, Jean-Remi King et al.

COLM 2025paper
#24

REM: Evaluating LLM Embodied Spatial Reasoning through Multi-Frame Trajectories

Jacob Thompson, Emiliano Garcia-Lopez, Yonatan Bisk

COLM 2025paper
#25

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Interactive AI Agents

Xuhui Zhou, Hyunwoo Kim, Faeze Brahman et al.

COLM 2025paper
#26

GenerationPrograms: Fine-grained Attribution with Executable Programs

David Wan, Eran Hirsch, Elias Stengel-Eskin et al.

COLM 2025paper
#27

Multi-Agent Retrieval-Augmented Framework for Evidence-Based Counterspeech Against Health Misinformation

Anirban Saha Anik, Xiaoying Song, Elliott Wang et al.

COLM 2025paper
#28

Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression

Hanqi Xiao, Yi-Lin Sung, Elias Stengel-Eskin et al.

COLM 2025paper
#29

Not All Data Are Unlearned Equally

Aravind Krishnan, Siva Reddy, Marius Mosbach

COLM 2025paper
#30

Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models

Lynn Chua, Badih Ghazi, Yangsibo Huang et al.

COLM 2025paper
#31

FineWeb2: One Pipeline to Scale Them All — Adapting Pre-Training Data Processing to Every Language

Guilherme Penedo, Hynek Kydlíček, Vinko Sabolčec et al.

COLM 2025paper
#32

LIMO: Less is More for Reasoning

Yixin Ye, Zhen Huang, Yang Xiao et al.

COLM 2025paper
#33

Overcoming Vocabulary Constraints with Pixel-level Fallback

Jonas F. Lotz, Hendra Setiawan, Stephan Peitz et al.

COLM 2025paper
#34

Breaking the Data Barrier -- Building GUI Agents Through Task Generalization

Junlei Zhang, Zichen Ding, Chang Ma et al.

COLM 2025paper
#35

Spike No More: Stabilizing the Pre-training of Large Language Models

Sho Takase, Shun Kiyono, Sosuke Kobayashi et al.

COLM 2025paper
#36

CultureCLIP: Empowering CLIP with Cultural Awareness through Synthetic Images and Contextualized Captions

Yuchen Huang, Zhiyuan Fan, Zhitao He et al.

COLM 2025paper
#37

VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation

Ziang Ye, Yang Zhang, Wentao Shi et al.

COLM 2025paper
#38

Learning Effective Language Representations for Sequential Recommendation via Joint Embedding Predictive Architecture

Nguyen Anh Minh, Dung D. Le

COLM 2025paper
#39

Reinforcement Learning Enhanced Full-Duplex Spoken Dialogue Language Models for Conversational Interactions

Chen Chen, Ke Hu, Chao-Han Huck Yang et al.

COLM 2025paper
#40

When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars

Rei Higuchi, Ryotaro Kawata, Naoki Nishikawa et al.

COLM 2025paper
#41

Backdoor Attacks on Dense Retrieval via Public and Unintentional Triggers

Quanyu Long, Yue Deng, Leilei Gan et al.

COLM 2025paper
#42

Layers at Similar Depths Generate Similar Activations Across LLM Architectures

Christopher Wolfram, Aaron Schein

COLM 2025paper
#43

Jigsaw Puzzles: Splitting Harmful Questions to Jailbreak Large Language Models in Multi-turn Interactions

Hao Yang, Lizhen Qu, Ehsan Shareghi et al.

COLM 2025paper
#44

Probing then Editing Response Personality of Large Language Models

Tianjie Ju, Zhenyu Shao, Bowen Wang et al.

COLM 2025paper
#45

Rerouting LLM Routers

Avital Shafran, Roei Schuster, Tom Ristenpart et al.

COLM 2025paper
#46

A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models

Zhouhang Xie, Junda Wu, Yiran Shen et al.

COLM 2025paper
#47

Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration

Ran Xu, Wenqi Shi, Yuchen Zhuang et al.

COLM 2025paper
#48

CoLa: Learning to Interactively Collaborate with Large Language Models

Abhishek Sharma, Dan Goldwasser

COLM 2025paper
#49

Understanding R1-Zero-Like Training: A Critical Perspective

Zichen Liu, Changyu Chen, Wenjun Li et al.

COLM 2025paper
#50

SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation

Zichong Li, Chen Liang, Zixuan Zhang et al.

COLM 2025paper
#51

SEAM: Semantically Equivalent Across Modalities Benchmark for Vision-Language Models

Zhenwei Tang, Difan Jiao, Blair Yang et al.

COLM 2025paper
#52

VideoSAVi: Self-Aligned Video Language Models without Human Supervision

Yogesh Kulkarni, Pooyan Fazli

COLM 2025paper
#53

Always Tell Me The Odds: Fine-grained Conditional Probability Estimation

Liaoyaqi Wang, Zhengping Jiang, Anqi Liu et al.

COLM 2025paper
#54

Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

Saaket Agashe, Kyle Wong, Vincent Tu et al.

COLM 2025paper
#55

How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence

Hongzhe Du, Weikai Li, Min Cai et al.

COLM 2025paper
#56

Implicit In-Context Learning: Evidence from Artificial Language Experiments

Xiaomeng Ma, Qihui Xu

COLM 2025paper
#57

To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning

Tian Qin, David Alvarez-Melis, Samy Jelassi et al.

COLM 2025paper
#58

One-shot Optimized Steering Vectors Mediate Safety-relevant Behaviors in LLMs

Jacob Dunefsky, Arman Cohan

COLM 2025paper
#59

Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF

Syrine Belakaria, Joshua Kazdan, Charles Marx et al.

COLM 2025paper
#60

SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching

Yuxuan Zhu, Ali Falahati, David H. Yang et al.

COLM 2025paper
#61

Defending LLM Watermarking Against Spoofing Attacks with Contrastive Representation Learning

Li An, Yujian Liu, Yepeng Liu et al.

COLM 2025paper
#62

Do Language Models Agree with Human Perceptions of Suspense in Stories?

Glenn Matlin, Devin Zhang, Rodrigo Barroso Loza et al.

COLM 2025paper
#63

Learning by Teaching: Engaging Students as Instructors of Large Language Models in Computer Science Education

Xinming Yang, Haasil Pujara, Jun Li

COLM 2025paper
#64

CALLME: Call Graph Augmentation with Large Language Models for Javascript

Michael Wang, Kexin Pei, Armando Solar-Lezama

COLM 2025paper
#65

CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing

Wenhao Zheng, Yixiao Chen, Weitong Zhang et al.

COLM 2025paper
#66

LM Agents May Fail to Act on Their Own Risk Knowledge

Yuzhi Tang, Tianxiao Li, Elizabeth Li et al.

COLM 2025paper
#67

Approximating Language Model Training Data from Weights

John Xavier Morris, Junjie Oscar Yin, Woojeong Kim et al.

COLM 2025paper
#68

Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought'' Control

Hannah Cyberey, David Evans

COLM 2025paper
#69

Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics

Hamed Mahdavi, Alireza Hashemi, Majid Daliri et al.

COLM 2025paper
#70

Hardware-Efficient Attention for Fast Decoding

Ted Zadouri, Hubert Strauss, Tri Dao

COLM 2025paper
#71

Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality

Sewoong Lee, Adam Davies, Marc E. Canby et al.

COLM 2025paper
#72

Exploring Large Language Model Agents for Piloting Social Experiments

Jinghua Piao, Yuwei Yan, Nian Li et al.

COLM 2025paper
#73

In-context Ranking Preference Optimization

Junda Wu, Rohan Surana, Zhouhang Xie et al.

COLM 2025paper
#74

Weight ensembling improves reasoning in language models

Xingyu Dang, Christina Baek, Kaiyue Wen et al.

COLM 2025paper
#75

Arctic-Embed 2.0: Multilingual Retrieval Without Compromise

Puxuan Yu, Luke Merrick, Gaurav Nuti et al.

COLM 2025paper
#76

Multilingual Contextualization of Large Language Models for Document-Level Machine Translation

Miguel Moura Ramos, Patrick Fernandes, Sweta Agrawal et al.

COLM 2025paper
#77

SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding

Mingze Xu, Mingfei Gao, Shiyu Li et al.

COLM 2025paper
#78

Do Large Language Models Have a Planning Theory of Mind? Evidence from MindGames: a Multi-Step Persuasion Task

Jared Moore, Ned Cooper, Rasmus Overmark et al.

COLM 2025paper
#79

DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding

Hossein Entezari Zarch, Lei Gao, Chaoyi Jiang et al.

COLM 2025paper
#80

Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources

Weizhi Wang, Yu Tian, Linjie Yang et al.

COLM 2025paper
#81

LoRe: Personalizing LLMs via Low-Rank Reward Modeling

Avinandan Bose, Zhihan Xiong, Yuejie Chi et al.

COLM 2025paper
#82

CRABS: A syntactic-semantic pincer strategy for bounding LLM interpretation of Python notebooks

Meng Li, Timothy M. McPhillips, Dingmin Wang et al.

COLM 2025paper
#83

2 OLMo 2 Furious (COLM’s Version)

Evan Pete Walsh, Luca Soldaini, Dirk Groeneveld et al.

COLM 2025paper
#84

Beyond Blanket Masking: Examining Granularity for Privacy Protection in Images Captured by Blind and Low Vision Users

Jeffri Murrugarra-Llerena, Haoran Niu, K. Suzanne Barber et al.

COLM 2025paper
#85

Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning

Anja Šurina, Amin Mansouri, Lars C.P.M. Quaedvlieg et al.

COLM 2025paper
#86

MS-SSM: A Multi-Scale State Space Model for Efficient Sequence Modeling

Mahdi Karami, Ali Behrouz, Peilin Zhong et al.

COLM 2025paper
#87

IterKey: Iterative Keyword Generation with LLMs for Enhanced Retrieval Augmented Generation

Kazuki Hayashi, Hidetaka Kamigaito, Shinya Kouda et al.

COLM 2025paper
#88

Evaluating the Diversity and Quality of LLM Generated Content

Alexander Shypula, Shuo Li, Botong Zhang et al.

COLM 2025paper
#89

QUDsim: Quantifying Discourse Similarities in LLM-Generated Text

Ramya Namuduri, Yating Wu, Anshun Asher Zheng et al.

COLM 2025paper
#90

A Critical Look At Tokenwise Reward-Guided Text Generation

Ahmad Rashid, Ruotian Wu, Julia Grosse et al.

COLM 2025paper
#91

When Splitting Makes Stronger: A Theoretical and Empirical Analysis of Divide-and-Conquer Prompting in LLMs

Yizhou Zhang, Defu Cao, Lun Du et al.

COLM 2025paper
#92

Humans overrely on overconfident language models, across languages

Neil Rathi, Dan Jurafsky, Kaitlyn Zhou

COLM 2025paper
#93

Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation

Ziqiao Ma, Jing Ding, Xuejun Zhang et al.

COLM 2025paper
#94

Values in the Wild: Discovering and Mapping Values in Real-World Language Model Interactions

Saffron Huang, Esin DURMUS, Kunal Handa et al.

COLM 2025paper
#95

URANIA: Differentially Private Insights into AI Use

Daogao Liu, Edith Cohen, Badih Ghazi et al.

COLM 2025paper
#96

The Zero Body Problem: Probing LLM Use of Sensory Language

Rebecca M. M. Hicke, Sil Hamilton, David Mimno

COLM 2025paper
#97

RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing

Yiqing Xie, Alex Xie, Divyanshu Sheth et al.

COLM 2025paper
#98

Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors

Fan Nie, Lan Feng, Haotian Ye et al.

COLM 2025paper
#99

Guided Reasoning in LLM-Driven Penetration Testing Using Structured Attack Trees

Katsuaki Nakano, Reza Fayyazi, Shanchieh Yang et al.

COLM 2025paper
#100

AdaptMI: Adaptive Skill-based In-context Math Instructions for Small Language Models

Yinghui He, Abhishek Panigrahi, Yong Lin et al.

COLM 2025paper
#101

Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?

Chenrui Fan, Ming Li, Lichao Sun et al.

COLM 2025paper
#102

OpinioRAG: Towards Generating User-Centric Opinion Highlights from Large-scale Online Reviews

Mir Tafseer Nayeem, Davood Rafiei

COLM 2025paper
#103

Learning to Generate Unit Tests for Automated Debugging

Archiki Prasad, Elias Stengel-Eskin, Justin Chen et al.

COLM 2025paper
#104

LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks

Soumyadeep Pal, Changsheng Wang, James Diffenderfer et al.

COLM 2025paper
#105

Investigating Intersectional Bias in Large Language Models using Confidence Disparities in Coreference Resolution

Falaah Arif Khan, Nivedha Sivakumar, Yinong Oliver Wang et al.

COLM 2025paper
#106

Prompt-Reverse Inconsistency: LLM Self-Inconsistency Beyond Generative Randomness and Prompt Paraphrasing

Jihyun Janice Ahn, Wenpeng Yin

COLM 2025paper
#107

Positional Biases Shift as Inputs Approach Context Window Limits

Blerta Veseli, Julian Chibane, Mariya Toneva et al.

COLM 2025paper
#108

ADAPT: Actively Discovering and Adapting to Preferences for any Task

Maithili Patel, Xavier Puig, Ruta Desai et al.

COLM 2025paper
#109

SciReplicate-Bench: Benchmarking LLMs in Agent-driven Algorithmic Reproduction from Research Papers

Yanzheng Xiang, Hanqi Yan, Shuyin Ouyang et al.

COLM 2025paper
#110

True Multimodal In-Context Learning Needs Attention to the Visual Context

Shuo Chen, Jianzhe Liu, Zhen Han et al.

COLM 2025paper
#111

Post-training for Efficient Communication via Convention Formation

Yilun Hua, Evan Wang, Yoav Artzi

COLM 2025paper
#112

Streaming DiLoCo with overlapping communication

Arthur Douillard, Yani Donchev, J Keith Rush et al.

COLM 2025paper
#113

MuSeD: A Multimodal Spanish Dataset for Sexism Detection in Social Media Videos

Laura De Grazia, Pol Pastells, Mauro Vázquez Chas et al.

COLM 2025paper
#114

EvidenceBench: A Benchmark for Extracting Evidence from Biomedical Papers

Jianyou Wang, Weili Cao, Kaicheng Wang et al.

COLM 2025paper
#115

Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging

Ryo Bertolissi, Jonas Hübotter, Ido Hakimi et al.

COLM 2025paper
#116

An Illusion of Progress? Assessing the Current State of Web Agents

Tianci Xue, Weijian Qi, Tianneng Shi et al.

COLM 2025paper
#117

Boundless Byte Pair Encoding: Breaking the Pre-tokenization Barrier

Craig W Schmidt, Varshini Reddy, Chris Tanner et al.

COLM 2025paper
#118

Multi-Agent Systems Execute Arbitrary Malicious Code

Harold Triedman, Rishi Dev Jha, Vitaly Shmatikov

COLM 2025paper
#119

Privately Learning from Graphs with Applications in Fine-tuning Large Language Models

Haoteng Yin, Rongzhe Wei, Eli Chien et al.

COLM 2025paper
#120

Evaluating LLMs on Chinese Idiom Translation

Cai Yang, Yao Dou, David Heineman et al.

COLM 2025paper
#121

Rethinking Safety in LLM Fine-tuning: An Optimization Perspective

Minseon Kim, Jin Myung Kwak, Lama Alssum et al.

COLM 2025paper
#122

CodeXEmbed: A Generalist Embedding Model Family for Multilingual and Multi-task Code Retrieval

Ye Liu, Rui Meng, Shafiq Joty et al.

COLM 2025paper
#123

Do LLMs Understand Your Translations? Evaluating Paragraph-level MT with Question Answering

Patrick Fernandes, Sweta Agrawal, Emmanouil Zaranis et al.

COLM 2025paper
#124

Law of Vision Representation in MLLMs

Shijia Yang, Bohan Zhai, Quanzeng You et al.

COLM 2025paper
#125

Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts

Samin Yeasar Arnob, Zhan Su, Minseon Kim et al.

COLM 2025paper
#126

Towards Compute-Optimal Many-Shot In-Context Learning

Shahriar Golchin, Yanfei Chen, Rujun Han et al.

COLM 2025paper
#127

The World According to LLMs: How Geographic Origin Influences LLMs' Entity Deduction Capabilities

Harsh Nishant Lalai, Raj Sanjay Shah, Jiaxin Pei et al.

COLM 2025paper
#128

LeakAgent: RL-based Red-teaming Agent for LLM Privacy Leakage

Yuzhou Nie, Zhun Wang, Ye Yu et al.

COLM 2025paper
#129

ReasonIR: Training Retrievers for Reasoning Tasks

Rulin Shao, Rui Qiao, Varsha Kishore et al.

COLM 2025paper
#130

Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation

Julia Kreutzer, Eleftheria Briakou, Sweta Agrawal et al.

COLM 2025paper
#131

Self-Rewarding PPO: Aligning Large Language Models with Demonstrations Only

Qingru Zhang, Liang Qiu, Ilgee Hong et al.

COLM 2025paper
#132

BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning

Ahmed Masry, Abhay Puri, Masoud Hashemi et al.

COLM 2025paper
#133

Self-Steering Language Models

Gabriel Grand, Joshua B. Tenenbaum, Vikash Mansinghka et al.

COLM 2025paper
#134

Training Large Language Models to Reason in a Continuous Latent Space

Shibo Hao, Sainbayar Sukhbaatar, DiJia Su et al.

COLM 2025paper
#135

The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains

Scott Geng, Hamish Ivison, Chun-Liang Li et al.

COLM 2025paper
#136

ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data

Tong Chen, Faeze Brahman, Jiacheng Liu et al.

COLM 2025paper
#137

Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate

Yubo Wang, Xiang Yue, Wenhu Chen

COLM 2025paper
#138

Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving

Yong Lin, Shange Tang, Bohan Lyu et al.

COLM 2025paper
#139

Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling

Ben Lipkin, Benjamin LeBrun, Jacob Hoover Vigly et al.

COLM 2025paper
#140

On the Effectiveness and Generalization of Race Representations for Debiasing High-Stakes Decisions

Dang Nguyen, Chenhao Tan

COLM 2025paper
#141

Quantifying Fairness in LLMs Beyond Tokens: A Semantic and Statistical Perspective

Weijie Xu, Yiwen Wang, Chi Xue et al.

COLM 2025paper
#142

Improving Fisher Information Estimation and Efficiency for LoRA-based LLM Unlearning

Yejin Kim, Eunwon Kim, Buru Chang et al.

COLM 2025paper
#143

Multi-Token Attention

Olga Golovneva, Tianlu Wang, Jason E Weston et al.

COLM 2025paper
#144

From Queries to Criteria: Understanding How Astronomers Evaluate LLMs

Alina Hyk, Kiera McCormick, Mian Zhong et al.

COLM 2025paper
#145

Analyzing Multilingualism in Large Language Models with Sparse Autoencoders

Ikhyun Cho, Julia Hockenmaier

COLM 2025paper
#146

CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation

Jixuan Leng, Chengsong Huang, Langlin Huang et al.

COLM 2025paper
#147

Unifying Autoregressive and Diffusion-Based Sequence Generation

Nima Fathi, Torsten Scholak, Pierre-Andre Noel

COLM 2025paper
#148

Control the Temperature: Selective Sampling for Diverse and High-Quality LLM Outputs

Sergey Troshin, Wafaa Mohammed, Yan Meng et al.

COLM 2025paper
#149

UTF-8 Plumbing: Byte-level Tokenizers Unavoidably Enable LLMs to Generate Ill-formed UTF-8

Preston Firestone, Shubham Ugare, Gagandeep Singh et al.

COLM 2025paper
#150

RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale

Daniel Goldstein, Eric Alcaide, Janna Lu et al.

COLM 2025paper
#151

Learning Adaptive Parallel Reasoning with Language Models

Jiayi Pan, Xiuyu Li, Long Lian et al.

COLM 2025paper
#152

EllieSQL: Cost-Efficient Text-to-SQL with Complexity-Aware Routing

Yizhang Zhu, Runzhi JIANG, Boyan Li et al.

COLM 2025paper
#153

Why do LLMs attend to the first token?

Federico Barbero, Alvaro Arroyo, Xiangming Gu et al.

COLM 2025paper
#154

Overfill: Two-Stage Models for Efficient Language Model Decoding

Woojeong Kim, Junxiong Wang, Jing Nathan Yan et al.

COLM 2025paper
#155

CLIPPER: Compression enables long-context synthetic data generation

Chau Minh Pham, Yapei Chang, Mohit Iyyer

COLM 2025paper
#156

Society of Mind Meets Real-Time Strategy: A Hierarchical Multi-Agent Framework for Strategic Reasoning

Daechul Ahn, San Kim, Jonghyun Choi

COLM 2025paper
#157

A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility

Andreas Hochlehnert, Hardik Bhatnagar, Vishaal Udandarao et al.

COLM 2025paper
#158

SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?

Jianzhu Yao, Kevin Wang, Ryan Hsieh et al.

COLM 2025paper
#159

Resource-efficient Inference with Foundation Model Programs

Lunyiu Nie, Zhimin Ding, Kevin Yu et al.

COLM 2025paper
#160

Teach Old SAEs New Domain Tricks with Boosting

Nikita Koriagin, Yaroslav Aksenov, Daniil Laptev et al.

COLM 2025paper
#161

Improving LLMs‘ Generalized Reasoning Abilities by Graph Problems

Qifan Zhang, Nuo Chen, Zehua Li et al.

COLM 2025paper
#162

Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models

José Pombal, Nuno M Guerreiro, Ricardo Rei et al.

COLM 2025paper
#163

Register Always Matters: Analysis of LLM Pretraining Data Through the Lens of Language Variation

Amanda Myntti, Erik Henriksson, Veronika Laippala et al.

COLM 2025paper
#164

Scoring Verifiers: Evaluating Synthetic Verification for Code and Reasoning

Aleksander Ficek, Somshubra Majumdar, Vahid Noroozi et al.

COLM 2025paper
#165

Cascade Reward Sampling for Efficient Decoding-Time Alignment

Bolian Li, Yifan Wang, Anamika Lochab et al.

COLM 2025paper
#166

Reverse-engineering NLI: A study of the meta-inferential properties of Natural Language Inference

Rasmus Blanck, Bill Noble, Stergios Chatzikyriakidis

COLM 2025paper
#167

Have Large Language Models Learned to Reason? A Characterization via 3-SAT

RISHI HAZRA, Gabriele Venturato, Pedro Zuidberg Dos Martires et al.

COLM 2025paper
#168

HIPPO-VIDEO : Simulating Watch Histories with Large Language Models for History-Driven Video Highlighting

Jeongeun Lee, Youngjae Yu, Dongha Lee

COLM 2025paper
#169

Adversarial Training of Reward Models

Alexander Bukharin, Haifeng Qian, Shengyang Sun et al.

COLM 2025paper
#170

The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning

Raj Sanjay Shah, Jing Huang, Keerthiram Murugesan et al.

COLM 2025paper
#171

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs

Itay Itzhak, Yonatan Belinkov, Gabriel Stanovsky

COLM 2025paper
#172

ALOPE: Adaptive Layer Optimization for Translation Quality Estimation using Large Language Models

Archchana Sindhujan, Shenbin Qian, Chan Chi Chun Matthew et al.

COLM 2025paper
#173

The Blessing and Curse of Dimensionality in Safety Alignment

Rachel S.Y. Teo, Laziz Abdullaev, Tan Minh Nguyen

COLM 2025paper
#174

AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation

Itay Nakash, Nitay Calderon, Eyal Ben-David et al.

COLM 2025paper
#175

Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling

Hang Zheng, Hongshen Xu, Yuncong Liu et al.

COLM 2025paper
#176

Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation

Yi Lu, Wanxu Zhao, Xin Zhou et al.

COLM 2025paper
#177

A Controlled Study on Long Context Extension and Generalization in LLMs

Yi Lu, Jing Nathan Yan, Songlin Yang et al.

COLM 2025paper
#178

G1yphD3c0de: Towards Safer Language Models on Visually Perturbed Texts

Yejinchoi, Yejin Yeo, Yejin Son et al.

COLM 2025paper
#179

Efficient Process Reward Model Training via Active Learning

Keyu Duan, Zichen Liu, Xin Mao et al.

COLM 2025paper
#180

Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models

Ameen Ali Ali, Shahar Katz, Lior Wolf et al.

COLM 2025paper
#181

FormaRL: Enhancing Autoformalization with no Labeled Data

Yanxing Huang, Xinling Jin, Sijie Liang et al.

COLM 2025paper
#182

ICQuant: Index Coding enables Low-bit LLM Quantization

Xinlin Li, Osama Hanna, Christina Fragouli et al.

COLM 2025paper
#183

MAC: A Live Benchmark for Multimodal Large Language Models in Scientific Understanding

Mohan Jiang, Jin Gao, Jiahao Zhan et al.

COLM 2025paper
#184

Interpreting the linear structure of vision-language model embedding spaces

Isabel Papadimitriou, Huangyuan Su, Thomas Fel et al.

COLM 2025paper
#185

LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models

Minqian Liu, Zhiyang Xu, Xinyi Zhang et al.

COLM 2025paper
#186

Establishing Task Scaling Laws via Compute-Efficient Model Ladders

Akshita Bhagia, Jiacheng Liu, Alexander Wettig et al.

COLM 2025paper
#187

Can A Society of Generative Agents Simulate Human Behavior and Inform Public Health Policy? A Case Study on Vaccine Hesitancy

Abe Bohan Hou, Hongru Du, Yichen Wang et al.

COLM 2025paper
#188

LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception

Yuan-Hong Liao, Sven Elflein, Liu He et al.

COLM 2025paper
#189

RARe: Retrieval Augmented Retrieval with In-Context Examples

Atula Tejaswi, Yoonsang Lee, sujay sanghavi et al.

COLM 2025paper
#190

Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

Mohammadreza Pourreza, Shayan Talaei, Ruoxi Sun et al.

COLM 2025paper
#191

PersonaEval: Are LLM Evaluators Human Enough to Judge Role-Play?

Lingfeng Zhou, Jialing Zhang, Jin Gao et al.

COLM 2025paper
#192

MapIQ: Evaluating Multimodal Large Language Models for Map Question Answering

Varun Srivastava, Fan Lei, Srija Mukhopadhyay et al.

COLM 2025paper
#193

Bayesian scaling laws for in-context learning

Aryaman Arora, Dan Jurafsky, Christopher Potts et al.

COLM 2025paper
#194

DoomArena: A framework for Testing AI Agents Against Evolving Security Threats

Léo Boisvert, Abhay Puri, Gabriel Huang et al.

COLM 2025paper
#195

Texture or Semantics? Vision-Language Models Get Lost in Font Recognition

Zhecheng Li, Guoxian Song, Yujun Cai et al.

COLM 2025paper
#196

Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time

Van Yang, Xiang Yue, Vipin Chaudhary et al.

COLM 2025paper
#197

Transformers are Efficient Compilers, Provably

Xiyu Zhai, Runlong Zhou, Liao Zhang et al.

COLM 2025paper
#198

Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation

Lajanugen Logeswaran, Jaekyeom Kim, Sungryull Sohn et al.

COLM 2025paper
#199

Pretrained Hybrids with MAD Skills

Nicholas Roberts, Samuel Guo, Zhiqi Gao et al.

COLM 2025paper
#200

Readability ≠ Learnability: Rethinking the Role of Simplicity in Training Small Language Models

Ivan Lee, Taylor Berg-Kirkpatrick

COLM 2025paper
PreviousNext