Most Cited COLM Poster "steiner tree problem" Papers

418 papers found • Page 1 of 3

#1

Pretraining on the Test Set Is No Longer All You Need: A Debate-Driven Approach to QA Benchmarks

Linbo Cao, Jinman Zhao

COLM 2025paperarXiv:2507.17747
#2

Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games

David Guzman Piedrahita, Yongjin Yang, Mrinmaya Sachan et al.

COLM 2025paper
#3

QAPyramid: Fine-grained Evaluation of Content Selection for Text Summarization

Shiyue Zhang, David Wan, Arie Cattan et al.

COLM 2025paper
#4

Sherkala-Chat: Building a State-of-the-Art LLM for Kazakh in a Moderately Resourced Setting

Fajri Koto, Rituraj Joshi, Nurdaulet Mukhituly et al.

COLM 2025paper
#5

Layerwise Importance Analysis of Feed-Forward Networks in Transformer-based Language Models

Wataru Ikeda, Kazuki Yano, Ryosuke Takahashi et al.

COLM 2025paperarXiv:2508.17734
#6

Teaching Models to Understand (but not Generate) High-risk Data

Ryan Yixiang Wang, Matthew Finlayson, Luca Soldaini et al.

COLM 2025paperarXiv:2505.03052
#7

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

Zefan Cai, Yichi Zhang, Bofei Gao et al.

COLM 2025paperarXiv:2406.02069
#8

Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining

Rosie Zhao, Alexandru Meterez, Sham M. Kakade et al.

COLM 2025paperarXiv:2504.07912
#9

Sample Efficient Preference Alignment in LLMs via Active Exploration

Viraj Mehta, Syrine Belakaria, Vikramjeet Das et al.

COLM 2025paperarXiv:2312.00267
#10

Probing Syntax in Large Language Models: Successes and Remaining Challenges

Pablo J. Diego Simon, Emmanuel Chemla, Jean-Remi King et al.

COLM 2025paperarXiv:2508.03211
#11

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Interactive AI Agents

Xuhui Zhou, Hyunwoo Kim, Faeze Brahman et al.

COLM 2025paper
#12

LIMO: Less is More for Reasoning

Yixin Ye, Zhen Huang, Yang Xiao et al.

COLM 2025paperarXiv:2502.03387
#13

Probing then Editing Response Personality of Large Language Models

Tianjie Ju, Zhenyu Shao, Bowen Wang et al.

COLM 2025paperarXiv:2504.10227
#14

A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models

Zhouhang Xie, Junda Wu, Yiran Shen et al.

COLM 2025paperarXiv:2504.07070
#15

Always Tell Me The Odds: Fine-grained Conditional Probability Estimation

Liaoyaqi Wang, Zhengping Jiang, Anqi Liu et al.

COLM 2025paperarXiv:2505.01595
#16

One-shot Optimized Steering Vectors Mediate Safety-relevant Behaviors in LLMs

Jacob Dunefsky, Arman Cohan

COLM 2025paperarXiv:2502.18862
#17

LM Agents May Fail to Act on Their Own Risk Knowledge

Yuzhi Tang, Tianxiao Li, Elizabeth Li et al.

COLM 2025paperarXiv:2508.13465
#18

Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought'' Control

Hannah Cyberey, David Evans

COLM 2025paper
#19

Exploring Large Language Model Agents for Piloting Social Experiments

Jinghua Piao, Yuwei Yan, Nian Li et al.

COLM 2025paperarXiv:2508.08678
#20

Weight ensembling improves reasoning in language models

Xingyu Dang, Christina Baek, Kaiyue Wen et al.

COLM 2025paperarXiv:2504.10478
#21

Do Large Language Models Have a Planning Theory of Mind? Evidence from MindGames: a Multi-Step Persuasion Task

Jared Moore, Ned Cooper, Rasmus Overmark et al.

COLM 2025paperarXiv:2507.16196
#22

LoRe: Personalizing LLMs via Low-Rank Reward Modeling

Avinandan Bose, Zhihan Xiong, Yuejie Chi et al.

COLM 2025paperarXiv:2504.14439
#23

Beyond Blanket Masking: Examining Granularity for Privacy Protection in Images Captured by Blind and Low Vision Users

Jeffri Murrugarra-Llerena, Haoran Niu, K. Suzanne Barber et al.

COLM 2025paperarXiv:2508.09245
#24

When Splitting Makes Stronger: A Theoretical and Empirical Analysis of Divide-and-Conquer Prompting in LLMs

Yizhou Zhang, Defu Cao, Lun Du et al.

COLM 2025paper
#25

Humans overrely on overconfident language models, across languages

Neil Rathi, Dan Jurafsky, Kaitlyn Zhou

COLM 2025paperarXiv:2507.06306
#26

Values in the Wild: Discovering and Mapping Values in Real-World Language Model Interactions

Saffron Huang, Esin DURMUS, Kunal Handa et al.

COLM 2025paper
#27

The Zero Body Problem: Probing LLM Use of Sensory Language

Rebecca M. M. Hicke, Sil Hamilton, David Mimno

COLM 2025paperarXiv:2504.06393
#28

Investigating Intersectional Bias in Large Language Models using Confidence Disparities in Coreference Resolution

Falaah Arif Khan, Nivedha Sivakumar, Yinong Oliver Wang et al.

COLM 2025paperarXiv:2508.07111
#29

Positional Biases Shift as Inputs Approach Context Window Limits

Blerta Veseli, Julian Chibane, Mariya Toneva et al.

COLM 2025paper
#30

ADAPT: Actively Discovering and Adapting to Preferences for any Task

Maithili Patel, Xavier Puig, Ruta Desai et al.

COLM 2025paper
#31

SciReplicate-Bench: Benchmarking LLMs in Agent-driven Algorithmic Reproduction from Research Papers

Yanzheng Xiang, Hanqi Yan, Shuyin Ouyang et al.

COLM 2025paper
#32

True Multimodal In-Context Learning Needs Attention to the Visual Context

Shuo Chen, Jianzhe Liu, Zhen Han et al.

COLM 2025paper
#33

Post-training for Efficient Communication via Convention Formation

Yilun Hua, Evan Wang, Yoav Artzi

COLM 2025paper
#34

Streaming DiLoCo with overlapping communication

Arthur Douillard, Yani Donchev, J Keith Rush et al.

COLM 2025paper
#35

MuSeD: A Multimodal Spanish Dataset for Sexism Detection in Social Media Videos

Laura De Grazia, Pol Pastells, Mauro Vázquez Chas et al.

COLM 2025paper
#36

EvidenceBench: A Benchmark for Extracting Evidence from Biomedical Papers

Jianyou Wang, Weili Cao, Kaicheng Wang et al.

COLM 2025paper
#37

Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging

Ryo Bertolissi, Jonas Hübotter, Ido Hakimi et al.

COLM 2025paper
#38

An Illusion of Progress? Assessing the Current State of Web Agents

Tianci Xue, Weijian Qi, Tianneng Shi et al.

COLM 2025paper
#39

Boundless Byte Pair Encoding: Breaking the Pre-tokenization Barrier

Craig W Schmidt, Varshini Reddy, Chris Tanner et al.

COLM 2025paper
#40

Multi-Agent Systems Execute Arbitrary Malicious Code

Harold Triedman, Rishi Dev Jha, Vitaly Shmatikov

COLM 2025paper
#41

Privately Learning from Graphs with Applications in Fine-tuning Large Language Models

Haoteng Yin, Rongzhe Wei, Eli Chien et al.

COLM 2025paper
#42

Evaluating LLMs on Chinese Idiom Translation

Cai Yang, Yao Dou, David Heineman et al.

COLM 2025paper
#43

Rethinking Safety in LLM Fine-tuning: An Optimization Perspective

Minseon Kim, Jin Myung Kwak, Lama Alssum et al.

COLM 2025paper
#44

CodeXEmbed: A Generalist Embedding Model Family for Multilingual and Multi-task Code Retrieval

Ye Liu, Rui Meng, Shafiq Joty et al.

COLM 2025paper
#45

Do LLMs Understand Your Translations? Evaluating Paragraph-level MT with Question Answering

Patrick Fernandes, Sweta Agrawal, Emmanouil Zaranis et al.

COLM 2025paper
#46

Law of Vision Representation in MLLMs

Shijia Yang, Bohan Zhai, Quanzeng You et al.

COLM 2025paper
#47

Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts

Samin Yeasar Arnob, Zhan Su, Minseon Kim et al.

COLM 2025paper
#48

Towards Compute-Optimal Many-Shot In-Context Learning

Shahriar Golchin, Yanfei Chen, Rujun Han et al.

COLM 2025paper
#49

The World According to LLMs: How Geographic Origin Influences LLMs' Entity Deduction Capabilities

Harsh Nishant Lalai, Raj Sanjay Shah, Jiaxin Pei et al.

COLM 2025paper
#50

LeakAgent: RL-based Red-teaming Agent for LLM Privacy Leakage

Yuzhou Nie, Zhun Wang, Ye Yu et al.

COLM 2025paper
#51

ReasonIR: Training Retrievers for Reasoning Tasks

Rulin Shao, Rui Qiao, Varsha Kishore et al.

COLM 2025paper
#52

Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation

Julia Kreutzer, Eleftheria Briakou, Sweta Agrawal et al.

COLM 2025paper
#53

Self-Rewarding PPO: Aligning Large Language Models with Demonstrations Only

Qingru Zhang, Liang Qiu, Ilgee Hong et al.

COLM 2025paper
#54

BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning

Ahmed Masry, Abhay Puri, Masoud Hashemi et al.

COLM 2025paper
#55

Self-Steering Language Models

Gabriel Grand, Joshua B. Tenenbaum, Vikash Mansinghka et al.

COLM 2025paper
#56

Training Large Language Models to Reason in a Continuous Latent Space

Shibo Hao, Sainbayar Sukhbaatar, DiJia Su et al.

COLM 2025paper
#57

The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains

Scott Geng, Hamish Ivison, Chun-Liang Li et al.

COLM 2025paper
#58

ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data

Tong Chen, Faeze Brahman, Jiacheng Liu et al.

COLM 2025paper
#59

Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate

Yubo Wang, Xiang Yue, Wenhu Chen

COLM 2025paper
#60

Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving

Yong Lin, Shange Tang, Bohan Lyu et al.

COLM 2025paper
#61

Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling

Ben Lipkin, Benjamin LeBrun, Jacob Hoover Vigly et al.

COLM 2025paper
#62

On the Effectiveness and Generalization of Race Representations for Debiasing High-Stakes Decisions

Dang Nguyen, Chenhao Tan

COLM 2025paper
#63

Quantifying Fairness in LLMs Beyond Tokens: A Semantic and Statistical Perspective

Weijie Xu, Yiwen Wang, Chi Xue et al.

COLM 2025paper
#64

Improving Fisher Information Estimation and Efficiency for LoRA-based LLM Unlearning

Yejin Kim, Eunwon Kim, Buru Chang et al.

COLM 2025paper
#65

Multi-Token Attention

Olga Golovneva, Tianlu Wang, Jason E Weston et al.

COLM 2025paper
#66

From Queries to Criteria: Understanding How Astronomers Evaluate LLMs

Alina Hyk, Kiera McCormick, Mian Zhong et al.

COLM 2025paper
#67

Analyzing Multilingualism in Large Language Models with Sparse Autoencoders

Ikhyun Cho, Julia Hockenmaier

COLM 2025paper
#68

CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation

Jixuan Leng, Chengsong Huang, Langlin Huang et al.

COLM 2025paper
#69

Unifying Autoregressive and Diffusion-Based Sequence Generation

Nima Fathi, Torsten Scholak, Pierre-Andre Noel

COLM 2025paper
#70

Control the Temperature: Selective Sampling for Diverse and High-Quality LLM Outputs

Sergey Troshin, Wafaa Mohammed, Yan Meng et al.

COLM 2025paper
#71

UTF-8 Plumbing: Byte-level Tokenizers Unavoidably Enable LLMs to Generate Ill-formed UTF-8

Preston Firestone, Shubham Ugare, Gagandeep Singh et al.

COLM 2025paper
#72

RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale

Daniel Goldstein, Eric Alcaide, Janna Lu et al.

COLM 2025paper
#73

Learning Adaptive Parallel Reasoning with Language Models

Jiayi Pan, Xiuyu Li, Long Lian et al.

COLM 2025paper
#74

EllieSQL: Cost-Efficient Text-to-SQL with Complexity-Aware Routing

Yizhang Zhu, Runzhi JIANG, Boyan Li et al.

COLM 2025paper
#75

Why do LLMs attend to the first token?

Federico Barbero, Alvaro Arroyo, Xiangming Gu et al.

COLM 2025paper
#76

Overfill: Two-Stage Models for Efficient Language Model Decoding

Woojeong Kim, Junxiong Wang, Jing Nathan Yan et al.

COLM 2025paper
#77

CLIPPER: Compression enables long-context synthetic data generation

Chau Minh Pham, Yapei Chang, Mohit Iyyer

COLM 2025paper
#78

Society of Mind Meets Real-Time Strategy: A Hierarchical Multi-Agent Framework for Strategic Reasoning

Daechul Ahn, San Kim, Jonghyun Choi

COLM 2025paper
#79

A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility

Andreas Hochlehnert, Hardik Bhatnagar, Vishaal Udandarao et al.

COLM 2025paper
#80

SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?

Jianzhu Yao, Kevin Wang, Ryan Hsieh et al.

COLM 2025paper
#81

Resource-efficient Inference with Foundation Model Programs

Lunyiu Nie, Zhimin Ding, Kevin Yu et al.

COLM 2025paper
#82

Teach Old SAEs New Domain Tricks with Boosting

Nikita Koriagin, Yaroslav Aksenov, Daniil Laptev et al.

COLM 2025paper
#83

Improving LLMs‘ Generalized Reasoning Abilities by Graph Problems

Qifan Zhang, Nuo Chen, Zehua Li et al.

COLM 2025paper
#84

Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models

José Pombal, Nuno M Guerreiro, Ricardo Rei et al.

COLM 2025paper
#85

Register Always Matters: Analysis of LLM Pretraining Data Through the Lens of Language Variation

Amanda Myntti, Erik Henriksson, Veronika Laippala et al.

COLM 2025paper
#86

Scoring Verifiers: Evaluating Synthetic Verification for Code and Reasoning

Aleksander Ficek, Somshubra Majumdar, Vahid Noroozi et al.

COLM 2025paper
#87

Cascade Reward Sampling for Efficient Decoding-Time Alignment

Bolian Li, Yifan Wang, Anamika Lochab et al.

COLM 2025paper
#88

Reverse-engineering NLI: A study of the meta-inferential properties of Natural Language Inference

Rasmus Blanck, Bill Noble, Stergios Chatzikyriakidis

COLM 2025paper
#89

Have Large Language Models Learned to Reason? A Characterization via 3-SAT

RISHI HAZRA, Gabriele Venturato, Pedro Zuidberg Dos Martires et al.

COLM 2025paper
#90

HIPPO-VIDEO : Simulating Watch Histories with Large Language Models for History-Driven Video Highlighting

Jeongeun Lee, Youngjae Yu, Dongha Lee

COLM 2025paper
#91

Adversarial Training of Reward Models

Alexander Bukharin, Haifeng Qian, Shengyang Sun et al.

COLM 2025paper
#92

The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning

Raj Sanjay Shah, Jing Huang, Keerthiram Murugesan et al.

COLM 2025paper
#93

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs

Itay Itzhak, Yonatan Belinkov, Gabriel Stanovsky

COLM 2025paper
#94

ALOPE: Adaptive Layer Optimization for Translation Quality Estimation using Large Language Models

Archchana Sindhujan, Shenbin Qian, Chan Chi Chun Matthew et al.

COLM 2025paper
#95

The Blessing and Curse of Dimensionality in Safety Alignment

Rachel S.Y. Teo, Laziz Abdullaev, Tan Minh Nguyen

COLM 2025paper
#96

AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation

Itay Nakash, Nitay Calderon, Eyal Ben-David et al.

COLM 2025paper
#97

Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling

Hang Zheng, Hongshen Xu, Yuncong Liu et al.

COLM 2025paper
#98

Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation

Yi Lu, Wanxu Zhao, Xin Zhou et al.

COLM 2025paper
#99

A Controlled Study on Long Context Extension and Generalization in LLMs

Yi Lu, Jing Nathan Yan, Songlin Yang et al.

COLM 2025paper
#100

G1yphD3c0de: Towards Safer Language Models on Visually Perturbed Texts

Yejinchoi, Yejin Yeo, Yejin Son et al.

COLM 2025paper
#101

Efficient Process Reward Model Training via Active Learning

Keyu Duan, Zichen Liu, Xin Mao et al.

COLM 2025paper
#102

Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models

Ameen Ali Ali, Shahar Katz, Lior Wolf et al.

COLM 2025paper
#103

FormaRL: Enhancing Autoformalization with no Labeled Data

Yanxing Huang, Xinling Jin, Sijie Liang et al.

COLM 2025paper
#104

ICQuant: Index Coding enables Low-bit LLM Quantization

Xinlin Li, Osama Hanna, Christina Fragouli et al.

COLM 2025paper
#105

MAC: A Live Benchmark for Multimodal Large Language Models in Scientific Understanding

Mohan Jiang, Jin Gao, Jiahao Zhan et al.

COLM 2025paper
#106

Interpreting the linear structure of vision-language model embedding spaces

Isabel Papadimitriou, Huangyuan Su, Thomas Fel et al.

COLM 2025paper
#107

LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models

Minqian Liu, Zhiyang Xu, Xinyi Zhang et al.

COLM 2025paper
#108

Establishing Task Scaling Laws via Compute-Efficient Model Ladders

Akshita Bhagia, Jiacheng Liu, Alexander Wettig et al.

COLM 2025paper
#109

Can A Society of Generative Agents Simulate Human Behavior and Inform Public Health Policy? A Case Study on Vaccine Hesitancy

Abe Bohan Hou, Hongru Du, Yichen Wang et al.

COLM 2025paper
#110

LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception

Yuan-Hong Liao, Sven Elflein, Liu He et al.

COLM 2025paper
#111

RARe: Retrieval Augmented Retrieval with In-Context Examples

Atula Tejaswi, Yoonsang Lee, sujay sanghavi et al.

COLM 2025paper
#112

Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

Mohammadreza Pourreza, Shayan Talaei, Ruoxi Sun et al.

COLM 2025paper
#113

PersonaEval: Are LLM Evaluators Human Enough to Judge Role-Play?

Lingfeng Zhou, Jialing Zhang, Jin Gao et al.

COLM 2025paper
#114

MapIQ: Evaluating Multimodal Large Language Models for Map Question Answering

Varun Srivastava, Fan Lei, Srija Mukhopadhyay et al.

COLM 2025paper
#115

Bayesian scaling laws for in-context learning

Aryaman Arora, Dan Jurafsky, Christopher Potts et al.

COLM 2025paper
#116

DoomArena: A framework for Testing AI Agents Against Evolving Security Threats

Léo Boisvert, Abhay Puri, Gabriel Huang et al.

COLM 2025paper
#117

Texture or Semantics? Vision-Language Models Get Lost in Font Recognition

Zhecheng Li, Guoxian Song, Yujun Cai et al.

COLM 2025paper
#118

Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time

Van Yang, Xiang Yue, Vipin Chaudhary et al.

COLM 2025paper
#119

Transformers are Efficient Compilers, Provably

Xiyu Zhai, Runlong Zhou, Liao Zhang et al.

COLM 2025paper
#120

Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation

Lajanugen Logeswaran, Jaekyeom Kim, Sungryull Sohn et al.

COLM 2025paper
#121

Pretrained Hybrids with MAD Skills

Nicholas Roberts, Samuel Guo, Zhiqi Gao et al.

COLM 2025paper
#122

Benchmarking Retrieval-Augmented Generation for Chemistry

Xianrui Zhong, Bowen Jin, Siru Ouyang et al.

COLM 2025paper
#123

AutoScale: Scale-Aware Data Mixing for Pre-Training LLMs

Feiyang Kang, Yifan Sun, Bingbing Wen et al.

COLM 2025paper
#124

Meta-Learning for Speeding Up Large Model Inference in Decentralized Environments

Yipeng Du, Zihao Wang, Ahmad Farhan et al.

COLM 2025paper
#125

Multilingual and Multi-Accent Jailbreaking of Audio LLMs

Jaechul Roh, Virat Shejwalkar, Amir Houmansadr

COLM 2025paper
#126

X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression

Guihong Li, Mehdi Rezagholizadeh, Mingyu Yang et al.

COLM 2025paper
#127

OpenCodeReasoning: Advancing Data Distillation for Competitive Coding

Wasi Uddin Ahmad, Sean Narenthiran, Somshubra Majumdar et al.

COLM 2025paper
#128

UNVEILING: What Makes Linguistics Olympiad Puzzles Tricky for LLMs?

Mukund Choudhary, KV Aditya Srivatsa, Gaurja Aeron et al.

COLM 2025paper
#129

Inducing Programmatic Skills for Agentic Tasks

Zora Zhiruo Wang, Apurva Gandhi, Graham Neubig et al.

COLM 2025paper
#130

Learning to Reason for Long-Form Story Generation

Alexander Gurung, Mirella Lapata

COLM 2025paper
#131

Can LLM "Self-report"?: Evaluating the Validity of Self-report Scales in Measuring Personality Design in LLM-based Chatbots

Huiqi Zou, Pengda Wang, Zihan Yan et al.

COLM 2025paper
#132

Visual Representations inside the Language Model

Benlin Liu, Amita Kamath, Madeleine Grunde-McLaughlin et al.

COLM 2025paper
#133

SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths

Kaixuan Huang, Xudong Guo, Mengdi Wang

COLM 2025paper
#134

RankAlign: A Ranking View of the Generator-Validator Gap in Large Language Models

Juan Diego Rodriguez, Wenxuan Ding, Katrin Erk et al.

COLM 2025paper
#135

SAEs Can Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs

Aashiq Muhamed, Jacopo Bonato, Mona T. Diab et al.

COLM 2025paper
#136

Energy-Based Reward Models for Robust Language Model Alignment

Anamika Lochab, Ruqi Zhang

COLM 2025paper
#137

AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time computation

Tuhin Chakrabarty, Philippe Laban, Chien-Sheng Wu

COLM 2025paper
#138

Mitigating Modal Imbalance in Multimodal Reasoning

Chen Henry Wu, Neil Kale, Aditi Raghunathan

COLM 2025paper
#139

NoveltyBench: Evaluating Language Models for Humanlike Diversity

Yiming Zhang, Harshita Diddee, Susan Holm et al.

COLM 2025paper
#140

(Im)possibility of Automated Hallucination Detection in Large Language Models

Amin Karbasi, Omar Montasser, John Sous et al.

COLM 2025paper
#141

RRO: LLM Agent Optimization Through Rising Reward Trajectories

Zilong Wang, Jingfeng Yang, Sreyashi Nag et al.

COLM 2025paper
#142

Single-Pass Document Scanning for Question Answering

Weili Cao, Jianyou Wang, Youze Zheng et al.

COLM 2025paper
#143

Knowledge Graph Retrieval-Augmented Generation via GNN-Guided Prompting

Haochen Liu, Song Wang, Jundong Li

COLM 2025paper
#144

Don’t lie to your friends: Learning what you know from collaborative self-play

Jacob Eisenstein, Reza Aghajani, Adam Fisch et al.

COLM 2025paper
#145

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Bowen Jin, Hansi Zeng, Zhenrui Yue et al.

COLM 2025paper
#146

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

Xing Han Lù, Amirhossein Kazemnejad, Nicholas Meade et al.

COLM 2025paper
#147

EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees

Zhiyuan Zeng, Yizhong Wang, Hannaneh Hajishirzi et al.

COLM 2025paper
#148

ThoughtTerminator: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models

Xiao Pu, Michael Saxon, Wenyue Hua et al.

COLM 2025paper
#149

Scaling Analysis of Interleaved Speech-Text Language Models

Gallil Maimon, Michael Hassid, Amit Roth et al.

COLM 2025paper
#150

Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers

Kusha Sareen, Morgane M Moss, Alessandro Sordoni et al.

COLM 2025paper
#151

Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models

Neel Jain, Aditya Shrivastava, Chenyang Zhu et al.

COLM 2025paper
#152

Language Model Personalization via Reward Factorization

Idan Shenfeld, Felix Faltings, Pulkit Agrawal et al.

COLM 2025paper
#153

Resona: Improving Context Copying in Linear Recurrence Models with Retrieval

Xinyu Wang, Linrui Ma, Jerry Huang et al.

COLM 2025paper
#154

Model-Agnostic Policy Explanations with Large Language Models

Zhang Xi-Jia, Yue Guo, Shufei Chen et al.

COLM 2025paper
#155

How Multimodal LLMs Solve Image Tasks: A Lens on Visual Grounding, Task Reasoning, and Answer Decoding

Zhuoran Yu, Yong Jae Lee

COLM 2025paper
#156

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

Nathan Lambert, Jacob Morrison, Valentina Pyatkin et al.

COLM 2025paper
#157

Customize Multi-modal RAI Guardrails with Precedent-based predictions

Cheng-Fu Yang, Thanh Tran, Christos Christodoulopoulos et al.

COLM 2025paper
#158

Can Large Language Models Integrate Spatial Data? Empirical Insights into Reasoning Strengths and Computational Weaknesses

Bin HAN, Robert Wolfe, Anat Caspi et al.

COLM 2025paper
#159

What is the Visual Cognition Gap between Humans and Multimodal LLMs?

Xu Cao, Yifan Shen, Bolin Lai et al.

COLM 2025paper
#160

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

Pranjal Aggarwal, Sean Welleck

COLM 2025paper
#161

Elucidating the Design Space of Decay in Linear Attention

Zhen Qin, Xuyang Shen, Yiran Zhong

COLM 2025paper
#162

Noiser: Bounded Input Perturbations for Attributing Large Language Models

Mohammad Reza Ghasemi Madani, Aryo Pradipta Gema, Yu Zhao et al.

COLM 2025paper
#163

SmolLM2: When Smol Goes Big — Data-Centric Training of a Fully Open Small Language Model

Loubna Ben allal, Anton Lozhkov, Elie Bakouch et al.

COLM 2025paper
#164

LongCodeBench: Evaluating Coding LLMs at 1M Context Windows

Stefano Rando, Luca Romani, Alessio Sampieri et al.

COLM 2025paper
#165

Agree to Disagree? A Meta-Evaluation of LLM Misgendering

Arjun Subramonian, Vagrant Gautam, Preethi Seshadri et al.

COLM 2025paper
#166

MALT: Improving Reasoning with Multi-Agent LLM Training

Sumeet Ramesh Motwani, Chandler Smith, Rocktim Jyoti Das et al.

COLM 2025paper
#167

More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment

Yifan Wang, Runjin Chen, Bolian Li et al.

COLM 2025paper
#168

BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity

Chenyang Song, Weilin Zhao, Xu Han et al.

COLM 2025paper
#169

Adaptive Layer-skipping in Pre-trained LLMs

Xuan Luo, Weizhi Wang, Xifeng Yan

COLM 2025paper
#170

AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset

Bingxiang He, Wenbin Zhang, Jiaxi Song et al.

COLM 2025paper
#171

Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse Reinforcement Learning

Jared Joselowitz, Ritam Majumdar, Arjun Jagota et al.

COLM 2025paper
#172

LLMs Are In-Context Bandit Reinforcement Learners

Giovanni Monea, Antoine Bosselut, Kianté Brantley et al.

COLM 2025paper
#173

Rethinking Multilingual Continual Pretraining: Data Mixing for Adapting LLMs Across Languages and Resources

Zihao Li, Shaoxiong Ji, Hengyu Luo et al.

COLM 2025paper
#174

Self-Evolving Critique Abilities in Large Language Models

Zhengyang Tang, Ziniu Li, Zhenyang Xiao et al.

COLM 2025paper
#175

Scaling Laws of Synthetic Data for Language Model

Zeyu Qin, Qingxiu Dong, Xingxing Zhang et al.

COLM 2025paper
#176

HyperINF: Unleashing the HyperPower of Schulz's Method for Data Influence Estimation

Xinyu Zhou, Simin Fan, Martin Jaggi

COLM 2025paper
#177

Contextualize-then-Aggregate: Circuits for In-Context Learning in Gemma-2 2B

Aleksandra Bakalova, Yana Veitsman, Xinting Huang et al.

COLM 2025paper
#178

CONCAP: Seeing Beyond English with Concepts Retrieval-Augmented Captioning

George Ibrahim, Rita Ramos, Yova Kementchedjhieva

COLM 2025paper
#179

AIOS: LLM Agent Operating System

Kai Mei, Xi Zhu, Wujiang Xu et al.

COLM 2025paper
#180

LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning

Gabriel Jacob Perin, Runjin Chen, Xuxi Chen et al.

COLM 2025paper
#181

Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact Verifiers

Wooseok Seo, Seungju Han, Jaehun Jung et al.

COLM 2025paper
#182

Towards User-level Private Reinforcement Learning with Human Feedback

Jiaming Zhang, Mingxi Lei, Meng Ding et al.

COLM 2025paper
#183

MeMAD: Structured Memory of Debates for Enhanced Multi-Agent Reasoning

Shuai Ling, Lizi Liao, Dongmei Jiang et al.

COLM 2025paper
#184

VaPR - Vision-language Preference alignment for Reasoning

Rohan Wadhawan, Fabrice Y Harel-Canada, Zi-Yi Dou et al.

COLM 2025paper
#185

FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning

Zhehao Zhang, Weijie Xu, Fanyou Wu et al.

COLM 2025paper
#186

SuperBPE: Space Travel for Language Models

Alisa Liu, Jonathan Hayase, Valentin Hofmann et al.

COLM 2025paper
#187

MegaMath: Pushing the Limits of Open Math Corpora

Fan Zhou, Zengzhi Wang, Nikhil Ranjan et al.

COLM 2025paper
#188

FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios

Ethan Chern, Steffi Chern, Shiqi Chen et al.

COLM 2025paper
#189

SecurityLingua: Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression

Yucheng Li, Surin Ahn, Huiqiang Jiang et al.

COLM 2025paper
#190

$\mu$KE: Matryoshka Unstructured Knowledge Editing of Large Language Models

Zian Su, Ziyang Huang, Kaiyuan Zhang et al.

COLM 2025paper
#191

Text Speaks Louder than Vision: ASCII Art Reveals Textual Biases in Vision-Language Models

Zhaochen Wang, Bryan Hooi, Yiwei Wang et al.

COLM 2025paper
#192

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

Kanishk Gandhi, Ayush K Chakravarthy, Anikait Singh et al.

COLM 2025paper
#193

Hawkeye: Model Collaboration for Efficient Reasoning

Jianshu She, Zhuohao Li, Zhemin Huang et al.

COLM 2025paper
#194

Building Instruction-Tuning Datasets from Human-Written Instructions with Open-Weight Large Language Models

Youmi Ma, Sakae Mizuki, Kazuki Fujii et al.

COLM 2025paper
#195

Impact-driven Context Filtering For Cross-file Code Completion

Yanzhou Li, Shangqing Liu, Kangjie Chen et al.

COLM 2025paper
#196

Phased Training for LLM-powered Text Retrieval Models Beyond Data Scaling

Xin Zhang, Yanzhao Zhang, Wen Xie et al.

COLM 2025paper
#197

JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model

Yi Nian, Shenzhe Zhu, Yuehan Qin et al.

COLM 2025paper
#198

IMPersona: Evaluating Individual Level LLM Impersonation

Quan Shi, Carlos E Jimenez, Stephen Dong et al.

COLM 2025paper
#199

ProsodyLM: Uncovering the Emerging Prosody Processing Capabilities in Speech Language Models

Kaizhi Qian, Xulin Fan, Junrui Ni et al.

COLM 2025paper
#200

Readability ≠ Learnability: Rethinking the Role of Simplicity in Training Small Language Models

Ivan Lee, Taylor Berg-Kirkpatrick

COLM 2025paper
PreviousNext