Most Cited COLM 2025 "image decompression pre-training" Papers

418 papers found • Page 1 of 3

#1

Pretraining on the Test Set Is No Longer All You Need: A Debate-Driven Approach to QA Benchmarks

Linbo Cao, Jinman Zhao

COLM 2025paperarXiv:2507.17747
#2

Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games

David Guzman Piedrahita, Yongjin Yang, Mrinmaya Sachan et al.

COLM 2025paper
#3

QAPyramid: Fine-grained Evaluation of Content Selection for Text Summarization

Shiyue Zhang, David Wan, Arie Cattan et al.

COLM 2025paper
#4

Sherkala-Chat: Building a State-of-the-Art LLM for Kazakh in a Moderately Resourced Setting

Fajri Koto, Rituraj Joshi, Nurdaulet Mukhituly et al.

COLM 2025paper
#5

Layerwise Importance Analysis of Feed-Forward Networks in Transformer-based Language Models

Wataru Ikeda, Kazuki Yano, Ryosuke Takahashi et al.

COLM 2025paperarXiv:2508.17734
#6

Teaching Models to Understand (but not Generate) High-risk Data

Ryan Yixiang Wang, Matthew Finlayson, Luca Soldaini et al.

COLM 2025paperarXiv:2505.03052
#7

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

Zefan Cai, Yichi Zhang, Bofei Gao et al.

COLM 2025paperarXiv:2406.02069
#8

Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining

Rosie Zhao, Alexandru Meterez, Sham M. Kakade et al.

COLM 2025paperarXiv:2504.07912
#9

Sample Efficient Preference Alignment in LLMs via Active Exploration

Viraj Mehta, Syrine Belakaria, Vikramjeet Das et al.

COLM 2025paperarXiv:2312.00267
#10

Probing Syntax in Large Language Models: Successes and Remaining Challenges

Pablo J. Diego Simon, Emmanuel Chemla, Jean-Remi King et al.

COLM 2025paperarXiv:2508.03211
#11

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Interactive AI Agents

Xuhui Zhou, Hyunwoo Kim, Faeze Brahman et al.

COLM 2025paper
#12

LIMO: Less is More for Reasoning

Yixin Ye, Zhen Huang, Yang Xiao et al.

COLM 2025paperarXiv:2502.03387
#13

Probing then Editing Response Personality of Large Language Models

Tianjie Ju, Zhenyu Shao, Bowen Wang et al.

COLM 2025paperarXiv:2504.10227
#14

A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models

Zhouhang Xie, Junda Wu, Yiran Shen et al.

COLM 2025paperarXiv:2504.07070
#15

Always Tell Me The Odds: Fine-grained Conditional Probability Estimation

Liaoyaqi Wang, Zhengping Jiang, Anqi Liu et al.

COLM 2025paperarXiv:2505.01595
#16

One-shot Optimized Steering Vectors Mediate Safety-relevant Behaviors in LLMs

Jacob Dunefsky, Arman Cohan

COLM 2025paperarXiv:2502.18862
#17

LM Agents May Fail to Act on Their Own Risk Knowledge

Yuzhi Tang, Tianxiao Li, Elizabeth Li et al.

COLM 2025paperarXiv:2508.13465
#18

Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought'' Control

Hannah Cyberey, David Evans

COLM 2025paper
#19

Exploring Large Language Model Agents for Piloting Social Experiments

Jinghua Piao, Yuwei Yan, Nian Li et al.

COLM 2025paperarXiv:2508.08678
#20

Weight ensembling improves reasoning in language models

Xingyu Dang, Christina Baek, Kaiyue Wen et al.

COLM 2025paperarXiv:2504.10478
#21

Do Large Language Models Have a Planning Theory of Mind? Evidence from MindGames: a Multi-Step Persuasion Task

Jared Moore, Ned Cooper, Rasmus Overmark et al.

COLM 2025paperarXiv:2507.16196
#22

LoRe: Personalizing LLMs via Low-Rank Reward Modeling

Avinandan Bose, Zhihan Xiong, Yuejie Chi et al.

COLM 2025paperarXiv:2504.14439
#23

Beyond Blanket Masking: Examining Granularity for Privacy Protection in Images Captured by Blind and Low Vision Users

Jeffri Murrugarra-Llerena, Haoran Niu, K. Suzanne Barber et al.

COLM 2025paperarXiv:2508.09245
#24

When Splitting Makes Stronger: A Theoretical and Empirical Analysis of Divide-and-Conquer Prompting in LLMs

Yizhou Zhang, Defu Cao, Lun Du et al.

COLM 2025paper
#25

Humans overrely on overconfident language models, across languages

Neil Rathi, Dan Jurafsky, Kaitlyn Zhou

COLM 2025paperarXiv:2507.06306
#26

Values in the Wild: Discovering and Mapping Values in Real-World Language Model Interactions

Saffron Huang, Esin DURMUS, Kunal Handa et al.

COLM 2025paper
#27

The Zero Body Problem: Probing LLM Use of Sensory Language

Rebecca M. M. Hicke, Sil Hamilton, David Mimno

COLM 2025paperarXiv:2504.06393
#28

Investigating Intersectional Bias in Large Language Models using Confidence Disparities in Coreference Resolution

Falaah Arif Khan, Nivedha Sivakumar, Yinong Oliver Wang et al.

COLM 2025paperarXiv:2508.07111
#29

SciReplicate-Bench: Benchmarking LLMs in Agent-driven Algorithmic Reproduction from Research Papers

Yanzheng Xiang, Hanqi Yan, Shuyin Ouyang et al.

COLM 2025paperarXiv:2504.00255
#30

True Multimodal In-Context Learning Needs Attention to the Visual Context

Shuo Chen, Jianzhe Liu, Zhen Han et al.

COLM 2025paperarXiv:2507.15807
#31

Post-training for Efficient Communication via Convention Formation

Yilun Hua, Evan Wang, Yoav Artzi

COLM 2025paperarXiv:2508.06482
#32

Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging

Ryo Bertolissi, Jonas Hübotter, Ido Hakimi et al.

COLM 2025paperarXiv:2505.14136
#33

CodeXEmbed: A Generalist Embedding Model Family for Multilingual and Multi-task Code Retrieval

Ye Liu, Rui Meng, Shafiq Joty et al.

COLM 2025paper
#34

The World According to LLMs: How Geographic Origin Influences LLMs' Entity Deduction Capabilities

Harsh Nishant Lalai, Raj Sanjay Shah, Jiaxin Pei et al.

COLM 2025paperarXiv:2508.05525
#35

BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning

Ahmed Masry, Abhay Puri, Masoud Hashemi et al.

COLM 2025paperarXiv:2508.09804
#36

Training Large Language Models to Reason in a Continuous Latent Space

Shibo Hao, Sainbayar Sukhbaatar, DiJia Su et al.

COLM 2025paperarXiv:2412.06769
#37

EllieSQL: Cost-Efficient Text-to-SQL with Complexity-Aware Routing

Yizhang Zhu, Runzhi JIANG, Boyan Li et al.

COLM 2025paperarXiv:2503.22402
#38

Resource-efficient Inference with Foundation Model Programs

Lunyiu Nie, Zhimin Ding, Kevin Yu et al.

COLM 2025paperarXiv:2504.07247
#39

Have Large Language Models Learned to Reason? A Characterization via 3-SAT

RISHI HAZRA, Gabriele Venturato, Pedro Zuidberg Dos Martires et al.

COLM 2025paperarXiv:2504.03930
#40

HIPPO-VIDEO : Simulating Watch Histories with Large Language Models for History-Driven Video Highlighting

Jeongeun Lee, Youngjae Yu, Dongha Lee

COLM 2025paper
#41

The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning

Raj Sanjay Shah, Jing Huang, Keerthiram Murugesan et al.

COLM 2025paper
#42

G1yphD3c0de: Towards Safer Language Models on Visually Perturbed Texts

Yejinchoi, Yejin Yeo, Yejin Son et al.

COLM 2025paper
#43

Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models

Ameen Ali Ali, Shahar Katz, Lior Wolf et al.

COLM 2025paperarXiv:2507.09185
#44

LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception

Yuan-Hong Liao, Sven Elflein, Liu He et al.

COLM 2025paperarXiv:2504.15362
#45

Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

Mohammadreza Pourreza, Shayan Talaei, Ruoxi Sun et al.

COLM 2025paperarXiv:2503.23157
#46

Texture or Semantics? Vision-Language Models Get Lost in Font Recognition

Zhecheng Li, Guoxian Song, Yujun Cai et al.

COLM 2025paperarXiv:2503.23768
#47

Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation

Lajanugen Logeswaran, Jaekyeom Kim, Sungryull Sohn et al.

COLM 2025paper
#48

Pretrained Hybrids with MAD Skills

Nicholas Roberts, Samuel Guo, Zhiqi Gao et al.

COLM 2025paper
#49

Benchmarking Retrieval-Augmented Generation for Chemistry

Xianrui Zhong, Bowen Jin, Siru Ouyang et al.

COLM 2025paper
#50

AutoScale: Scale-Aware Data Mixing for Pre-Training LLMs

Feiyang Kang, Yifan Sun, Bingbing Wen et al.

COLM 2025paper
#51

Meta-Learning for Speeding Up Large Model Inference in Decentralized Environments

Yipeng Du, Zihao Wang, Ahmad Farhan et al.

COLM 2025paper
#52

Multilingual and Multi-Accent Jailbreaking of Audio LLMs

Jaechul Roh, Virat Shejwalkar, Amir Houmansadr

COLM 2025paper
#53

X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression

Guihong Li, Mehdi Rezagholizadeh, Mingyu Yang et al.

COLM 2025paper
#54

OpenCodeReasoning: Advancing Data Distillation for Competitive Coding

Wasi Uddin Ahmad, Sean Narenthiran, Somshubra Majumdar et al.

COLM 2025paper
#55

UNVEILING: What Makes Linguistics Olympiad Puzzles Tricky for LLMs?

Mukund Choudhary, KV Aditya Srivatsa, Gaurja Aeron et al.

COLM 2025paper
#56

Inducing Programmatic Skills for Agentic Tasks

Zora Zhiruo Wang, Apurva Gandhi, Graham Neubig et al.

COLM 2025paper
#57

Learning to Reason for Long-Form Story Generation

Alexander Gurung, Mirella Lapata

COLM 2025paper
#58

Can LLM "Self-report"?: Evaluating the Validity of Self-report Scales in Measuring Personality Design in LLM-based Chatbots

Huiqi Zou, Pengda Wang, Zihan Yan et al.

COLM 2025paper
#59

Visual Representations inside the Language Model

Benlin Liu, Amita Kamath, Madeleine Grunde-McLaughlin et al.

COLM 2025paper
#60

SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths

Kaixuan Huang, Xudong Guo, Mengdi Wang

COLM 2025paper
#61

RankAlign: A Ranking View of the Generator-Validator Gap in Large Language Models

Juan Diego Rodriguez, Wenxuan Ding, Katrin Erk et al.

COLM 2025paper
#62

SAEs Can Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs

Aashiq Muhamed, Jacopo Bonato, Mona T. Diab et al.

COLM 2025paper
#63

Energy-Based Reward Models for Robust Language Model Alignment

Anamika Lochab, Ruqi Zhang

COLM 2025paper
#64

AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time computation

Tuhin Chakrabarty, Philippe Laban, Chien-Sheng Wu

COLM 2025paper
#65

Mitigating Modal Imbalance in Multimodal Reasoning

Chen Henry Wu, Neil Kale, Aditi Raghunathan

COLM 2025paper
#66

NoveltyBench: Evaluating Language Models for Humanlike Diversity

Yiming Zhang, Harshita Diddee, Susan Holm et al.

COLM 2025paper
#67

(Im)possibility of Automated Hallucination Detection in Large Language Models

Amin Karbasi, Omar Montasser, John Sous et al.

COLM 2025paper
#68

RRO: LLM Agent Optimization Through Rising Reward Trajectories

Zilong Wang, Jingfeng Yang, Sreyashi Nag et al.

COLM 2025paper
#69

Single-Pass Document Scanning for Question Answering

Weili Cao, Jianyou Wang, Youze Zheng et al.

COLM 2025paper
#70

Knowledge Graph Retrieval-Augmented Generation via GNN-Guided Prompting

Haochen Liu, Song Wang, Jundong Li

COLM 2025paper
#71

Don’t lie to your friends: Learning what you know from collaborative self-play

Jacob Eisenstein, Reza Aghajani, Adam Fisch et al.

COLM 2025paper
#72

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Bowen Jin, Hansi Zeng, Zhenrui Yue et al.

COLM 2025paper
#73

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

Xing Han Lù, Amirhossein Kazemnejad, Nicholas Meade et al.

COLM 2025paper
#74

EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees

Zhiyuan Zeng, Yizhong Wang, Hannaneh Hajishirzi et al.

COLM 2025paper
#75

ThoughtTerminator: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models

Xiao Pu, Michael Saxon, Wenyue Hua et al.

COLM 2025paper
#76

Scaling Analysis of Interleaved Speech-Text Language Models

Gallil Maimon, Michael Hassid, Amit Roth et al.

COLM 2025paper
#77

Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers

Kusha Sareen, Morgane M Moss, Alessandro Sordoni et al.

COLM 2025paper
#78

Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models

Neel Jain, Aditya Shrivastava, Chenyang Zhu et al.

COLM 2025paper
#79

Language Model Personalization via Reward Factorization

Idan Shenfeld, Felix Faltings, Pulkit Agrawal et al.

COLM 2025paper
#80

Resona: Improving Context Copying in Linear Recurrence Models with Retrieval

Xinyu Wang, Linrui Ma, Jerry Huang et al.

COLM 2025paper
#81

Model-Agnostic Policy Explanations with Large Language Models

Zhang Xi-Jia, Yue Guo, Shufei Chen et al.

COLM 2025paper
#82

How Multimodal LLMs Solve Image Tasks: A Lens on Visual Grounding, Task Reasoning, and Answer Decoding

Zhuoran Yu, Yong Jae Lee

COLM 2025paper
#83

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

Nathan Lambert, Jacob Morrison, Valentina Pyatkin et al.

COLM 2025paper
#84

Customize Multi-modal RAI Guardrails with Precedent-based predictions

Cheng-Fu Yang, Thanh Tran, Christos Christodoulopoulos et al.

COLM 2025paper
#85

Can Large Language Models Integrate Spatial Data? Empirical Insights into Reasoning Strengths and Computational Weaknesses

Bin HAN, Robert Wolfe, Anat Caspi et al.

COLM 2025paper
#86

What is the Visual Cognition Gap between Humans and Multimodal LLMs?

Xu Cao, Yifan Shen, Bolin Lai et al.

COLM 2025paper
#87

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

Pranjal Aggarwal, Sean Welleck

COLM 2025paper
#88

Elucidating the Design Space of Decay in Linear Attention

Zhen Qin, Xuyang Shen, Yiran Zhong

COLM 2025paper
#89

Noiser: Bounded Input Perturbations for Attributing Large Language Models

Mohammad Reza Ghasemi Madani, Aryo Pradipta Gema, Yu Zhao et al.

COLM 2025paper
#90

SmolLM2: When Smol Goes Big — Data-Centric Training of a Fully Open Small Language Model

Loubna Ben allal, Anton Lozhkov, Elie Bakouch et al.

COLM 2025paper
#91

LongCodeBench: Evaluating Coding LLMs at 1M Context Windows

Stefano Rando, Luca Romani, Alessio Sampieri et al.

COLM 2025paper
#92

Agree to Disagree? A Meta-Evaluation of LLM Misgendering

Arjun Subramonian, Vagrant Gautam, Preethi Seshadri et al.

COLM 2025paper
#93

MALT: Improving Reasoning with Multi-Agent LLM Training

Sumeet Ramesh Motwani, Chandler Smith, Rocktim Jyoti Das et al.

COLM 2025paper
#94

More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment

Yifan Wang, Runjin Chen, Bolian Li et al.

COLM 2025paper
#95

BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity

Chenyang Song, Weilin Zhao, Xu Han et al.

COLM 2025paper
#96

Adaptive Layer-skipping in Pre-trained LLMs

Xuan Luo, Weizhi Wang, Xifeng Yan

COLM 2025paper
#97

AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset

Bingxiang He, Wenbin Zhang, Jiaxi Song et al.

COLM 2025paper
#98

Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse Reinforcement Learning

Jared Joselowitz, Ritam Majumdar, Arjun Jagota et al.

COLM 2025paper
#99

LLMs Are In-Context Bandit Reinforcement Learners

Giovanni Monea, Antoine Bosselut, Kianté Brantley et al.

COLM 2025paper
#100

Rethinking Multilingual Continual Pretraining: Data Mixing for Adapting LLMs Across Languages and Resources

Zihao Li, Shaoxiong Ji, Hengyu Luo et al.

COLM 2025paper
#101

Self-Evolving Critique Abilities in Large Language Models

Zhengyang Tang, Ziniu Li, Zhenyang Xiao et al.

COLM 2025paper
#102

Scaling Laws of Synthetic Data for Language Model

Zeyu Qin, Qingxiu Dong, Xingxing Zhang et al.

COLM 2025paper
#103

HyperINF: Unleashing the HyperPower of Schulz's Method for Data Influence Estimation

Xinyu Zhou, Simin Fan, Martin Jaggi

COLM 2025paper
#104

Contextualize-then-Aggregate: Circuits for In-Context Learning in Gemma-2 2B

Aleksandra Bakalova, Yana Veitsman, Xinting Huang et al.

COLM 2025paper
#105

CONCAP: Seeing Beyond English with Concepts Retrieval-Augmented Captioning

George Ibrahim, Rita Ramos, Yova Kementchedjhieva

COLM 2025paper
#106

AIOS: LLM Agent Operating System

Kai Mei, Xi Zhu, Wujiang Xu et al.

COLM 2025paper
#107

LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning

Gabriel Jacob Perin, Runjin Chen, Xuxi Chen et al.

COLM 2025paper
#108

Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact Verifiers

Wooseok Seo, Seungju Han, Jaehun Jung et al.

COLM 2025paper
#109

Towards User-level Private Reinforcement Learning with Human Feedback

Jiaming Zhang, Mingxi Lei, Meng Ding et al.

COLM 2025paper
#110

MeMAD: Structured Memory of Debates for Enhanced Multi-Agent Reasoning

Shuai Ling, Lizi Liao, Dongmei Jiang et al.

COLM 2025paper
#111

VaPR - Vision-language Preference alignment for Reasoning

Rohan Wadhawan, Fabrice Y Harel-Canada, Zi-Yi Dou et al.

COLM 2025paper
#112

FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning

Zhehao Zhang, Weijie Xu, Fanyou Wu et al.

COLM 2025paper
#113

SuperBPE: Space Travel for Language Models

Alisa Liu, Jonathan Hayase, Valentin Hofmann et al.

COLM 2025paper
#114

MegaMath: Pushing the Limits of Open Math Corpora

Fan Zhou, Zengzhi Wang, Nikhil Ranjan et al.

COLM 2025paper
#115

FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios

Ethan Chern, Steffi Chern, Shiqi Chen et al.

COLM 2025paper
#116

SecurityLingua: Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression

Yucheng Li, Surin Ahn, Huiqiang Jiang et al.

COLM 2025paper
#117

$\mu$KE: Matryoshka Unstructured Knowledge Editing of Large Language Models

Zian Su, Ziyang Huang, Kaiyuan Zhang et al.

COLM 2025paper
#118

Text Speaks Louder than Vision: ASCII Art Reveals Textual Biases in Vision-Language Models

Zhaochen Wang, Bryan Hooi, Yiwei Wang et al.

COLM 2025paper
#119

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

Kanishk Gandhi, Ayush K Chakravarthy, Anikait Singh et al.

COLM 2025paper
#120

Hawkeye: Model Collaboration for Efficient Reasoning

Jianshu She, Zhuohao Li, Zhemin Huang et al.

COLM 2025paper
#121

Building Instruction-Tuning Datasets from Human-Written Instructions with Open-Weight Large Language Models

Youmi Ma, Sakae Mizuki, Kazuki Fujii et al.

COLM 2025paper
#122

Impact-driven Context Filtering For Cross-file Code Completion

Yanzhou Li, Shangqing Liu, Kangjie Chen et al.

COLM 2025paper
#123

Phased Training for LLM-powered Text Retrieval Models Beyond Data Scaling

Xin Zhang, Yanzhao Zhang, Wen Xie et al.

COLM 2025paper
#124

JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model

Yi Nian, Shenzhe Zhu, Yuehan Qin et al.

COLM 2025paper
#125

IMPersona: Evaluating Individual Level LLM Impersonation

Quan Shi, Carlos E Jimenez, Stephen Dong et al.

COLM 2025paper
#126

ProsodyLM: Uncovering the Emerging Prosody Processing Capabilities in Speech Language Models

Kaizhi Qian, Xulin Fan, Junrui Ni et al.

COLM 2025paper
#127

Bootstrapping Visual Assistant Modeling with Situated Interaction Simulation

Yichi Zhang, Run Peng, Yinpei Dai et al.

COLM 2025paper
#128

Context-Adaptive Multi-Prompt Embedding with Large Language Models for Vision-Language Alignment

Dahun Kim, Anelia Angelova

COLM 2025paper
#129

Understanding Layer Significance in LLM Alignment

Guangyuan SHI, ZEXIN LU, Xiaoyu DONG et al.

COLM 2025paper
#130

EnrichIndex: Using LLMs to Enrich Retrieval Indices Offline

Peter Baile Chen, Tomer Wolfson, Mike Cafarella et al.

COLM 2025paper
#131

Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory

Liangyu Wang, Jie Ren, Hang Xu et al.

COLM 2025paper
#132

Deep Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions

Minwoo Kang, Suhong Moon, Seung Hyeong Lee et al.

COLM 2025paper
#133

SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models

Arijit Ray, Jiafei Duan, Ellis L Brown II et al.

COLM 2025paper
#134

DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning

Pengcheng Jiang, Jiacheng Lin, Lang Cao et al.

COLM 2025paper
#135

Exposing and Patching the Flaws of Large Language Models in Social Character Simulation

Yue Huang, Zhengqing Yuan, Yujun Zhou et al.

COLM 2025paper
#136

Rank1: Test-Time Compute for Reranking in Information Retrieval

Orion Weller, Kathryn Ricci, Eugene Yang et al.

COLM 2025paper
#137

Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation

Shiven Sinha, Shashwat Goel, Ponnurangam Kumaraguru et al.

COLM 2025paper
#138

Plato: Plan to Efficient Decode for Large Language Model Inference

Shuowei Jin, Xueshen Liu, Yongji Wu et al.

COLM 2025paper
#139

Correctness-Guaranteed Code Generation via Constrained Decoding

Lingxiao Li, salar rahili, Yiwei Zhao

COLM 2025paper
#140

StagFormer: Time Staggering Decoder only Transformers

Dylan J Cutler, Arun Kandoor, Nishanth Dikkala et al.

COLM 2025paper
#141

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Deepak Nathani, Lovish Madaan, Nicholas Roberts et al.

COLM 2025paper
#142

Limitations of refinement methods for weak to strong generalization

Seamus Somerstep, Yaacov Ritov, Mikhail Yurochkin et al.

COLM 2025paper
#143

How do language models learn facts? Dynamics, curricula and hallucinations

Nicolas Zucchet, Jorg Bornschein, Stephanie C.Y. Chan et al.

COLM 2025paper
#144

DualEdit: Dual Editing for Knowledge Updating in Vision-Language Models

Zhiyi Shi, Binjie Wang, Chongjie Si et al.

COLM 2025paper
#145

Improving Table Understanding with LLMs and Entity-Oriented Search

Thi-Nhung Nguyen, Hoang Ngo, Dinh Phung et al.

COLM 2025paper
#146

LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation

Xi Ye, Fangcong Yin, Yinghui He et al.

COLM 2025paper
#147

ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations

Yubo Wang, Xueguang Ma, Ping Nie et al.

COLM 2025paper
#148

Short-PHD: Detecting Short LLM-generated Text with Topological Data Analysis After Off-topic Content Insertion

Dongjun Wei, Minjia Mao, Xiao Fang et al.

COLM 2025paper
#149

Truth-value judgment in language models: ‘truth directions’ are context sensitive

Stefan F. Schouten, Peter Bloem, Ilia Markov et al.

COLM 2025paper
#150

Out-of-Distribution Detection using Synthetic Data Generation

Momin Abbas, Muneeza Azmat, Raya Horesh et al.

COLM 2025paper
#151

Cutting the Root of Hallucination: Structural Trimming for Vulnerability Mitigation in Code LLMs

Yage Zhang

COLM 2025paper
#152

RWKV-7 "Goose" with Expressive Dynamic State Evolution

Bo Peng, Ruichong Zhang, Daniel Goldstein et al.

COLM 2025paper
#153

Ensemble Debiasing Across Class and Sample Levels for Fairer Prompting Accuracy

Ruixi Lin, Ziqiao Wang, Yang You

COLM 2025paper
#154

Imagine All The Relevance: Scenario-Profiled Indexing with Knowledge Expansion for Dense Retrieval

Sangam Lee, Ryang Heo, SeongKu Kang et al.

COLM 2025paper
#155

You Cannot Feed Two Birds with One Score: the Accuracy-Naturalness Tradeoff in Translation

Gergely Flamich, David Vilar, Jan-Thorsten Peter et al.

COLM 2025paper
#156

Mixture of Attention Spans: Optimizing LLM Inference Efficiency with Heterogeneous Sliding-Window Lengths

Tianyu Fu, Haofeng Huang, Xuefei Ning et al.

COLM 2025paper
#157

Understanding the Uncertainty of LLM Explanations: A Perspective Based on Reasoning Topology

Longchao Da, Xiaoou Liu, Jiaxin Dai et al.

COLM 2025paper
#158

How does Watermarking Affect Visual Language Models in Document Understanding?

Chunxue Xu, Yiwei Wang, Bryan Hooi et al.

COLM 2025paper
#159

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

Chengqi Lyu, Songyang Gao, Yuzhe Gu et al.

COLM 2025paper
#160

When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning

Nishad Singhi, Hritik Bansal, Arian Hosseini et al.

COLM 2025paper
#161

R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents

Naman Jain, Jaskirat Singh, Manish Shetty et al.

COLM 2025paper
#162

Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning Code LLMs

Zichao Hu, Junyi Jessy Li, Arjun Guha et al.

COLM 2025paper
#163

CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis

Anjiang Wei, Tarun Suresh, Jiannan Cao et al.

COLM 2025paper
#164

C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing

Zhongyang Li, Ziyue Li, Tianyi Zhou

COLM 2025paper
#165

Gating is Weighting: Understanding Gated Linear Attention through In-context Learning

Yingcong Li, Davoud Ataee Tarzanagh, Ankit Singh Rawat et al.

COLM 2025paper
#166

$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources

Apoorv Khandelwal, Tian Yun, Nihal V. Nayak et al.

COLM 2025paper
#167

Shared Global and Local Geometry of Language Model Embeddings

Andrew Lee, Melanie Weber, Fernanda Viégas et al.

COLM 2025paper
#168

D3: A Dataset for Training Code LMs to Act Diff-by-Diff

Ulyana Piterbarg, Kanishk Gandhi, Lerrel Pinto et al.

COLM 2025paper
#169

LLM-based Multi-Agents System Attack via Continuous Optimization with Discrete Efficient Search

Weichen Yu, Kai Hu, Tianyu Pang et al.

COLM 2025paper
#170

Do Biased Models Have Biased Thoughts?

Swati Rajwal, Shivank Garg, Reem Abdel-Salam et al.

COLM 2025paper
#171

BEARCUBS: A benchmark for computer-using web agents

Yixiao Song, Katherine Thai, Chau Minh Pham et al.

COLM 2025paper
#172

CUPID: Evaluating Personalized and Contextualized Alignment of LLMs from Interactions

Tae Soo Kim, Yoonjoo Lee, Yoonah Park et al.

COLM 2025paper
#173

Supposedly Equivalent Facts That Aren’t? Entity Frequency in Pre-training Induces Asymmetry in LLMs

Yuan He, Bailan He, Zifeng Ding et al.

COLM 2025paper
#174

Training Plug-and-Play Knowledge Modules with Deep Context Distillation

Lucas Caccia, Alan Ansell, Edoardo Ponti et al.

COLM 2025paper
#175

EuroBERT: Scaling Multilingual Encoders for European Languages

Nicolas Boizard, Hippolyte Gisserot-Boukhlef, Duarte Miguel Alves et al.

COLM 2025paper
#176

Style over Substance: Distilled Language Models Reason Via Stylistic Replication

Philip Lippmann, Jie Yang

COLM 2025paper
#177

Plancraft: an evaluation dataset for planning with LLM agents

Gautier Dagan, Frank Keller, Alex Lascarides

COLM 2025paper
#178

Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human Feedback

Johannes Ackermann, Takashi Ishida, Masashi Sugiyama

COLM 2025paper
#179

Efficient Construction of Model Family through Progressive Training Using Model Expansion

Kazuki Yano, Sho Takase, Sosuke Kobayashi et al.

COLM 2025paper
#180

Inside-Out: Hidden Factual Knowledge in LLMs

Zorik Gekhman, Eyal Ben-David, Hadas Orgad et al.

COLM 2025paper
#181

News is More than a Collection of Facts: Moral Frame Preserving News Summarization

Enrico Liscio, Michela Lorandi, Pradeep K. Murukannaiah

COLM 2025paper
#182

LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K

Tao Yuan, Xuefei Ning, Dong Zhou et al.

COLM 2025paper
#183

Base Models Beat Aligned Models at Randomness and Creativity

Peter West, Christopher Potts

COLM 2025paper
#184

Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation

Songjun Tu, Jiahao Lin, Xiangyu Tian et al.

COLM 2025paper
#185

Agents Are All You Need for LLM Unlearning

Debdeep Sanyal, Murari Mandal

COLM 2025paper
#186

One ruler to measure them all: Benchmarking multilingual long-context language models

Yekyung Kim, Jenna Russell, Marzena Karpinska et al.

COLM 2025paper
#187

SEAL: Steerable Reasoning Calibration of Large Language Models for Free

Runjin Chen, Zhenyu Zhang, Junyuan Hong et al.

COLM 2025paper
#188

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

Weihao Zeng, Yuzhen Huang, Qian Liu et al.

COLM 2025paper
#189

SpectR: Dynamically Composing LM Experts with Spectral Routing

William Fleshman, Benjamin Van Durme

COLM 2025paper
#190

Both Direct and Indirect Evidence Contribute to Dative Alternation Preferences in Language Models

Qing Yao, Kanishka Misra, Leonie Weissweiler et al.

COLM 2025paper
#191

TRELLIS: Learning to Compress Key-Value Memory in Attention Models

Mahdi Karami, Ali Behrouz, Praneeth Kacham et al.

COLM 2025paper
#192

Beyond the Reported Cutoff: Where Large Language Models Fall Short on Financial Knowledge

Agam Shah, Liqin Ye, Sebastian Jaskowski et al.

COLM 2025paper
#193

LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation

Juzheng Zhang, Jiacheng You, Ashwinee Panda et al.

COLM 2025paper
#194

CASCADE Your Datasets for Cross-Mode Knowledge Retrieval of Language Models

Runlong Zhou, Yi Zhang

COLM 2025paper
#195

Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback

Runlong Zhou, Maryam Fazel, Simon Shaolei Du

COLM 2025paper
#196

The Devil is in the EOS: Sequence Training for Detailed Image Captioning

Abdelrahman Mohamed, Yova Kementchedjhieva

COLM 2025paper
#197

ReFeed: Multi-dimensional Summarization Refinement with Reflective Reasoning on Feedback

Taewon Yun, Jihwan Oh, Hyangsuk Min et al.

COLM 2025paper
#198

Modifying Large Language Model Post-Training for Diverse Creative Writing

John Joon Young Chung, Vishakh Padmakumar, Melissa Roemmele et al.

COLM 2025paper
#199

LLMs as Research Tools: A Large Scale Survey of Researchers’ Usage and Perceptions

Zhehui Liao, Maria Antoniak, Inyoung Cheong et al.

COLM 2025paper
#200

Readability ≠ Learnability: Rethinking the Role of Simplicity in Training Small Language Models

Ivan Lee, Taylor Berg-Kirkpatrick

COLM 2025paper
PreviousNext