Most Cited COLM "deformation-based modeling" Papers

418 papers found • Page 2 of 3

#201

FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios

Ethan Chern, Steffi Chern, Shiqi Chen et al.

COLM 2025paper
#202

SecurityLingua: Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression

Yucheng Li, Surin Ahn, Huiqiang Jiang et al.

COLM 2025paper
#203

$\mu$KE: Matryoshka Unstructured Knowledge Editing of Large Language Models

Zian Su, Ziyang Huang, Kaiyuan Zhang et al.

COLM 2025paper
#204

Text Speaks Louder than Vision: ASCII Art Reveals Textual Biases in Vision-Language Models

Zhaochen Wang, Bryan Hooi, Yiwei Wang et al.

COLM 2025paper
#205

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

Kanishk Gandhi, Ayush K Chakravarthy, Anikait Singh et al.

COLM 2025paper
#206

Hawkeye: Model Collaboration for Efficient Reasoning

Jianshu She, Zhuohao Li, Zhemin Huang et al.

COLM 2025paper
#207

Building Instruction-Tuning Datasets from Human-Written Instructions with Open-Weight Large Language Models

Youmi Ma, Sakae Mizuki, Kazuki Fujii et al.

COLM 2025paper
#208

Impact-driven Context Filtering For Cross-file Code Completion

Yanzhou Li, Shangqing Liu, Kangjie Chen et al.

COLM 2025paper
#209

Phased Training for LLM-powered Text Retrieval Models Beyond Data Scaling

Xin Zhang, Yanzhao Zhang, Wen Xie et al.

COLM 2025paper
#210

JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model

Yi Nian, Shenzhe Zhu, Yuehan Qin et al.

COLM 2025paper
#211

IMPersona: Evaluating Individual Level LLM Impersonation

Quan Shi, Carlos E Jimenez, Stephen Dong et al.

COLM 2025paper
#212

ProsodyLM: Uncovering the Emerging Prosody Processing Capabilities in Speech Language Models

Kaizhi Qian, Xulin Fan, Junrui Ni et al.

COLM 2025paper
#213

Bootstrapping Visual Assistant Modeling with Situated Interaction Simulation

Yichi Zhang, Run Peng, Yinpei Dai et al.

COLM 2025paper
#214

Context-Adaptive Multi-Prompt Embedding with Large Language Models for Vision-Language Alignment

Dahun Kim, Anelia Angelova

COLM 2025paper
#215

Understanding Layer Significance in LLM Alignment

Guangyuan SHI, ZEXIN LU, Xiaoyu DONG et al.

COLM 2025paper
#216

EnrichIndex: Using LLMs to Enrich Retrieval Indices Offline

Peter Baile Chen, Tomer Wolfson, Mike Cafarella et al.

COLM 2025paper
#217

Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory

Liangyu Wang, Jie Ren, Hang Xu et al.

COLM 2025paper
#218

Deep Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions

Minwoo Kang, Suhong Moon, Seung Hyeong Lee et al.

COLM 2025paper
#219

SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models

Arijit Ray, Jiafei Duan, Ellis L Brown II et al.

COLM 2025paper
#220

DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning

Pengcheng Jiang, Jiacheng Lin, Lang Cao et al.

COLM 2025paper
#221

Exposing and Patching the Flaws of Large Language Models in Social Character Simulation

Yue Huang, Zhengqing Yuan, Yujun Zhou et al.

COLM 2025paper
#222

Rank1: Test-Time Compute for Reranking in Information Retrieval

Orion Weller, Kathryn Ricci, Eugene Yang et al.

COLM 2025paper
#223

Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation

Shiven Sinha, Shashwat Goel, Ponnurangam Kumaraguru et al.

COLM 2025paper
#224

Plato: Plan to Efficient Decode for Large Language Model Inference

Shuowei Jin, Xueshen Liu, Yongji Wu et al.

COLM 2025paper
#225

Correctness-Guaranteed Code Generation via Constrained Decoding

Lingxiao Li, salar rahili, Yiwei Zhao

COLM 2025paper
#226

StagFormer: Time Staggering Decoder only Transformers

Dylan J Cutler, Arun Kandoor, Nishanth Dikkala et al.

COLM 2025paper
#227

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Deepak Nathani, Lovish Madaan, Nicholas Roberts et al.

COLM 2025paper
#228

Limitations of refinement methods for weak to strong generalization

Seamus Somerstep, Yaacov Ritov, Mikhail Yurochkin et al.

COLM 2025paper
#229

How do language models learn facts? Dynamics, curricula and hallucinations

Nicolas Zucchet, Jorg Bornschein, Stephanie C.Y. Chan et al.

COLM 2025paper
#230

DualEdit: Dual Editing for Knowledge Updating in Vision-Language Models

Zhiyi Shi, Binjie Wang, Chongjie Si et al.

COLM 2025paper
#231

Improving Table Understanding with LLMs and Entity-Oriented Search

Thi-Nhung Nguyen, Hoang Ngo, Dinh Phung et al.

COLM 2025paper
#232

LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation

Xi Ye, Fangcong Yin, Yinghui He et al.

COLM 2025paper
#233

ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations

Yubo Wang, Xueguang Ma, Ping Nie et al.

COLM 2025paper
#234

Short-PHD: Detecting Short LLM-generated Text with Topological Data Analysis After Off-topic Content Insertion

Dongjun Wei, Minjia Mao, Xiao Fang et al.

COLM 2025paper
#235

Truth-value judgment in language models: ‘truth directions’ are context sensitive

Stefan F. Schouten, Peter Bloem, Ilia Markov et al.

COLM 2025paper
#236

Out-of-Distribution Detection using Synthetic Data Generation

Momin Abbas, Muneeza Azmat, Raya Horesh et al.

COLM 2025paper
#237

Cutting the Root of Hallucination: Structural Trimming for Vulnerability Mitigation in Code LLMs

Yage Zhang

COLM 2025paper
#238

RWKV-7 "Goose" with Expressive Dynamic State Evolution

Bo Peng, Ruichong Zhang, Daniel Goldstein et al.

COLM 2025paper
#239

Ensemble Debiasing Across Class and Sample Levels for Fairer Prompting Accuracy

Ruixi Lin, Ziqiao Wang, Yang You

COLM 2025paper
#240

Imagine All The Relevance: Scenario-Profiled Indexing with Knowledge Expansion for Dense Retrieval

Sangam Lee, Ryang Heo, SeongKu Kang et al.

COLM 2025paper
#241

You Cannot Feed Two Birds with One Score: the Accuracy-Naturalness Tradeoff in Translation

Gergely Flamich, David Vilar, Jan-Thorsten Peter et al.

COLM 2025paper
#242

Mixture of Attention Spans: Optimizing LLM Inference Efficiency with Heterogeneous Sliding-Window Lengths

Tianyu Fu, Haofeng Huang, Xuefei Ning et al.

COLM 2025paper
#243

Understanding the Uncertainty of LLM Explanations: A Perspective Based on Reasoning Topology

Longchao Da, Xiaoou Liu, Jiaxin Dai et al.

COLM 2025paper
#244

How does Watermarking Affect Visual Language Models in Document Understanding?

Chunxue Xu, Yiwei Wang, Bryan Hooi et al.

COLM 2025paper
#245

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

Chengqi Lyu, Songyang Gao, Yuzhe Gu et al.

COLM 2025paper
#246

When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning

Nishad Singhi, Hritik Bansal, Arian Hosseini et al.

COLM 2025paper
#247

R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents

Naman Jain, Jaskirat Singh, Manish Shetty et al.

COLM 2025paper
#248

Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning Code LLMs

Zichao Hu, Junyi Jessy Li, Arjun Guha et al.

COLM 2025paper
#249

CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis

Anjiang Wei, Tarun Suresh, Jiannan Cao et al.

COLM 2025paper
#250

C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing

Zhongyang Li, Ziyue Li, Tianyi Zhou

COLM 2025paper
#251

Gating is Weighting: Understanding Gated Linear Attention through In-context Learning

Yingcong Li, Davoud Ataee Tarzanagh, Ankit Singh Rawat et al.

COLM 2025paper
#252

$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources

Apoorv Khandelwal, Tian Yun, Nihal V. Nayak et al.

COLM 2025paper
#253

Shared Global and Local Geometry of Language Model Embeddings

Andrew Lee, Melanie Weber, Fernanda Viégas et al.

COLM 2025paper
#254

D3: A Dataset for Training Code LMs to Act Diff-by-Diff

Ulyana Piterbarg, Kanishk Gandhi, Lerrel Pinto et al.

COLM 2025paper
#255

LLM-based Multi-Agents System Attack via Continuous Optimization with Discrete Efficient Search

Weichen Yu, Kai Hu, Tianyu Pang et al.

COLM 2025paper
#256

Do Biased Models Have Biased Thoughts?

Swati Rajwal, Shivank Garg, Reem Abdel-Salam et al.

COLM 2025paper
#257

BEARCUBS: A benchmark for computer-using web agents

Yixiao Song, Katherine Thai, Chau Minh Pham et al.

COLM 2025paper
#258

CUPID: Evaluating Personalized and Contextualized Alignment of LLMs from Interactions

Tae Soo Kim, Yoonjoo Lee, Yoonah Park et al.

COLM 2025paper
#259

Supposedly Equivalent Facts That Aren’t? Entity Frequency in Pre-training Induces Asymmetry in LLMs

Yuan He, Bailan He, Zifeng Ding et al.

COLM 2025paper
#260

Training Plug-and-Play Knowledge Modules with Deep Context Distillation

Lucas Caccia, Alan Ansell, Edoardo Ponti et al.

COLM 2025paper
#261

EuroBERT: Scaling Multilingual Encoders for European Languages

Nicolas Boizard, Hippolyte Gisserot-Boukhlef, Duarte Miguel Alves et al.

COLM 2025paper
#262

Style over Substance: Distilled Language Models Reason Via Stylistic Replication

Philip Lippmann, Jie Yang

COLM 2025paper
#263

Plancraft: an evaluation dataset for planning with LLM agents

Gautier Dagan, Frank Keller, Alex Lascarides

COLM 2025paper
#264

Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human Feedback

Johannes Ackermann, Takashi Ishida, Masashi Sugiyama

COLM 2025paper
#265

Efficient Construction of Model Family through Progressive Training Using Model Expansion

Kazuki Yano, Sho Takase, Sosuke Kobayashi et al.

COLM 2025paper
#266

Inside-Out: Hidden Factual Knowledge in LLMs

Zorik Gekhman, Eyal Ben-David, Hadas Orgad et al.

COLM 2025paper
#267

News is More than a Collection of Facts: Moral Frame Preserving News Summarization

Enrico Liscio, Michela Lorandi, Pradeep K. Murukannaiah

COLM 2025paper
#268

LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K

Tao Yuan, Xuefei Ning, Dong Zhou et al.

COLM 2025paper
#269

Base Models Beat Aligned Models at Randomness and Creativity

Peter West, Christopher Potts

COLM 2025paper
#270

Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation

Songjun Tu, Jiahao Lin, Xiangyu Tian et al.

COLM 2025paper
#271

Agents Are All You Need for LLM Unlearning

Debdeep Sanyal, Murari Mandal

COLM 2025paper
#272

One ruler to measure them all: Benchmarking multilingual long-context language models

Yekyung Kim, Jenna Russell, Marzena Karpinska et al.

COLM 2025paper
#273

SEAL: Steerable Reasoning Calibration of Large Language Models for Free

Runjin Chen, Zhenyu Zhang, Junyuan Hong et al.

COLM 2025paper
#274

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

Weihao Zeng, Yuzhen Huang, Qian Liu et al.

COLM 2025paper
#275

SpectR: Dynamically Composing LM Experts with Spectral Routing

William Fleshman, Benjamin Van Durme

COLM 2025paper
#276

Both Direct and Indirect Evidence Contribute to Dative Alternation Preferences in Language Models

Qing Yao, Kanishka Misra, Leonie Weissweiler et al.

COLM 2025paper
#277

TRELLIS: Learning to Compress Key-Value Memory in Attention Models

Mahdi Karami, Ali Behrouz, Praneeth Kacham et al.

COLM 2025paper
#278

Beyond the Reported Cutoff: Where Large Language Models Fall Short on Financial Knowledge

Agam Shah, Liqin Ye, Sebastian Jaskowski et al.

COLM 2025paper
#279

LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation

Juzheng Zhang, Jiacheng You, Ashwinee Panda et al.

COLM 2025paper
#280

CASCADE Your Datasets for Cross-Mode Knowledge Retrieval of Language Models

Runlong Zhou, Yi Zhang

COLM 2025paper
#281

Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback

Runlong Zhou, Maryam Fazel, Simon Shaolei Du

COLM 2025paper
#282

The Devil is in the EOS: Sequence Training for Detailed Image Captioning

Abdelrahman Mohamed, Yova Kementchedjhieva

COLM 2025paper
#283

ReFeed: Multi-dimensional Summarization Refinement with Reflective Reasoning on Feedback

Taewon Yun, Jihwan Oh, Hyangsuk Min et al.

COLM 2025paper
#284

Modifying Large Language Model Post-Training for Diverse Creative Writing

John Joon Young Chung, Vishakh Padmakumar, Melissa Roemmele et al.

COLM 2025paper
#285

LLMs as Research Tools: A Large Scale Survey of Researchers’ Usage and Perceptions

Zhehui Liao, Maria Antoniak, Inyoung Cheong et al.

COLM 2025paper
#286

FineMedLM-o1: Enhancing Medical Knowledge Reasoning Ability of LLM from Supervised Fine-Tuning to Test-Time Training

hongzhou yu, Tianhao Cheng, Yingwen Wang et al.

COLM 2025paper
#287

Can Test-Time Scaling Improve World Foundation Model?

Wenyan Cong, Hanqing Zhu, Peihao Wang et al.

COLM 2025paper
#288

VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information

Ryo Kamoi, Yusen Zhang, Sarkar Snigdha Sarathi Das et al.

COLM 2025paper
#289

Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models

Hyunwoo Kim, Melanie Sclar, Tan Zhi-Xuan et al.

COLM 2025paper
#290

DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation

Jingyang Xiang, Sai Qian Zhang

COLM 2025paper
#291

The Dual-Route Model of Induction

Sheridan Feucht, Eric Todd, Byron C Wallace et al.

COLM 2025paper
#292

Language Models Fail to Introspect About Their Knowledge of Language

Siyuan Song, Jennifer Hu, Kyle Mahowald

COLM 2025paper
#293

SQuat: Subspace-orthogonal KV Cache Quantization

Hao Wang, Ligong Han, Kai Xu et al.

COLM 2025paper
#294

Hidden in plain sight: VLMs overlook their visual representations

Stephanie Fu, tyler bonnen, Devin Guillory et al.

COLM 2025paper
#295

Language Model Uncertainty Quantification with Attention Chain

Yinghao Li, Rushi Qiang, Lama Moukheiber et al.

COLM 2025paper
#296

SmolVLM: Redefining small and efficient multimodal models

Andrés Marafioti, Orr Zohar, Miquel Farré et al.

COLM 2025paper
#297

Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach

Shijian Deng, Wentian Zhao, Yu-Jhe Li et al.

COLM 2025paper
#298

Overflow Prevention Enhances Long-Context Recurrent LLMs

Assaf Ben-Kish, Itamar Zimerman, Muhammad Jehanzeb Mirza et al.

COLM 2025paper
#299

KVSink: Understanding and Enhancing the Preservation of Attention Sinks in KV Cache Quantization for LLMs

Zunhai Su, Kehong Yuan

COLM 2025paper
#300

PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction

Shufan Li, Aditya Grover

COLM 2025paper
#301

Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base

Linxin Song, Xuwei Ding, Jieyu Zhang et al.

COLM 2025paper
#302

Assessing Judging Bias in Large Reasoning Models: An Empirical Study

Qian Wang, Zhanzhi Lou, Zhenheng Tang et al.

COLM 2025paper
#303

Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance

Takuya Tamura, Taro Yano, Masafumi Enomoto et al.

COLM 2025paperarXiv:2504.19811
#304

E$^2$-RAG: Towards Editable Efficient RAG by Editing Compressed KV Caches

Tongxu Luo, Wenyu Du, HanWen Hao et al.

COLM 2025paper
#305

Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding

Fabian David Schmidt, Ivan Vulić, Goran Glavaš et al.

COLM 2025paperarXiv:2501.06117
#306

NoWag: A Unified Framework for Shape Preserving Com- pression of Large Language Models

Lawrence Ray Liu, Inesh Chakrabarti, Yixiao Li et al.

COLM 2025paper
#307

Evaluating Large Language Models as Expert Annotators

Yu-Min Tseng, Wei-Lin Chen, Chung-Chi Chen et al.

COLM 2025paperarXiv:2508.07827
#308

Yourbench: Dynamic Evaluation Set Generation with LLMs

Sumuk Shashidhar, Clémentine Fourrier, Alina Lozovskaya et al.

COLM 2025paper
#309

LawFlow: Collecting and Simulating Lawyers’ Thought Processes on Business Formation Case Studies

Debarati Das, Khanh Chi Le, Ritik Sachin Parkar et al.

COLM 2025paper
#310

Traceable and Explainable Multimodal Large Language Models: An Information-Theoretic View

Zihan Huang, Junda Wu, Rohan Surana et al.

COLM 2025paper
#311

Understanding and Improving Noisy Embedding Techniques in Instruction Finetuning

Abhay Yadav

COLM 2025paper
#312

REFA: Reference Free Alignment with Fine-Grained Length Control

Taneesh Gupta, Rahul Madhavan, Xuchao Zhang et al.

COLM 2025paper
#313

Hyperparameter Loss Surfaces Are Simple Near their Optima

Nicholas Lourie, He He, Kyunghyun Cho

COLM 2025paper
#314

From Next-Token to Mathematics: The Learning Dynamics of Mathematical Reasoning in Language Models

Shubhra Mishra, Gabriel Poesia, Noah Goodman

COLM 2025paperarXiv:2407.00900
#315

The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage

Skyler Hallinan, Jaehun Jung, Melanie Sclar et al.

COLM 2025paperarXiv:2508.09603
#316

Synthetic Data Generation and Multi-Step Reinforcement Learning for Reasoning and Tool Use

Anna Goldie, Azalia Mirhoseini, Hao Zhou et al.

COLM 2025paper
#317

MSRS: Evaluating Multi-Source Retrieval-Augmented Generation

Rohan Phanse, Ej Zhou, Kejian Shi et al.

COLM 2025paperarXiv:2508.20867
#318

Epistemic Alignment: A Mediating Framework for User-LLM Knowledge Delivery

Nicholas Clark, Hua Shen, Bill Howe et al.

COLM 2025paperarXiv:2504.01205
#319

PrefPalette: Personalized Preference Modeling with Latent Attributes

Shuyue Stella Li, Melanie Sclar, Hunter Lang et al.

COLM 2025paperarXiv:2507.13541
#320

X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents

Salman Rahman, Liwei Jiang, James Shiffer et al.

COLM 2025paperarXiv:2504.13203
#321

Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale

Bowen Jiang, Zhuoqun Hao, Young Min Cho et al.

COLM 2025paperarXiv:2504.14225
#322

Language models align with brain regions that represent concepts across modalities

Maria Ryskina, Greta Tuckute, Alexander Fung et al.

COLM 2025paperarXiv:2508.11536
#323

SUV: Scalable Large Language Model Copyright Compliance with Regularized Selective Unlearning

Tianyang Xu, Xiaoze Liu, Feijie Wu et al.

COLM 2025paperarXiv:2503.22948
#324

Can LLMs Handle WebShell Detection? Overcoming Detection Challenges with Behavioral Function-Aware Framework

Feijiang Han, Jiaming Zhang, Chuyi Deng et al.

COLM 2025paperarXiv:2504.13811
#325

LLM Unlearning Without an Expert Curated Dataset

Xiaoyuan Zhu, Muru Zhang, Ollie Liu et al.

COLM 2025paperarXiv:2508.06595
#326

Steering Large Language Model Activations in Sparse Spaces

Reza Bayat, Ali Rahimi-Kalahroudi, Mohammad Pezeshki et al.

COLM 2025paperarXiv:2503.00177
#327

Adaptive Computation Pruning for the Forgetting Transformer

Zhixuan Lin, Johan Obando-Ceron, Xu Owen He et al.

COLM 2025paperarXiv:2504.06949
#328

Pairwise or Pointwise? Evaluating Feedback Protocols for Bias in LLM-Based Evaluation

Tuhina Tripathi, Manya Wadhwa, Greg Durrett et al.

COLM 2025paperarXiv:2504.14716
#329

Estimating Optimal Context Length for Hybrid Retrieval-augmented Multi-document Summarization

Adithya Pratapa, Teruko Mitamura

COLM 2025paperarXiv:2504.12972
#330

Navigating the Rabbit Hole: Emergent Biases in LLM-Generated Attack Narratives Targeting Mental Health Groups

Rijul Magu, Arka Dutta, Sean Kim et al.

COLM 2025paperarXiv:2504.06160
#331

M²IV: Towards Efficient and Fine-grained Multimodal In-Context Learning via Representation Engineering

Yanshu Li, Yi Cao, Hongyang He et al.

COLM 2025paper
#332

BiXSE: Improving Dense Retrieval via Probabilistic Graded Relevance Distillation

Christos Tsirigotis, Vaibhav Adlakha, Joao Monteiro et al.

COLM 2025paperarXiv:2508.06781
#333

Stop-Think-AutoRegress: Language Modeling with Latent Diffusion Planning

Justin Lovelace, Christian K Belardi, Sofian Zalouk et al.

COLM 2025paper
#334

In-Context Occam’s Razor: How Transformers Prefer Simpler Hypotheses on the Fly

Puneesh Deora, Bhavya Vasudeva, Tina Behnia et al.

COLM 2025paper
#335

Reasoning Models Know When They’re Right: Probing Hidden States for Self-Verification

Anqi Zhang, Yulin Chen, Jane Pan et al.

COLM 2025paper
#336

The Negation Bias in Large Language Models: Investigating bias reflected in linguistic markers

Yishan Wang, Pia Sommerauer, Jelke Bloem

COLM 2025paper
#337

Language Agents Mirror Human Causal Reasoning Biases. How Can We Help Them Think Like Scientists?

Anthony GX-Chen, Dongyan Lin, Mandana Samiei et al.

COLM 2025paper
#338

Finding Flawed Fictions: Evaluating Complex Reasoning in Language Models via Plot Hole Detection

Kabir Ahuja, Melanie Sclar, Yulia Tsvetkov

COLM 2025paperarXiv:2504.11900
#339

Hell or High Water: Evaluating Agentic Recovery from External Failures

Andrew Wang, Sophia Hager, Adi Asija et al.

COLM 2025paperarXiv:2508.11027
#340

A Taxonomy of Transcendence

Natalie Abreu, Edwin Zhang, Eran Malach et al.

COLM 2025paperarXiv:2508.17669
#341

Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers

Shalev Lifshitz, Sheila A. McIlraith, Yilun Du

COLM 2025paperarXiv:2502.20379
#342

Retrieval-Augmented Generation with Conflicting Evidence

Han Wang, Archiki Prasad, Elias Stengel-Eskin et al.

COLM 2025paperarXiv:2504.13079
#343

Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models

Thao Nguyen, Yang Li, Olga Golovneva et al.

COLM 2025paperarXiv:2506.04689
#344

Impact of LLM Alignment on Impression Formation in Social Interactions

Ala N. Tak, Anahita Bolourani, Daniel B. Shank et al.

COLM 2025paper
#345

MixAssist: An Audio-Language Dataset for Co-Creative AI Assistance in Music Mixing

Michael Paul Clemens, Ana Marasovic

COLM 2025paperarXiv:2507.06329
#346

Breakpoint: Stress-testing systems-level reasoning in LLM agents

Kaivalya Hariharan, Uzay Girit, Zifan Wang et al.

COLM 2025paper
#347

Rhapsody: A Dataset for Highlight Detection in Podcasts

Younghan Park, Anuj Diwan, David Harwath et al.

COLM 2025paperarXiv:2505.19429
#348

M-Prometheus: A Suite of Open Multilingual LLM Judges

José Pombal, Dongkeun Yoon, Patrick Fernandes et al.

COLM 2025paperarXiv:2504.04953
#349

Task Vectors in In-Context Learning: Emergence, Formation, and Benefits

Liu Yang, Ziqian Lin, Kangwook Lee et al.

COLM 2025paperarXiv:2501.09240
#350

Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs

Dongyang Fan, Vinko Sabolčec, Matin Ansaripour et al.

COLM 2025paper
#351

Rethinking Associative Memory Mechanism in Induction Head

Shuo Wang, Issei Sato

COLM 2025paper
#352

Stuffed Mamba: Oversized States Lead to the Inability to Forget

Yingfa Chen, Xinrong Zhang, Shengding Hu et al.

COLM 2025paper
#353

Fluid Language Model Benchmarking

Valentin Hofmann, David Heineman, Ian Magnusson et al.

COLM 2025paperarXiv:2509.11106
#354

Data-Centric Human Preference with Rationales for Direct Preference Alignment

Hoang Anh Just, Ming Jin, Anit Kumar Sahu et al.

COLM 2025paperarXiv:2407.14477
#355

DynaSaur: Large Language Agents Beyond Predefined Actions

Dang Nguyen, Viet Dac Lai, Seunghyun Yoon et al.

COLM 2025paperarXiv:2411.01747
#356

PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages

Priyanshu Kumar, Devansh Jain, Akhila Yerukola et al.

COLM 2025paperarXiv:2504.04377
#357

Advancing Language Multi-Agent Learning with Credit Re-Assignment for Interactive Environment Generalization

Zhitao He, Zijun Liu, Peng Li et al.

COLM 2025paperarXiv:2502.14496
#358

Partial Perspectives: How LLMs Handle Logically Inconsistent Knowledge in Reasoning Tasks

Zichao Li, Ines Arous, Jackie CK Cheung

COLM 2025paper
#359

EvalAgents: Discovering Implicit Evaluation Criteria from the Web

Manya Wadhwa, Zayne Rea Sprague, Chaitanya Malaviya et al.

COLM 2025paperarXiv:2504.15219
#360

Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models

Meghana Arakkal Rajeev, Rajkumar Ramamurthy, Prapti Trivedi et al.

COLM 2025paperarXiv:2503.01781
#361

ALFA: Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning

Shuyue Stella Li, Jimin Mun, Faeze Brahman et al.

COLM 2025paperarXiv:2502.14860
#362

Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models

Ruikang Liu, Yuxuan Sun, Manyi Zhang et al.

COLM 2025paperarXiv:2504.04823
#363

CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation

Anirudh Khatry, Robert Zhang, Jia Pan et al.

COLM 2025paperarXiv:2504.15254
#364

On Mechanistic Circuits for Extractive Question-Answering

Samyadeep Basu, Vlad I Morariu, Ryan A. Rossi et al.

COLM 2025paperarXiv:2502.08059
#365

Boosting LLM Reasoning via Spontaneous Self-Correction

Xutong Zhao, Tengyu Xu, Xuewei Wang et al.

COLM 2025paperarXiv:2506.06923
#366

REM: Evaluating LLM Embodied Spatial Reasoning through Multi-Frame Trajectories

Jacob Thompson, Emiliano Garcia-Lopez, Yonatan Bisk

COLM 2025paperarXiv:2512.00736
#367

GenerationPrograms: Fine-grained Attribution with Executable Programs

David Wan, Eran Hirsch, Elias Stengel-Eskin et al.

COLM 2025paperarXiv:2506.14580
#368

Multi-Agent Retrieval-Augmented Framework for Evidence-Based Counterspeech Against Health Misinformation

Anirban Saha Anik, Xiaoying Song, Elliott Wang et al.

COLM 2025paperarXiv:2507.07307
#369

Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression

Hanqi Xiao, Yi-Lin Sung, Elias Stengel-Eskin et al.

COLM 2025paperarXiv:2504.07389
#370

Not All Data Are Unlearned Equally

Aravind Krishnan, Siva Reddy, Marius Mosbach

COLM 2025paperarXiv:2504.05058
#371

Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models

Lynn Chua, Badih Ghazi, Yangsibo Huang et al.

COLM 2025paperarXiv:2406.16135
#372

FineWeb2: One Pipeline to Scale Them All — Adapting Pre-Training Data Processing to Every Language

Guilherme Penedo, Hynek Kydlíček, Vinko Sabolčec et al.

COLM 2025paperarXiv:2506.20920
#373

Overcoming Vocabulary Constraints with Pixel-level Fallback

Jonas F. Lotz, Hendra Setiawan, Stephan Peitz et al.

COLM 2025paperarXiv:2504.02122
#374

Breaking the Data Barrier -- Building GUI Agents Through Task Generalization

Junlei Zhang, Zichen Ding, Chang Ma et al.

COLM 2025paperarXiv:2504.10127
#375

Spike No More: Stabilizing the Pre-training of Large Language Models

Sho Takase, Shun Kiyono, Sosuke Kobayashi et al.

COLM 2025paperarXiv:2312.16903
#376

CultureCLIP: Empowering CLIP with Cultural Awareness through Synthetic Images and Contextualized Captions

Yuchen Huang, Zhiyuan Fan, Zhitao He et al.

COLM 2025paperarXiv:2507.06210
#377

VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation

Ziang Ye, Yang Zhang, Wentao Shi et al.

COLM 2025paperarXiv:2507.06899
#378

Learning Effective Language Representations for Sequential Recommendation via Joint Embedding Predictive Architecture

Nguyen Anh Minh, Dung D. Le

COLM 2025paperarXiv:2504.10512
#379

Reinforcement Learning Enhanced Full-Duplex Spoken Dialogue Language Models for Conversational Interactions

Chen Chen, Ke Hu, Chao-Han Huck Yang et al.

COLM 2025paper
#380

When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars

Rei Higuchi, Ryotaro Kawata, Naoki Nishikawa et al.

COLM 2025paperarXiv:2504.17562
#381

Backdoor Attacks on Dense Retrieval via Public and Unintentional Triggers

Quanyu Long, Yue Deng, Leilei Gan et al.

COLM 2025paperarXiv:2402.13532
#382

Layers at Similar Depths Generate Similar Activations Across LLM Architectures

Christopher Wolfram, Aaron Schein

COLM 2025paperarXiv:2504.08775
#383

Jigsaw Puzzles: Splitting Harmful Questions to Jailbreak Large Language Models in Multi-turn Interactions

Hao Yang, Lizhen Qu, Ehsan Shareghi et al.

COLM 2025paper
#384

Rerouting LLM Routers

Avital Shafran, Roei Schuster, Tom Ristenpart et al.

COLM 2025paperarXiv:2501.01818
#385

Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration

Ran Xu, Wenqi Shi, Yuchen Zhuang et al.

COLM 2025paperarXiv:2504.04915
#386

CoLa: Learning to Interactively Collaborate with Large Language Models

Abhishek Sharma, Dan Goldwasser

COLM 2025paperarXiv:2504.02965
#387

Understanding R1-Zero-Like Training: A Critical Perspective

Zichen Liu, Changyu Chen, Wenjun Li et al.

COLM 2025paperarXiv:2503.20783
#388

SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation

Zichong Li, Chen Liang, Zixuan Zhang et al.

COLM 2025paperarXiv:2506.18349
#389

SEAM: Semantically Equivalent Across Modalities Benchmark for Vision-Language Models

Zhenwei Tang, Difan Jiao, Blair Yang et al.

COLM 2025paperarXiv:2508.18179
#390

VideoSAVi: Self-Aligned Video Language Models without Human Supervision

Yogesh Kulkarni, Pooyan Fazli

COLM 2025paperarXiv:2412.00624
#391

Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

Saaket Agashe, Kyle Wong, Vincent Tu et al.

COLM 2025paperarXiv:2504.00906
#392

How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence

Hongzhe Du, Weikai Li, Min Cai et al.

COLM 2025paperarXiv:2504.02904
#393

Implicit In-Context Learning: Evidence from Artificial Language Experiments

Xiaomeng Ma, Qihui Xu

COLM 2025paperarXiv:2503.24190
#394

To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning

Tian Qin, David Alvarez-Melis, Samy Jelassi et al.

COLM 2025paperarXiv:2504.07052
#395

Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF

Syrine Belakaria, Joshua Kazdan, Charles Marx et al.

COLM 2025paperarXiv:2503.22137
#396

SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching

Yuxuan Zhu, Ali Falahati, David H. Yang et al.

COLM 2025paperarXiv:2504.00970
#397

Defending LLM Watermarking Against Spoofing Attacks with Contrastive Representation Learning

Li An, Yujian Liu, Yepeng Liu et al.

COLM 2025paperarXiv:2504.06575
#398

Do Language Models Agree with Human Perceptions of Suspense in Stories?

Glenn Matlin, Devin Zhang, Rodrigo Barroso Loza et al.

COLM 2025paperarXiv:2508.15794
#399

Learning by Teaching: Engaging Students as Instructors of Large Language Models in Computer Science Education

Xinming Yang, Haasil Pujara, Jun Li

COLM 2025paperarXiv:2508.05979
#400

CALLME: Call Graph Augmentation with Large Language Models for Javascript

Michael Wang, Kexin Pei, Armando Solar-Lezama

COLM 2025paper