Most Cited COLM "multi-objective decision-making" Papers

418 papers found • Page 2 of 3

#201

Overflow Prevention Enhances Long-Context Recurrent LLMs

Assaf Ben-Kish, Itamar Zimerman, Muhammad Jehanzeb Mirza et al.

COLM 2025paper
#202

KVSink: Understanding and Enhancing the Preservation of Attention Sinks in KV Cache Quantization for LLMs

Zunhai Su, Kehong Yuan

COLM 2025paper
#203

PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction

Shufan Li, Aditya Grover

COLM 2025paper
#204

Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base

Linxin Song, Xuwei Ding, Jieyu Zhang et al.

COLM 2025paper
#205

Assessing Judging Bias in Large Reasoning Models: An Empirical Study

Qian Wang, Zhanzhi Lou, Zhenheng Tang et al.

COLM 2025paper
#206

Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance

Takuya Tamura, Taro Yano, Masafumi Enomoto et al.

COLM 2025paperarXiv:2504.19811
#207

E$^2$-RAG: Towards Editable Efficient RAG by Editing Compressed KV Caches

Tongxu Luo, Wenyu Du, HanWen Hao et al.

COLM 2025paper
#208

Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding

Fabian David Schmidt, Ivan Vulić, Goran Glavaš et al.

COLM 2025paperarXiv:2501.06117
#209

NoWag: A Unified Framework for Shape Preserving Com- pression of Large Language Models

Lawrence Ray Liu, Inesh Chakrabarti, Yixiao Li et al.

COLM 2025paper
#210

Evaluating Large Language Models as Expert Annotators

Yu-Min Tseng, Wei-Lin Chen, Chung-Chi Chen et al.

COLM 2025paperarXiv:2508.07827
#211

Yourbench: Dynamic Evaluation Set Generation with LLMs

Sumuk Shashidhar, Clémentine Fourrier, Alina Lozovskaya et al.

COLM 2025paper
#212

LawFlow: Collecting and Simulating Lawyers’ Thought Processes on Business Formation Case Studies

Debarati Das, Khanh Chi Le, Ritik Sachin Parkar et al.

COLM 2025paper
#213

Traceable and Explainable Multimodal Large Language Models: An Information-Theoretic View

Zihan Huang, Junda Wu, Rohan Surana et al.

COLM 2025paper
#214

Understanding and Improving Noisy Embedding Techniques in Instruction Finetuning

Abhay Yadav

COLM 2025paper
#215

REFA: Reference Free Alignment with Fine-Grained Length Control

Taneesh Gupta, Rahul Madhavan, Xuchao Zhang et al.

COLM 2025paper
#216

Hyperparameter Loss Surfaces Are Simple Near their Optima

Nicholas Lourie, He He, Kyunghyun Cho

COLM 2025paper
#217

From Next-Token to Mathematics: The Learning Dynamics of Mathematical Reasoning in Language Models

Shubhra Mishra, Gabriel Poesia, Noah Goodman

COLM 2025paperarXiv:2407.00900
#218

The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage

Skyler Hallinan, Jaehun Jung, Melanie Sclar et al.

COLM 2025paperarXiv:2508.09603
#219

Synthetic Data Generation and Multi-Step Reinforcement Learning for Reasoning and Tool Use

Anna Goldie, Azalia Mirhoseini, Hao Zhou et al.

COLM 2025paper
#220

MSRS: Evaluating Multi-Source Retrieval-Augmented Generation

Rohan Phanse, Ej Zhou, Kejian Shi et al.

COLM 2025paperarXiv:2508.20867
#221

Epistemic Alignment: A Mediating Framework for User-LLM Knowledge Delivery

Nicholas Clark, Hua Shen, Bill Howe et al.

COLM 2025paperarXiv:2504.01205
#222

PrefPalette: Personalized Preference Modeling with Latent Attributes

Shuyue Stella Li, Melanie Sclar, Hunter Lang et al.

COLM 2025paperarXiv:2507.13541
#223

X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents

Salman Rahman, Liwei Jiang, James Shiffer et al.

COLM 2025paperarXiv:2504.13203
#224

Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale

Bowen Jiang, Zhuoqun Hao, Young Min Cho et al.

COLM 2025paperarXiv:2504.14225
#225

Language models align with brain regions that represent concepts across modalities

Maria Ryskina, Greta Tuckute, Alexander Fung et al.

COLM 2025paperarXiv:2508.11536
#226

SUV: Scalable Large Language Model Copyright Compliance with Regularized Selective Unlearning

Tianyang Xu, Xiaoze Liu, Feijie Wu et al.

COLM 2025paperarXiv:2503.22948
#227

Can LLMs Handle WebShell Detection? Overcoming Detection Challenges with Behavioral Function-Aware Framework

Feijiang Han, Jiaming Zhang, Chuyi Deng et al.

COLM 2025paperarXiv:2504.13811
#228

LLM Unlearning Without an Expert Curated Dataset

Xiaoyuan Zhu, Muru Zhang, Ollie Liu et al.

COLM 2025paperarXiv:2508.06595
#229

Steering Large Language Model Activations in Sparse Spaces

Reza Bayat, Ali Rahimi-Kalahroudi, Mohammad Pezeshki et al.

COLM 2025paperarXiv:2503.00177
#230

Adaptive Computation Pruning for the Forgetting Transformer

Zhixuan Lin, Johan Obando-Ceron, Xu Owen He et al.

COLM 2025paperarXiv:2504.06949
#231

Pairwise or Pointwise? Evaluating Feedback Protocols for Bias in LLM-Based Evaluation

Tuhina Tripathi, Manya Wadhwa, Greg Durrett et al.

COLM 2025paperarXiv:2504.14716
#232

Estimating Optimal Context Length for Hybrid Retrieval-augmented Multi-document Summarization

Adithya Pratapa, Teruko Mitamura

COLM 2025paperarXiv:2504.12972
#233

Navigating the Rabbit Hole: Emergent Biases in LLM-Generated Attack Narratives Targeting Mental Health Groups

Rijul Magu, Arka Dutta, Sean Kim et al.

COLM 2025paperarXiv:2504.06160
#234

M²IV: Towards Efficient and Fine-grained Multimodal In-Context Learning via Representation Engineering

Yanshu Li, Yi Cao, Hongyang He et al.

COLM 2025paper
#235

BiXSE: Improving Dense Retrieval via Probabilistic Graded Relevance Distillation

Christos Tsirigotis, Vaibhav Adlakha, Joao Monteiro et al.

COLM 2025paperarXiv:2508.06781
#236

Stop-Think-AutoRegress: Language Modeling with Latent Diffusion Planning

Justin Lovelace, Christian K Belardi, Sofian Zalouk et al.

COLM 2025paper
#237

In-Context Occam’s Razor: How Transformers Prefer Simpler Hypotheses on the Fly

Puneesh Deora, Bhavya Vasudeva, Tina Behnia et al.

COLM 2025paper
#238

Reasoning Models Know When They’re Right: Probing Hidden States for Self-Verification

Anqi Zhang, Yulin Chen, Jane Pan et al.

COLM 2025paper
#239

The Negation Bias in Large Language Models: Investigating bias reflected in linguistic markers

Yishan Wang, Pia Sommerauer, Jelke Bloem

COLM 2025paper
#240

Language Agents Mirror Human Causal Reasoning Biases. How Can We Help Them Think Like Scientists?

Anthony GX-Chen, Dongyan Lin, Mandana Samiei et al.

COLM 2025paper
#241

Finding Flawed Fictions: Evaluating Complex Reasoning in Language Models via Plot Hole Detection

Kabir Ahuja, Melanie Sclar, Yulia Tsvetkov

COLM 2025paperarXiv:2504.11900
#242

Hell or High Water: Evaluating Agentic Recovery from External Failures

Andrew Wang, Sophia Hager, Adi Asija et al.

COLM 2025paperarXiv:2508.11027
#243

A Taxonomy of Transcendence

Natalie Abreu, Edwin Zhang, Eran Malach et al.

COLM 2025paperarXiv:2508.17669
#244

Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers

Shalev Lifshitz, Sheila A. McIlraith, Yilun Du

COLM 2025paperarXiv:2502.20379
#245

Retrieval-Augmented Generation with Conflicting Evidence

Han Wang, Archiki Prasad, Elias Stengel-Eskin et al.

COLM 2025paperarXiv:2504.13079
#246

Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models

Thao Nguyen, Yang Li, Olga Golovneva et al.

COLM 2025paperarXiv:2506.04689
#247

Impact of LLM Alignment on Impression Formation in Social Interactions

Ala N. Tak, Anahita Bolourani, Daniel B. Shank et al.

COLM 2025paper
#248

MixAssist: An Audio-Language Dataset for Co-Creative AI Assistance in Music Mixing

Michael Paul Clemens, Ana Marasovic

COLM 2025paperarXiv:2507.06329
#249

Breakpoint: Stress-testing systems-level reasoning in LLM agents

Kaivalya Hariharan, Uzay Girit, Zifan Wang et al.

COLM 2025paper
#250

Rhapsody: A Dataset for Highlight Detection in Podcasts

Younghan Park, Anuj Diwan, David Harwath et al.

COLM 2025paperarXiv:2505.19429
#251

M-Prometheus: A Suite of Open Multilingual LLM Judges

José Pombal, Dongkeun Yoon, Patrick Fernandes et al.

COLM 2025paperarXiv:2504.04953
#252

Task Vectors in In-Context Learning: Emergence, Formation, and Benefits

Liu Yang, Ziqian Lin, Kangwook Lee et al.

COLM 2025paperarXiv:2501.09240
#253

Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs

Dongyang Fan, Vinko Sabolčec, Matin Ansaripour et al.

COLM 2025paper
#254

Rethinking Associative Memory Mechanism in Induction Head

Shuo Wang, Issei Sato

COLM 2025paper
#255

Stuffed Mamba: Oversized States Lead to the Inability to Forget

Yingfa Chen, Xinrong Zhang, Shengding Hu et al.

COLM 2025paper
#256

Fluid Language Model Benchmarking

Valentin Hofmann, David Heineman, Ian Magnusson et al.

COLM 2025paperarXiv:2509.11106
#257

Data-Centric Human Preference with Rationales for Direct Preference Alignment

Hoang Anh Just, Ming Jin, Anit Kumar Sahu et al.

COLM 2025paperarXiv:2407.14477
#258

DynaSaur: Large Language Agents Beyond Predefined Actions

Dang Nguyen, Viet Dac Lai, Seunghyun Yoon et al.

COLM 2025paperarXiv:2411.01747
#259

PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages

Priyanshu Kumar, Devansh Jain, Akhila Yerukola et al.

COLM 2025paperarXiv:2504.04377
#260

Advancing Language Multi-Agent Learning with Credit Re-Assignment for Interactive Environment Generalization

Zhitao He, Zijun Liu, Peng Li et al.

COLM 2025paperarXiv:2502.14496
#261

Partial Perspectives: How LLMs Handle Logically Inconsistent Knowledge in Reasoning Tasks

Zichao Li, Ines Arous, Jackie CK Cheung

COLM 2025paper
#262

EvalAgents: Discovering Implicit Evaluation Criteria from the Web

Manya Wadhwa, Zayne Rea Sprague, Chaitanya Malaviya et al.

COLM 2025paperarXiv:2504.15219
#263

Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models

Meghana Arakkal Rajeev, Rajkumar Ramamurthy, Prapti Trivedi et al.

COLM 2025paperarXiv:2503.01781
#264

ALFA: Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning

Shuyue Stella Li, Jimin Mun, Faeze Brahman et al.

COLM 2025paperarXiv:2502.14860
#265

Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models

Ruikang Liu, Yuxuan Sun, Manyi Zhang et al.

COLM 2025paperarXiv:2504.04823
#266

CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation

Anirudh Khatry, Robert Zhang, Jia Pan et al.

COLM 2025paperarXiv:2504.15254
#267

On Mechanistic Circuits for Extractive Question-Answering

Samyadeep Basu, Vlad I Morariu, Ryan A. Rossi et al.

COLM 2025paperarXiv:2502.08059
#268

Boosting LLM Reasoning via Spontaneous Self-Correction

Xutong Zhao, Tengyu Xu, Xuewei Wang et al.

COLM 2025paperarXiv:2506.06923
#269

REM: Evaluating LLM Embodied Spatial Reasoning through Multi-Frame Trajectories

Jacob Thompson, Emiliano Garcia-Lopez, Yonatan Bisk

COLM 2025paperarXiv:2512.00736
#270

GenerationPrograms: Fine-grained Attribution with Executable Programs

David Wan, Eran Hirsch, Elias Stengel-Eskin et al.

COLM 2025paperarXiv:2506.14580
#271

Multi-Agent Retrieval-Augmented Framework for Evidence-Based Counterspeech Against Health Misinformation

Anirban Saha Anik, Xiaoying Song, Elliott Wang et al.

COLM 2025paperarXiv:2507.07307
#272

Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression

Hanqi Xiao, Yi-Lin Sung, Elias Stengel-Eskin et al.

COLM 2025paperarXiv:2504.07389
#273

Not All Data Are Unlearned Equally

Aravind Krishnan, Siva Reddy, Marius Mosbach

COLM 2025paperarXiv:2504.05058
#274

Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models

Lynn Chua, Badih Ghazi, Yangsibo Huang et al.

COLM 2025paperarXiv:2406.16135
#275

FineWeb2: One Pipeline to Scale Them All — Adapting Pre-Training Data Processing to Every Language

Guilherme Penedo, Hynek Kydlíček, Vinko Sabolčec et al.

COLM 2025paperarXiv:2506.20920
#276

Overcoming Vocabulary Constraints with Pixel-level Fallback

Jonas F. Lotz, Hendra Setiawan, Stephan Peitz et al.

COLM 2025paperarXiv:2504.02122
#277

Breaking the Data Barrier -- Building GUI Agents Through Task Generalization

Junlei Zhang, Zichen Ding, Chang Ma et al.

COLM 2025paperarXiv:2504.10127
#278

Spike No More: Stabilizing the Pre-training of Large Language Models

Sho Takase, Shun Kiyono, Sosuke Kobayashi et al.

COLM 2025paperarXiv:2312.16903
#279

CultureCLIP: Empowering CLIP with Cultural Awareness through Synthetic Images and Contextualized Captions

Yuchen Huang, Zhiyuan Fan, Zhitao He et al.

COLM 2025paperarXiv:2507.06210
#280

VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation

Ziang Ye, Yang Zhang, Wentao Shi et al.

COLM 2025paperarXiv:2507.06899
#281

Learning Effective Language Representations for Sequential Recommendation via Joint Embedding Predictive Architecture

Nguyen Anh Minh, Dung D. Le

COLM 2025paperarXiv:2504.10512
#282

Reinforcement Learning Enhanced Full-Duplex Spoken Dialogue Language Models for Conversational Interactions

Chen Chen, Ke Hu, Chao-Han Huck Yang et al.

COLM 2025paper
#283

When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars

Rei Higuchi, Ryotaro Kawata, Naoki Nishikawa et al.

COLM 2025paperarXiv:2504.17562
#284

Backdoor Attacks on Dense Retrieval via Public and Unintentional Triggers

Quanyu Long, Yue Deng, Leilei Gan et al.

COLM 2025paperarXiv:2402.13532
#285

Layers at Similar Depths Generate Similar Activations Across LLM Architectures

Christopher Wolfram, Aaron Schein

COLM 2025paperarXiv:2504.08775
#286

Jigsaw Puzzles: Splitting Harmful Questions to Jailbreak Large Language Models in Multi-turn Interactions

Hao Yang, Lizhen Qu, Ehsan Shareghi et al.

COLM 2025paper
#287

Rerouting LLM Routers

Avital Shafran, Roei Schuster, Tom Ristenpart et al.

COLM 2025paperarXiv:2501.01818
#288

Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration

Ran Xu, Wenqi Shi, Yuchen Zhuang et al.

COLM 2025paperarXiv:2504.04915
#289

CoLa: Learning to Interactively Collaborate with Large Language Models

Abhishek Sharma, Dan Goldwasser

COLM 2025paperarXiv:2504.02965
#290

Understanding R1-Zero-Like Training: A Critical Perspective

Zichen Liu, Changyu Chen, Wenjun Li et al.

COLM 2025paperarXiv:2503.20783
#291

SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation

Zichong Li, Chen Liang, Zixuan Zhang et al.

COLM 2025paperarXiv:2506.18349
#292

SEAM: Semantically Equivalent Across Modalities Benchmark for Vision-Language Models

Zhenwei Tang, Difan Jiao, Blair Yang et al.

COLM 2025paperarXiv:2508.18179
#293

VideoSAVi: Self-Aligned Video Language Models without Human Supervision

Yogesh Kulkarni, Pooyan Fazli

COLM 2025paperarXiv:2412.00624
#294

Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

Saaket Agashe, Kyle Wong, Vincent Tu et al.

COLM 2025paperarXiv:2504.00906
#295

How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence

Hongzhe Du, Weikai Li, Min Cai et al.

COLM 2025paperarXiv:2504.02904
#296

Implicit In-Context Learning: Evidence from Artificial Language Experiments

Xiaomeng Ma, Qihui Xu

COLM 2025paperarXiv:2503.24190
#297

To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning

Tian Qin, David Alvarez-Melis, Samy Jelassi et al.

COLM 2025paperarXiv:2504.07052
#298

Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF

Syrine Belakaria, Joshua Kazdan, Charles Marx et al.

COLM 2025paperarXiv:2503.22137
#299

SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching

Yuxuan Zhu, Ali Falahati, David H. Yang et al.

COLM 2025paperarXiv:2504.00970
#300

Defending LLM Watermarking Against Spoofing Attacks with Contrastive Representation Learning

Li An, Yujian Liu, Yepeng Liu et al.

COLM 2025paperarXiv:2504.06575
#301

Do Language Models Agree with Human Perceptions of Suspense in Stories?

Glenn Matlin, Devin Zhang, Rodrigo Barroso Loza et al.

COLM 2025paperarXiv:2508.15794
#302

Learning by Teaching: Engaging Students as Instructors of Large Language Models in Computer Science Education

Xinming Yang, Haasil Pujara, Jun Li

COLM 2025paperarXiv:2508.05979
#303

CALLME: Call Graph Augmentation with Large Language Models for Javascript

Michael Wang, Kexin Pei, Armando Solar-Lezama

COLM 2025paper
#304

CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing

Wenhao Zheng, Yixiao Chen, Weitong Zhang et al.

COLM 2025paperarXiv:2502.01976
#305

Approximating Language Model Training Data from Weights

John Xavier Morris, Junjie Oscar Yin, Woojeong Kim et al.

COLM 2025paperarXiv:2506.15553
#306

Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics

Hamed Mahdavi, Alireza Hashemi, Majid Daliri et al.

COLM 2025paperarXiv:2504.01995
#307

Hardware-Efficient Attention for Fast Decoding

Ted Zadouri, Hubert Strauss, Tri Dao

COLM 2025paperarXiv:2505.21487
#308

Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality

Sewoong Lee, Adam Davies, Marc E. Canby et al.

COLM 2025paperarXiv:2503.24277
#309

In-context Ranking Preference Optimization

Junda Wu, Rohan Surana, Zhouhang Xie et al.

COLM 2025paperarXiv:2504.15477
#310

Arctic-Embed 2.0: Multilingual Retrieval Without Compromise

Puxuan Yu, Luke Merrick, Gaurav Nuti et al.

COLM 2025paperarXiv:2412.04506
#311

Multilingual Contextualization of Large Language Models for Document-Level Machine Translation

Miguel Moura Ramos, Patrick Fernandes, Sweta Agrawal et al.

COLM 2025paperarXiv:2504.12140
#312

SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding

Mingze Xu, Mingfei Gao, Shiyu Li et al.

COLM 2025paperarXiv:2503.18943
#313

DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding

Hossein Entezari Zarch, Lei Gao, Chaoyi Jiang et al.

COLM 2025paperarXiv:2504.05598
#314

Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources

Weizhi Wang, Yu Tian, Linjie Yang et al.

COLM 2025paperarXiv:2504.00595
#315

CRABS: A syntactic-semantic pincer strategy for bounding LLM interpretation of Python notebooks

Meng Li, Timothy M. McPhillips, Dingmin Wang et al.

COLM 2025paperarXiv:2507.11742
#316

2 OLMo 2 Furious (COLM’s Version)

Evan Pete Walsh, Luca Soldaini, Dirk Groeneveld et al.

COLM 2025paper
#317

Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning

Anja Šurina, Amin Mansouri, Lars C.P.M. Quaedvlieg et al.

COLM 2025paperarXiv:2504.05108
#318

MS-SSM: A Multi-Scale State Space Model for Efficient Sequence Modeling

Mahdi Karami, Ali Behrouz, Peilin Zhong et al.

COLM 2025paperarXiv:2512.23824
#319

IterKey: Iterative Keyword Generation with LLMs for Enhanced Retrieval Augmented Generation

Kazuki Hayashi, Hidetaka Kamigaito, Shinya Kouda et al.

COLM 2025paperarXiv:2505.08450
#320

Evaluating the Diversity and Quality of LLM Generated Content

Alexander Shypula, Shuo Li, Botong Zhang et al.

COLM 2025paperarXiv:2504.12522
#321

QUDsim: Quantifying Discourse Similarities in LLM-Generated Text

Ramya Namuduri, Yating Wu, Anshun Asher Zheng et al.

COLM 2025paperarXiv:2504.09373
#322

A Critical Look At Tokenwise Reward-Guided Text Generation

Ahmad Rashid, Ruotian Wu, Julia Grosse et al.

COLM 2025paperarXiv:2406.07780
#323

Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation

Ziqiao Ma, Jing Ding, Xuejun Zhang et al.

COLM 2025paperarXiv:2504.16060
#324

URANIA: Differentially Private Insights into AI Use

Daogao Liu, Edith Cohen, Badih Ghazi et al.

COLM 2025paperarXiv:2506.04681
#325

RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing

Yiqing Xie, Alex Xie, Divyanshu Sheth et al.

COLM 2025paperarXiv:2503.07358
#326

Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors

Fan Nie, Lan Feng, Haotian Ye et al.

COLM 2025paperarXiv:2504.04785
#327

Guided Reasoning in LLM-Driven Penetration Testing Using Structured Attack Trees

Katsuaki Nakano, Reza Fayyazi, Shanchieh Yang et al.

COLM 2025paperarXiv:2509.07939
#328

AdaptMI: Adaptive Skill-based In-context Math Instructions for Small Language Models

Yinghui He, Abhishek Panigrahi, Yong Lin et al.

COLM 2025paperarXiv:2505.00147
#329

Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?

Chenrui Fan, Ming Li, Lichao Sun et al.

COLM 2025paperarXiv:2504.06514
#330

OpinioRAG: Towards Generating User-Centric Opinion Highlights from Large-scale Online Reviews

Mir Tafseer Nayeem, Davood Rafiei

COLM 2025paperarXiv:2509.00285
#331

Learning to Generate Unit Tests for Automated Debugging

Archiki Prasad, Elias Stengel-Eskin, Justin Chen et al.

COLM 2025paperarXiv:2502.01619
#332

LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks

Soumyadeep Pal, Changsheng Wang, James Diffenderfer et al.

COLM 2025paperarXiv:2504.10185
#333

Prompt-Reverse Inconsistency: LLM Self-Inconsistency Beyond Generative Randomness and Prompt Paraphrasing

Jihyun Janice Ahn, Wenpeng Yin

COLM 2025paperarXiv:2504.01282
#334

Positional Biases Shift as Inputs Approach Context Window Limits

Blerta Veseli, Julian Chibane, Mariya Toneva et al.

COLM 2025paperarXiv:2508.07479
#335

ADAPT: Actively Discovering and Adapting to Preferences for any Task

Maithili Patel, Xavier Puig, Ruta Desai et al.

COLM 2025paperarXiv:2504.04040
#336

Streaming DiLoCo with overlapping communication

Arthur Douillard, Yani Donchev, J Keith Rush et al.

COLM 2025paperarXiv:2501.18512
#337

MuSeD: A Multimodal Spanish Dataset for Sexism Detection in Social Media Videos

Laura De Grazia, Pol Pastells, Mauro Vázquez Chas et al.

COLM 2025paperarXiv:2504.11169
#338

EvidenceBench: A Benchmark for Extracting Evidence from Biomedical Papers

Jianyou Wang, Weili Cao, Kaicheng Wang et al.

COLM 2025paperarXiv:2504.18736
#339

An Illusion of Progress? Assessing the Current State of Web Agents

Tianci Xue, Weijian Qi, Tianneng Shi et al.

COLM 2025paperarXiv:2504.01382
#340

Boundless Byte Pair Encoding: Breaking the Pre-tokenization Barrier

Craig W Schmidt, Varshini Reddy, Chris Tanner et al.

COLM 2025paperarXiv:2504.00178
#341

Multi-Agent Systems Execute Arbitrary Malicious Code

Harold Triedman, Rishi Dev Jha, Vitaly Shmatikov

COLM 2025paperarXiv:2503.12188
#342

Privately Learning from Graphs with Applications in Fine-tuning Large Language Models

Haoteng Yin, Rongzhe Wei, Eli Chien et al.

COLM 2025paperarXiv:2410.08299
#343

Evaluating LLMs on Chinese Idiom Translation

Cai Yang, Yao Dou, David Heineman et al.

COLM 2025paperarXiv:2508.10421
#344

Rethinking Safety in LLM Fine-tuning: An Optimization Perspective

Minseon Kim, Jin Myung Kwak, Lama Alssum et al.

COLM 2025paperarXiv:2508.12531
#345

Do LLMs Understand Your Translations? Evaluating Paragraph-level MT with Question Answering

Patrick Fernandes, Sweta Agrawal, Emmanouil Zaranis et al.

COLM 2025paperarXiv:2504.07583
#346

Law of Vision Representation in MLLMs

Shijia Yang, Bohan Zhai, Quanzeng You et al.

COLM 2025paperarXiv:2408.16357
#347

Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts

Samin Yeasar Arnob, Zhan Su, Minseon Kim et al.

COLM 2025paperarXiv:2507.07140
#348

Towards Compute-Optimal Many-Shot In-Context Learning

Shahriar Golchin, Yanfei Chen, Rujun Han et al.

COLM 2025paperarXiv:2507.16217
#349

LeakAgent: RL-based Red-teaming Agent for LLM Privacy Leakage

Yuzhou Nie, Zhun Wang, Ye Yu et al.

COLM 2025paperarXiv:2412.05734
#350

ReasonIR: Training Retrievers for Reasoning Tasks

Rulin Shao, Rui Qiao, Varsha Kishore et al.

COLM 2025paperarXiv:2504.20595
#351

Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation

Julia Kreutzer, Eleftheria Briakou, Sweta Agrawal et al.

COLM 2025paperarXiv:2504.11829
#352

Self-Rewarding PPO: Aligning Large Language Models with Demonstrations Only

Qingru Zhang, Liang Qiu, Ilgee Hong et al.

COLM 2025paperarXiv:2510.21090
#353

Self-Steering Language Models

Gabriel Grand, Joshua B. Tenenbaum, Vikash Mansinghka et al.

COLM 2025paperarXiv:2504.07081
#354

The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains

Scott Geng, Hamish Ivison, Chun-Liang Li et al.

COLM 2025paperarXiv:2507.06187
#355

ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data

Tong Chen, Faeze Brahman, Jiacheng Liu et al.

COLM 2025paperarXiv:2504.14452
#356

Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate

Yubo Wang, Xiang Yue, Wenhu Chen

COLM 2025paperarXiv:2501.17703
#357

Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving

Yong Lin, Shange Tang, Bohan Lyu et al.

COLM 2025paperarXiv:2502.07640
#358

Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling

Ben Lipkin, Benjamin LeBrun, Jacob Hoover Vigly et al.

COLM 2025paperarXiv:2504.05410
#359

On the Effectiveness and Generalization of Race Representations for Debiasing High-Stakes Decisions

Dang Nguyen, Chenhao Tan

COLM 2025paperarXiv:2504.06303
#360

Quantifying Fairness in LLMs Beyond Tokens: A Semantic and Statistical Perspective

Weijie Xu, Yiwen Wang, Chi Xue et al.

COLM 2025paperarXiv:2506.19028
#361

Improving Fisher Information Estimation and Efficiency for LoRA-based LLM Unlearning

Yejin Kim, Eunwon Kim, Buru Chang et al.

COLM 2025paperarXiv:2508.21300
#362

Multi-Token Attention

Olga Golovneva, Tianlu Wang, Jason E Weston et al.

COLM 2025paperarXiv:2504.00927
#363

From Queries to Criteria: Understanding How Astronomers Evaluate LLMs

Alina Hyk, Kiera McCormick, Mian Zhong et al.

COLM 2025paperarXiv:2507.15715
#364

Analyzing Multilingualism in Large Language Models with Sparse Autoencoders

Ikhyun Cho, Julia Hockenmaier

COLM 2025paper
#365

CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation

Jixuan Leng, Chengsong Huang, Langlin Huang et al.

COLM 2025paperarXiv:2504.00043
#366

Unifying Autoregressive and Diffusion-Based Sequence Generation

Nima Fathi, Torsten Scholak, Pierre-Andre Noel

COLM 2025paperarXiv:2504.06416
#367

Control the Temperature: Selective Sampling for Diverse and High-Quality LLM Outputs

Sergey Troshin, Wafaa Mohammed, Yan Meng et al.

COLM 2025paperarXiv:2510.01218
#368

UTF-8 Plumbing: Byte-level Tokenizers Unavoidably Enable LLMs to Generate Ill-formed UTF-8

Preston Firestone, Shubham Ugare, Gagandeep Singh et al.

COLM 2025paperarXiv:2511.05578
#369

RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale

Daniel Goldstein, Eric Alcaide, Janna Lu et al.

COLM 2025paperarXiv:2505.03005
#370

Learning Adaptive Parallel Reasoning with Language Models

Jiayi Pan, Xiuyu Li, Long Lian et al.

COLM 2025paperarXiv:2504.15466
#371

Why do LLMs attend to the first token?

Federico Barbero, Alvaro Arroyo, Xiangming Gu et al.

COLM 2025paperarXiv:2504.02732
#372

Overfill: Two-Stage Models for Efficient Language Model Decoding

Woojeong Kim, Junxiong Wang, Jing Nathan Yan et al.

COLM 2025paperarXiv:2508.08446
#373

CLIPPER: Compression enables long-context synthetic data generation

Chau Minh Pham, Yapei Chang, Mohit Iyyer

COLM 2025paperarXiv:2502.14854
#374

Society of Mind Meets Real-Time Strategy: A Hierarchical Multi-Agent Framework for Strategic Reasoning

Daechul Ahn, San Kim, Jonghyun Choi

COLM 2025paperarXiv:2508.06042
#375

A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility

Andreas Hochlehnert, Hardik Bhatnagar, Vishaal Udandarao et al.

COLM 2025paperarXiv:2504.07086
#376

SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?

Jianzhu Yao, Kevin Wang, Ryan Hsieh et al.

COLM 2025paperarXiv:2503.12349
#377

Teach Old SAEs New Domain Tricks with Boosting

Nikita Koriagin, Yaroslav Aksenov, Daniil Laptev et al.

COLM 2025paperarXiv:2507.12990
#378

Improving LLMs‘ Generalized Reasoning Abilities by Graph Problems

Qifan Zhang, Nuo Chen, Zehua Li et al.

COLM 2025paperarXiv:2507.17168
#379

Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models

José Pombal, Nuno M Guerreiro, Ricardo Rei et al.

COLM 2025paperarXiv:2504.01001
#380

Register Always Matters: Analysis of LLM Pretraining Data Through the Lens of Language Variation

Amanda Myntti, Erik Henriksson, Veronika Laippala et al.

COLM 2025paperarXiv:2504.01542
#381

Scoring Verifiers: Evaluating Synthetic Verification for Code and Reasoning

Aleksander Ficek, Somshubra Majumdar, Vahid Noroozi et al.

COLM 2025paperarXiv:2502.13820
#382

Cascade Reward Sampling for Efficient Decoding-Time Alignment

Bolian Li, Yifan Wang, Anamika Lochab et al.

COLM 2025paperarXiv:2406.16306
#383

Reverse-engineering NLI: A study of the meta-inferential properties of Natural Language Inference

Rasmus Blanck, Bill Noble, Stergios Chatzikyriakidis

COLM 2025paperarXiv:2601.05170
#384

Adversarial Training of Reward Models

Alexander Bukharin, Haifeng Qian, Shengyang Sun et al.

COLM 2025paperarXiv:2504.06141
#385

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs

Itay Itzhak, Yonatan Belinkov, Gabriel Stanovsky

COLM 2025paperarXiv:2507.07186
#386

ALOPE: Adaptive Layer Optimization for Translation Quality Estimation using Large Language Models

Archchana Sindhujan, Shenbin Qian, Chan Chi Chun Matthew et al.

COLM 2025paperarXiv:2508.07484
#387

The Blessing and Curse of Dimensionality in Safety Alignment

Rachel S.Y. Teo, Laziz Abdullaev, Tan Minh Nguyen

COLM 2025paperarXiv:2507.20333
#388

AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation

Itay Nakash, Nitay Calderon, Eyal Ben-David et al.

COLM 2025paperarXiv:2503.19693
#389

Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling

Hang Zheng, Hongshen Xu, Yuncong Liu et al.

COLM 2025paperarXiv:2503.02233
#390

Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation

Yi Lu, Wanxu Zhao, Xin Zhou et al.

COLM 2025paperarXiv:2504.18857
#391

A Controlled Study on Long Context Extension and Generalization in LLMs

Yi Lu, Jing Nathan Yan, Songlin Yang et al.

COLM 2025paperarXiv:2409.12181
#392

Efficient Process Reward Model Training via Active Learning

Keyu Duan, Zichen Liu, Xin Mao et al.

COLM 2025paperarXiv:2504.10559
#393

FormaRL: Enhancing Autoformalization with no Labeled Data

Yanxing Huang, Xinling Jin, Sijie Liang et al.

COLM 2025paperarXiv:2508.18914
#394

ICQuant: Index Coding enables Low-bit LLM Quantization

Xinlin Li, Osama Hanna, Christina Fragouli et al.

COLM 2025paperarXiv:2505.00850
#395

MAC: A Live Benchmark for Multimodal Large Language Models in Scientific Understanding

Mohan Jiang, Jin Gao, Jiahao Zhan et al.

COLM 2025paperarXiv:2508.15802
#396

Interpreting the linear structure of vision-language model embedding spaces

Isabel Papadimitriou, Huangyuan Su, Thomas Fel et al.

COLM 2025paperarXiv:2504.11695
#397

LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models

Minqian Liu, Zhiyang Xu, Xinyi Zhang et al.

COLM 2025paperarXiv:2504.10430
#398

Establishing Task Scaling Laws via Compute-Efficient Model Ladders

Akshita Bhagia, Jiacheng Liu, Alexander Wettig et al.

COLM 2025paperarXiv:2412.04403
#399

Can A Society of Generative Agents Simulate Human Behavior and Inform Public Health Policy? A Case Study on Vaccine Hesitancy

Abe Bohan Hou, Hongru Du, Yichen Wang et al.

COLM 2025paperarXiv:2503.09639
#400

RARe: Retrieval Augmented Retrieval with In-Context Examples

Atula Tejaswi, Yoonsang Lee, sujay sanghavi et al.

COLM 2025paperarXiv:2410.20088