Most Cited COLM "hierarchical memory structure" Papers
418 papers found • Page 2 of 3
Conference
Teach Old SAEs New Domain Tricks with Boosting
Nikita Koriagin, Yaroslav Aksenov, Daniil Laptev et al.
SEAM: Semantically Equivalent Across Modalities Benchmark for Vision-Language Models
Zhenwei Tang, Difan Jiao, Blair Yang et al.
The Negation Bias in Large Language Models: Investigating bias reflected in linguistic markers
Yishan Wang, Pia Sommerauer, Jelke Bloem
Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts
Samin Yeasar Arnob, Zhan Su, Minseon Kim et al.
UTF-8 Plumbing: Byte-level Tokenizers Unavoidably Enable LLMs to Generate Ill-formed UTF-8
Preston Firestone, Shubham Ugare, Gagandeep Singh et al.
Can Test-Time Scaling Improve World Foundation Model?
Wenyan Cong, Hanqing Zhu, Peihao Wang et al.
VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information
Ryo Kamoi, Yusen Zhang, Sarkar Snigdha Sarathi Das et al.
Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models
Hyunwoo Kim, Melanie Sclar, Tan Zhi-Xuan et al.
DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation
Jingyang Xiang, Sai Qian Zhang
SmolVLM: Redefining small and efficient multimodal models
Andrés Marafioti, Orr Zohar, Miquel Farré et al.
Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach
Shijian Deng, Wentian Zhao, Yu-Jhe Li et al.
KVSink: Understanding and Enhancing the Preservation of Attention Sinks in KV Cache Quantization for LLMs
Zunhai Su, Kehong Yuan
Assessing Judging Bias in Large Reasoning Models: An Empirical Study
Qian Wang, Zhanzhi Lou, Zhenheng Tang et al.
PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction
Shufan Li, Aditya Grover
Readability ≠ Learnability: Rethinking the Role of Simplicity in Training Small Language Models
Ivan Lee, Taylor Berg-Kirkpatrick
QAPyramid: Fine-grained Evaluation of Content Selection for Text Summarization
Shiyue Zhang, David Wan, Arie Cattan et al.
Teaching Models to Understand (but not Generate) High-risk Data
Ryan Yixiang Wang, Matthew Finlayson, Luca Soldaini et al.
HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Interactive AI Agents
Xuhui Zhou, Hyunwoo Kim, Faeze Brahman et al.
LM Agents May Fail to Act on Their Own Risk Knowledge
Yuzhi Tang, Tianxiao Li, Elizabeth Li et al.
Beyond Blanket Masking: Examining Granularity for Privacy Protection in Images Captured by Blind and Low Vision Users
Jeffri Murrugarra-Llerena, Haoran Niu, K. Suzanne Barber et al.
When Splitting Makes Stronger: A Theoretical and Empirical Analysis of Divide-and-Conquer Prompting in LLMs
Yizhou Zhang, Defu Cao, Lun Du et al.
CodeXEmbed: A Generalist Embedding Model Family for Multilingual and Multi-task Code Retrieval
Ye Liu, Rui Meng, Shafiq Joty et al.
HIPPO-VIDEO : Simulating Watch Histories with Large Language Models for History-Driven Video Highlighting
Jeongeun Lee, Youngjae Yu, Dongha Lee
The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning
Raj Sanjay Shah, Jing Huang, Keerthiram Murugesan et al.
G1yphD3c0de: Towards Safer Language Models on Visually Perturbed Texts
Yejinchoi, Yejin Yeo, Yejin Son et al.
NoveltyBench: Evaluating Language Models for Humanlike Diversity
Yiming Zhang, Harshita Diddee, Susan Holm et al.
RRO: LLM Agent Optimization Through Rising Reward Trajectories
Zilong Wang, Jingfeng Yang, Sreyashi Nag et al.
Don’t lie to your friends: Learning what you know from collaborative self-play
Jacob Eisenstein, Reza Aghajani, Adam Fisch et al.
How Multimodal LLMs Solve Image Tasks: A Lens on Visual Grounding, Task Reasoning, and Answer Decoding
Zhuoran Yu, Yong Jae Lee
Tulu 3: Pushing Frontiers in Open Language Model Post-Training
Nathan Lambert, Jacob Morrison, Valentina Pyatkin et al.
Customize Multi-modal RAI Guardrails with Precedent-based predictions
Cheng-Fu Yang, Thanh Tran, Christos Christodoulopoulos et al.
Can Large Language Models Integrate Spatial Data? Empirical Insights into Reasoning Strengths and Computational Weaknesses
Bin HAN, Robert Wolfe, Anat Caspi et al.
SmolLM2: When Smol Goes Big — Data-Centric Training of a Fully Open Small Language Model
Loubna Ben allal, Anton Lozhkov, Elie Bakouch et al.
More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment
Yifan Wang, Runjin Chen, Bolian Li et al.
AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset
Bingxiang He, Wenbin Zhang, Jiaxi Song et al.
Self-Evolving Critique Abilities in Large Language Models
Zhengyang Tang, Ziniu Li, Zhenyang Xiao et al.
VaPR - Vision-language Preference alignment for Reasoning
Rohan Wadhawan, Fabrice Y Harel-Canada, Zi-Yi Dou et al.
FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios
Ethan Chern, Steffi Chern, Shiqi Chen et al.
Phased Training for LLM-powered Text Retrieval Models Beyond Data Scaling
Xin Zhang, Yanzhao Zhang, Wen Xie et al.
EnrichIndex: Using LLMs to Enrich Retrieval Indices Offline
Peter Baile Chen, Tomer Wolfson, Mike Cafarella et al.
Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory
Liangyu Wang, Jie Ren, Hang Xu et al.
Deep Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions
Minwoo Kang, Suhong Moon, Seung Hyeong Lee et al.
Correctness-Guaranteed Code Generation via Constrained Decoding
Lingxiao Li, salar rahili, Yiwei Zhao
DualEdit: Dual Editing for Knowledge Updating in Vision-Language Models
Zhiyi Shi, Binjie Wang, Chongjie Si et al.
LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation
Xi Ye, Fangcong Yin, Yinghui He et al.
Out-of-Distribution Detection using Synthetic Data Generation
Momin Abbas, Muneeza Azmat, Raya Horesh et al.
Ensemble Debiasing Across Class and Sample Levels for Fairer Prompting Accuracy
Ruixi Lin, Ziqiao Wang, Yang You
R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents
Naman Jain, Jaskirat Singh, Manish Shetty et al.
C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing
Zhongyang Li, Ziyue Li, Tianyi Zhou
Base Models Beat Aligned Models at Randomness and Creativity
Peter West, Christopher Potts
SEAL: Steerable Reasoning Calibration of Large Language Models for Free
Runjin Chen, Zhenyu Zhang, Junyuan Hong et al.
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Weihao Zeng, Yuzhen Huang, Qian Liu et al.
SpectR: Dynamically Composing LM Experts with Spectral Routing
William Fleshman, Benjamin Van Durme
The Devil is in the EOS: Sequence Training for Detailed Image Captioning
Abdelrahman Mohamed, Yova Kementchedjhieva
LLMs as Research Tools: A Large Scale Survey of Researchers’ Usage and Perceptions
Zhehui Liao, Maria Antoniak, Inyoung Cheong et al.
Language Model Uncertainty Quantification with Attention Chain
Yinghao Li, Rushi Qiang, Lama Moukheiber et al.
Overflow Prevention Enhances Long-Context Recurrent LLMs
Assaf Ben-Kish, Itamar Zimerman, Muhammad Jehanzeb Mirza et al.
E$^2$-RAG: Towards Editable Efficient RAG by Editing Compressed KV Caches
Tongxu Luo, Wenyu Du, HanWen Hao et al.
NoWag: A Unified Framework for Shape Preserving Com- pression of Large Language Models
Lawrence Ray Liu, Inesh Chakrabarti, Yixiao Li et al.
Evaluating Large Language Models as Expert Annotators
Yu-Min Tseng, Wei-Lin Chen, Chung-Chi Chen et al.
Yourbench: Dynamic Evaluation Set Generation with LLMs
Sumuk Shashidhar, Clémentine Fourrier, Alina Lozovskaya et al.
LawFlow: Collecting and Simulating Lawyers’ Thought Processes on Business Formation Case Studies
Debarati Das, Khanh Chi Le, Ritik Sachin Parkar et al.
Traceable and Explainable Multimodal Large Language Models: An Information-Theoretic View
Zihan Huang, Junda Wu, Rohan Surana et al.
Understanding and Improving Noisy Embedding Techniques in Instruction Finetuning
Abhay Yadav
REFA: Reference Free Alignment with Fine-Grained Length Control
Taneesh Gupta, Rahul Madhavan, Xuchao Zhang et al.
Synthetic Data Generation and Multi-Step Reinforcement Learning for Reasoning and Tool Use
Anna Goldie, Azalia Mirhoseini, Hao Zhou et al.
MSRS: Evaluating Multi-Source Retrieval-Augmented Generation
Rohan Phanse, Ej Zhou, Kejian Shi et al.
Estimating Optimal Context Length for Hybrid Retrieval-augmented Multi-document Summarization
Adithya Pratapa, Teruko Mitamura
Stop-Think-AutoRegress: Language Modeling with Latent Diffusion Planning
Justin Lovelace, Christian K Belardi, Sofian Zalouk et al.
Reasoning Models Know When They’re Right: Probing Hidden States for Self-Verification
Anqi Zhang, Yulin Chen, Jane Pan et al.
Hell or High Water: Evaluating Agentic Recovery from External Failures
Andrew Wang, Sophia Hager, Adi Asija et al.
Impact of LLM Alignment on Impression Formation in Social Interactions
Ala N. Tak, Anahita Bolourani, Daniel B. Shank et al.
Breakpoint: Stress-testing systems-level reasoning in LLM agents
Kaivalya Hariharan, Uzay Girit, Zifan Wang et al.
Rhapsody: A Dataset for Highlight Detection in Podcasts
Younghan Park, Anuj Diwan, David Harwath et al.
Rethinking Associative Memory Mechanism in Induction Head
Shuo Wang, Issei Sato
Overfill: Two-Stage Models for Efficient Language Model Decoding
Woojeong Kim, Junxiong Wang, Jing Nathan Yan et al.
Partial Perspectives: How LLMs Handle Logically Inconsistent Knowledge in Reasoning Tasks
Zichao Li, Ines Arous, Jackie CK Cheung
On Mechanistic Circuits for Extractive Question-Answering
Samyadeep Basu, Vlad I Morariu, Ryan A. Rossi et al.
REM: Evaluating LLM Embodied Spatial Reasoning through Multi-Frame Trajectories
Jacob Thompson, Emiliano Garcia-Lopez, Yonatan Bisk
GenerationPrograms: Fine-grained Attribution with Executable Programs
David Wan, Eran Hirsch, Elias Stengel-Eskin et al.
Reinforcement Learning Enhanced Full-Duplex Spoken Dialogue Language Models for Conversational Interactions
Chen Chen, Ke Hu, Chao-Han Huck Yang et al.
CoLa: Learning to Interactively Collaborate with Large Language Models
Abhishek Sharma, Dan Goldwasser
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation
Zichong Li, Chen Liang, Zixuan Zhang et al.
Do Language Models Agree with Human Perceptions of Suspense in Stories?
Glenn Matlin, Devin Zhang, Rodrigo Barroso Loza et al.
Learning by Teaching: Engaging Students as Instructors of Large Language Models in Computer Science Education
Xinming Yang, Haasil Pujara, Jun Li
CALLME: Call Graph Augmentation with Large Language Models for Javascript
Michael Wang, Kexin Pei, Armando Solar-Lezama
CRABS: A syntactic-semantic pincer strategy for bounding LLM interpretation of Python notebooks
Meng Li, Timothy M. McPhillips, Dingmin Wang et al.
2 OLMo 2 Furious (COLM’s Version)
Evan Pete Walsh, Luca Soldaini, Dirk Groeneveld et al.
EvidenceBench: A Benchmark for Extracting Evidence from Biomedical Papers
Jianyou Wang, Weili Cao, Kaicheng Wang et al.
Evaluating LLMs on Chinese Idiom Translation
Cai Yang, Yao Dou, David Heineman et al.
Towards Compute-Optimal Many-Shot In-Context Learning
Shahriar Golchin, Yanfei Chen, Rujun Han et al.
Analyzing Multilingualism in Large Language Models with Sparse Autoencoders
Ikhyun Cho, Julia Hockenmaier
Society of Mind Meets Real-Time Strategy: A Hierarchical Multi-Agent Framework for Strategic Reasoning
Daechul Ahn, San Kim, Jonghyun Choi
Improving LLMs‘ Generalized Reasoning Abilities by Graph Problems
Qifan Zhang, Nuo Chen, Zehua Li et al.
Reverse-engineering NLI: A study of the meta-inferential properties of Natural Language Inference
Rasmus Blanck, Bill Noble, Stergios Chatzikyriakidis
Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation
Yi Lu, Wanxu Zhao, Xin Zhou et al.
MapIQ: Evaluating Multimodal Large Language Models for Map Question Answering
Varun Srivastava, Fan Lei, Srija Mukhopadhyay et al.
Transformers are Efficient Compilers, Provably
Xiyu Zhai, Runlong Zhou, Liao Zhang et al.
Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation
Lajanugen Logeswaran, Jaekyeom Kim, Sungryull Sohn et al.
Pretrained Hybrids with MAD Skills
Nicholas Roberts, Samuel Guo, Zhiqi Gao et al.
Benchmarking Retrieval-Augmented Generation for Chemistry
Xianrui Zhong, Bowen Jin, Siru Ouyang et al.
Multilingual and Multi-Accent Jailbreaking of Audio LLMs
Jaechul Roh, Virat Shejwalkar, Amir Houmansadr
UNVEILING: What Makes Linguistics Olympiad Puzzles Tricky for LLMs?
Mukund Choudhary, KV Aditya Srivatsa, Gaurja Aeron et al.
Can LLM "Self-report"?: Evaluating the Validity of Self-report Scales in Measuring Personality Design in LLM-based Chatbots
Huiqi Zou, Pengda Wang, Zihan Yan et al.
Exposing and Patching the Flaws of Large Language Models in Social Character Simulation
Yue Huang, Zhengqing Yuan, Yujun Zhou et al.
Rank1: Test-Time Compute for Reranking in Information Retrieval
Orion Weller, Kathryn Ricci, Eugene Yang et al.
The Dual-Route Model of Induction
Sheridan Feucht, Eric Todd, Byron C Wallace et al.
Hidden in plain sight: VLMs overlook their visual representations
Stephanie Fu, tyler bonnen, Devin Guillory et al.
Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base
Linxin Song, Xuwei Ding, Jieyu Zhang et al.
SAEs Can Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs
Aashiq Muhamed, Jacopo Bonato, Mona T. Diab et al.
Energy-Based Reward Models for Robust Language Model Alignment
Anamika Lochab, Ruqi Zhang
AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time computation
Tuhin Chakrabarty, Philippe Laban, Chien-Sheng Wu
Mitigating Modal Imbalance in Multimodal Reasoning
Chen Henry Wu, Neil Kale, Aditi Raghunathan
(Im)possibility of Automated Hallucination Detection in Large Language Models
Amin Karbasi, Omar Montasser, John Sous et al.
Single-Pass Document Scanning for Question Answering
Weili Cao, Jianyou Wang, Youze Zheng et al.
Knowledge Graph Retrieval-Augmented Generation via GNN-Guided Prompting
Haochen Liu, Song Wang, Jundong Li
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Bowen Jin, Hansi Zeng, Zhenrui Yue et al.
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Xing Han Lù, Amirhossein Kazemnejad, Nicholas Meade et al.
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees
Zhiyuan Zeng, Yizhong Wang, Hannaneh Hajishirzi et al.
ThoughtTerminator: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models
Xiao Pu, Michael Saxon, Wenyue Hua et al.
Scaling Analysis of Interleaved Speech-Text Language Models
Gallil Maimon, Michael Hassid, Amit Roth et al.
Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers
Kusha Sareen, Morgane M Moss, Alessandro Sordoni et al.
Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models
Neel Jain, Aditya Shrivastava, Chenyang Zhu et al.
Language Model Personalization via Reward Factorization
Idan Shenfeld, Felix Faltings, Pulkit Agrawal et al.
Resona: Improving Context Copying in Linear Recurrence Models with Retrieval
Xinyu Wang, Linrui Ma, Jerry Huang et al.
Model-Agnostic Policy Explanations with Large Language Models
Zhang Xi-Jia, Yue Guo, Shufei Chen et al.
What is the Visual Cognition Gap between Humans and Multimodal LLMs?
Xu Cao, Yifan Shen, Bolin Lai et al.
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
Pranjal Aggarwal, Sean Welleck
Elucidating the Design Space of Decay in Linear Attention
Zhen Qin, Xuyang Shen, Yiran Zhong
Noiser: Bounded Input Perturbations for Attributing Large Language Models
Mohammad Reza Ghasemi Madani, Aryo Pradipta Gema, Yu Zhao et al.
LongCodeBench: Evaluating Coding LLMs at 1M Context Windows
Stefano Rando, Luca Romani, Alessio Sampieri et al.
Agree to Disagree? A Meta-Evaluation of LLM Misgendering
Arjun Subramonian, Vagrant Gautam, Preethi Seshadri et al.
MALT: Improving Reasoning with Multi-Agent LLM Training
Sumeet Ramesh Motwani, Chandler Smith, Rocktim Jyoti Das et al.
BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity
Chenyang Song, Weilin Zhao, Xu Han et al.
Adaptive Layer-skipping in Pre-trained LLMs
Xuan Luo, Weizhi Wang, Xifeng Yan
Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse Reinforcement Learning
Jared Joselowitz, Ritam Majumdar, Arjun Jagota et al.
LLMs Are In-Context Bandit Reinforcement Learners
Giovanni Monea, Antoine Bosselut, Kianté Brantley et al.
Rethinking Multilingual Continual Pretraining: Data Mixing for Adapting LLMs Across Languages and Resources
Zihao Li, Shaoxiong Ji, Hengyu Luo et al.
Scaling Laws of Synthetic Data for Language Model
Zeyu Qin, Qingxiu Dong, Xingxing Zhang et al.
HyperINF: Unleashing the HyperPower of Schulz's Method for Data Influence Estimation
Xinyu Zhou, Simin Fan, Martin Jaggi
Contextualize-then-Aggregate: Circuits for In-Context Learning in Gemma-2 2B
Aleksandra Bakalova, Yana Veitsman, Xinting Huang et al.
CONCAP: Seeing Beyond English with Concepts Retrieval-Augmented Captioning
George Ibrahim, Rita Ramos, Yova Kementchedjhieva
AIOS: LLM Agent Operating System
Kai Mei, Xi Zhu, Wujiang Xu et al.
LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning
Gabriel Jacob Perin, Runjin Chen, Xuxi Chen et al.
Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact Verifiers
Wooseok Seo, Seungju Han, Jaehun Jung et al.
Towards User-level Private Reinforcement Learning with Human Feedback
Jiaming Zhang, Mingxi Lei, Meng Ding et al.
MeMAD: Structured Memory of Debates for Enhanced Multi-Agent Reasoning
Shuai Ling, Lizi Liao, Dongmei Jiang et al.
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning
Zhehao Zhang, Weijie Xu, Fanyou Wu et al.
SuperBPE: Space Travel for Language Models
Alisa Liu, Jonathan Hayase, Valentin Hofmann et al.
MegaMath: Pushing the Limits of Open Math Corpora
Fan Zhou, Zengzhi Wang, Nikhil Ranjan et al.
SecurityLingua: Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression
Yucheng Li, Surin Ahn, Huiqiang Jiang et al.
$\mu$KE: Matryoshka Unstructured Knowledge Editing of Large Language Models
Zian Su, Ziyang Huang, Kaiyuan Zhang et al.
Text Speaks Louder than Vision: ASCII Art Reveals Textual Biases in Vision-Language Models
Zhaochen Wang, Bryan Hooi, Yiwei Wang et al.
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
Kanishk Gandhi, Ayush K Chakravarthy, Anikait Singh et al.
Hawkeye: Model Collaboration for Efficient Reasoning
Jianshu She, Zhuohao Li, Zhemin Huang et al.
Building Instruction-Tuning Datasets from Human-Written Instructions with Open-Weight Large Language Models
Youmi Ma, Sakae Mizuki, Kazuki Fujii et al.
Impact-driven Context Filtering For Cross-file Code Completion
Yanzhou Li, Shangqing Liu, Kangjie Chen et al.
JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
Yi Nian, Shenzhe Zhu, Yuehan Qin et al.
IMPersona: Evaluating Individual Level LLM Impersonation
Quan Shi, Carlos E Jimenez, Stephen Dong et al.
ProsodyLM: Uncovering the Emerging Prosody Processing Capabilities in Speech Language Models
Kaizhi Qian, Xulin Fan, Junrui Ni et al.
Bootstrapping Visual Assistant Modeling with Situated Interaction Simulation
Yichi Zhang, Run Peng, Yinpei Dai et al.
Context-Adaptive Multi-Prompt Embedding with Large Language Models for Vision-Language Alignment
Dahun Kim, Anelia Angelova
Understanding Layer Significance in LLM Alignment
Guangyuan SHI, ZEXIN LU, Xiaoyu DONG et al.
SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models
Arijit Ray, Jiafei Duan, Ellis L Brown II et al.
DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning
Pengcheng Jiang, Jiacheng Lin, Lang Cao et al.
Language Models Fail to Introspect About Their Knowledge of Language
Siyuan Song, Jennifer Hu, Kyle Mahowald
SQuat: Subspace-orthogonal KV Cache Quantization
Hao Wang, Ligong Han, Kai Xu et al.
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation
Shiven Sinha, Shashwat Goel, Ponnurangam Kumaraguru et al.
Plato: Plan to Efficient Decode for Large Language Model Inference
Shuowei Jin, Xueshen Liu, Yongji Wu et al.
StagFormer: Time Staggering Decoder only Transformers
Dylan J Cutler, Arun Kandoor, Nishanth Dikkala et al.
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Deepak Nathani, Lovish Madaan, Nicholas Roberts et al.
Limitations of refinement methods for weak to strong generalization
Seamus Somerstep, Yaacov Ritov, Mikhail Yurochkin et al.
How do language models learn facts? Dynamics, curricula and hallucinations
Nicolas Zucchet, Jorg Bornschein, Stephanie C.Y. Chan et al.
Improving Table Understanding with LLMs and Entity-Oriented Search
Thi-Nhung Nguyen, Hoang Ngo, Dinh Phung et al.
ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations
Yubo Wang, Xueguang Ma, Ping Nie et al.
Short-PHD: Detecting Short LLM-generated Text with Topological Data Analysis After Off-topic Content Insertion
Dongjun Wei, Minjia Mao, Xiao Fang et al.
Truth-value judgment in language models: ‘truth directions’ are context sensitive
Stefan F. Schouten, Peter Bloem, Ilia Markov et al.
Cutting the Root of Hallucination: Structural Trimming for Vulnerability Mitigation in Code LLMs
Yage Zhang
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Bo Peng, Ruichong Zhang, Daniel Goldstein et al.
Imagine All The Relevance: Scenario-Profiled Indexing with Knowledge Expansion for Dense Retrieval
Sangam Lee, Ryang Heo, SeongKu Kang et al.
You Cannot Feed Two Birds with One Score: the Accuracy-Naturalness Tradeoff in Translation
Gergely Flamich, David Vilar, Jan-Thorsten Peter et al.
Mixture of Attention Spans: Optimizing LLM Inference Efficiency with Heterogeneous Sliding-Window Lengths
Tianyu Fu, Haofeng Huang, Xuefei Ning et al.
Understanding the Uncertainty of LLM Explanations: A Perspective Based on Reasoning Topology
Longchao Da, Xiaoou Liu, Jiaxin Dai et al.
How does Watermarking Affect Visual Language Models in Document Understanding?
Chunxue Xu, Yiwei Wang, Bryan Hooi et al.
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
Chengqi Lyu, Songyang Gao, Yuzhe Gu et al.
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning
Nishad Singhi, Hritik Bansal, Arian Hosseini et al.
Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning Code LLMs
Zichao Hu, Junyi Jessy Li, Arjun Guha et al.
CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis
Anjiang Wei, Tarun Suresh, Jiannan Cao et al.
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
Yingcong Li, Davoud Ataee Tarzanagh, Ankit Singh Rawat et al.
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
Apoorv Khandelwal, Tian Yun, Nihal V. Nayak et al.
Shared Global and Local Geometry of Language Model Embeddings
Andrew Lee, Melanie Weber, Fernanda Viégas et al.
D3: A Dataset for Training Code LMs to Act Diff-by-Diff
Ulyana Piterbarg, Kanishk Gandhi, Lerrel Pinto et al.
LLM-based Multi-Agents System Attack via Continuous Optimization with Discrete Efficient Search
Weichen Yu, Kai Hu, Tianyu Pang et al.
Do Biased Models Have Biased Thoughts?
Swati Rajwal, Shivank Garg, Reem Abdel-Salam et al.
BEARCUBS: A benchmark for computer-using web agents
Yixiao Song, Katherine Thai, Chau Minh Pham et al.
CUPID: Evaluating Personalized and Contextualized Alignment of LLMs from Interactions
Tae Soo Kim, Yoonjoo Lee, Yoonah Park et al.
Supposedly Equivalent Facts That Aren’t? Entity Frequency in Pre-training Induces Asymmetry in LLMs
Yuan He, Bailan He, Zifeng Ding et al.
Training Plug-and-Play Knowledge Modules with Deep Context Distillation
Lucas Caccia, Alan Ansell, Edoardo Ponti et al.
EuroBERT: Scaling Multilingual Encoders for European Languages
Nicolas Boizard, Hippolyte Gisserot-Boukhlef, Duarte Miguel Alves et al.
Style over Substance: Distilled Language Models Reason Via Stylistic Replication
Philip Lippmann, Jie Yang