Most Cited COLM "knowledge insertion" Papers
418 papers found • Page 2 of 3
Conference
Supposedly Equivalent Facts That Aren’t? Entity Frequency in Pre-training Induces Asymmetry in LLMs
Yuan He, Bailan He, Zifeng Ding et al.
Efficient Construction of Model Family through Progressive Training Using Model Expansion
Kazuki Yano, Sho Takase, Sosuke Kobayashi et al.
Model-Agnostic Policy Explanations with Large Language Models
Zhang Xi-Jia, Yue Guo, Shufei Chen et al.
Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse Reinforcement Learning
Jared Joselowitz, Ritam Majumdar, Arjun Jagota et al.
From Next-Token to Mathematics: The Learning Dynamics of Mathematical Reasoning in Language Models
Shubhra Mishra, Gabriel Poesia, Noah Goodman
You Cannot Feed Two Birds with One Score: the Accuracy-Naturalness Tradeoff in Translation
Gergely Flamich, David Vilar, Jan-Thorsten Peter et al.
Both Direct and Indirect Evidence Contribute to Dative Alternation Preferences in Language Models
Qing Yao, Kanishka Misra, Leonie Weissweiler et al.
TRELLIS: Learning to Compress Key-Value Memory in Attention Models
Mahdi Karami, Ali Behrouz, Praneeth Kacham et al.
Control the Temperature: Selective Sampling for Diverse and High-Quality LLM Outputs
Sergey Troshin, Wafaa Mohammed, Yan Meng et al.
True Multimodal In-Context Learning Needs Attention to the Visual Context
Shuo Chen, Jianzhe Liu, Zhen Han et al.
EvalAgents: Discovering Implicit Evaluation Criteria from the Web
Manya Wadhwa, Zayne Rea Sprague, Chaitanya Malaviya et al.
QUDsim: Quantifying Discourse Similarities in LLM-Generated Text
Ramya Namuduri, Yating Wu, Anshun Asher Zheng et al.
Data-Centric Human Preference with Rationales for Direct Preference Alignment
Hoang Anh Just, Ming Jin, Anit Kumar Sahu et al.
Quantifying Fairness in LLMs Beyond Tokens: A Semantic and Statistical Perspective
Weijie Xu, Yiwen Wang, Chi Xue et al.
How Multimodal LLMs Solve Image Tasks: A Lens on Visual Grounding, Task Reasoning, and Answer Decoding
Zhuoran Yu, Yong Jae Lee
Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs
Dongyang Fan, Vinko Sabolčec, Matin Ansaripour et al.
Rethinking Multilingual Continual Pretraining: Data Mixing for Adapting LLMs Across Languages and Resources
Zihao Li, Shaoxiong Ji, Hengyu Luo et al.
SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching
Yuxuan Zhu, Ali Falahati, David H. Yang et al.
Plato: Plan to Efficient Decode for Large Language Model Inference
Shuowei Jin, Xueshen Liu, Yongji Wu et al.
Overcoming Vocabulary Constraints with Pixel-level Fallback
Jonas F. Lotz, Hendra Setiawan, Stephan Peitz et al.
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation
Liaoyaqi Wang, Zhengping Jiang, Anqi Liu et al.
VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation
Ziang Ye, Yang Zhang, Wentao Shi et al.
In-context Ranking Preference Optimization
Junda Wu, Rohan Surana, Zhouhang Xie et al.
IterKey: Iterative Keyword Generation with LLMs for Enhanced Retrieval Augmented Generation
Kazuki Hayashi, Hidetaka Kamigaito, Shinya Kouda et al.
AdaptMI: Adaptive Skill-based In-context Math Instructions for Small Language Models
Yinghui He, Abhishek Panigrahi, Yong Lin et al.
Improving Fisher Information Estimation and Efficiency for LoRA-based LLM Unlearning
Yejin Kim, Eunwon Kim, Buru Chang et al.
Post-training for Efficient Communication via Convention Formation
Yilun Hua, Evan Wang, Yoav Artzi
On the Effectiveness and Generalization of Race Representations for Debiasing High-Stakes Decisions
Dang Nguyen, Chenhao Tan
DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding
Hossein Entezari Zarch, Lei Gao, Chaoyi Jiang et al.
ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data
Tong Chen, Faeze Brahman, Jiacheng Liu et al.
Register Always Matters: Analysis of LLM Pretraining Data Through the Lens of Language Variation
Amanda Myntti, Erik Henriksson, Veronika Laippala et al.
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Ziqiao Ma, Jing Ding, Xuejun Zhang et al.
CUPID: Evaluating Personalized and Contextualized Alignment of LLMs from Interactions
Tae Soo Kim, Yoonjoo Lee, Yoonah Park et al.
Probing then Editing Response Personality of Large Language Models
Tianjie Ju, Zhenyu Shao, Bowen Wang et al.
ReFeed: Multi-dimensional Summarization Refinement with Reflective Reasoning on Feedback
Taewon Yun, Jihwan Oh, Hyangsuk Min et al.
Text Speaks Louder than Vision: ASCII Art Reveals Textual Biases in Vision-Language Models
Zhaochen Wang, Bryan Hooi, Yiwei Wang et al.
Pretraining on the Test Set Is No Longer All You Need: A Debate-Driven Approach to QA Benchmarks
Linbo Cao, Jinman Zhao
MAC: A Live Benchmark for Multimodal Large Language Models in Scientific Understanding
Mohan Jiang, Jin Gao, Jiahao Zhan et al.
Resona: Improving Context Copying in Linear Recurrence Models with Retrieval
Xinyu Wang, Linrui Ma, Jerry Huang et al.
Adaptive Computation Pruning for the Forgetting Transformer
Zhixuan Lin, Johan Obando-Ceron, Xu Owen He et al.
MuSeD: A Multimodal Spanish Dataset for Sexism Detection in Social Media Videos
Laura De Grazia, Pol Pastells, Mauro Vázquez Chas et al.
Language models align with brain regions that represent concepts across modalities
Maria Ryskina, Greta Tuckute, Alexander Fung et al.
Beyond the Reported Cutoff: Where Large Language Models Fall Short on Financial Knowledge
Agam Shah, Liqin Ye, Sebastian Jaskowski et al.
Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact Verifiers
Wooseok Seo, Seungju Han, Jaehun Jung et al.
AutoScale: Scale-Aware Data Mixing for Pre-Training LLMs
Feiyang Kang, Yifan Sun, Bingbing Wen et al.
Energy-Based Reward Models for Robust Language Model Alignment
Anamika Lochab, Ruqi Zhang
Short-PHD: Detecting Short LLM-generated Text with Topological Data Analysis After Off-topic Content Insertion
Dongjun Wei, Minjia Mao, Xiao Fang et al.
Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory
Liangyu Wang, Jie Ren, Hang Xu et al.
Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs
Itay Itzhak, Yonatan Belinkov, Gabriel Stanovsky
Stuffed Mamba: Oversized States Lead to the Inability to Forget
Yingfa Chen, Xinrong Zhang, Shengding Hu et al.
SQuat: Subspace-orthogonal KV Cache Quantization
Hao Wang, Ligong Han, Kai Xu et al.
Context-Adaptive Multi-Prompt Embedding with Large Language Models for Vision-Language Alignment
Dahun Kim, Anelia Angelova
Deep Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions
Minwoo Kang, Suhong Moon, Seung Hyeong Lee et al.
Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality
Sewoong Lee, Adam Davies, Marc E. Canby et al.
Correctness-Guaranteed Code Generation via Constrained Decoding
Lingxiao Li, salar rahili, Yiwei Zhao
Visual Representations inside the Language Model
Benlin Liu, Amita Kamath, Madeleine Grunde-McLaughlin et al.
Approximating Language Model Training Data from Weights
John Xavier Morris, Junjie Oscar Yin, Woojeong Kim et al.
RankAlign: A Ranking View of the Generator-Validator Gap in Large Language Models
Juan Diego Rodriguez, Wenxuan Ding, Katrin Erk et al.
Guided Reasoning in LLM-Driven Penetration Testing Using Structured Attack Trees
Katsuaki Nakano, Reza Fayyazi, Shanchieh Yang et al.
The Zero Body Problem: Probing LLM Use of Sensory Language
Rebecca M. M. Hicke, Sil Hamilton, David Mimno
MixAssist: An Audio-Language Dataset for Co-Creative AI Assistance in Music Mixing
Michael Paul Clemens, Ana Marasovic
Imagine All The Relevance: Scenario-Profiled Indexing with Knowledge Expansion for Dense Retrieval
Sangam Lee, Ryang Heo, SeongKu Kang et al.
A Taxonomy of Transcendence
Natalie Abreu, Edwin Zhang, Eran Malach et al.
Language Agents Mirror Human Causal Reasoning Biases. How Can We Help Them Think Like Scientists?
Anthony GX-Chen, Dongyan Lin, Mandana Samiei et al.
In-Context Occam’s Razor: How Transformers Prefer Simpler Hypotheses on the Fly
Puneesh Deora, Bhavya Vasudeva, Tina Behnia et al.
Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF
Syrine Belakaria, Joshua Kazdan, Charles Marx et al.
Style over Substance: Distilled Language Models Reason Via Stylistic Replication
Philip Lippmann, Jie Yang
LLM Unlearning Without an Expert Curated Dataset
Xiaoyuan Zhu, Muru Zhang, Ollie Liu et al.
Humans overrely on overconfident language models, across languages
Neil Rathi, Dan Jurafsky, Kaitlyn Zhou
The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage
Skyler Hallinan, Jaehun Jung, Melanie Sclar et al.
SpectR: Dynamically Composing LM Experts with Spectral Routing
William Fleshman, Benjamin Van Durme
When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars
Rei Higuchi, Ryotaro Kawata, Naoki Nishikawa et al.
Noiser: Bounded Input Perturbations for Attributing Large Language Models
Mohammad Reza Ghasemi Madani, Aryo Pradipta Gema, Yu Zhao et al.
From Queries to Criteria: Understanding How Astronomers Evaluate LLMs
Alina Hyk, Kiera McCormick, Mian Zhong et al.
Agree to Disagree? A Meta-Evaluation of LLM Misgendering
Arjun Subramonian, Vagrant Gautam, Preethi Seshadri et al.
MS-SSM: A Multi-Scale State Space Model for Efficient Sequence Modeling
Mahdi Karami, Ali Behrouz, Peilin Zhong et al.
AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset
Bingxiang He, Wenbin Zhang, Jiaxi Song et al.
RARe: Retrieval Augmented Retrieval with In-Context Examples
Atula Tejaswi, Yoonsang Lee, sujay sanghavi et al.
Impact-driven Context Filtering For Cross-file Code Completion
Yanzhou Li, Shangqing Liu, Kangjie Chen et al.
CLIPPER: Compression enables long-context synthetic data generation
Chau Minh Pham, Yapei Chang, Mohit Iyyer
ADAPT: Actively Discovering and Adapting to Preferences for any Task
Maithili Patel, Xavier Puig, Ruta Desai et al.
Probing Syntax in Large Language Models: Successes and Remaining Challenges
Pablo J. Diego Simon, Emmanuel Chemla, Jean-Remi King et al.
Implicit In-Context Learning: Evidence from Artificial Language Experiments
Xiaomeng Ma, Qihui Xu
Self-Rewarding PPO: Aligning Large Language Models with Demonstrations Only
Qingru Zhang, Liang Qiu, Ilgee Hong et al.
The World According to LLMs: How Geographic Origin Influences LLMs' Entity Deduction Capabilities
Harsh Nishant Lalai, Raj Sanjay Shah, Jiaxin Pei et al.
Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance
Takuya Tamura, Taro Yano, Masafumi Enomoto et al.
Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models
Ameen Ali Ali, Shahar Katz, Lior Wolf et al.
URANIA: Differentially Private Insights into AI Use
Daogao Liu, Edith Cohen, Badih Ghazi et al.
The Negation Bias in Large Language Models: Investigating bias reflected in linguistic markers
Yishan Wang, Pia Sommerauer, Jelke Bloem
Do Large Language Models Have a Planning Theory of Mind? Evidence from MindGames: a Multi-Step Persuasion Task
Jared Moore, Ned Cooper, Rasmus Overmark et al.
Layerwise Importance Analysis of Feed-Forward Networks in Transformer-based Language Models
Wataru Ikeda, Kazuki Yano, Ryosuke Takahashi et al.
Learning Effective Language Representations for Sequential Recommendation via Joint Embedding Predictive Architecture
Nguyen Anh Minh, Dung D. Le
Single-Pass Document Scanning for Question Answering
Weili Cao, Jianyou Wang, Youze Zheng et al.
RRO: LLM Agent Optimization Through Rising Reward Trajectories
Zilong Wang, Jingfeng Yang, Sreyashi Nag et al.
UTF-8 Plumbing: Byte-level Tokenizers Unavoidably Enable LLMs to Generate Ill-formed UTF-8
Preston Firestone, Shubham Ugare, Gagandeep Singh et al.
Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts
Samin Yeasar Arnob, Zhan Su, Minseon Kim et al.
Exploring Large Language Model Agents for Piloting Social Experiments
Jinghua Piao, Yuwei Yan, Nian Li et al.
Privately Learning from Graphs with Applications in Fine-tuning Large Language Models
Haoteng Yin, Rongzhe Wei, Eli Chien et al.
Investigating Intersectional Bias in Large Language Models using Confidence Disparities in Coreference Resolution
Falaah Arif Khan, Nivedha Sivakumar, Yinong Oliver Wang et al.
Ensemble Debiasing Across Class and Sample Levels for Fairer Prompting Accuracy
Ruixi Lin, Ziqiao Wang, Yang You
Mitigating Modal Imbalance in Multimodal Reasoning
Chen Henry Wu, Neil Kale, Aditi Raghunathan
Teach Old SAEs New Domain Tricks with Boosting
Nikita Koriagin, Yaroslav Aksenov, Daniil Laptev et al.
Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression
Hanqi Xiao, Yi-Lin Sung, Elias Stengel-Eskin et al.
Multi-Agent Retrieval-Augmented Framework for Evidence-Based Counterspeech Against Health Misinformation
Anirban Saha Anik, Xiaoying Song, Elliott Wang et al.
Resource-efficient Inference with Foundation Model Programs
Lunyiu Nie, Zhimin Ding, Kevin Yu et al.
Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding
Fabian David Schmidt, Ivan Vulić, Goran Glavaš et al.
CONCAP: Seeing Beyond English with Concepts Retrieval-Augmented Captioning
George Ibrahim, Rita Ramos, Yova Kementchedjhieva
OpinioRAG: Towards Generating User-Centric Opinion Highlights from Large-scale Online Reviews
Mir Tafseer Nayeem, Davood Rafiei
DualEdit: Dual Editing for Knowledge Updating in Vision-Language Models
Zhiyi Shi, Binjie Wang, Chongjie Si et al.
Limitations of refinement methods for weak to strong generalization
Seamus Somerstep, Yaacov Ritov, Mikhail Yurochkin et al.
X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression
Guihong Li, Mehdi Rezagholizadeh, Mingyu Yang et al.
ALOPE: Adaptive Layer Optimization for Translation Quality Estimation using Large Language Models
Archchana Sindhujan, Shenbin Qian, Chan Chi Chun Matthew et al.
Customize Multi-modal RAI Guardrails with Precedent-based predictions
Cheng-Fu Yang, Thanh Tran, Christos Christodoulopoulos et al.
News is More than a Collection of Facts: Moral Frame Preserving News Summarization
Enrico Liscio, Michela Lorandi, Pradeep K. Murukannaiah
CASCADE Your Datasets for Cross-Mode Knowledge Retrieval of Language Models
Runlong Zhou, Yi Zhang
Hyperparameter Loss Surfaces Are Simple Near their Optima
Nicholas Lourie, He He, Kyunghyun Cho
Can Large Language Models Integrate Spatial Data? Empirical Insights into Reasoning Strengths and Computational Weaknesses
Bin HAN, Robert Wolfe, Anat Caspi et al.
SEAM: Semantically Equivalent Across Modalities Benchmark for Vision-Language Models
Zhenwei Tang, Difan Jiao, Blair Yang et al.
Navigating the Rabbit Hole: Emergent Biases in LLM-Generated Attack Narratives Targeting Mental Health Groups
Rijul Magu, Arka Dutta, Sean Kim et al.
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation
Shiven Sinha, Shashwat Goel, Ponnurangam Kumaraguru et al.
Elucidating the Design Space of Decay in Linear Attention
Zhen Qin, Xuyang Shen, Yiran Zhong
Meta-Learning for Speeding Up Large Model Inference in Decentralized Environments
Yipeng Du, Zihao Wang, Ahmad Farhan et al.
BiXSE: Improving Dense Retrieval via Probabilistic Graded Relevance Distillation
Christos Tsirigotis, Vaibhav Adlakha, Joao Monteiro et al.
CodeXEmbed: A Generalist Embedding Model Family for Multilingual and Multi-task Code Retrieval
Ye Liu, Rui Meng, Shafiq Joty et al.
Overfill: Two-Stage Models for Efficient Language Model Decoding
Woojeong Kim, Junxiong Wang, Jing Nathan Yan et al.
Society of Mind Meets Real-Time Strategy: A Hierarchical Multi-Agent Framework for Strategic Reasoning
Daechul Ahn, San Kim, Jonghyun Choi
EvidenceBench: A Benchmark for Extracting Evidence from Biomedical Papers
Jianyou Wang, Weili Cao, Kaicheng Wang et al.
Improving LLMs‘ Generalized Reasoning Abilities by Graph Problems
Qifan Zhang, Nuo Chen, Zehua Li et al.
Reverse-engineering NLI: A study of the meta-inferential properties of Natural Language Inference
Rasmus Blanck, Bill Noble, Stergios Chatzikyriakidis
HIPPO-VIDEO : Simulating Watch Histories with Large Language Models for History-Driven Video Highlighting
Jeongeun Lee, Youngjae Yu, Dongha Lee
The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning
Raj Sanjay Shah, Jing Huang, Keerthiram Murugesan et al.
Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation
Yi Lu, Wanxu Zhao, Xin Zhou et al.
G1yphD3c0de: Towards Safer Language Models on Visually Perturbed Texts
Yejinchoi, Yejin Yeo, Yejin Son et al.
E$^2$-RAG: Towards Editable Efficient RAG by Editing Compressed KV Caches
Tongxu Luo, Wenyu Du, HanWen Hao et al.
When Splitting Makes Stronger: A Theoretical and Empirical Analysis of Divide-and-Conquer Prompting in LLMs
Yizhou Zhang, Defu Cao, Lun Du et al.
Beyond Blanket Masking: Examining Granularity for Privacy Protection in Images Captured by Blind and Low Vision Users
Jeffri Murrugarra-Llerena, Haoran Niu, K. Suzanne Barber et al.
MapIQ: Evaluating Multimodal Large Language Models for Map Question Answering
Varun Srivastava, Fan Lei, Srija Mukhopadhyay et al.
2 OLMo 2 Furious (COLM’s Version)
Evan Pete Walsh, Luca Soldaini, Dirk Groeneveld et al.
CRABS: A syntactic-semantic pincer strategy for bounding LLM interpretation of Python notebooks
Meng Li, Timothy M. McPhillips, Dingmin Wang et al.
Transformers are Efficient Compilers, Provably
Xiyu Zhai, Runlong Zhou, Liao Zhang et al.
Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation
Lajanugen Logeswaran, Jaekyeom Kim, Sungryull Sohn et al.
Pretrained Hybrids with MAD Skills
Nicholas Roberts, Samuel Guo, Zhiqi Gao et al.
Benchmarking Retrieval-Augmented Generation for Chemistry
Xianrui Zhong, Bowen Jin, Siru Ouyang et al.
Multilingual and Multi-Accent Jailbreaking of Audio LLMs
Jaechul Roh, Virat Shejwalkar, Amir Houmansadr
UNVEILING: What Makes Linguistics Olympiad Puzzles Tricky for LLMs?
Mukund Choudhary, KV Aditya Srivatsa, Gaurja Aeron et al.
Can LLM "Self-report"?: Evaluating the Validity of Self-report Scales in Measuring Personality Design in LLM-based Chatbots
Huiqi Zou, Pengda Wang, Zihan Yan et al.
LM Agents May Fail to Act on Their Own Risk Knowledge
Yuzhi Tang, Tianxiao Li, Elizabeth Li et al.
CALLME: Call Graph Augmentation with Large Language Models for Javascript
Michael Wang, Kexin Pei, Armando Solar-Lezama
Learning by Teaching: Engaging Students as Instructors of Large Language Models in Computer Science Education
Xinming Yang, Haasil Pujara, Jun Li
Hidden in plain sight: VLMs overlook their visual representations
Stephanie Fu, tyler bonnen, Devin Guillory et al.
Knowledge Graph Retrieval-Augmented Generation via GNN-Guided Prompting
Haochen Liu, Song Wang, Jundong Li
Language Model Uncertainty Quantification with Attention Chain
Yinghao Li, Rushi Qiang, Lama Moukheiber et al.
Do Language Models Agree with Human Perceptions of Suspense in Stories?
Glenn Matlin, Devin Zhang, Rodrigo Barroso Loza et al.
KVSink: Understanding and Enhancing the Preservation of Attention Sinks in KV Cache Quantization for LLMs
Zunhai Su, Kehong Yuan
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation
Zichong Li, Chen Liang, Zixuan Zhang et al.
CoLa: Learning to Interactively Collaborate with Large Language Models
Abhishek Sharma, Dan Goldwasser
SmolLM2: When Smol Goes Big — Data-Centric Training of a Fully Open Small Language Model
Loubna Ben allal, Anton Lozhkov, Elie Bakouch et al.
Reinforcement Learning Enhanced Full-Duplex Spoken Dialogue Language Models for Conversational Interactions
Chen Chen, Ke Hu, Chao-Han Huck Yang et al.
SmolVLM: Redefining small and efficient multimodal models
Andrés Marafioti, Orr Zohar, Miquel Farré et al.
BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity
Chenyang Song, Weilin Zhao, Xu Han et al.
HyperINF: Unleashing the HyperPower of Schulz's Method for Data Influence Estimation
Xinyu Zhou, Simin Fan, Martin Jaggi
GenerationPrograms: Fine-grained Attribution with Executable Programs
David Wan, Eran Hirsch, Elias Stengel-Eskin et al.
HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Interactive AI Agents
Xuhui Zhou, Hyunwoo Kim, Faeze Brahman et al.
PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction
Shufan Li, Aditya Grover
REM: Evaluating LLM Embodied Spatial Reasoning through Multi-Frame Trajectories
Jacob Thompson, Emiliano Garcia-Lopez, Yonatan Bisk
MeMAD: Structured Memory of Debates for Enhanced Multi-Agent Reasoning
Shuai Ling, Lizi Liao, Dongmei Jiang et al.
VaPR - Vision-language Preference alignment for Reasoning
Rohan Wadhawan, Fabrice Y Harel-Canada, Zi-Yi Dou et al.
On Mechanistic Circuits for Extractive Question-Answering
Samyadeep Basu, Vlad I Morariu, Ryan A. Rossi et al.
Partial Perspectives: How LLMs Handle Logically Inconsistent Knowledge in Reasoning Tasks
Zichao Li, Ines Arous, Jackie CK Cheung
Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach
Shijian Deng, Wentian Zhao, Yu-Jhe Li et al.
$\mu$KE: Matryoshka Unstructured Knowledge Editing of Large Language Models
Zian Su, Ziyang Huang, Kaiyuan Zhang et al.
Teaching Models to Understand (but not Generate) High-risk Data
Ryan Yixiang Wang, Matthew Finlayson, Luca Soldaini et al.
Hawkeye: Model Collaboration for Efficient Reasoning
Jianshu She, Zhuohao Li, Zhemin Huang et al.
Building Instruction-Tuning Datasets from Human-Written Instructions with Open-Weight Large Language Models
Youmi Ma, Sakae Mizuki, Kazuki Fujii et al.
Phased Training for LLM-powered Text Retrieval Models Beyond Data Scaling
Xin Zhang, Yanzhao Zhang, Wen Xie et al.
JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
Yi Nian, Shenzhe Zhu, Yuehan Qin et al.
IMPersona: Evaluating Individual Level LLM Impersonation
Quan Shi, Carlos E Jimenez, Stephen Dong et al.
ProsodyLM: Uncovering the Emerging Prosody Processing Capabilities in Speech Language Models
Kaizhi Qian, Xulin Fan, Junrui Ni et al.
Bootstrapping Visual Assistant Modeling with Situated Interaction Simulation
Yichi Zhang, Run Peng, Yinpei Dai et al.
Exposing and Patching the Flaws of Large Language Models in Social Character Simulation
Yue Huang, Zhengqing Yuan, Yujun Zhou et al.
Rethinking Associative Memory Mechanism in Induction Head
Shuo Wang, Issei Sato
StagFormer: Time Staggering Decoder only Transformers
Dylan J Cutler, Arun Kandoor, Nishanth Dikkala et al.
QAPyramid: Fine-grained Evaluation of Content Selection for Text Summarization
Shiyue Zhang, David Wan, Arie Cattan et al.
Improving Table Understanding with LLMs and Entity-Oriented Search
Thi-Nhung Nguyen, Hoang Ngo, Dinh Phung et al.
Rhapsody: A Dataset for Highlight Detection in Podcasts
Younghan Park, Anuj Diwan, David Harwath et al.
Breakpoint: Stress-testing systems-level reasoning in LLM agents
Kaivalya Hariharan, Uzay Girit, Zifan Wang et al.
Truth-value judgment in language models: ‘truth directions’ are context sensitive
Stefan F. Schouten, Peter Bloem, Ilia Markov et al.
Overflow Prevention Enhances Long-Context Recurrent LLMs
Assaf Ben-Kish, Itamar Zimerman, Muhammad Jehanzeb Mirza et al.
Cutting the Root of Hallucination: Structural Trimming for Vulnerability Mitigation in Code LLMs
Yage Zhang
Impact of LLM Alignment on Impression Formation in Social Interactions
Ala N. Tak, Anahita Bolourani, Daniel B. Shank et al.
How does Watermarking Affect Visual Language Models in Document Understanding?
Chunxue Xu, Yiwei Wang, Bryan Hooi et al.
Hell or High Water: Evaluating Agentic Recovery from External Failures
Andrew Wang, Sophia Hager, Adi Asija et al.
Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning Code LLMs
Zichao Hu, Junyi Jessy Li, Arjun Guha et al.
Reasoning Models Know When They’re Right: Probing Hidden States for Self-Verification
Anqi Zhang, Yulin Chen, Jane Pan et al.
Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base
Linxin Song, Xuwei Ding, Jieyu Zhang et al.
Stop-Think-AutoRegress: Language Modeling with Latent Diffusion Planning
Justin Lovelace, Christian K Belardi, Sofian Zalouk et al.
D3: A Dataset for Training Code LMs to Act Diff-by-Diff
Ulyana Piterbarg, Kanishk Gandhi, Lerrel Pinto et al.
LLM-based Multi-Agents System Attack via Continuous Optimization with Discrete Efficient Search
Weichen Yu, Kai Hu, Tianyu Pang et al.
Do Biased Models Have Biased Thoughts?
Swati Rajwal, Shivank Garg, Reem Abdel-Salam et al.
Assessing Judging Bias in Large Reasoning Models: An Empirical Study
Qian Wang, Zhanzhi Lou, Zhenheng Tang et al.