Most Cited COLM "structural bioinformatics" Papers
418 papers found • Page 2 of 3
Conference
Text Speaks Louder than Vision: ASCII Art Reveals Textual Biases in Vision-Language Models
Zhaochen Wang, Bryan Hooi, Yiwei Wang et al.
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
Kanishk Gandhi, Ayush K Chakravarthy, Anikait Singh et al.
Hawkeye: Model Collaboration for Efficient Reasoning
Jianshu She, Zhuohao Li, Zhemin Huang et al.
Building Instruction-Tuning Datasets from Human-Written Instructions with Open-Weight Large Language Models
Youmi Ma, Sakae Mizuki, Kazuki Fujii et al.
Impact-driven Context Filtering For Cross-file Code Completion
Yanzhou Li, Shangqing Liu, Kangjie Chen et al.
Phased Training for LLM-powered Text Retrieval Models Beyond Data Scaling
Xin Zhang, Yanzhao Zhang, Wen Xie et al.
JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
Yi Nian, Shenzhe Zhu, Yuehan Qin et al.
IMPersona: Evaluating Individual Level LLM Impersonation
Quan Shi, Carlos E Jimenez, Stephen Dong et al.
ProsodyLM: Uncovering the Emerging Prosody Processing Capabilities in Speech Language Models
Kaizhi Qian, Xulin Fan, Junrui Ni et al.
Bootstrapping Visual Assistant Modeling with Situated Interaction Simulation
Yichi Zhang, Run Peng, Yinpei Dai et al.
Context-Adaptive Multi-Prompt Embedding with Large Language Models for Vision-Language Alignment
Dahun Kim, Anelia Angelova
Understanding Layer Significance in LLM Alignment
Guangyuan SHI, ZEXIN LU, Xiaoyu DONG et al.
EnrichIndex: Using LLMs to Enrich Retrieval Indices Offline
Peter Baile Chen, Tomer Wolfson, Mike Cafarella et al.
Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory
Liangyu Wang, Jie Ren, Hang Xu et al.
Deep Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions
Minwoo Kang, Suhong Moon, Seung Hyeong Lee et al.
SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models
Arijit Ray, Jiafei Duan, Ellis L Brown II et al.
DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning
Pengcheng Jiang, Jiacheng Lin, Lang Cao et al.
Exposing and Patching the Flaws of Large Language Models in Social Character Simulation
Yue Huang, Zhengqing Yuan, Yujun Zhou et al.
Rank1: Test-Time Compute for Reranking in Information Retrieval
Orion Weller, Kathryn Ricci, Eugene Yang et al.
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation
Shiven Sinha, Shashwat Goel, Ponnurangam Kumaraguru et al.
Plato: Plan to Efficient Decode for Large Language Model Inference
Shuowei Jin, Xueshen Liu, Yongji Wu et al.
Correctness-Guaranteed Code Generation via Constrained Decoding
Lingxiao Li, salar rahili, Yiwei Zhao
StagFormer: Time Staggering Decoder only Transformers
Dylan J Cutler, Arun Kandoor, Nishanth Dikkala et al.
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Deepak Nathani, Lovish Madaan, Nicholas Roberts et al.
Limitations of refinement methods for weak to strong generalization
Seamus Somerstep, Yaacov Ritov, Mikhail Yurochkin et al.
How do language models learn facts? Dynamics, curricula and hallucinations
Nicolas Zucchet, Jorg Bornschein, Stephanie C.Y. Chan et al.
DualEdit: Dual Editing for Knowledge Updating in Vision-Language Models
Zhiyi Shi, Binjie Wang, Chongjie Si et al.
Improving Table Understanding with LLMs and Entity-Oriented Search
Thi-Nhung Nguyen, Hoang Ngo, Dinh Phung et al.
LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation
Xi Ye, Fangcong Yin, Yinghui He et al.
ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations
Yubo Wang, Xueguang Ma, Ping Nie et al.
Short-PHD: Detecting Short LLM-generated Text with Topological Data Analysis After Off-topic Content Insertion
Dongjun Wei, Minjia Mao, Xiao Fang et al.
Truth-value judgment in language models: ‘truth directions’ are context sensitive
Stefan F. Schouten, Peter Bloem, Ilia Markov et al.
Out-of-Distribution Detection using Synthetic Data Generation
Momin Abbas, Muneeza Azmat, Raya Horesh et al.
Cutting the Root of Hallucination: Structural Trimming for Vulnerability Mitigation in Code LLMs
Yage Zhang
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Bo Peng, Ruichong Zhang, Daniel Goldstein et al.
Ensemble Debiasing Across Class and Sample Levels for Fairer Prompting Accuracy
Ruixi Lin, Ziqiao Wang, Yang You
Imagine All The Relevance: Scenario-Profiled Indexing with Knowledge Expansion for Dense Retrieval
Sangam Lee, Ryang Heo, SeongKu Kang et al.
You Cannot Feed Two Birds with One Score: the Accuracy-Naturalness Tradeoff in Translation
Gergely Flamich, David Vilar, Jan-Thorsten Peter et al.
Mixture of Attention Spans: Optimizing LLM Inference Efficiency with Heterogeneous Sliding-Window Lengths
Tianyu Fu, Haofeng Huang, Xuefei Ning et al.
Understanding the Uncertainty of LLM Explanations: A Perspective Based on Reasoning Topology
Longchao Da, Xiaoou Liu, Jiaxin Dai et al.
How does Watermarking Affect Visual Language Models in Document Understanding?
Chunxue Xu, Yiwei Wang, Bryan Hooi et al.
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
Chengqi Lyu, Songyang Gao, Yuzhe Gu et al.
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning
Nishad Singhi, Hritik Bansal, Arian Hosseini et al.
R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents
Naman Jain, Jaskirat Singh, Manish Shetty et al.
Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning Code LLMs
Zichao Hu, Junyi Jessy Li, Arjun Guha et al.
CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis
Anjiang Wei, Tarun Suresh, Jiannan Cao et al.
C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing
Zhongyang Li, Ziyue Li, Tianyi Zhou
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
Yingcong Li, Davoud Ataee Tarzanagh, Ankit Singh Rawat et al.
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
Apoorv Khandelwal, Tian Yun, Nihal V. Nayak et al.
Shared Global and Local Geometry of Language Model Embeddings
Andrew Lee, Melanie Weber, Fernanda Viégas et al.
D3: A Dataset for Training Code LMs to Act Diff-by-Diff
Ulyana Piterbarg, Kanishk Gandhi, Lerrel Pinto et al.
LLM-based Multi-Agents System Attack via Continuous Optimization with Discrete Efficient Search
Weichen Yu, Kai Hu, Tianyu Pang et al.
Do Biased Models Have Biased Thoughts?
Swati Rajwal, Shivank Garg, Reem Abdel-Salam et al.
BEARCUBS: A benchmark for computer-using web agents
Yixiao Song, Katherine Thai, Chau Minh Pham et al.
CUPID: Evaluating Personalized and Contextualized Alignment of LLMs from Interactions
Tae Soo Kim, Yoonjoo Lee, Yoonah Park et al.
Supposedly Equivalent Facts That Aren’t? Entity Frequency in Pre-training Induces Asymmetry in LLMs
Yuan He, Bailan He, Zifeng Ding et al.
Training Plug-and-Play Knowledge Modules with Deep Context Distillation
Lucas Caccia, Alan Ansell, Edoardo Ponti et al.
EuroBERT: Scaling Multilingual Encoders for European Languages
Nicolas Boizard, Hippolyte Gisserot-Boukhlef, Duarte Miguel Alves et al.
Style over Substance: Distilled Language Models Reason Via Stylistic Replication
Philip Lippmann, Jie Yang
Plancraft: an evaluation dataset for planning with LLM agents
Gautier Dagan, Frank Keller, Alex Lascarides
Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human Feedback
Johannes Ackermann, Takashi Ishida, Masashi Sugiyama
Efficient Construction of Model Family through Progressive Training Using Model Expansion
Kazuki Yano, Sho Takase, Sosuke Kobayashi et al.
Inside-Out: Hidden Factual Knowledge in LLMs
Zorik Gekhman, Eyal Ben-David, Hadas Orgad et al.
News is More than a Collection of Facts: Moral Frame Preserving News Summarization
Enrico Liscio, Michela Lorandi, Pradeep K. Murukannaiah
LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K
Tao Yuan, Xuefei Ning, Dong Zhou et al.
Base Models Beat Aligned Models at Randomness and Creativity
Peter West, Christopher Potts
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Songjun Tu, Jiahao Lin, Xiangyu Tian et al.
Agents Are All You Need for LLM Unlearning
Debdeep Sanyal, Murari Mandal
One ruler to measure them all: Benchmarking multilingual long-context language models
Yekyung Kim, Jenna Russell, Marzena Karpinska et al.
SEAL: Steerable Reasoning Calibration of Large Language Models for Free
Runjin Chen, Zhenyu Zhang, Junyuan Hong et al.
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Weihao Zeng, Yuzhen Huang, Qian Liu et al.
SpectR: Dynamically Composing LM Experts with Spectral Routing
William Fleshman, Benjamin Van Durme
Both Direct and Indirect Evidence Contribute to Dative Alternation Preferences in Language Models
Qing Yao, Kanishka Misra, Leonie Weissweiler et al.
TRELLIS: Learning to Compress Key-Value Memory in Attention Models
Mahdi Karami, Ali Behrouz, Praneeth Kacham et al.
Beyond the Reported Cutoff: Where Large Language Models Fall Short on Financial Knowledge
Agam Shah, Liqin Ye, Sebastian Jaskowski et al.
LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation
Juzheng Zhang, Jiacheng You, Ashwinee Panda et al.
CASCADE Your Datasets for Cross-Mode Knowledge Retrieval of Language Models
Runlong Zhou, Yi Zhang
Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback
Runlong Zhou, Maryam Fazel, Simon Shaolei Du
The Devil is in the EOS: Sequence Training for Detailed Image Captioning
Abdelrahman Mohamed, Yova Kementchedjhieva
ReFeed: Multi-dimensional Summarization Refinement with Reflective Reasoning on Feedback
Taewon Yun, Jihwan Oh, Hyangsuk Min et al.
Modifying Large Language Model Post-Training for Diverse Creative Writing
John Joon Young Chung, Vishakh Padmakumar, Melissa Roemmele et al.
LLMs as Research Tools: A Large Scale Survey of Researchers’ Usage and Perceptions
Zhehui Liao, Maria Antoniak, Inyoung Cheong et al.
FineMedLM-o1: Enhancing Medical Knowledge Reasoning Ability of LLM from Supervised Fine-Tuning to Test-Time Training
hongzhou yu, Tianhao Cheng, Yingwen Wang et al.
Can Test-Time Scaling Improve World Foundation Model?
Wenyan Cong, Hanqing Zhu, Peihao Wang et al.
VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information
Ryo Kamoi, Yusen Zhang, Sarkar Snigdha Sarathi Das et al.
Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models
Hyunwoo Kim, Melanie Sclar, Tan Zhi-Xuan et al.
DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation
Jingyang Xiang, Sai Qian Zhang
The Dual-Route Model of Induction
Sheridan Feucht, Eric Todd, Byron C Wallace et al.
Language Models Fail to Introspect About Their Knowledge of Language
Siyuan Song, Jennifer Hu, Kyle Mahowald
SQuat: Subspace-orthogonal KV Cache Quantization
Hao Wang, Ligong Han, Kai Xu et al.
Hidden in plain sight: VLMs overlook their visual representations
Stephanie Fu, tyler bonnen, Devin Guillory et al.
Language Model Uncertainty Quantification with Attention Chain
Yinghao Li, Rushi Qiang, Lama Moukheiber et al.
SmolVLM: Redefining small and efficient multimodal models
Andrés Marafioti, Orr Zohar, Miquel Farré et al.
Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach
Shijian Deng, Wentian Zhao, Yu-Jhe Li et al.
Overflow Prevention Enhances Long-Context Recurrent LLMs
Assaf Ben-Kish, Itamar Zimerman, Muhammad Jehanzeb Mirza et al.
KVSink: Understanding and Enhancing the Preservation of Attention Sinks in KV Cache Quantization for LLMs
Zunhai Su, Kehong Yuan
PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction
Shufan Li, Aditya Grover
Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base
Linxin Song, Xuwei Ding, Jieyu Zhang et al.
Assessing Judging Bias in Large Reasoning Models: An Empirical Study
Qian Wang, Zhanzhi Lou, Zhenheng Tang et al.
Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance
Takuya Tamura, Taro Yano, Masafumi Enomoto et al.
E$^2$-RAG: Towards Editable Efficient RAG by Editing Compressed KV Caches
Tongxu Luo, Wenyu Du, HanWen Hao et al.
Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding
Fabian David Schmidt, Ivan Vulić, Goran Glavaš et al.
NoWag: A Unified Framework for Shape Preserving Com- pression of Large Language Models
Lawrence Ray Liu, Inesh Chakrabarti, Yixiao Li et al.
Evaluating Large Language Models as Expert Annotators
Yu-Min Tseng, Wei-Lin Chen, Chung-Chi Chen et al.
Yourbench: Dynamic Evaluation Set Generation with LLMs
Sumuk Shashidhar, Clémentine Fourrier, Alina Lozovskaya et al.
LawFlow: Collecting and Simulating Lawyers’ Thought Processes on Business Formation Case Studies
Debarati Das, Khanh Chi Le, Ritik Sachin Parkar et al.
Traceable and Explainable Multimodal Large Language Models: An Information-Theoretic View
Zihan Huang, Junda Wu, Rohan Surana et al.
Understanding and Improving Noisy Embedding Techniques in Instruction Finetuning
Abhay Yadav
REFA: Reference Free Alignment with Fine-Grained Length Control
Taneesh Gupta, Rahul Madhavan, Xuchao Zhang et al.
Hyperparameter Loss Surfaces Are Simple Near their Optima
Nicholas Lourie, He He, Kyunghyun Cho
From Next-Token to Mathematics: The Learning Dynamics of Mathematical Reasoning in Language Models
Shubhra Mishra, Gabriel Poesia, Noah Goodman
The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage
Skyler Hallinan, Jaehun Jung, Melanie Sclar et al.
Synthetic Data Generation and Multi-Step Reinforcement Learning for Reasoning and Tool Use
Anna Goldie, Azalia Mirhoseini, Hao Zhou et al.
MSRS: Evaluating Multi-Source Retrieval-Augmented Generation
Rohan Phanse, Ej Zhou, Kejian Shi et al.
Epistemic Alignment: A Mediating Framework for User-LLM Knowledge Delivery
Nicholas Clark, Hua Shen, Bill Howe et al.
PrefPalette: Personalized Preference Modeling with Latent Attributes
Shuyue Stella Li, Melanie Sclar, Hunter Lang et al.
X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents
Salman Rahman, Liwei Jiang, James Shiffer et al.
Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale
Bowen Jiang, Zhuoqun Hao, Young Min Cho et al.
Language models align with brain regions that represent concepts across modalities
Maria Ryskina, Greta Tuckute, Alexander Fung et al.
SUV: Scalable Large Language Model Copyright Compliance with Regularized Selective Unlearning
Tianyang Xu, Xiaoze Liu, Feijie Wu et al.
Can LLMs Handle WebShell Detection? Overcoming Detection Challenges with Behavioral Function-Aware Framework
Feijiang Han, Jiaming Zhang, Chuyi Deng et al.
LLM Unlearning Without an Expert Curated Dataset
Xiaoyuan Zhu, Muru Zhang, Ollie Liu et al.
Steering Large Language Model Activations in Sparse Spaces
Reza Bayat, Ali Rahimi-Kalahroudi, Mohammad Pezeshki et al.
Adaptive Computation Pruning for the Forgetting Transformer
Zhixuan Lin, Johan Obando-Ceron, Xu Owen He et al.
Pairwise or Pointwise? Evaluating Feedback Protocols for Bias in LLM-Based Evaluation
Tuhina Tripathi, Manya Wadhwa, Greg Durrett et al.
Estimating Optimal Context Length for Hybrid Retrieval-augmented Multi-document Summarization
Adithya Pratapa, Teruko Mitamura
Navigating the Rabbit Hole: Emergent Biases in LLM-Generated Attack Narratives Targeting Mental Health Groups
Rijul Magu, Arka Dutta, Sean Kim et al.
M²IV: Towards Efficient and Fine-grained Multimodal In-Context Learning via Representation Engineering
Yanshu Li, Yi Cao, Hongyang He et al.
BiXSE: Improving Dense Retrieval via Probabilistic Graded Relevance Distillation
Christos Tsirigotis, Vaibhav Adlakha, Joao Monteiro et al.
Stop-Think-AutoRegress: Language Modeling with Latent Diffusion Planning
Justin Lovelace, Christian K Belardi, Sofian Zalouk et al.
In-Context Occam’s Razor: How Transformers Prefer Simpler Hypotheses on the Fly
Puneesh Deora, Bhavya Vasudeva, Tina Behnia et al.
Reasoning Models Know When They’re Right: Probing Hidden States for Self-Verification
Anqi Zhang, Yulin Chen, Jane Pan et al.
The Negation Bias in Large Language Models: Investigating bias reflected in linguistic markers
Yishan Wang, Pia Sommerauer, Jelke Bloem
Language Agents Mirror Human Causal Reasoning Biases. How Can We Help Them Think Like Scientists?
Anthony GX-Chen, Dongyan Lin, Mandana Samiei et al.
Finding Flawed Fictions: Evaluating Complex Reasoning in Language Models via Plot Hole Detection
Kabir Ahuja, Melanie Sclar, Yulia Tsvetkov
Hell or High Water: Evaluating Agentic Recovery from External Failures
Andrew Wang, Sophia Hager, Adi Asija et al.
A Taxonomy of Transcendence
Natalie Abreu, Edwin Zhang, Eran Malach et al.
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
Shalev Lifshitz, Sheila A. McIlraith, Yilun Du
Retrieval-Augmented Generation with Conflicting Evidence
Han Wang, Archiki Prasad, Elias Stengel-Eskin et al.
Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models
Thao Nguyen, Yang Li, Olga Golovneva et al.
Impact of LLM Alignment on Impression Formation in Social Interactions
Ala N. Tak, Anahita Bolourani, Daniel B. Shank et al.
MixAssist: An Audio-Language Dataset for Co-Creative AI Assistance in Music Mixing
Michael Paul Clemens, Ana Marasovic
Breakpoint: Stress-testing systems-level reasoning in LLM agents
Kaivalya Hariharan, Uzay Girit, Zifan Wang et al.
Rhapsody: A Dataset for Highlight Detection in Podcasts
Younghan Park, Anuj Diwan, David Harwath et al.
M-Prometheus: A Suite of Open Multilingual LLM Judges
José Pombal, Dongkeun Yoon, Patrick Fernandes et al.
Task Vectors in In-Context Learning: Emergence, Formation, and Benefits
Liu Yang, Ziqian Lin, Kangwook Lee et al.
Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs
Dongyang Fan, Vinko Sabolčec, Matin Ansaripour et al.
Rethinking Associative Memory Mechanism in Induction Head
Shuo Wang, Issei Sato
Stuffed Mamba: Oversized States Lead to the Inability to Forget
Yingfa Chen, Xinrong Zhang, Shengding Hu et al.
Fluid Language Model Benchmarking
Valentin Hofmann, David Heineman, Ian Magnusson et al.
Data-Centric Human Preference with Rationales for Direct Preference Alignment
Hoang Anh Just, Ming Jin, Anit Kumar Sahu et al.
DynaSaur: Large Language Agents Beyond Predefined Actions
Dang Nguyen, Viet Dac Lai, Seunghyun Yoon et al.
PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages
Priyanshu Kumar, Devansh Jain, Akhila Yerukola et al.
Advancing Language Multi-Agent Learning with Credit Re-Assignment for Interactive Environment Generalization
Zhitao He, Zijun Liu, Peng Li et al.
Partial Perspectives: How LLMs Handle Logically Inconsistent Knowledge in Reasoning Tasks
Zichao Li, Ines Arous, Jackie CK Cheung
EvalAgents: Discovering Implicit Evaluation Criteria from the Web
Manya Wadhwa, Zayne Rea Sprague, Chaitanya Malaviya et al.
Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models
Meghana Arakkal Rajeev, Rajkumar Ramamurthy, Prapti Trivedi et al.
ALFA: Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning
Shuyue Stella Li, Jimin Mun, Faeze Brahman et al.
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
Ruikang Liu, Yuxuan Sun, Manyi Zhang et al.
CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation
Anirudh Khatry, Robert Zhang, Jia Pan et al.
On Mechanistic Circuits for Extractive Question-Answering
Samyadeep Basu, Vlad I Morariu, Ryan A. Rossi et al.
Boosting LLM Reasoning via Spontaneous Self-Correction
Xutong Zhao, Tengyu Xu, Xuewei Wang et al.
REM: Evaluating LLM Embodied Spatial Reasoning through Multi-Frame Trajectories
Jacob Thompson, Emiliano Garcia-Lopez, Yonatan Bisk
GenerationPrograms: Fine-grained Attribution with Executable Programs
David Wan, Eran Hirsch, Elias Stengel-Eskin et al.
Multi-Agent Retrieval-Augmented Framework for Evidence-Based Counterspeech Against Health Misinformation
Anirban Saha Anik, Xiaoying Song, Elliott Wang et al.
Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression
Hanqi Xiao, Yi-Lin Sung, Elias Stengel-Eskin et al.
Not All Data Are Unlearned Equally
Aravind Krishnan, Siva Reddy, Marius Mosbach
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models
Lynn Chua, Badih Ghazi, Yangsibo Huang et al.
FineWeb2: One Pipeline to Scale Them All — Adapting Pre-Training Data Processing to Every Language
Guilherme Penedo, Hynek Kydlíček, Vinko Sabolčec et al.
Overcoming Vocabulary Constraints with Pixel-level Fallback
Jonas F. Lotz, Hendra Setiawan, Stephan Peitz et al.
Breaking the Data Barrier -- Building GUI Agents Through Task Generalization
Junlei Zhang, Zichen Ding, Chang Ma et al.
Spike No More: Stabilizing the Pre-training of Large Language Models
Sho Takase, Shun Kiyono, Sosuke Kobayashi et al.
CultureCLIP: Empowering CLIP with Cultural Awareness through Synthetic Images and Contextualized Captions
Yuchen Huang, Zhiyuan Fan, Zhitao He et al.
VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation
Ziang Ye, Yang Zhang, Wentao Shi et al.
Learning Effective Language Representations for Sequential Recommendation via Joint Embedding Predictive Architecture
Nguyen Anh Minh, Dung D. Le
Reinforcement Learning Enhanced Full-Duplex Spoken Dialogue Language Models for Conversational Interactions
Chen Chen, Ke Hu, Chao-Han Huck Yang et al.
When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars
Rei Higuchi, Ryotaro Kawata, Naoki Nishikawa et al.
Backdoor Attacks on Dense Retrieval via Public and Unintentional Triggers
Quanyu Long, Yue Deng, Leilei Gan et al.
Layers at Similar Depths Generate Similar Activations Across LLM Architectures
Christopher Wolfram, Aaron Schein
Jigsaw Puzzles: Splitting Harmful Questions to Jailbreak Large Language Models in Multi-turn Interactions
Hao Yang, Lizhen Qu, Ehsan Shareghi et al.
Rerouting LLM Routers
Avital Shafran, Roei Schuster, Tom Ristenpart et al.
Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration
Ran Xu, Wenqi Shi, Yuchen Zhuang et al.
CoLa: Learning to Interactively Collaborate with Large Language Models
Abhishek Sharma, Dan Goldwasser
Understanding R1-Zero-Like Training: A Critical Perspective
Zichen Liu, Changyu Chen, Wenjun Li et al.
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation
Zichong Li, Chen Liang, Zixuan Zhang et al.
SEAM: Semantically Equivalent Across Modalities Benchmark for Vision-Language Models
Zhenwei Tang, Difan Jiao, Blair Yang et al.
VideoSAVi: Self-Aligned Video Language Models without Human Supervision
Yogesh Kulkarni, Pooyan Fazli
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents
Saaket Agashe, Kyle Wong, Vincent Tu et al.
How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence
Hongzhe Du, Weikai Li, Min Cai et al.
Implicit In-Context Learning: Evidence from Artificial Language Experiments
Xiaomeng Ma, Qihui Xu
To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning
Tian Qin, David Alvarez-Melis, Samy Jelassi et al.
Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF
Syrine Belakaria, Joshua Kazdan, Charles Marx et al.
SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching
Yuxuan Zhu, Ali Falahati, David H. Yang et al.
Defending LLM Watermarking Against Spoofing Attacks with Contrastive Representation Learning
Li An, Yujian Liu, Yepeng Liu et al.
Do Language Models Agree with Human Perceptions of Suspense in Stories?
Glenn Matlin, Devin Zhang, Rodrigo Barroso Loza et al.
Learning by Teaching: Engaging Students as Instructors of Large Language Models in Computer Science Education
Xinming Yang, Haasil Pujara, Jun Li
CALLME: Call Graph Augmentation with Large Language Models for Javascript
Michael Wang, Kexin Pei, Armando Solar-Lezama
CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing
Wenhao Zheng, Yixiao Chen, Weitong Zhang et al.
Approximating Language Model Training Data from Weights
John Xavier Morris, Junjie Oscar Yin, Woojeong Kim et al.
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Hamed Mahdavi, Alireza Hashemi, Majid Daliri et al.