Most Cited ICML 2025 "compute efficiency" Papers
3,340 papers found • Page 1 of 17
Conference
WorldSimBench: Towards Video Generation Models as World Simulators
Yiran Qin, Zhelun Shi, Jiwen Yu et al.
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu, Yuexiang Zhai, Jihan Yang et al.
From Crowdsourced Data to High-quality Benchmarks: Arena-Hard and Benchbuilder Pipeline
Tianle Li, Wei-Lin Chiang, Evan Frick et al.
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Xinyu Guan, Li Lyna Zhang, Yifei Liu et al.
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
Yuan Zhang, Chun-Kai Fan, Junpeng Ma et al.
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Yu Sun, Xinhao Li, Karan Dalal et al.
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
Xiaoqian Shen, Yunyang Xiong, Changsheng Zhao et al.
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Yiheng Xu, Zekun Wang, Junli Wang et al.
MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
Yuang Zhang, Jiaxi Gu, Li-Wen Wang et al.
Training Software Engineering Agents and Verifiers with SWE-Gym
Jiayi Pan, Xingyao Wang, Graham Neubig et al.
Layer by Layer: Uncovering Hidden Representations in Language Models
Oscar Skean, Md Rifat Arefin, Dan Zhao et al.
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
Anselm Paulus, Arman Zharmagambetov, Chuan Guo et al.
Imagine While Reasoning in Space: Multimodal Visualization-of-Thought
Chengzu Li, Wenshan Wu, Huanyu Zhang et al.
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning
Gaoyue Zhou, Hengkai Pan, Yann LeCun et al.
How Far Is Video Generation from World Model: A Physical Law Perspective
Bingyi Kang, Yang Yue, Rui Lu et al.
Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models
Lucy Xiaoyang Shi, brian ichter, Michael Equi et al.
A General Framework for Inference-time Scaling and Steering of Diffusion Models
Raghav Singhal, Zachary Horvitz, Ryan Teehan et al.
Taming Rectified Flow for Inversion and Editing
Jiangshan Wang, Junfu Pu, Zhongang Qi et al.
AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
Zhengxuan Wu, Aryaman Arora, Atticus Geiger et al.
Free Process Rewards without Process Labels
Lifan Yuan, Wendi Li, Huayu Chen et al.
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning
Yuchen Lin, Ronan Le Bras, Kyle Richardson et al.
Agent-as-a-Judge: Evaluate Agents with Agents
Mingchen Zhuge, Changsheng Zhao, Dylan Ashley et al.
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
Yucheng Hu, Yanjiang Guo, Pengchao Wang et al.
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
Xiong Wang, Yangze Li, Chaoyou Fu et al.
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer
Enze Xie, Junsong Chen, Yuyang Zhao et al.
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Jan Betley, Daniel Tan, Niels Warncke et al.
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong, Zikang Shan, Guhao Feng et al.
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding
Yuxin Zuo, Shang Qu, Yifei Li et al.
OR-Bench: An Over-Refusal Benchmark for Large Language Models
Jiaxing Cui, Wei-Lin Chiang, Ion Stoica et al.
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Rogerio Bonatti, Dan Zhao, Francesco Bonacci et al.
Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions
Jaeyeon Kim, Kulin Shah, Vasilis Kontonis et al.
Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark
Yunzhuo Hao, Jiawei Gu, Huichen Wang et al.
PaperBench: Evaluating AI’s Ability to Replicate AI Research
Giulio Starace, Oliver Jaffe, Dane Sherburn et al.
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
Rui Yang, Hanyang(Jeremy) Chen, Junyu Zhang et al.
Learning Smooth and Expressive Interatomic Potentials for Physical Property Prediction
Xiang Fu, Brandon Wood, Luis Barroso-Luque et al.
Theoretical guarantees on the best-of-n alignment policy
Ahmad Beirami, Alekh Agarwal, Jonathan Berant et al.
Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse
Ryan Liu, Jiayi Geng, Addison J. Wu et al.
Weak-to-Strong Jailbreaking on Large Language Models
Xuandong Zhao, Xianjun Yang, Tianyu Pang et al.
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Dongzhi Jiang, Renrui Zhang, Ziyu Guo et al.
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
Jonas Gehring, Kunhao Zheng, Jade Copet et al.
Agent Workflow Memory
Zhiruo Wang, Jiayuan Mao, Daniel Fried et al.
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning
Zhenni Bi, Kai Han, Chuanjian Liu et al.
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities
Sreyan Ghosh, Zhifeng Kong, Sonal Kumar et al.
TabICL: A Tabular Foundation Model for In-Context Learning on Large Data
Jingang QU, David Holzmüller, Gael Varoquaux et al.
Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks
Lutfi Erdogan, Hiroki Furuta, Sehoon Kim et al.
Multi-agent Architecture Search via Agentic Supernet
Guibin Zhang, Luyang Niu, Junfeng Fang et al.
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Baohao Liao, Yuhui Xu, Hanze Dong et al.
Sundial: A Family of Highly Capable Time Series Foundation Models
Yong Liu, Guo Qin, Zhiyuan Shi et al.
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
Fanqing Meng, Jiaqi Liao, Xinyu Tan et al.
Cradle: Empowering Foundation Agents towards General Computer Control
Weihao Tan, Wentao Zhang, Xinrun Xu et al.
Diffusion Adversarial Post-Training for One-Step Video Generation
Shanchuan Lin, Xin Xia, Yuxi Ren et al.
History-Guided Video Diffusion
Kiwhan Song, Boyuan Chen, Max Simchowitz et al.
Scaling Test-Time Compute Without Verification or RL is Suboptimal
Amrith Setlur, Nived Rajaraman, Sergey Levine et al.
Training Deep Learning Models with Norm-Constrained LMOs
Thomas Pethick, Wanyun Xie, Kimon Antonakopoulos et al.
XAttention: Block Sparse Attention with Antidiagonal Scoring
Ruyi Xu, Guangxuan Xiao, Haofeng Huang et al.
Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts
Xu Liu, Juncheng Liu, Gerald Woo et al.
HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation
Tianwei Lin, Wenqiao Zhang, Sijing Li et al.
GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning
Zhen Xiang, Linzhi Zheng, Yanjie Li et al.
What If We Recaption Billions of Web Images with LLaMA-3?
Xianhang Li, Haoqin Tu, Mude Hui et al.
T1: Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling
Zhenyu Hou, Xin Lv, Rui Lu et al.
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models
Hila Chefer, Uriel Singer, Amit Zohar et al.
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
Marwa Abdulhai, Isadora White, Charlie Snell et al.
SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization
Jintao Zhang, Haofeng Huang, Pengle Zhang et al.
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?
Samuel Miserendino, Michele Wang, Tejal Patwardhan et al.
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Hanshi Sun, Li-Wen Chang, Wenlei Bao et al.
RE-Bench: Evaluating Frontier AI R&D Capabilities of Language Model Agents against Human Experts
Hjalmar Wijk, Tao Lin, Joel Becker et al.
Inductive Moment Matching
Linqi (Alex) Zhou, Stefano Ermon, Jiaming Song
Flow Q-Learning
Seohong Park, Qiyang Li, Sergey Levine
Fast Video Generation with Sliding Tile Attention
Peiyuan Zhang, Yongqi Chen, Runlong Su et al.
An analytic theory of creativity in convolutional diffusion models
Mason Kamb, Surya Ganguli
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
Kaixuan Huang, Jiacheng Guo, Zihao Li et al.
KernelBench: Can LLMs Write Efficient GPU Kernels?
Anne Ouyang, Simon Guo, Simran Arora et al.
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
Yi-Fan Zhang, Tao Yu, Haochen Tian et al.
Automatically Interpreting Millions of Features in Large Language Models
Gonçalo Paulo, Alex Mallen, Caden Juang et al.
AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML
Patara Trirat, Wonyong Jeong, Sung Ju Hwang
Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
Swarnadeep Saha, Xian Li, Marjan Ghazvininejad et al.
Normalizing Flows are Capable Generative Models
Shuangfei Zhai, Ruixiang Zhang, Preetum Nakkiran et al.
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability
Adam Karvonen, Can Rager, Johnny Lin et al.
Learning Multi-Level Features with Matryoshka Sparse Autoencoders
Bart Bussmann, Noa Nabeshima, Adam Karvonen et al.
NoLiMa: Long-Context Evaluation Beyond Literal Matching
Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt et al.
Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
Subhash Kantamneni, Josh Engels, Senthooran Rajamanoharan et al.
VinePPO: Refining Credit Assignment in RL Training of LLMs
Amirhossein Kazemnejad, Milad Aghajohari, Eva Portelance et al.
VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters
Mouxiang Chen, Lefei Shen, Zhuo Li et al.
Long-Term TalkingFace Generation via Motion-Prior Conditional Diffusion Model
SHEN FEI, Cong Wang, Junyao Gao et al.
Organize the Web: Constructing Domains Enhances Pre-Training Data Curation
Alexander Wettig, Kyle Lo, Sewon Min et al.
Masked Autoencoders Are Effective Tokenizers for Diffusion Models
Hao Chen, Yujin Han, Fangyi Chen et al.
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
Andy (DiJia) Su, Hanlin Zhu, Yingchen Xu et al.
Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM’s Reasoning Capability
Zicheng Lin, Tian Liang, Jiahao Xu et al.
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Theodoros Kouzelis, Ioannis Kakogeorgiou, Spyros Gidaris et al.
G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks
Guibin Zhang, Yanwei Yue, Xiangguo Sun et al.
FlexTok: Resampling Images into 1D Token Sequences of Flexible Length
Roman Bachmann, Jesse Allardice, David Mizrahi et al.
NETS: A Non-equilibrium Transport Sampler
Michael Albergo, Eric Vanden-Eijnden
Monte Carlo Tree Search for Comprehensive Exploration in LLM-Based Automatic Heuristic Design
Zhi Zheng, Zhuoliang Xie, Zhenkun Wang et al.
All-atom Diffusion Transformers: Unified generative modelling of molecules and materials
Chaitanya Joshi, Xiang Fu, Yi-Lun Liao et al.
ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding
Xingyu Fu, Minqian Liu, Zhengyuan Yang et al.
EraseAnything: Enabling Concept Erasure in Rectified Flow Transformers
Daiheng Gao, Shilin Lu, Wenbo Zhou et al.
Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning Attack
Tiansheng Huang, Gautam Bhattacharya, Pratik Joshi et al.
STAIR: Improving Safety Alignment with Introspective Reasoning
Yichi Zhang, Siyuan Zhang, Yao Huang et al.
AnyEdit: Edit Any Knowledge Encoded in Language Models
Houcheng Jiang, Junfeng Fang, Ningyu Zhang et al.
FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching
Sucheng Ren, Qihang Yu, Ju He et al.
Learn Beneficial Noise as Graph Augmentation
Siqi Huang, Yanchen Xu, Hongyuan Zhang et al.
FlipAttack: Jailbreak LLMs via Flipping
Yue Liu, Xiaoxin He, Miao Xiong et al.
Empirical Design in Reinforcement Learning
Andrew Patterson, Samuel F Neumann, Martha White et al.
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
Shaokun Zhang, Ming Yin, Jieyu Zhang et al.
Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas
Shiqi Chen, Tongyao Zhu, Ruochen Zhou et al.
UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent
Jianke Zhang, Yanjiang Guo, Yucheng Hu et al.
On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents
Jen-Tse Huang, Jiaxu Zhou, Tailin Jin et al.
The Surprising Effectiveness of Test-Time Training for Few-Shot Learning
Ekin Akyürek, Mehul Damani, Adam Zweiger et al.
Context is Key: A Benchmark for Forecasting with Essential Textual Information
Andrew Williams, Arjun Ashok, Étienne Marcotte et al.
FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing
Yingying Deng, Xiangyu He, Changwang Mei et al.
One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation
Zhendong Wang, Max Li, Ajay Mandlekar et al.
Thinking LLMs: General Instruction Following with Thought Generation
Tianhao Wu, Janice Lan, Weizhe Yuan et al.
AdaWorld: Learning Adaptable World Models with Latent Actions
Shenyuan Gao, Siyuan Zhou, Yilun Du et al.
RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers
Min Zhao, Guande He, Yixiao Chen et al.
ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning
Zhaorun Chen, Mintong Kang, Bo Li
Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection
Zhiyuan Yan, Jiangming Wang, Peng Jin et al.
Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment
Audrey Huang, Adam Block, Qinghua Liu et al.
An Architecture Search Framework for Inference-Time Techniques
Jon Saad-Falcon, Adrian Lafuente, Shlok Natarajan et al.
CollabLLM: From Passive Responders to Active Collaborators
Shirley Wu, Michel Galley, Baolin Peng et al.
STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving
Kefan Dong, Tengyu Ma
SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference
Jintao Zhang, Chendong Xiang, Haofeng Huang et al.
DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation
Dongya Jia, Zhuo Chen, Jiawei Chen et al.
Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond
Chongyu Fan, jinghan jia, Yihua Zhang et al.
Adjoint Sampling: Highly Scalable Diffusion Samplers via Adjoint Matching
Aaron Havens, Benjamin Kurt Miller, Bing Yan et al.
MoH: Multi-Head Attention as Mixture-of-Head Attention
Peng Jin, Bo Zhu, Li Yuan et al.
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Maohao Shen, Guangtao Zeng, Zhenting Qi et al.
The Diffusion Duality
Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan et al.
Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts
Marta Skreta, Tara Akhound-Sadegh, Viktor Ohanesian et al.
Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks
Maya Bechler-Speicher, Ben Finkelshtein, Fabrizio Frasca et al.
CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities
Yuxuan Zhu, Antony Kellermann, Dylan Bowman et al.
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction
Huang Huang, Fangchen Liu, Letian Fu et al.
FlatQuant: Flatness Matters for LLM Quantization
Yuxuan Sun, Ruikang Liu, Haoli Bai et al.
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
Wei Huang, Haotong Qin, Yangdong Liu et al.
Improving the Diffusability of Autoencoders
Ivan Skorokhodov, Sharath Girish, Benran Hu et al.
SafeArena: Evaluating the Safety of Autonomous Web Agents
Ada Tur, Nicholas Meade, Xing Han Lù et al.
Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models
Linhao Luo, Zicheng Zhao, Reza Haffari et al.
FG-CLIP: Fine-Grained Visual and Textual Alignment
Chunyu Xie, Bin Wang, Fanjing Kong et al.
WMAdapter: Adding WaterMark Control to Latent Diffusion Models
Hai Ci, Yiren Song, Pei Yang et al.
SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders
Bartosz Cywiński, Kamil Deja
Which Attention Heads Matter for In-Context Learning?
Kayo Yin, Jacob Steinhardt
Collapse or Thrive: Perils and Promises of Synthetic Data in a Self-Generating World
Joshua Kazdan, Rylan Schaeffer, Apratim Dey et al.
Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models
Xin Zou, Yizhou WANG, Yibo Yan et al.
Learning to Route LLMs with Confidence Tokens
Yu-Neng Chuang, Prathusha Sarma, Parikshit Gopalan et al.
AutoEval Done Right: Using Synthetic Data for Model Evaluation
Pierre Boyeau, Anastasios Angelopoulos, Tianle Li et al.
Time-VLM: Exploring Multimodal Vision-Language Models for Augmented Time Series Forecasting
Siru Zhong, Weilin Ruan, Ming Jin et al.
Transolver++: An Accurate Neural Solver for PDEs on Million-Scale Geometries
HUAKUN LUO, Haixu Wu, Hang Zhou et al.
VideoRoPE: What Makes for Good Video Rotary Position Embedding?
Xilin Wei, Xiaoran Liu, Yuhang Zang et al.
RepoAudit: An Autonomous LLM-Agent for Repository-Level Code Auditing
Jinyao Guo, Chengpeng Wang, Xiangzhe Xu et al.
Detecting Strategic Deception with Linear Probes
Nicholas Goldowsky-Dill, Bilal Chughtai, Stefan Heimersheim et al.
SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
Haoquan Fang, Markus Grotz, Wilbert Pumacay et al.
PepTune: De Novo Generation of Therapeutic Peptides with Multi-Objective-Guided Discrete Diffusion
Sophia Tang, Yinuo Zhang, Pranam Chatterjee, PhD
DeFoG: Discrete Flow Matching for Graph Generation
Yiming Qin, Manuel Madeira, Dorina Thanou et al.
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback
Yafu Li, Xuyang Hu, Xiaoye Qu et al.
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
Rylan Schaeffer, Hailey Schoelkopf, Brando Miranda et al.
Modular Duality in Deep Learning
Jeremy Bernstein, Laker Newhouse
Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding
Mingyu Jin, Kai Mei, Wujiang Xu et al.
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
Perampalli Shravan Nayak, Xiangru Jian, Kevin Qinghong Lin et al.
An Analysis of Quantile Temporal-Difference Learning
Mark Rowland, Remi Munos, Mohammad Gheshlaghi Azar et al.
Robust Autonomy Emerges from Self-Play
Marco Cusumano-Towner, David Hafner, Alexander Hertzberg et al.
AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models
Zheng Lian, Haoyu Chen, Lan Chen et al.
No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces
Daniel Marczak, Simone Magistri, Sebastian Cygert et al.
On the Emergence of Position Bias in Transformers
Xinyi Wu, Yifei Wang, Stefanie Jegelka et al.
High-Dimensional Prediction for Sequential Decision Making
Georgy Noarov, Ramya Ramalingam, Aaron Roth et al.
DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts
Tobias Braun, Mark Rothermel, Marcus Rohrbach et al.
Optimizing Large Language Model Training Using FP4 Quantization
Ruizhe Wang, Yeyun Gong, Xiao Liu et al.
The dark side of the forces: assessing non-conservative force models for atomistic machine learning
Filippo Bigi, Marcel Langer, Michele Ceriotti
Diverging Preferences: When do Annotators Disagree and do Models Know?
Michael Zhang, Zhilin Wang, Jena Hwang et al.
Proposer-Agent-Evaluator (PAE): Autonomous Skill Discovery For Foundation Model Internet Agents
Yifei Zhou, Qianlan Yang, Kaixiang Lin et al.
Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models
Thomas Fel, Ekdeep Singh Lubana, Jacob Prince et al.
Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
Maya Pavlova, Erik Brinkman, Krithika Iyer et al.
The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence
Tom Wollschläger, Jannes Elstner, Simon Geisler et al.
Steer LLM Latents for Hallucination Detection
Seongheon Park, Xuefeng Du, Min-Hsuan Yeh et al.
SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
Guoxuan Chen, Han Shi, jiawei li et al.
Understanding Chain-of-Thought in LLMs through Information Theory
Jean-Francois Ton, Muhammad Faaiz Taufiq, Yang Liu
Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification
Eric Zhao, Pranjal Awasthi, Sreenivas Gollapudi
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models
Xin Xu, Qiyun Xu, Tong Xiao et al.
Efficient Online Reinforcement Learning for Diffusion Policy
Haitong Ma, Tianyi Chen, Kai Wang et al.
FourierMamba: Fourier Learning Integration with State Space Models for Image Deraining
Dong Li, Yidi Liu, Xueyang Fu et al.
Overtrained Language Models Are Harder to Fine-Tune
Jacob Mitchell Springer, Sachin Goyal, Kaiyue Wen et al.
GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing
Akashah Shabbir, Ilmuz Zaman Mohammed Zumri, Mohammed Bennamoun et al.
KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems
Jusheng Zhang, Zimeng Huang, Yijia Fan et al.
Locate-then-edit for Multi-hop Factual Recall under Knowledge Editing
Zhuoran Zhang, Yongxiang Li, Zijian Kan et al.
Distillation Scaling Laws
Dan Busbridge, Amitis Shidani, Floris Weers et al.
How to set AdamW's weight decay as you scale model and dataset size
Xi Wang, Laurence Aitchison
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
Hongzhi Huang, Defa Zhu, Banggu Wu et al.
Diving into Self-Evolving Training for Multimodal Reasoning
Wei Liu, Junlong Li, Xiwen Zhang et al.
AdvAgent: Controllable Blackbox Red-teaming on Web Agents
Chejian Xu, Mintong Kang, Jiawei Zhang et al.
MARS: Unleashing the Power of Variance Reduction for Training Large Models
Huizhuo Yuan, Yifeng Liu, Shuang Wu et al.
LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations
Anian Ruoss, Fabio Pardo, Harris Chan et al.
Training Dynamics of In-Context Learning in Linear Attention
Yedi Zhang, Aaditya Singh, Peter Latham et al.
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
Samira Abnar, Harshay Shah, Dan Busbridge et al.
Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models
Zehan Wang, Ziang Zhang, Tianyu Pang et al.
RUN: Reversible Unfolding Network for Concealed Object Segmentation
Chunming He, Rihan Zhang, Fengyang Xiao et al.
Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
Fan Zhou, Zengzhi Wang, Qian Liu et al.
From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models
Etowah Adams, Liam Bai, Minji Lee et al.
Subspace Optimization for Large Language Models with Convergence Guarantees
Yutong He, Pengrui Li, Yipeng Hu et al.
Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads
Siqi Kou, Jiachun Jin, Zhihong Liu et al.
AI for Global Climate Cooperation: Modeling Global Climate Negotiations, Agreements, and Long-Term Cooperation in RICE-N
Tianyu Zhang, Andrew Williams, Phillip Wozny et al.
Star Attention: Efficient LLM Inference over Long Sequences
Shantanu Acharya, Fei Jia, Boris Ginsburg
LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models
Parshin Shojaee, Ngoc Hieu Nguyen, Kazem Meidani et al.