Most Cited ICML "physical constraint integration" Papers
5,975 papers found • Page 1 of 30
Conference
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Patrick Esser, Sumith Kulal, Andreas Blattmann et al.
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Lianghui Zhu, Bencheng Liao, Qian Zhang et al.
Improving Factuality and Reasoning in Language Models through Multiagent Debate
Yilun Du, Shuang Li, Antonio Torralba et al.
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Tri Dao, Albert Gu
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
Weihao Yu, Zhengyuan Yang, Linjie Li et al.
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Wei-Lin Chiang, Lianmin Zheng, Ying Sheng et al.
Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff et al.
WorldSimBench: Towards Video Generation Models as World Simulators
Yiran Qin, Zhelun Shi, Jiwen Yu et al.
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Mantas Mazeika, Long Phan, Xuwang Yin et al.
NExT-GPT: Any-to-Any Multimodal LLM
Shengqiong Wu, Hao Fei, Leigang Qu et al.
DoRA: Weight-Decomposed Low-Rank Adaptation
Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin et al.
MusicRL: Aligning Music Generation to Human Preferences
Geoffrey Cideron, Sertan Girgin, Mauro Verzetti et al.
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Tianle Cai, Yuhong Li, Zhengyang Geng et al.
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Le Yu, Bowen Yu, Haiyang Yu et al.
RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Harrison Lee, Samrat Phatale, Hassan Mansoor et al.
Self-Rewarding Language Models
Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho et al.
A decoder-only foundation model for time-series forecasting
Abhimanyu Das, Weihao Kong, Rajat Sen et al.
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Zixiang Chen, Yihe Deng, Huizhuo Yuan et al.
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu, Yuexiang Zhai, Jihan Yang et al.
Unified Training of Universal Time Series Forecasting Transformers
Gerald Woo, Chenghao Liu, Akshat Kumar et al.
GPT-4V(ision) is a Generalist Web Agent, if Grounded
Boyuan Zheng, Boyu Gou, Jihyung Kil et al.
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Dan Kondratyuk, Lijun Yu, Xiuye Gu et al.
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
Haoran Xu, Amr Sharaf, Yunmo Chen et al.
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Collin Burns, Pavel Izmailov, Jan Kirchner et al.
LESS: Selecting Influential Data for Targeted Instruction Tuning
Mengzhou Xia, Sadhika Malladi, Suchin Gururangan et al.
Genie: Generative Interactive Environments
Jake Bruce, Michael Dennis, Ashley Edwards et al.
Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels
Haoning Wu, Zicheng Zhang, Weixia Zhang et al.
How Language Model Hallucinations Can Snowball
Muru Zhang, Ofir Press, William Merrill et al.
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Jiawei Zhao, Zhenyu Zhang, Beidi Chen et al.
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
Zirui Liu, Jiayi Yuan, Hongye Jin et al.
Promptbreeder: Self-Referential Self-Improvement via Prompt Evolution
Chrisantha Fernando, Dylan Banarse, Henryk Michalewski et al.
The Linear Representation Hypothesis and the Geometry of Large Language Models
Kiho Park, Yo Joong Choe, Victor Veitch
From Crowdsourced Data to High-quality Benchmarks: Arena-Hard and Benchbuilder Pipeline
Tianle Li, Wei-Lin Chiang, Evan Frick et al.
MOMENT: A Family of Open Time-series Foundation Models
Mononito Goswami, Konrad Szafer, Arjun Choudhry et al.
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution
Aaron Lou, Chenlin Meng, Stefano Ermon
Executable Code Actions Elicit Better LLM Agents
Xingyao Wang, Yangyi Chen, Lifan Yuan et al.
LoRA+: Efficient Low Rank Adaptation of Large Models
Soufiane Hayou, Nikhil Ghosh, Bin Yu
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Yuhui Li, Fangyun Wei, Chao Zhang et al.
The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning
Nathaniel Li, Alexander Pan, Anjali Gopal et al.
Gated Linear Attention Transformers with Hardware-Efficient Training
Songlin Yang, Bailin Wang, Yikang Shen et al.
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
Jian Xie, Kai Zhang, Jiangjie Chen et al.
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint
Wei Xiong, Hanze Dong, Chenlu Ye et al.
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Zeqian Ju, Yuancheng Wang, Kai Shen et al.
An Embodied Generalist Agent in 3D World
Jiangyong Huang, Silong Yong, Xiaojian Ma et al.
AlphaZero-Like Tree-Search can Guide Large Language Model Decoding and Training
Ziyu Wan, Xidong Feng, Muning Wen et al.
MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark
Dongping Chen, Ruoxi Chen, Shilin Zhang et al.
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Yiran Ding, Li Lyna Zhang, Chengruidong Zhang et al.
SqueezeLLM: Dense-and-Sparse Quantization
Sehoon Kim, Coleman Hooper, Amir Gholaminejad et al.
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Xinyu Guan, Li Lyna Zhang, Yifei Liu et al.
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
Siddharth Karamcheti, Suraj Nair, Ashwin Balakrishna et al.
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu, Peter Bailis, Ion Stoica et al.
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Shusheng Xu, Wei Fu, Jiaxuan Gao et al.
QUEST: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Jiaming Tang, Yilong Zhao, Kan Zhu et al.
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
Zeyuan Allen-Zhu, Yuanzhi Li
QuIP$\#$: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
Albert Tseng, Jerry Chee, Qingyao Sun et al.
Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design
Andrew Campbell, Jason Yim, Regina Barzilay et al.
3D-VLA: A 3D Vision-Language-Action Generative World Model
Haoyu Zhen, Xiaowen Qiu, Peihao Chen et al.
Better & Faster Large Language Models via Multi-token Prediction
Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Roziere et al.
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text
Abhimanyu Hans, Avi Schwarzschild, Valeriia Cherepanova et al.
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
Alex Gu, Baptiste Roziere, Hugh Leather et al.
In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
Sheng Liu, Haotian Ye, Lei Xing et al.
GaussianPro: 3D Gaussian Splatting with Progressive Propagation
Kai Cheng, Xiaoxiao Long, Kaizhi Yang et al.
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
Yuan Zhang, Chun-Kai Fan, Junpeng Ma et al.
ULTRAFEEDBACK: Boosting Language Models with Scaled AI Feedback
Ganqu Cui, Lifan Yuan, Ning Ding et al.
Debating with More Persuasive LLMs Leads to More Truthful Answers
Akbir Khan, John Hughes, Dan Valentine et al.
Magicoder: Empowering Code Generation with OSS-Instruct
Yuxiang Wei, Zhe Wang, Jiawei Liu et al.
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
Ling Yang, Zhaochen Yu, Chenlin Meng et al.
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Yu Sun, Xinhao Li, Karan Dalal et al.
Fast Timing-Conditioned Latent Audio Diffusion
Zach Evans, CJ Carr, Josiah Taylor et al.
Evolution of Heuristics: Towards Efficient Automatic Algorithm Design Using Large Language Model
Fei Liu, Tong Xialiang, Mingxuan Yuan et al.
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Zechun Liu, Changsheng Zhao, Forrest Iandola et al.
Nash Learning from Human Feedback
REMI MUNOS, Michal Valko, Daniele Calandriello et al.
AlphaFold Meets Flow Matching for Generating Protein Ensembles
Bowen Jing, Bonnie Berger, Tommi Jaakkola
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation
Yufei Wang, Zhou Xian, Feng Chen et al.
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
Soroush Nasiriany, Fei Xia, Wenhao Yu et al.
Data Engineering for Scaling Language Models to 128K Context
Yao Fu, Rameswar Panda, Xinyao Niu et al.
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Boyi Wei, Kaixuan Huang, Yangsibo Huang et al.
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
Xiaoqian Shen, Yunyang Xiong, Changsheng Zhao et al.
Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews
Weixin Liang, Zachary Izzo, Yaohui Zhang et al.
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Yiheng Xu, Zekun Wang, Junli Wang et al.
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
Xiaoxuan Wang, ziniu hu, Pan Lu et al.
Transolver: A Fast Transformer Solver for PDEs on General Geometries
Haixu Wu, Huakun Luo, Haowen Wang et al.
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Fahim Tajwar, Anikait Singh, Archit Sharma et al.
tinyBenchmarks: evaluating LLMs with fewer examples
Felipe Maia Polo, Lucas Weber, Leshem Choshen et al.
Fundamental Limitations of Alignment in Large Language Models
Yotam Wolf, Noam Wies, Oshri Avnery et al.
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Asma Ghandeharioun, Avi Caciularu, Adam Pearce et al.
Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling
Yair Schiff, Chia Hsiang Kao, Aaron Gokaslan et al.
MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation
Qian Huang, Jian Vora, Percy Liang et al.
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
Zhifeng Kong, ARUSHI GOEL, Rohan Badlani et al.
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Andrew Lee, Xiaoyan Bai, Itamar Pres et al.
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Kaining Ying, Fanqing Meng, Jin Wang et al.
Repeat After Me: Transformers are Better than State Space Models at Copying
Samy Jelassi, David Brandfonbrener, Sham Kakade et al.
MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
Yuang Zhang, Jiaxi Gu, Li-Wen Wang et al.
Extreme Compression of Large Language Models via Additive Quantization
Vage Egiazarian, Andrei Panferov, Denis Kuznedelev et al.
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Fuzhao Xue, Zian Zheng, Yao Fu et al.
Training Software Engineering Agents and Verifiers with SWE-Gym
Jiayi Pan, Xingyao Wang, Graham Neubig et al.
COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
Xingang Guo, Fangxu Yu, Huan Zhang et al.
Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation
Mingyuan Zhou, Huangjie Zheng, Zhendong Wang et al.
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
Lu Yin, You Wu, Zhenyu Zhang et al.
Generalized Preference Optimization: A Unified Approach to Offline Alignment
Yunhao Tang, Zhaohan Guo, Zeyu Zheng et al.
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations
Jiaqi Zhai, Yunxing Liao, Xing Liu et al.
Timer: Generative Pre-trained Transformers Are Large Time Series Models
Yong Liu, Haoran Zhang, Chenyu Li et al.
LLaGA: Large Language and Graph Assistant
Runjin Chen, Tong Zhao, Ajay Jaiswal et al.
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Ziniu Li, Tian Xu, Yushun Zhang et al.
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition
Hao Fei, Shengqiong Wu, Wei Ji et al.
MathScale: Scaling Instruction Tuning for Mathematical Reasoning
Zhengyang Tang, Xingxing Zhang, Benyou Wang et al.
Layer by Layer: Uncovering Hidden Representations in Language Models
Oscar Skean, Md Rifat Arefin, Dan Zhao et al.
Stealing part of a production language model
Nicholas Carlini, Daniel Paleka, Krishnamurthy Dvijotham et al.
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Wei Huang, Yangdong Liu, Haotong Qin et al.
Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Luke Bailey, Euan Ong, Stuart Russell et al.
HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding
Zhaorun Chen, Zhuokai Zhao, HONGYIN LUO et al.
WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?
Alexandre Drouin, Maxime Gasse, Massimo Caccia et al.
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Dongyang Liu, Renrui Zhang, Longtian Qiu et al.
BetterV: Controlled Verilog Generation with Discriminative Guidance
Zehua Pei, Huiling Zhen, Mingxuan Yuan et al.
Simple linear attention language models balance the recall-throughput tradeoff
Simran Arora, Sabri Eyuboglu, Michael Zhang et al.
The Pitfalls of Next-Token Prediction
Gregor Bachmann, Vaishnavh Nagarajan
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
Gokul Swamy, Christoph Dann, Rahul Kidambi et al.
Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game
Zelai Xu, Chao Yu, Fei Fang et al.
SparseTSF: Modeling Long-term Time Series Forecasting with *1k* Parameters
Shengsheng Lin, Weiwei Lin, Wentai Wu et al.
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Yifei Zhou, Andrea Zanette, Jiayi Pan et al.
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Chengshu Li, Jacky Liang, Andy Zeng et al.
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
Anselm Paulus, Arman Zharmagambetov, Chuan Guo et al.
Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts
Zhi-Yi Chin, Chieh Ming Jiang, Ching-Chun Huang et al.
Imagine While Reasoning in Space: Multimodal Visualization-of-Thought
Chengzu Li, Wenshan Wu, Huanyu Zhang et al.
Behavior Generation with Latent Actions
Seungjae Lee, Yibin Wang, Haritheja Etukuru et al.
QuRating: Selecting High-Quality Data for Training Language Models
Alexander Wettig, Aatmik Gupta, Saumya Malik et al.
The Illusion of State in State-Space Models
William Merrill, Jackson Petty, Ashish Sabharwal
TSLANet: Rethinking Transformers for Time Series Representation Learning
Emadeldeen Eldele, Mohamed Ragab, Zhenghua Chen et al.
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning
Gaoyue Zhou, Hengkai Pan, Yann LeCun et al.
How Far Is Video Generation from World Model: A Physical Law Perspective
Bingyi Kang, Yang Yue, Rui Lu et al.
Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment
Rui Yang, Xiaoman Pan, Feng Luo et al.
An LLM Compiler for Parallel Function Calling
Sehoon Kim, Suhong Moon, Ryan Tabrizi et al.
Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models
Yongshuo Zong, Ondrej Bohdal, Tingyang Yu et al.
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Nikhil Sardana, Jacob Portes, Alexandre (Sasha) Doubov et al.
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Xing Han Lù, Zdeněk Kasner, Siva Reddy
Token-level Direct Preference Optimization
Yongcheng Zeng, Guoqing Liu, Weiyu Ma et al.
Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models
Lucy Xiaoyang Shi, brian ichter, Michael Equi et al.
Scaling Laws for Fine-Grained Mixture of Experts
Jan Ludziejewski, Jakub Krajewski, Kamil Adamczewski et al.
A General Framework for Inference-time Scaling and Steering of Diffusion Models
Raghav Singhal, Zachary Horvitz, Ryan Teehan et al.
RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback
Yufei Wang, Zhanyi Sun, Jesse Zhang et al.
Taming Rectified Flow for Inversion and Editing
Jiangshan Wang, Junfu Pu, Zhongang Qi et al.
AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
Zhengxuan Wu, Aryaman Arora, Atticus Geiger et al.
Controlled Decoding from Language Models
Sidharth Mudgal, Jong Lee, Harish Ganapathy et al.
Free Process Rewards without Process Labels
Lifan Yuan, Wendi Li, Huayu Chen et al.
Challenges in Training PINNs: A Loss Landscape Perspective
Pratik Rathore, Weimu Lei, Zachary Frangella et al.
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning
Yuchen Lin, Ronan Le Bras, Kyle Richardson et al.
Agent-as-a-Judge: Evaluate Agents with Agents
Mingchen Zhuge, Changsheng Zhao, Dylan Ashley et al.
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
Yucheng Hu, Yanjiang Guo, Pengchao Wang et al.
Prodigy: An Expeditiously Adaptive Parameter-Free Learner
Konstantin Mishchenko, Aaron Defazio
MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion
Di Chang, Yichun Shi, Quankai Gao et al.
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
Xiong Wang, Yangze Li, Chaoyou Fu et al.
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer
Enze Xie, Junsong Chen, Yuyang Zhao et al.
HumanTOMATO: Text-aligned Whole-body Motion Generation
Shunlin Lu, Ling-Hao Chen, Ailing Zeng et al.
A Tale of Tails: Model Collapse as a Change of Scaling Laws
Elvis Dohmatob, Yunzhen Feng, Pu Yang et al.
AI Control: Improving Safety Despite Intentional Subversion
Ryan Greenblatt, Buck Shlegeris, Kshitij Sachan et al.
ODIN: Disentangled Reward Mitigates Hacking in RLHF
Lichang Chen, Chen Zhu, Jiuhai Chen et al.
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Jan Betley, Daniel Tan, Niels Warncke et al.
RoboDreamer: Learning Compositional World Models for Robot Imagination
Siyuan Zhou, Yilun Du, Jiaben Chen et al.
Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark
Yihua Zhang, Pingzhi Li, Junyuan Hong et al.
Can Mamba Learn How To Learn? A Comparative Study on In-Context Learning Tasks
Jong Ho Park, Jaden Park, Zheyang Xiong et al.
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Jesse Farebrother, Jordi Orbay, Quan Vuong et al.
In-context Convergence of Transformers
Yu Huang, Yuan Cheng, Yingbin LIANG
On Prompt-Driven Safeguarding for Large Language Models
Chujie Zheng, Fan Yin, Hao Zhou et al.
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong, Zikang Shan, Guhao Feng et al.
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding
Yuxin Zuo, Shang Qu, Yifei Li et al.
OR-Bench: An Over-Refusal Benchmark for Large Language Models
Jiaxing Cui, Wei-Lin Chiang, Ion Stoica et al.
Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
Long Qian, Juncheng Li, Yu Wu et al.
Provably Robust DPO: Aligning Language Models with Noisy Feedback
Sayak Ray Chowdhury, Anush Kini, Nagarajan Natarajan
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Xiangming Gu, Xiaosen Zheng, Tianyu Pang et al.
How Transformers Learn Causal Structure with Gradient Descent
Eshaan Nichani, Alex Damian, Jason Lee
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements
Alexander Havrilla, Sharath Chandra Raparthy, Christoforos Nalmpantis et al.
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Rogerio Bonatti, Dan Zhao, Francesco Bonacci et al.
Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions
Jaeyeon Kim, Kulin Shah, Vasilis Kontonis et al.
Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling
Bairu Hou, Yujian Liu, Kaizhi Qian et al.
Proactive Detection of Voice Cloning with Localized Watermarking
Robin San Roman, Pierre Fernandez, Hady Elsahar et al.
Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark
Yunzhuo Hao, Jiawei Gu, Huichen Wang et al.
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models
Bilgehan Sel, Ahmad Al-Tawaha, Vanshaj Khattar et al.
PaperBench: Evaluating AI’s Ability to Replicate AI Research
Giulio Starace, Oliver Jaffe, Dane Sherburn et al.
InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks
Xueyu Hu, Ziyu Zhao, Shuang Wei et al.
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
Rui Yang, Hanyang(Jeremy) Chen, Junyu Zhang et al.
Iterated Denoising Energy Matching for Sampling from Boltzmann Densities
Tara Akhound-Sadegh, Jarrid Rector-Brooks, Joey Bose et al.
MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data
Paul Scotti, Mihir Tripathy, Cesar Kadir Torrico Villanueva et al.
Learning Smooth and Expressive Interatomic Potentials for Physical Property Prediction
Xiang Fu, Brandon Wood, Luis Barroso-Luque et al.
Flora: Low-Rank Adapters Are Secretly Gradient Compressors
Yongchang Hao, Yanshuai Cao, Lili Mou
GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting
Xiaoyu Zhou, Xingjian Ran, Yajiao Xiong et al.
Weak-to-Strong Jailbreaking on Large Language Models
Xuandong Zhao, Xianjun Yang, Tianyu Pang et al.
Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse
Ryan Liu, Jiayi Geng, Addison J. Wu et al.
Diffusion Language Models Are Versatile Protein Learners
Xinyou Wang, Zaixiang Zheng, Fei YE et al.
Theoretical guarantees on the best-of-n alignment policy
Ahmad Beirami, Alekh Agarwal, Jonathan Berant et al.
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Piotr Nawrot, Adrian Łańcucki, Marcin Chochowski et al.
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Dongzhi Jiang, Renrui Zhang, Ziyu Guo et al.
DsDm: Model-Aware Dataset Selection with Datamodels
Logan Engstrom
Position: Levels of AGI for Operationalizing Progress on the Path to AGI
Meredith Morris, Jascha Sohl-Dickstein, Noah Fiedel et al.
Large Language Models are Geographically Biased
Rohin Manvi, Samar Khanna, Marshall Burke et al.
Online Speculative Decoding
Xiaoxuan Liu, Lanxiang Hu, Peter Bailis et al.
DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training
Zhongkai Hao, Chang Su, LIU SONGMING et al.
AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers
Reduan Achtibat, Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer et al.
Localizing Task Information for Improved Model Merging and Compression
Ke Wang, Nikolaos Dimitriadis, Guillermo Ortiz-Jimenez et al.
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
Jonas Gehring, Kunhao Zheng, Jade Copet et al.
SparQ Attention: Bandwidth-Efficient LLM Inference
Luka Ribar, Ivan Chelombiev, Luke Hudlass-Galley et al.