Most Cited ICLR 2025 "coco-20 dataset" Papers
3,827 papers found • Page 1 of 20
Conference
SAM 2: Segment Anything in Images and Videos
Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu et al.
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Clemencia Siro, Guy Gur-Ari, Gaurav Mishra et al.
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Zhuoyi Yang, Jiayan Teng, Wendi Zheng et al.
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Naman Jain, Han, Alex Gu et al.
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Javier Rando, Tony Wang, Stewart Slocum et al.
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Haipeng Luo, Qingfeng Sun, Can Xu et al.
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Jipeng Zhang, Hanze Dong, Tong Zhang et al.
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Jinheng Xie, Weijia Mao, Zechen Bai et al.
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Terry Yue Zhuo, Minh Chien Vu, Jenny Chim et al.
Causal Reasoning and Large Language Models: Opening a New Frontier for Causality
Chenhao Tan, Robert Ness, Amit Sharma et al.
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko, francesco croce, Nicolas Flammarion
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
Songming Liu, Lingxuan Wu, Bangguo Li et al.
OpenHands: An Open Platform for AI Software Developers as Generalist Agents
Xingyao Wang, Boxuan Li, Yufan Song et al.
Generative Verifiers: Reward Modeling as Next-Token Prediction
Lunjun Zhang, Arian Hosseini, Hritik Bansal et al.
Training Language Models to Self-Correct via Reinforcement Learning
Aviral Kumar, Vincent Zhuang, Rishabh Agarwal et al.
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Sihyun Yu, Sangkyung Kwak, Huiwon Jang et al.
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
Alexey Bochkovskiy, Amaël Delaunoy, Hugo Germain et al.
Scaling and evaluating sparse autoencoders
Leo Gao, Tom Dupre la Tour, Henk Tillman et al.
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Chunting Zhou, Lili Yu, Arun Babu et al.
Safety Alignment Should be Made More Than Just a Few Tokens Deep
Xiangyu Qi, Ashwinee Panda, Kaifeng Lyu et al.
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
Chenglei Si, Diyi Yang, Tatsunori Hashimoto
Mixture-of-Agents Enhances Large Language Model Capabilities
Junlin Wang, Jue Wang, Ben Athiwaratkun et al.
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion
Junyi Zhang, Charles Herrmann, Junhwa Hur et al.
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Zhangchen Xu, Fengqing Jiang, Luyao Niu et al.
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Lianghui Zhu, Xinggang Wang, Xinlong Wang
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks, Can Rager, Eric Michaud et al.
SpinQuant: LLM Quantization with Learned Rotations
Zechun Liu, Changsheng Zhao, Igor Fedorov et al.
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Jiabo Ye, Haiyang Xu, Haowei Liu et al.
LoRA Learns Less and Forgets Less
Jonathan Frankle, Jose Javier Gonzalez Ortiz, Cody Blakeney et al.
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Zayne Sprague, Fangcong Yin, Juan Rodriguez et al.
Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models
Jimeng Sun, Shubhendu Trivedi, Zhen Lin
Pyramidal Flow Matching for Efficient Video Generative Modeling
Yang Jin, Zhicheng Sun, Ningyuan Li et al.
Generative Representational Instruction Tuning
Niklas Muennighoff, Hongjin SU, Liang Wang et al.
Self-Play Preference Optimization for Language Model Alignment
Yue Wu, Zhiqing Sun, Rina Hughes et al.
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
Jiayi Ye, Yanbo Wang, Yue Huang et al.
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
Kepan Nan, Rui Xie, Penghao Zhou et al.
Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data
Jingyang Ou, Shen Nie, Kaiwen Xue et al.
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Chris Rawles, Sarah Clinckemaillie, Yifan Chang et al.
One Step Diffusion via Shortcut Models
Kevin Frans, Danijar Hafner, Sergey Levine et al.
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
Xiaoming Shi, Shiyu Wang, Yuqi Nie et al.
Inverse Scaling: When Bigger Isn't Better
Joe Cavanagh, Andrew Gritsevskiy, Najoung Kim et al.
Advancing LLM Reasoning Generalists with Preference Trees
Lifan Yuan, Ganqu Cui, Hanbin Wang et al.
Revisiting Feature Prediction for Learning Visual Representations from Video
Quentin Garrido, Yann LeCun, Michael Rabbat et al.
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
Jiahui Gao, Renjie Pi, Jipeng Zhang et al.
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Marianne Arriola, Aaron Gokaslan, Justin Chiu et al.
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Guangxuan Xiao, Jiaming Tang, Jingwei Zuo et al.
Gated Delta Networks: Improving Mamba2 with Delta Rule
Songlin Yang, Jan Kautz, Ali Hatamizadeh
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov, Kushal Tirumala, Hassan Shapourian et al.
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
Yuancheng Wang, Haoyue Zhan, Liwei Liu et al.
MUSE: Machine Unlearning Six-Way Evaluation for Language Models
Weijia Shi, Jaechan Lee, Yangsibo Huang et al.
Diffusion Models Are Real-Time Game Engines
Dani Valevski, Yaniv Leviathan, Moab Arar et al.
JudgeBench: A Benchmark for Evaluating LLM-Based Judges
Sijun Tan, Siyuan Zhuang, Kyle Montgomery et al.
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving
Yangzhen Wu, Zhiqing Sun, Shanda Li et al.
Retrieval Head Mechanistically Explains Long-Context Factuality
Wenhao Wu, Yizhong Wang, Guangxuan Xiao et al.
AFlow: Automating Agentic Workflow Generation
Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu et al.
World Model on Million-Length Video And Language With Blockwise RingAttention
Hao Liu, Wilson Yan, Matei Zaharia et al.
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Bill Yuchen Lin, Yuntian Deng, Khyathi Chandu et al.
Physics of Language Models: Part 3.2, Knowledge Manipulation
Zeyuan Allen-Zhu, Yuanzhi Li
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Huajian Xin, Z.Z. Ren, Junxiao Song et al.
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark for Large Language Models
Bofei Gao, Feifan Song, Zhe Yang et al.
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
Tinghao Xie, Xiangyu Qi, Yi Zeng et al.
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models
Junfeng Fang, Houcheng Jiang, Kun Wang et al.
Diffusion Policy Policy Optimization
Allen Ren, Justin Lidard, Lars Ankile et al.
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
Shansan Gong, Shivam Agarwal, Yizhe Zhang et al.
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Qingkai Fang, Shoutao Guo, Yan Zhou et al.
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
Jun Shern Chan, Neil Chowdhury, Oliver Jaffe et al.
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solver
Zhenting Qi, Mingyuan MA, Jiahang Xu et al.
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian et al.
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Shengpeng Ji, Ziyue Jiang, Wen Wang et al.
Interpreting Emergent Planning in Model-Free Reinforcement Learning
Thomas Bush, Stephen Chung, Usman Anwar et al.
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
Hadas Orgad, Michael Toker, Zorik Gekhman et al.
IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities
Ziyang Li, Saikat Dutta, Mayur Naik
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
Shi Yu, Chaoyue Tang, Bokai Xu et al.
Automated Design of Agentic Systems
Shengran Hu, Cong Lu, Jeff Clune
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
Jing He, Haodong Li, Wei Yin et al.
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren, Yang Liu, Yadong Lu et al.
Scaling up Masked Diffusion Models on Text
Shen Nie, Fengqi Zhu, Chao Du et al.
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Zehan Qi, Xiao Liu, Iat Long Iong et al.
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
Lijie Fan, Tianhong Li, Siyang Qin et al.
CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL
Mohammadreza Pourreza, Hailong Li, Ruoxi Sun et al.
Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control
Carles Domingo i Enrich, Michal Drozdzal, Brian Karrer et al.
Data Scaling Laws in Imitation Learning for Robotic Manipulation
Fanqi Lin, Yingdong Hu, Pingyue Sheng et al.
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Min Shi, Fuxiao Liu, Shihao Wang et al.
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks
Rui-Jie Zhu, Qihang Zhao, Jason Eshraghian et al.
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
Zeyuan Allen-Zhu, Yuanzhi Li
ToolACE: Winning the Points of LLM Function Calling
Weiwen Liu, Xu Huang, Xingshan Zeng et al.
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
Di Wu, Hongwei Wang, Wenhao Yu et al.
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
Sherwin Bahmani, Ivan Skorokhodov, Aliaksandr Siarohin et al.
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Fei Wang, XINGYU FU, James Y. Huang et al.
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Shaolei Zhang, Qingkai Fang, Yang et al.
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Xiaogeng Liu, Peiran Li, G. Edward Suh et al.
HelpSteer2-Preference: Complementing Ratings with Preferences
Zhilin Wang, Alexander Bukharin, Olivier Delalleau et al.
Tamper-Resistant Safeguards for Open-Weight LLMs
Rishub Tamirisa, Bhrugu Bharathi, Long Phan et al.
EIA: ENVIRONMENTAL INJECTION ATTACK ON GENERALIST WEB AGENTS FOR PRIVACY LEAKAGE
Zeyi Liao, Lingbo Mo, Chejian Xu et al.
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?
John Yang, Carlos E Jimenez, Alex Zhang et al.
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
Xierui Wang, Siming Fu, Qihan Huang et al.
Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling
Kaiwen Zheng, Yongxin Chen, Hanzi Mao et al.
OmniRe: Omni Urban Scene Reconstruction
Ziyu Chen, Jiawei Yang, Jiahui Huang et al.
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents
Hanrong Zhang, Jingyuan Huang, Kai Mei et al.
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai, Enxin Song, Yilun Du et al.
VideoPhy: Evaluating Physical Commonsense for Video Generation
Hritik Bansal, Zongyu Lin, Tianyi Xie et al.
Agent S: An Open Agentic Framework that Uses Computers Like a Human
Saaket Agashe, Jiuzhou Han, Shuyu Gan et al.
MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers
Yiwen Chen, Tong He, Di Huang et al.
Autoregressive Video Generation without Vector Quantization
Haoge Deng, Ting Pan, Haiwen Diao et al.
On the self-verification limitations of large language models on reasoning and planning tasks
Kaya Stechly, Karthik Valmeekam, Subbarao Kambhampati
RegMix: Data Mixture as Regression for Language Model Pre-training
Qian Liu, Xiaosen Zheng, Niklas Muennighoff et al.
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Yushi Bai, Jiajie Zhang, Xin Lv et al.
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
Gen Luo, Yiyi Zhou, Yuxin Zhang et al.
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
Yantao Liu, Zijun Yao, Rui Min et al.
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations
Litu Rout, Yujia Chen, Nataniel Ruiz et al.
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
Tian Ye, Zicheng Xu, Yuanzhi Li et al.
CBraMod: A Criss-Cross Brain Foundation Model for EEG Decoding
Jiquan Wang, Sha Zhao, Zhiling Luo et al.
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Yuang Peng, Yuxin Cui, Haomiao Tang et al.
ColPali: Efficient Document Retrieval with Vision Language Models
Manuel Faysse, Hugues Sibille, Tony Wu et al.
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
Xinlei Chen, Zhuang Liu, Saining Xie et al.
LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias
Haian Jin, Hanwen Jiang, Hao Tan et al.
When Attention Sink Emerges in Language Models: An Empirical View
Xiangming Gu, Tianyu Pang, Chao Du et al.
SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models
Muyang Li, Yujun Lin, Zhekai Zhang et al.
Not All Language Model Features Are One-Dimensionally Linear
Josh Engels, Eric Michaud, Isaac Liao et al.
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
Cong Wei, Zheyang Xiong, Weiming Ren et al.
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency
Jianwen Jiang, Chao Liang, Jiaqi Yang et al.
Kolmogorov-Arnold Transformer
Xingyi Yang, Xinchao Wang
LiveBench: A Challenging, Contamination-Limited LLM Benchmark
Colin White, Samuel Dooley, Manley Roberts et al.
AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation
Yuning Cui, Syed Waqas Zamir, Salman Khan et al.
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Weifeng Lin, Xinyu Wei, Ruichuan An et al.
Making Text Embedders Few-Shot Learners
Chaofan Li, Minghao Qin, Shitao Xiao et al.
TimeMixer++: A General Time Series Pattern Machine for Universal Predictive Analysis
Shiyu Wang, Jiawei LI, Xiaoming Shi et al.
Vision-LSTM: xLSTM as Generic Vision Backbone
Benedikt Alkin, Maximilian Beck, Korbinian Pöppel et al.
Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation
Yang Tian, Sizhe Yang, Jia Zeng et al.
Training-free Camera Control for Video Generation
Chen Hou, Zhibo Chen
Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances
Shilin Lu, Zihan Zhou, Jiayou Lu et al.
AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation
Jiafei Duan, Wilbert Pumacay, Nishanth Kumar et al.
DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models
Chengke Zou, Xingang Guo, Rui Yang et al.
Soft Merging of Experts with Adaptive Routing
Haokun Liu, Muqeeth Mohammed, Colin Raffel
Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs
Minh Nguyen, Andrew Baker, Clement Neo et al.
Unlocking Guidance for Discrete State-Space Diffusion and Flow Models
Hunter Nisonoff, Junhao Xiong, Stephan Allenspach et al.
Consistency Models Made Easy
Zhengyang Geng, Ashwini Pokle, Weijian Luo et al.
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
jiarui zhang, Mahyar Khayatkhoei, Prateek Chhikara et al.
CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models
Hyungjin Chung, Jeongsol Kim, Geon Yeong Park et al.
Real-Time Video Generation with Pyramid Attention Broadcast
Xuanlei Zhao, Xiaolong Jin, Kai Wang et al.
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Zilong (Ryan) Wang, Zifeng Wang, Long Le et al.
RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation
Sergio Gómez Colmenarejo, Jost Springenberg, Jose Enrique Chen et al.
MM-EMBED: UNIVERSAL MULTIMODAL RETRIEVAL WITH MULTIMODAL LLMS
Sheng-Chieh Lin, Chankyu Lee, Mohammad Shoeybi et al.
GraphRouter: A Graph-based Router for LLM Selections
Tao Feng, Yanzhen Shen, Jiaxuan You
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Javier Ferrando, Oscar Obeso, Senthooran Rajamanoharan et al.
Language models scale reliably with over-training and on downstream tasks
Samir Yitzhak Gadre, Georgios Smyrnis, Vaishaal Shankar et al.
Generalization v.s. Memorization: Tracing Language Models’ Capabilities Back to Pretraining Data
Xinyi Wang, Antonis Antoniades, Yanai Elazar et al.
Eliciting Human Preferences with Language Models
Belinda Li, Alex Tamkin, Noah Goodman et al.
Dissecting Adversarial Robustness of Multimodal LM Agents
Chen Wu, Rishi Shah, Jing Yu Koh et al.
FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models
Zhipei Xu, Xuanyu Zhang, Runyi Li et al.
Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning
Jiacheng Ye, Jiahui Gao, Shansan Gong et al.
Diffusion-Based Planning for Autonomous Driving with Flexible Guidance
Yinan Zheng, Ruiming Liang, Kexin ZHENG et al.
OGBench: Benchmarking Offline Goal-Conditioned RL
Seohong Park, Kevin Frans, Benjamin Eysenbach et al.
InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales
Zhepei Wei, Wei-Lin Chen, Yu Meng
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine
Renrui Zhang, Xinyu Wei, Dongzhi Jiang et al.
Improved Techniques for Optimization-Based Jailbreaking on Large Language Models
Xiaojun Jia, Tianyu Pang, Chao Du et al.
MMTEB: Massive Multilingual Text Embedding Benchmark
Kenneth Enevoldsen, Isaac Chung, Imene Kerboua et al.
MaskBit: Embedding-free Image Generation via Bit Tokens
Mark Weber, Lijun Yu, Qihang Yu et al.
Language Models Learn to Mislead Humans via RLHF
Jiaxin Wen, Ruiqi Zhong, Akbir Khan et al.
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine
Yunfei Xie, Ce Zhou, Lang Gao et al.
Planning in Natural Language Improves LLM Search for Code Generation
Evan Wang, Federico Cassano, Catherine Wu et al.
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
Davide Paglieri, Bartłomiej Cupiał, Samuel Coward et al.
Programming Refusal with Conditional Activation Steering
Bruce W. Lee, Inkit Padhi, Karthikeyan Natesan Ramamurthy et al.
Weak to Strong Generalization for Large Language Models with Multi-capabilities
Yucheng Zhou, Jianbing Shen, Yu Cheng
Fine-tuning can cripple your foundation model; preserving features may be the solution
Philip Torr, Puneet Dokania, Jishnu Mukhoti et al.
Accelerating Diffusion Transformers with Token-wise Feature Caching
Chang Zou, Xuyang Liu, Ting Liu et al.
Improving Text-to-Image Consistency via Automatic Prompt Optimization
Melissa Hall, Michal Drozdzal, Oscar Mañas et al.
CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models
Zheng Chong, Xiao Dong, Haoxiang Li et al.
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Xiao Liu, Tianjie Zhang, Yu Gu et al.
HAMSTER: Hierarchical Action Models for Open-World Robot Manipulation
Yi Li, Yuquan Deng, Jesse Zhang et al.
DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving
Xiaosong Jia, Junqi You, Zhiyuan Zhang et al.
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents
Ke Yang, Yao Liu, Sapana Chaudhary et al.
Scaling Laws for Precision
Tanishq Kumar, Zachary Ankner, Benjamin Spector et al.
Does Refusal Training in LLMs Generalize to the Past Tense?
Maksym Andriushchenko, Nicolas Flammarion
CycleResearcher: Improving Automated Research via Automated Review
Yixuan Weng, Minjun Zhu, Guangsheng Bao et al.
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation
Cheng Yang, Chufan Shi, Yaxin Liu et al.
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Fushuo Huo, Wenchao Xu, Zhong Zhang et al.
Reasoning with Latent Thoughts: On the Power of Looped Transformers
Nikunj Saunshi, Nishanth Dikkala, Zhiyuan Li et al.
MagicPIG: LSH Sampling for Efficient LLM Generation
Zhuoming Chen, Ranajoy Sadhukhan, Zihao Ye et al.
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequences
Canyu Zhao, Mingyu Liu, Wen Wang et al.
FreDF: Learning to Forecast in the Frequency Domain
Hao Wang, Lichen Pan, Yuan Shen et al.
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
Hritik Bansal, Arian Hosseini, Rishabh Agarwal et al.
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
Aleksandar Makelov, Georg Lange, Neel Nanda
ImageFolder: Autoregressive Image Generation with Folded Tokens
Xiang Li, Kai Qiu, Hao Chen et al.
Arithmetic Without Algorithms: Language Models Solve Math with a Bag of Heuristics
Yaniv Nikankin, Anja Reusch, Aaron Mueller et al.
Simple Guidance Mechanisms for Discrete Diffusion Models
Yair Schiff, Subham Sahoo, Hao Phung et al.
DSBench: How Far Are Data Science Agents from Becoming Data Science Experts?
Liqiang Jing, Zhehui Huang, Xiaoyang Wang et al.
Learning Dynamics of LLM Finetuning
YI REN, Danica Sutherland
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
Ranajoy Sadhukhan, Jian Chen, Zhuoming Chen et al.
Image and Video Tokenization with Binary Spherical Quantization
Yue Zhao, Yuanjun Xiong, Philipp Krähenbühl
Animate-X: Universal Character Image Animation with Enhanced Motion Representation
Shuai Tan, Biao Gong, Xiang Wang et al.
SafeDiffuser: Safe Planning with Diffusion Probabilistic Models
Wei Xiao, Johnson (Tsun-Hsuan) Wang, Chuang Gan et al.
Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation
Tiansheng Huang, Sihao Hu, Fatih Ilhan et al.
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
Hanlin Tang, Yang Lin, Jing Lin et al.
See What You Are Told: Visual Attention Sink in Large Multimodal Models
Seil Kang, Jinyeong Kim, Junhyeok Kim et al.
CATCH: Channel-Aware Multivariate Time Series Anomaly Detection via Frequency Patching
Xingjian Wu, Xiangfei Qiu, Zhengyu Li et al.
Repetition Improves Language Model Embeddings
Jacob Springer, Suhas Kotha, Daniel Fried et al.
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
Hyungjoo Chae, Namyoung Kim, Kai Ong et al.
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Teun van der Weij, Felix Hofstätter, Oliver Jaffe et al.
Matryoshka Multimodal Models
Mu Cai, Jianwei Yang, Jianfeng Gao et al.