Most Cited ICLR 2025 "generalization to novel classes" Papers
3,827 papers found • Page 1 of 20
Conference
SAM 2: Segment Anything in Images and Videos
Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu et al.
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Clemencia Siro, Guy Gur-Ari, Gaurav Mishra et al.
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Zhuoyi Yang, Jiayan Teng, Wendi Zheng et al.
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Naman Jain, Han, Alex Gu et al.
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Javier Rando, Tony Wang, Stewart Slocum et al.
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Haipeng Luo, Qingfeng Sun, Can Xu et al.
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Jipeng Zhang, Hanze Dong, Tong Zhang et al.
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Jinheng Xie, Weijia Mao, Zechen Bai et al.
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
Iman Mirzadeh, Keivan Alizadeh-Vahid, Hooman Shahrokhi et al.
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
Chankyu Lee, Rajarshi Roy, Mengyao Xu et al.
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Terry Yue Zhuo, Minh Chien Vu, Jenny Chim et al.
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
Songming Liu, Lingxuan Wu, Bangguo Li et al.
Causal Reasoning and Large Language Models: Opening a New Frontier for Causality
Chenhao Tan, Robert Ness, Amit Sharma et al.
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko, francesco croce, Nicolas Flammarion
OpenHands: An Open Platform for AI Software Developers as Generalist Agents
Xingyao Wang, Boxuan Li, Yufan Song et al.
Generative Verifiers: Reward Modeling as Next-Token Prediction
Lunjun Zhang, Arian Hosseini, Hritik Bansal et al.
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Sihyun Yu, Sangkyung Kwak, Huiwon Jang et al.
Scaling and evaluating sparse autoencoders
Leo Gao, Tom Dupre la Tour, Henk Tillman et al.
Training Language Models to Self-Correct via Reinforcement Learning
Aviral Kumar, Vincent Zhuang, Rishabh Agarwal et al.
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Chunting Zhou, Lili Yu, Arun Babu et al.
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
Alexey Bochkovskiy, Amaël Delaunoy, Hugo Germain et al.
Safety Alignment Should be Made More Than Just a Few Tokens Deep
Xiangyu Qi, Ashwinee Panda, Kaifeng Lyu et al.
Mixture-of-Agents Enhances Large Language Model Capabilities
Junlin Wang, Jue Wang, Ben Athiwaratkun et al.
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
Chenglei Si, Diyi Yang, Tatsunori Hashimoto
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Zhangchen Xu, Fengqing Jiang, Luyao Niu et al.
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion
Junyi Zhang, Charles Herrmann, Junhwa Hur et al.
SpinQuant: LLM Quantization with Learned Rotations
Zechun Liu, Changsheng Zhao, Igor Fedorov et al.
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Lianghui Zhu, Xinggang Wang, Xinlong Wang
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks, Can Rager, Eric Michaud et al.
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Boyu Gou, Demi Ruohan Wang, Boyuan Zheng et al.
LoRA Learns Less and Forgets Less
Jonathan Frankle, Jose Javier Gonzalez Ortiz, Cody Blakeney et al.
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Zayne Sprague, Fangcong Yin, Juan Rodriguez et al.
Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models
Jimeng Sun, Shubhendu Trivedi, Zhen Lin
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Jiabo Ye, Haiyang Xu, Haowei Liu et al.
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
Jiayi Ye, Yanbo Wang, Yue Huang et al.
Pyramidal Flow Matching for Efficient Video Generative Modeling
Yang Jin, Zhicheng Sun, Ningyuan Li et al.
Generative Representational Instruction Tuning
Niklas Muennighoff, Hongjin SU, Liang Wang et al.
Self-Play Preference Optimization for Language Model Alignment
Yue Wu, Zhiqing Sun, Rina Hughes et al.
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Yukang Chen, Fuzhao Xue, Dacheng Li et al.
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Yecheng Wu, Zhuoyang Zhang, Junyu Chen et al.
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Chris Rawles, Sarah Clinckemaillie, Yifan Chang et al.
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
Kepan Nan, Rui Xie, Penghao Zhou et al.
Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data
Jingyang Ou, Shen Nie, Kaiwen Xue et al.
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Marianne Arriola, Aaron Gokaslan, Justin Chiu et al.
Revisiting Feature Prediction for Learning Visual Representations from Video
Quentin Garrido, Yann LeCun, Michael Rabbat et al.
One Step Diffusion via Shortcut Models
Kevin Frans, Danijar Hafner, Sergey Levine et al.
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
Xiaoming Shi, Shiyu Wang, Yuqi Nie et al.
Inverse Scaling: When Bigger Isn't Better
Joe Cavanagh, Andrew Gritsevskiy, Najoung Kim et al.
Advancing LLM Reasoning Generalists with Preference Trees
Lifan Yuan, Ganqu Cui, Hanbin Wang et al.
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Guangxuan Xiao, Jiaming Tang, Jingwei Zuo et al.
Gated Delta Networks: Improving Mamba2 with Delta Rule
Songlin Yang, Jan Kautz, Ali Hatamizadeh
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
Sakshi, Utkarsh Tyagi, Sonal Kumar et al.
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
Jiahui Gao, Renjie Pi, Jipeng Zhang et al.
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov, Kushal Tirumala, Hassan Shapourian et al.
Diffusion Models Are Real-Time Game Engines
Dani Valevski, Yaniv Leviathan, Moab Arar et al.
OLMoE: Open Mixture-of-Experts Language Models
Niklas Muennighoff, Luca Soldaini, Dirk Groeneveld et al.
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Amrith Setlur, Chirag Nagpal, Adam Fisch et al.
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
Xin Wang, Yu Zheng, Zhongwei Wan et al.
MUSE: Machine Unlearning Six-Way Evaluation for Language Models
Weijia Shi, Jaechan Lee, Yangsibo Huang et al.
Latent Action Pretraining from Videos
Seonghyeon Ye, Joel Jang, Byeongguk Jeon et al.
JudgeBench: A Benchmark for Evaluating LLM-Based Judges
Sijun Tan, Siyuan Zhuang, Kyle Montgomery et al.
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
Yuancheng Wang, Haoyue Zhan, Liwei Liu et al.
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
Shansan Gong, Shivam Agarwal, Yizhe Zhang et al.
Simplifying, Stabilizing and Scaling Continuous-time Consistency Models
Cheng Lu, Yang Song
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models
Junfeng Fang, Houcheng Jiang, Kun Wang et al.
AFlow: Automating Agentic Workflow Generation
Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu et al.
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies
Ruijie Zheng, Yongyuan Liang, Shuaiyi Huang et al.
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
Tinghao Xie, Xiangyu Qi, Yi Zeng et al.
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Bill Yuchen Lin, Yuntian Deng, Khyathi Chandu et al.
Retrieval Head Mechanistically Explains Long-Context Factuality
Wenhao Wu, Yizhong Wang, Guangxuan Xiao et al.
World Model on Million-Length Video And Language With Blockwise RingAttention
Hao Liu, Wilson Yan, Matei Zaharia et al.
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark for Large Language Models
Bofei Gao, Feifan Song, Zhe Yang et al.
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
YiFan Zhang, Huanyu Zhang, Haochen Tian et al.
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
MUDE HUI, Siwei Yang, Bingchen Zhao et al.
Scaling Large Language Model-based Multi-Agent Collaboration
Chen Qian, Zihao Xie, YiFei Wang et al.
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving
Yangzhen Wu, Zhiqing Sun, Shanda Li et al.
Diffusion Policy Policy Optimization
Allen Ren, Justin Lidard, Lars Ankile et al.
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian et al.
Physics of Language Models: Part 3.2, Knowledge Manipulation
Zeyuan Allen-Zhu, Yuanzhi Li
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Huajian Xin, Z.Z. Ren, Junxiao Song et al.
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
Jun Shern Chan, Neil Chowdhury, Oliver Jaffe et al.
Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Fangyu Lei, Jixuan Chen, Yuxiao Ye et al.
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
Shubham Toshniwal, Wei Du, Ivan Moshkov et al.
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
Junyu Chen, Han Cai, Junsong Chen et al.
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
Zuyan Liu, Yuhao Dong, Ziwei Liu et al.
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Qingkai Fang, Shoutao Guo, Yan Zhou et al.
IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities
Ziyang Li, Saikat Dutta, Mayur Naik
Automated Design of Agentic Systems
Shengran Hu, Cong Lu, Jeff Clune
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Shengpeng Ji, Ziyue Jiang, Wen Wang et al.
Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control
Carles Domingo i Enrich, Michal Drozdzal, Brian Karrer et al.
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
Hadas Orgad, Michael Toker, Zorik Gekhman et al.
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solver
Zhenting Qi, Mingyuan MA, Jiahang Xu et al.
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
Di Wu, Hongwei Wang, Wenhao Yu et al.
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
Jing He, Haodong Li, Wei Yin et al.
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
Shi Yu, Chaoyue Tang, Bokai Xu et al.
Interpreting Emergent Planning in Model-Free Reinforcement Learning
Thomas Bush, Stephen Chung, Usman Anwar et al.
Scaling up Masked Diffusion Models on Text
Shen Nie, Fengqi Zhu, Chao Du et al.
ToolACE: Winning the Points of LLM Function Calling
Weiwen Liu, Xu Huang, Xingshan Zeng et al.
Data Scaling Laws in Imitation Learning for Robotic Manipulation
Fanqi Lin, Yingdong Hu, Pingyue Sheng et al.
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren, Yang Liu, Yadong Lu et al.
CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL
Mohammadreza Pourreza, Hailong Li, Ruoxi Sun et al.
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Zehan Qi, Xiao Liu, Iat Long Iong et al.
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks
Rui-Jie Zhu, Qihang Zhao, Jason Eshraghian et al.
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Fei Wang, XINGYU FU, James Y. Huang et al.
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
Hongjin SU, Howard Yen, Mengzhou Xia et al.
No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images
Botao Ye, Sifei Liu, Haofei Xu et al.
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
Lijie Fan, Tianhong Li, Siyang Qin et al.
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Jiasheng Ye, Peiju Liu, Tianxiang Sun et al.
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
Sherwin Bahmani, Ivan Skorokhodov, Aliaksandr Siarohin et al.
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
Ziyan Jiang, Rui Meng, Xinyi Yang et al.
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Shaolei Zhang, Qingkai Fang, Yang et al.
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
Zeyuan Allen-Zhu, Yuanzhi Li
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents
Hanrong Zhang, Jingyuan Huang, Kai Mei et al.
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Min Shi, Fuxiao Liu, Shihao Wang et al.
Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling
Kaiwen Zheng, Yongxin Chen, Hanzi Mao et al.
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Xiaogeng Liu, Peiran Li, G. Edward Suh et al.
Tamper-Resistant Safeguards for Open-Weight LLMs
Rishub Tamirisa, Bhrugu Bharathi, Long Phan et al.
A Sanity Check for AI-generated Image Detection
Shilin Yan, Ouxiang Li, Jiayin Cai et al.
HelpSteer2-Preference: Complementing Ratings with Preferences
Zhilin Wang, Alexander Bukharin, Olivier Delalleau et al.
EIA: ENVIRONMENTAL INJECTION ATTACK ON GENERALIST WEB AGENTS FOR PRIVACY LEAKAGE
Zeyi Liao, Lingbo Mo, Chejian Xu et al.
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Yuchen Duan, Weiyun Wang, Zhe Chen et al.
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
Yantao Liu, Zijun Yao, Rui Min et al.
Autoregressive Video Generation without Vector Quantization
Haoge Deng, Ting Pan, Haiwen Diao et al.
On the self-verification limitations of large language models on reasoning and planning tasks
Kaya Stechly, Karthik Valmeekam, Subbarao Kambhampati
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?
John Yang, Carlos E Jimenez, Alex Zhang et al.
SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency
Yiming Xie, Chun-Han Yao, Vikram Voleti et al.
OmniRe: Omni Urban Scene Reconstruction
Ziyu Chen, Jiawei Yang, Jiahui Huang et al.
Agent S: An Open Agentic Framework that Uses Computers Like a Human
Saaket Agashe, Jiuzhou Han, Shuyu Gan et al.
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations
Litu Rout, Yujia Chen, Nataniel Ruiz et al.
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
Xierui Wang, Siming Fu, Qihan Huang et al.
VideoPhy: Evaluating Physical Commonsense for Video Generation
Hritik Bansal, Zongyu Lin, Tianyi Xie et al.
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai, Enxin Song, Yilun Du et al.
RegMix: Data Mixture as Regression for Language Model Pre-training
Qian Liu, Xiaosen Zheng, Niklas Muennighoff et al.
Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG
Bowen Jin, Jinsung Yoon, Jiawei Han et al.
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Yushi Bai, Jiajie Zhang, Xin Lv et al.
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
Gen Luo, Yiyi Zhou, Yuxin Zhang et al.
MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers
Yiwen Chen, Tong He, Di Huang et al.
Not All Language Model Features Are One-Dimensionally Linear
Josh Engels, Eric Michaud, Isaac Liao et al.
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
Tian Ye, Zicheng Xu, Yuanzhi Li et al.
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Haotian Tang, Yecheng Wu, Shang Yang et al.
Consistency Models Made Easy
Zhengyang Geng, Ashwini Pokle, Weijian Luo et al.
Adam-mini: Use Fewer Learning Rates To Gain More
Yushun Zhang, Congliang Chen, Ziniu Li et al.
SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models
Muyang Li, Yujun Lin, Zhekai Zhang et al.
When Attention Sink Emerges in Language Models: An Empirical View
Xiangming Gu, Tianyu Pang, Chao Du et al.
LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias
Haian Jin, Hanwen Jiang, Hao Tan et al.
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency
Jianwen Jiang, Chao Liang, Jiaqi Yang et al.
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
Xinlei Chen, Zhuang Liu, Saining Xie et al.
CBraMod: A Criss-Cross Brain Foundation Model for EEG Decoding
Jiquan Wang, Sha Zhao, Zhiling Luo et al.
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Jintao Zhang, Jia wei, Pengle Zhang et al.
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Yuang Peng, Yuxin Cui, Haomiao Tang et al.
ColPali: Efficient Document Retrieval with Vision Language Models
Manuel Faysse, Hugues Sibille, Tony Wu et al.
AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation
Yuning Cui, Syed Waqas Zamir, Salman Khan et al.
Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs
Minh Nguyen, Andrew Baker, Clement Neo et al.
Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation
Yang Tian, Sizhe Yang, Jia Zeng et al.
TimeMixer++: A General Time Series Pattern Machine for Universal Predictive Analysis
Shiyu Wang, Jiawei LI, Xiaoming Shi et al.
Kolmogorov-Arnold Transformer
Xingyi Yang, Xinchao Wang
DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models
Chengke Zou, Xingang Guo, Rui Yang et al.
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
Cong Wei, Zheyang Xiong, Weiming Ren et al.
OGBench: Benchmarking Offline Goal-Conditioned RL
Seohong Park, Kevin Frans, Benjamin Eysenbach et al.
Making Text Embedders Few-Shot Learners
Chaofan Li, Minghao Qin, Shitao Xiao et al.
Unlocking Guidance for Discrete State-Space Diffusion and Flow Models
Hunter Nisonoff, Junhao Xiong, Stephan Allenspach et al.
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
jiarui zhang, Mahyar Khayatkhoei, Prateek Chhikara et al.
LiveBench: A Challenging, Contamination-Limited LLM Benchmark
Colin White, Samuel Dooley, Manley Roberts et al.
GraphRouter: A Graph-based Router for LLM Selections
Tao Feng, Yanzhen Shen, Jiaxuan You
Vision-LSTM: xLSTM as Generic Vision Backbone
Benedikt Alkin, Maximilian Beck, Korbinian Pöppel et al.
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Weifeng Lin, Xinyu Wei, Ruichuan An et al.
MotionClone: Training-Free Motion Cloning for Controllable Video Generation
Pengyang Ling, Jiazi Bu, Pan Zhang et al.
MM-EMBED: UNIVERSAL MULTIMODAL RETRIEVAL WITH MULTIMODAL LLMS
Sheng-Chieh Lin, Chankyu Lee, Mohammad Shoeybi et al.
Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances
Shilin Lu, Zihan Zhou, Jiayou Lu et al.
Safety Layers in Aligned Large Language Models: The Key to LLM Security
Shen Li, Liuyi Yao, Lan Zhang et al.
Programming Refusal with Conditional Activation Steering
Bruce W. Lee, Inkit Padhi, Karthikeyan Natesan Ramamurthy et al.
Improved Techniques for Optimization-Based Jailbreaking on Large Language Models
Xiaojun Jia, Tianyu Pang, Chao Du et al.
AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation
Jiafei Duan, Wilbert Pumacay, Nishanth Kumar et al.
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Javier Ferrando, Oscar Obeso, Senthooran Rajamanoharan et al.
Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning
Jiacheng Ye, Jiahui Gao, Shansan Gong et al.
Training-free Camera Control for Video Generation
Chen Hou, Zhibo Chen
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models
Peng Xia, Kangyu Zhu, Haoran Li et al.
CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models
Hyungjin Chung, Jeongsol Kim, Geon Yeong Park et al.
Diffusion-Based Planning for Autonomous Driving with Flexible Guidance
Yinan Zheng, Ruiming Liang, Kexin ZHENG et al.
Soft Merging of Experts with Adaptive Routing
Haokun Liu, Muqeeth Mohammed, Colin Raffel
Real-Time Video Generation with Pyramid Attention Broadcast
Xuanlei Zhao, Xiaolong Jin, Kai Wang et al.
D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement
Yansong Peng, Hebei Li, Peixi Wu et al.
Dissecting Adversarial Robustness of Multimodal LM Agents
Chen Wu, Rishi Shah, Jing Yu Koh et al.
Improving Instruction-Following in Language Models through Activation Steering
Alessandro Stolfo, Vidhisha Balachandran, Safoora Yousefi et al.
Generalization v.s. Memorization: Tracing Language Models’ Capabilities Back to Pretraining Data
Xinyi Wang, Antonis Antoniades, Yanai Elazar et al.
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation
Jiahao Cui, Hui Li, Yao Yao et al.
MMTEB: Massive Multilingual Text Embedding Benchmark
Kenneth Enevoldsen, Isaac Chung, Imene Kerboua et al.
Eliciting Human Preferences with Language Models
Belinda Li, Alex Tamkin, Noah Goodman et al.
Language models scale reliably with over-training and on downstream tasks
Samir Yitzhak Gadre, Georgios Smyrnis, Vaishaal Shankar et al.
Reasoning with Latent Thoughts: On the Power of Looped Transformers
Nikunj Saunshi, Nishanth Dikkala, Zhiyuan Li et al.
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine
Renrui Zhang, Xinyu Wei, Dongzhi Jiang et al.
Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF
Tengyang Xie, Dylan Foster, Akshay Krishnamurthy et al.
RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation
Sergio Gómez Colmenarejo, Jost Springenberg, Jose Enrique Chen et al.
Language Models Learn to Mislead Humans via RLHF
Jiaxin Wen, Ruiqi Zhong, Akbir Khan et al.
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Zilong (Ryan) Wang, Zifeng Wang, Long Le et al.
Enhancing End-to-End Autonomous Driving with Latent World Model
Yingyan Li, Lue Fan, Jiawei He et al.
FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models
Zhipei Xu, Xuanyu Zhang, Runyi Li et al.
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
Bahare Fatemi, Seyed Mehran Kazemi, Anton Tsitsulin et al.
Round and Round We Go! What makes Rotary Positional Encodings useful?
Federico Barbero, Alex Vitvitskyi, Christos Perivolaropoulos et al.
Fine-tuning can cripple your foundation model; preserving features may be the solution
Philip Torr, Puneet Dokania, Jishnu Mukhoti et al.