Most Cited 2025 "adversarial explanations" Papers
22,274 papers found • Page 3 of 112
Conference
Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems
Guibin Zhang, Yanwei Yue, Zhixun Li et al.
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Hanshi Sun, Li-Wen Chang, Wenlei Bao et al.
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Orr Zohar, Xiaohan Wang, Yann Dubois et al.
Hymba: A Hybrid-head Architecture for Small Language Models
Xin Dong, Yonggan Fu, Shizhe Diao et al.
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
Yangning Li, Yinghui Li, Xinyu Wang et al.
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
Qi Qin, Le Zhuo, Yi Xin et al.
Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment
Lijie Liu, Tianxiang Ma, Bingchuan Li et al.
Self-Improvement in Language Models: The Sharpening Mechanism
Audrey Huang, Adam Block, Dylan Foster et al.
AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML
Patara Trirat, Wonyong Jeong, Sung Ju Hwang
LLM-SR: Scientific Equation Discovery via Programming with Large Language Models
Parshin Shojaee, Kazem Meidani, Shashank Gupta et al.
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics
Enshen Zhou, Jingkun An, Cheng Chi et al.
Simplifying Deep Temporal Difference Learning
Matteo Gallici, Mattie Fellows, Benjamin Ellis et al.
Sundial: A Family of Highly Capable Time Series Foundation Models
Yong Liu, Guo Qin, Zhiyuan Shi et al.
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints
Jianhong Bai, Menghan Xia, Xintao WANG et al.
AgentSquare: Automatic LLM Agent Search in Modular Design Space
Yu Shang, Yu Li, Keyu Zhao et al.
Stable Flow: Vital Layers for Training-Free Image Editing
Omri Avrahami, Or Patashnik, Ohad Fried et al.
Controlling Space and Time with Diffusion Models
Daniel Watson, Saurabh Saxena, Lala Li et al.
GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving
Zebin Xing, Xingyu Zhang, Yang Hu et al.
KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks
Kaijing Ma, Xeron Du, Yunran Wang et al.
Q-Insight: Understanding Image Quality via Visual Reinforcement Learning
Weiqi Li, Xuanyu Zhang, Shijie Zhao et al.
Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning
Yu Fu, Zefan Cai, Abedelkadir Asi et al.
ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration
Chaojun Ni, Guosheng Zhao, Xiaofeng Wang et al.
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
Songhao Han, Wei Huang, Hairong Shi et al.
Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers
Lei Chen, Yuan Meng, Chen Tang et al.
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving
Daoguang Zan, Zhirong Huang, Wei Liu et al.
3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting
Qi Wu, Janick Martinez Esturo, Ashkan Mirzaei et al.
How to Evaluate Reward Models for RLHF
Evan Frick, Tianle Li, Connor Chen et al.
Inductive Moment Matching
Linqi (Alex) Zhou, Stefano Ermon, Jiaming Song
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
Shuangrui Ding, Rui Qian, Xiaoyi Dong et al.
Wonderland: Navigating 3D Scenes from a Single Image
Hanwen Liang, Junli Cao, Vidit Goel et al.
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
Rang Meng, Xingyu Zhang, Yuming Li et al.
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
Zhengyao Lyu, Chenyang Si, Junhao Song et al.
Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
Swarnadeep Saha, Xian Li, Marjan Ghazvininejad et al.
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning
Chi-Pin Huang, Yueh-Hua Wu, Min-Hung Chen et al.
EvaGaussians: Event Stream Assisted Gaussian Splatting from Blurry Images
Wangbo Yu, Chaoran Feng, Jianing Li et al.
VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis
Chao Pang, Xingxing Weng, Jiang Wu et al.
Task Singular Vectors: Reducing Task Interference in Model Merging
Antonio Andrea Gargiulo, Donato Crisostomi, Maria Sofia Bucarelli et al.
TorchTitan: One-stop PyTorch native solution for production ready LLM pretraining
Wanchao Liang, Tianyu Liu, Less Wright et al.
CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control
Guy Tevet, Sigal Raab, Setareh Cohan et al.
An Illusion of Progress? Assessing the Current State of Web Agents
Tianci Xue, Weijian Qi, Tianneng Shi et al.
Goku: Flow Based Video Generative Foundation Models
Shoufa Chen, Chongjian GE, Yuqi Zhang et al.
Tell me about yourself: LLMs are aware of their learned behaviors
Jan Betley, Xuchan Bao, Martín Soto et al.
Multiple Object Tracking as ID Prediction
Ruopeng Gao, Ji Qi, Limin Wang
End-to-End Driving with Online Trajectory Evaluation via BEV World Model
Yingyan Li, Yuqi Wang, Yang Liu et al.
WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild
Rolandos Alexandros Potamias, Jinglei Zhang, Jiankang Deng et al.
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
Zehui Chen, Kuikun Liu, Qiuchen Wang et al.
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis
Tianbao Xie, Jiaqi Deng, Xiaochuan Li et al.
Proteina: Scaling Flow-based Protein Structure Generative Models
Tomas Geffner, Kieran Didi, Zuobai Zhang et al.
RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph
Siru Ouyang, Wenhao Yu, Kaixin Ma et al.
Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection
Jia Guo, Shuai Lu, Weihang Zhang et al.
Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models
Seyedmorteza Sadat, Otmar Hilliges, Romann Weber
Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models
Weihao Ye, Qiong Wu, Wenhao Lin et al.
BOND: Aligning LLMs with Best-of-N Distillation
Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot-Desenonges et al.
Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
Xiangyu Wang, Donglin Yang, ziqin wang et al.
Physics-Informed Diffusion Models
Jan-Hendrik Bastek, WaiChing Sun, Dennis Kochmann
A Decade's Battle on Dataset Bias: Are We There Yet?
Zhuang Liu, Kaiming He
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Ziyang Ma, Yinghao Ma, Yanqiao Zhu et al.
Flow Q-Learning
Seohong Park, Qiyang Li, Sergey Levine
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding
Wasi Uddin Ahmad, Sean Narenthiran, Somshubra Majumdar et al.
Does Spatial Cognition Emerge in Frontier Models?
Santhosh Kumar Ramakrishnan, Erik Wijmans, Philipp Krähenbühl et al.
NoLiMa: Long-Context Evaluation Beyond Literal Matching
Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt et al.
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability
Adam Karvonen, Can Rager, Johnny Lin et al.
Inference Scaling for Long-Context Retrieval Augmented Generation
Zhenrui Yue, Honglei Zhuang, Aijun Bai et al.
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
Xunhao Lai, Jianqiao Lu, Yao Luo et al.
OmniBench: Towards The Future of Universal Omni-Language Models
Yizhi Li, Ge Zhang, Yinghao Ma et al.
Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient
Yongliang Wu, Shiji Zhou, Mingzhuo Yang et al.
TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers
Chuanrui Zhang, Yingshuang Zou, Zhuoling Li et al.
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials
Yiheng Xu, Dunjie Lu, Zhennan Shen et al.
MENTOR: Multi-level Self-supervised Learning for Multimodal Recommendation
Jinfeng Xu, Zheyu Chen, Shuo Yang et al.
Model merging with SVD to tie the Knots
George Stoica, Pratik Ramesh, Boglarka Ecsedi et al.
Calibrating Large Language Models with Sample Consistency
Qing Lyu, Kumar Shridhar, Chaitanya Malaviya et al.
WorldMem: Long-term Consistent World Simulation with Memory
Zeqi Xiao, Yushi LAN, Yifan Zhou et al.
NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer
Meng YOU, Zhiyu Zhu, Hui LIU et al.
Energy-Based Diffusion Language Models for Text Generation
Minkai Xu, Tomas Geffner, Karsten Kreis et al.
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
Noam Razin, Sadhika Malladi, Adithya Bhaskar et al.
X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents
Salman Rahman, Liwei Jiang, James Shiffer et al.
You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale
Baorui Ma, Huachen Gao, Haoge Deng et al.
S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models
Muzhi Dai, Chenxu Yang, Qingyi Si
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
Sucheng Ren, Qihang Yu, Ju He et al.
MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs
Xuannan Liu, Zekun Li, Pei Li et al.
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Qingyun Li, Zhe Chen, Weiyun Wang et al.
Learning Adaptive Parallel Reasoning with Language Models
Jiayi Pan, Xiuyu Li, Long Lian et al.
Describe Anything: Detailed Localized Image and Video Captioning
Long Lian, Yifan Ding, Yunhao Ge et al.
Easi3R: Estimating Disentangled Motion from DUSt3R Without Training
Xingyu Chen, Yue Chen, Yuliang Xiu et al.
Causal Order: The Key to Leveraging Imperfect Experts in Causal Inference
Aniket Vashishtha, Abbavaram Gowtham Reddy, Abhinav Kumar et al.
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
Tian Ye, Zicheng Xu, Yuanzhi Li et al.
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
Hang Hua, Yunlong Tang, Chenliang Xu et al.
FineWeb2: One Pipeline to Scale Them All — Adapting Pre-Training Data Processing to Every Language
Guilherme Penedo, Hynek Kydlíček, Vinko Sabolčec et al.
BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments
Yusuf Roohani, Andrew Lee, Qian Huang et al.
Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow
Fu-Yun Wang, Ling Yang, Zhaoyang Huang et al.
GRIT: Teaching MLLMs to Think with Images
Yue Fan, Xuehai He, Diji Yang et al.
Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning Attack
Tiansheng Huang, Gautam Bhattacharya, Pratik Joshi et al.
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Fanqing Meng, Jin Wang, Chuanhao Li et al.
VinePPO: Refining Credit Assignment in RL Training of LLMs
Amirhossein Kazemnejad, Milad Aghajohari, Eva Portelance et al.
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Saeed Ranjbar Alvar, Gursimran Singh, Mohammad Akbari et al.
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Duy-Kien Nguyen, Mahmoud Assran, Unnat Jain et al.
RNNs are not Transformers (Yet): The Key Bottleneck on In-Context Retrieval
Kaiyue Wen, Xingyu Dang, Kaifeng Lyu
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
Han Lin, Jaemin Cho, Abhay Zala et al.
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time
Yatin Dandi, Florent Krzakala, Bruno Loureiro et al.
ALLaM: Large Language Models for Arabic and English
M Saiful Bari, Yazeed Alnumay, Norah Alzahrani et al.
Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking
Heli Ben-Hamu, Itai Gat, Daniel Severo et al.
Language Model Can Listen While Speaking
Ziyang Ma, Yakun Song, Chenpeng Du et al.
Towards Interpreting Visual Information Processing in Vision-Language Models
Clement Neo, Luke Ong, Philip Torr et al.
Aether: Geometric-Aware Unified World Modeling
Haoyi Zhu, Yifan Wang, Jianjun Zhou et al.
LoRA-Pro: Are Low-Rank Adapters Properly Optimized?
Zhengbo Wang, Jian Liang, Ran He et al.
Organize the Web: Constructing Domains Enhances Pre-Training Data Curation
Alexander Wettig, Kyle Lo, Sewon Min et al.
TabM: Advancing tabular deep learning with parameter-efficient ensembling
Yury Gorishniy, Akim Kotelnikov, Artem Babenko
Eliminating Position Bias of Language Models: A Mechanistic Approach
Ziqi Wang, Hanlin Zhang, Xiner Li et al.
Learn Your Reference Model for Real Good Alignment
Alexey Gorbatovski, Boris Shaposhnikov, Alexey Malakhov et al.
Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models
Yinlam Chow, Guy Tennenholtz, Izzeddin Gur et al.
Learning How Hard to Think: Input-Adaptive Allocation of LM Computation
Mehul Damani, Idan Shenfeld, Andi Peng et al.
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models
Orion Weller, Ben Van Durme, Dawn Lawrie et al.
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models
Zhihang Lin, Mingbao Lin, Yuan Xie et al.
VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold
Dominic Maggio, Hyungtae Lim, Luca Carlone
Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems
Zhenting Qi, Hanlin Zhang, Eric P Xing et al.
EraseAnything: Enabling Concept Erasure in Rectified Flow Transformers
Daiheng Gao, Shilin Lu, Wenbo Zhou et al.
Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Yongxin Zhu, Bocheng Li, Yifei Xin et al.
Image Conductor: Precision Control for Interactive Video Synthesis
Yaowei Li, Xintao Wang, Zhaoyang Zhang et al.
Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders
Rui Chen, Jianfeng Zhang, Yixun Liang et al.
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Peixian Ma, Xialie Zhuang, Chengjin Xu et al.
Visual Agents as Fast and Slow Thinkers
Guangyan Sun, Mingyu Jin, Zhenting Wang et al.
NETS: A Non-equilibrium Transport Sampler
Michael Albergo, Eric Vanden-Eijnden
The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control
Ruili Feng, Han Zhang, Zhilei Shu et al.
PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-agent Tasks
Matthew Chang, Gunjan Chhablani, Alexander Clegg et al.
MultiBooth: Towards Generating All Your Concepts in an Image from Text
Chenyang Zhu, Kai Li, Yue Ma et al.
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Zhuoqun Li, Xuanang Chen, Haiyang Yu et al.
Vision Language Models are In-Context Value Learners
Yecheng Jason Ma, Joey Hejna, Chuyuan Fu et al.
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
Kaixuan Huang, Xudong Guo, Mengdi Wang
LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models
Fan-Yun Sun, Weiyu Liu, Siyi Gu et al.
MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control
Ruiyuan Gao, Kai Chen, Bo Xiao et al.
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions
Weizhe Yuan, Jane Yu, Song Jiang et al.
M-LLM Based Video Frame Selection for Efficient Video Understanding
Kai Hu, Feng Gao, Xiaohan Nie et al.
WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks
Ivan Evtimov, Arman Zharmagambetov, Aaron Grattafiori et al.
Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos
Mingfei Han, Linjie Yang, Xiaojun Chang et al.
Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better
Danny Driess, Jost Springenberg, Brian Ichter et al.
WorldScore: Unified Evaluation Benchmark for World Generation
Haoyi Duan, Hong-Xing Yu, Sirui Chen et al.
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis
Enric Corona, Andrei Zanfir, Eduard Gabriel Bazavan et al.
MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses
Zonglin Yang, Wanhao Liu, Ben Gao et al.
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
Andy (DiJia) Su, Hanlin Zhu, Yingchen Xu et al.
GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction
Sicheng Zuo, Wenzhao Zheng, Yuanhui Huang et al.
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
Linke Ouyang, Yuan Qu, Hongbin Zhou et al.
Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach
Lingchen Sun, Rongyuan Wu, Zhiyuan Ma et al.
OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code
Maxence Faldor, Jenny Zhang, Antoine Cully et al.
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Jang Hyun Cho, Andrea Madotto, Effrosyni Mavroudi et al.
Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
Junfei Wu, Jian Guan, Kaituo Feng et al.
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"
Yifei Ming, Senthil Purushwalkam, Shrey Pandit et al.
Generator Matching: Generative modeling with arbitrary Markov processes
Peter Holderrieth, Marton Havasi, Jason Yim et al.
TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data
Jeremy Irvin, Emily Liu, Joyce Chen et al.
SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment
Katrin Renz, Long Chen, Elahe Arani et al.
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
Minghong Cai, Xiaodong Cun, Xiaoyu Li et al.
OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints
Mingjie Pan, Jiyao Zhang, Tianshu Wu et al.
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Keisuke Kamahori, Tian Tang, Yile Gu et al.
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?
Egor Zverev, Sahar Abdelnabi, Soroush Tabesh et al.
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Theodoros Kouzelis, Ioannis Kakogeorgiou, Spyros Gidaris et al.
Preble: Efficient Distributed Prompt Scheduling for LLM Serving
Vikranth Srivatsa, Zijian He, Reyna Abhyankar et al.
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Kai Chen, Yunhao Gou, Runhui Huang et al.
Depth Any Video with Scalable Synthetic Data
Honghui Yang, Di Huang, Wei Yin et al.
Atom of Thoughts for Markov LLM Test-Time Scaling
Fengwei Teng, Quan Shi, Zhaoyang Yu et al.
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers
Yuntao Chen, Yuqi Wang, Zhaoxiang Zhang
Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation
Haotong Lin, Sida Peng, Jingxiao Chen et al.
ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding
Xingyu Fu, Minqian Liu, Zhengyuan Yang et al.
Data Shapley in One Training Run
Jiachen (Tianhao) Wang, Prateek Mittal, Dawn Song et al.
Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning
Seanie Lee, Minsu Kim, Lynn Cherif et al.
On the Optimization and Generalization of Multi-head Attention
Christos Thrampoulidis, Rouzbeh Ghaderi, Hossein Taheri et al.
Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model
Long Le, Jason Xie, William Liang et al.
End-to-End Autonomous Driving Through V2X Cooperation
Haibao Yu, Wenxian Yang, Jiaru Zhong et al.
ReasonIR: Training Retrievers for Reasoning Tasks
Rulin Shao, Rui Qiao, Varsha Kishore et al.
RRM: Robust Reward Model Training Mitigates Reward Hacking
Tianqi Liu, Wei Xiong, Jie Ren et al.
FlexTok: Resampling Images into 1D Token Sequences of Flexible Length
Roman Bachmann, Jesse Allardice, David Mizrahi et al.
Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
Qizhe Zhang, Aosong Cheng, Ming Lu et al.
Learning 4D Embodied World Models
Haoyu Zhen, Qiao Sun, Hongxin Zhang et al.
RMB: Comprehensively benchmarking reward models in LLM alignment
Enyu Zhou, Guodong Zheng, Binghai Wang et al.
RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning
Hao Gao, Shaoyu Chen, Bo Jiang et al.
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
Zhangheng LI, Keen You, Haotian Zhang et al.
MET3R: Measuring Multi-View Consistency in Generated Images
Mohammad Asim, Christopher Wewer, Thomas Wimmer et al.
Catastrophic Failure of LLM Unlearning via Quantization
Zhiwei Zhang, Fali Wang, Xiaomin Li et al.
ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer
Zhen Han, Zeyinzi Jiang, Yulin Pan et al.
UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization
Junjie He, Yifeng Geng, Liefeng Bo
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Jinbin Bai, Tian Ye, Wei Chow et al.
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance
Chunwei Wang, Guansong Lu, Junwei Yang et al.
Selective Aggregation for Low-Rank Adaptation in Federated Learning
Pengxin Guo, Shuang Zeng, Yanran Wang et al.
SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding
Rong Li, Shijie Li, Lingdong Kong et al.
How efficient is LLM-generated code? A rigorous & high-standard benchmark
Ruizhong Qiu, Weiliang Zeng, James Ezick et al.
LLM Generated Persona is a Promise with a Catch
Leon Li, Haozhe Chen, Hongseok Namkoong et al.
Detecting Data Deviations in Electronic Health Records
Kaiping Zheng, Horng-Ruey Chua, Beng Chin Ooi
DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis
Pan Wang, Qiang Zhou, Yawen Wu et al.
CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy
Zhibo Yang, Jun Tang, Zhaohai Li et al.
ENCODER: Entity Mining and Modification Relation Binding for Composed Image Retrieval
Zixu Li, Zhiwei Chen, Haokun Wen et al.
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
Dahyun Kang, Piotr Bojanowski, Huy V. Vo et al.
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control
Mariam Hassan, Sebastian Stapf, Ahmad Rahimi et al.
Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization
Audrey Huang, Wenhao Zhan, Tengyang Xie et al.
Robust LLM safeguarding via refusal feature adversarial training
Lei Yu, Virginie Do, Karen Hambardzumyan et al.
Arctic-Embed 2.0: Multilingual Retrieval Without Compromise
Puxuan Yu, Luke Merrick, Gaurav Nuti et al.
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
Yao Teng, Han Shi, Xian Liu et al.
Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement
Jaehun Jung, Faeze Brahman, Yejin Choi
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Yibin Wang, li zhimin, Yuhang Zang et al.
Universal Actions for Enhanced Embodied Foundation Models
Jinliang Zheng, Jianxiong Li, Dongxiu Liu et al.
Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design
Chenyu Wang, Masatoshi Uehara, Yichun He et al.
Context is Key: A Benchmark for Forecasting with Essential Textual Information
Andrew Williams, Arjun Ashok, Étienne Marcotte et al.
Towards Realistic Data Generation for Real-World Super-Resolution
Long Peng, Wenbo Li, Renjing Pei et al.