Most Cited 2025 "stochastic multi-agent systems" Papers
22,274 papers found • Page 6 of 112
Conference
BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding
Shuming Liu, Chen Zhao, Tianqi Xu et al.
AutoPresent: Designing Structured Visuals from Scratch
Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou et al.
Steering Large Language Models between Code Execution and Textual Reasoning
Yongchao Chen, Harsh Jhamtani, Srinagesh Sharma et al.
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?
Zihan Zheng, Zerui Cheng, Zeyu Shen et al.
Can LLMs Solve Longer Math Word Problems Better?
Xin Xu, Tong Xiao, Zitong Chao et al.
Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
Qirui Chen, Shangzhe Di, Weidi Xie
PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training
Cong Chen, Mingyu Liu, Chenchen Jing et al.
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
Ryota Tanaka, Taichi Iki, Taku Hasegawa et al.
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond
Guanyao Wu, Haoyu Liu, Hongming Fu et al.
Frequency Dynamic Convolution for Dense Image Prediction
Linwei Chen, Lin Gu, Liang Li et al.
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.
AffordDP: Generalizable Diffusion Policy with Transferable Affordance
Shijie Wu, Yihang Zhu, Yunao Huang et al.
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality
Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim
Grounded Reinforcement Learning for Visual Reasoning
Gabriel Sarch, Snigdha Saha, Naitik Khandelwal et al.
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation
Lijun Li, Zhelun Shi, Xuhao Hu et al.
Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models
Lucio La Cava, Andrea Tagarelli
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching
Zizheng Pan, Bohan Zhuang, De-An Huang et al.
CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
Xinhao Liu, Jintong Li, Yicheng Jiang et al.
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences
Hongyan Zhi, Peihao Chen, Junyan Li et al.
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time
Justin Deschenaux, Caglar Gulcehre
XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation
Bowen Chen, Brynn zhao, Haomiao Sun et al.
Understanding Factual Recall in Transformers via Associative Memories
Eshaan Nichani, Jason Lee, Alberto Bietti
Interleaved-Modal Chain-of-Thought
Jun Gao, Yongqi Li, Ziqiang Cao et al.
FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes
Lue Fan, Hao ZHANG, Qitai Wang et al.
A Formal Framework for Understanding Length Generalization in Transformers
Xinting Huang, Andy Yang, Satwik Bhattamishra et al.
MagicQuill: An Intelligent Interactive Image Editing System
Zichen Liu, Yue Yu, Hao Ouyang et al.
FineVQ: Fine-Grained User Generated Content Video Quality Assessment
Huiyu Duan, Qiang Hu, Wang Jiarui et al.
Adversarial Search Engine Optimization for Large Language Models
Fredrik Nestaas, Edoardo Debenedetti, Florian Tramer
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
Lawrence Jang, Yinheng Li, Dan Zhao et al.
STORM: Spatio-TempOral Reconstruction Model For Large-Scale Outdoor Scenes
Jiawei Yang, Jiahui Huang, Boris Ivanovic et al.
Multi-Agent Collaboration via Evolving Orchestration
Yufan Dang, Chen Qian, Xueheng Luo et al.
CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement
Yun Liu, Chengwen Zhang, Ruofan Xing et al.
Moral Alignment for LLM Agents
Elizaveta Tennant, Stephen Hailes, Mirco Musolesi
GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models
Zewei Zhang, Huan Liu, Jun Chen et al.
Stationary Kernels and Gaussian Processes on Lie Groups and their Homogeneous Spaces I: the compact case
Iskander Azangulov, Andrei Smolensky, Alexander Terenin et al.
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models
Xin Xu, Qiyun Xu, Tong Xiao et al.
OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation
Shenghai Yuan, Xianyi He, Yufan Deng et al.
KGGen: Extracting Knowledge Graphs from Plain Text with Language Models
Belinda Mo, Kyssen Yu, Joshua Kazdan et al.
ResearchTown: Simulator of Human Research Community
Haofei Yu, Zhaochen Hong, Zirui Cheng et al.
Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline
Junlong Cheng, Bin Fu, Jin Ye et al.
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Videos Generation
Xiaofeng Wang, Kang Zhao, Feng Liu et al.
An Intelligent Agentic System for Complex Image Restoration Problems
Kaiwen Zhu, Jinjin Gu, Zhiyuan You et al.
NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning
Xin Yi, Shunfan Zheng, Linlin Wang et al.
Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation
Ziyang Xie, Zhizheng Liu, Zhenghao Peng et al.
DropGaussian: Structural Regularization for Sparse-view Gaussian Splatting
Hyunwoo Park, Gun Ryu, Wonjun Kim
Adversarial Diffusion Compression for Real-World Image Super-Resolution
Bin Chen, Gehui Li, Rongyuan Wu et al.
Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift
Siyuan Liang, Jiawei Liang, Tianyu Pang et al.
How to build a consistency model: Learning flow maps via self-distillation
Nicholas Boffi, Michael Albergo, Eric Vanden-Eijnden
TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio Motion Embedding and Diffusion Interpolation
haiyang liu, Xingchao Yang, Tomoya Akiyama et al.
Divide and Translate: Compositional First-Order Logic Translation and Verification for Complex Logical Reasoning
Hyun Ryu, Gyeongman Kim, Hyemin S. Lee et al.
Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation
Chengwen Qi, Ren Ma, Bowen Li et al.
Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists
Bojia Zi, Penghui Ruan, Marco Chen et al.
CityNav: A Large-Scale Dataset for Real-World Aerial Navigation
Jungdae Lee, Taiki Miyanishi, Shuhei Kurita et al.
MUSE-VL: Modeling Unified VLM through Semantic Discrete Encoding
Rongchang Xie, Chen Du, Ping Song et al.
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
Hanlin Wang, Hao Ouyang, Qiuyu Wang et al.
Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold
Lazar Atanackovic, Xi (Nicole) Zhang, Brandon Amos et al.
FastLGS: Speeding Up Language Embedded Gaussians with Feature Grid Mapping
Yuzhou Ji, He Zhu, Junshu Tang et al.
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction
Zhen Xing, Qi Dai, Zejia Weng et al.
Exploring Unbiased Deepfake Detection via Token-Level Shuffling and Mixing
Xinghe Fu, Zhiyuan Yan, Taiping Yao et al.
Your ViT is Secretly an Image Segmentation Model
Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans et al.
SparX: A Sparse Cross-Layer Connection Mechanism for Hierarchical Vision Mamba and Transformer Networks
Meng Lou, Yunxiang Fu, Yizhou Yu
SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures
Hui Liu, Chen Jia, Fan Shi et al.
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors
Duo Zheng, shijia Huang, Yanyang Li et al.
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?
Fengxiang Wang, hongzhen wang, Zonghao Guo et al.
FLIP: Flow-Centric Generative Planning as General-Purpose Manipulation World Model
Chongkai Gao, Haozhuo Zhang, Zhixuan Xu et al.
The Superposition of Diffusion Models Using the Itô Density Estimator
Marta Skreta, Lazar Atanackovic, Joey Bose et al.
Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination
Leonardo Barcellona, Andrii Zadaianchuk, Davide Allegro et al.
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
Rongyao Fang, Chengqi Duan, Kun Wang et al.
ADBM: Adversarial Diffusion Bridge Model for Reliable Adversarial Purification
Xiao Li, Wenxuan Sun, Huanran Chen et al.
Energy-Weighted Flow Matching for Offline Reinforcement Learning
Shiyuan Zhang, Weitong Zhang, Quanquan Gu
Model Poisoning Attacks to Federated Learning via Multi-Round Consistency
Yueqi Xie, Minghong Fang, Neil Zhenqiang Gong
Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
Laura Ruis, Maximilian Mozes, Juhan Bae et al.
Calibrated Multi-Preference Optimization for Aligning Diffusion Models
Kyungmin Lee, Xiaohang Li, Qifei Wang et al.
Towards Understanding Safety Alignment: A Mechanistic Perspective from Safety Neurons
Jianhui Chen, Xiaozhi Wang, Zijun Yao et al.
Efficient Online Reinforcement Learning for Diffusion Policy
Haitong Ma, Tianyi Chen, Kai Wang et al.
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Haoyi Zhu, Honghui Yang, Yating Wang et al.
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector
Xiao Guo, Xiufeng Song, Yue Zhang et al.
Results of the Big ANN: NeurIPS’23 competition
Harsha Vardhan simhadri, Martin Aumüller, Matthijs Douze et al.
Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions
Sarah Wiegreffe, Oyvind Tafjord, Yonatan Belinkov et al.
Diffusion Beats Autoregressive in Data-Constrained Settings
Mihir Prabhudesai, Mengning Wu, Amir Zadeh et al.
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation
Xinpeng Wang, Chengzhi (Martin) Hu, Paul Röttger et al.
Min-K%++: Improved Baseline for Pre-Training Data Detection from Large Language Models
Jingyang Zhang, Jingwei Sun, Eric Yeats et al.
Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment
Harrish Thasarathan, Julian Forsyth, Thomas Fel et al.
AnimateAnything: Consistent and Controllable Animation for Video Generation
guojun lei, Chi Wang, Rong Zhang et al.
A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
Kai Wang, Mingjia Shi, YuKun Zhou et al.
Generating CAD Code with Vision-Language Models for 3D Designs
Kamel Alrashedy, Pradyumna Tambwekar, Zulfiqar Haider Zaidi et al.
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM
Wang Jiarui, Huiyu Duan, Guangtao Zhai et al.
Inverse Constitutional AI: Compressing Preferences into Principles
Arduin Findeis, Timo Kaufmann, Eyke Hüllermeier et al.
VSSD: Vision Mamba with Non-Causal State Space Duality
Yuheng Shi, Mingjia Li, Minjing Dong et al.
Decoupled Spatio-Temporal Consistency Learning for Self-Supervised Tracking
Yaozong Zheng, Bineng Zhong, Qihua Liang et al.
Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding
Yunlong Tang, Daiki Shimada, Jing Bi et al.
Faster Diffusion Sampling with Randomized Midpoints: Sequential and Parallel
Shivam Gupta, Linda Cai, Sitan Chen
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
Muhammad Danish, Muhammad Akhtar Munir, Syed Shah et al.
What Makes a Good Diffusion Planner for Decision Making?
Haofei Lu, Dongqi Han, Yifei Shen et al.
Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation
ZIYU ZHU, Xilin Wang, Yixuan Li et al.
RouteLLM: Learning to Route LLMs from Preference Data
Isaac Ong, Amjad Almahairi, Vincent Wu et al.
KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse
Jingbo Yang, Bairu Hou, Wei Wei et al.
LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity
Walid Bousselham, Angie Boggust, Sofian Chaybouti et al.
ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model
Shunlin Lu, Jingbo Wang, Zeyu Lu et al.
Specialized Foundation Models Struggle to Beat Supervised Baselines
Zongzhe Xu, Ritvik Gupta, Wenduo Cheng et al.
NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens
Cunxiang Wang, Ruoxi Ning, Boqi Pan et al.
Checklists Are Better Than Reward Models For Aligning Language Models
Vijay Viswanathan, Yanchao Sun, Xiang Kong et al.
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining
Ming Hu, Kun yuan, Yaling Shen et al.
Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning
Mingyang Chen, sunhaoze, Tianpeng Li et al.
Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning
Yun Qu, Yuhang Jiang, Boyuan Wang et al.
Language-Guided Image Tokenization for Generation
Kaiwen Zha, Lijun Yu, Alireza Fathi et al.
DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products
Julien Siems, Timur Carstensen, Arber Zela et al.
Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis
Guangchen (Eric) Lan, Dong-Jun Han, Abolfazl Hashemi et al.
ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization
Zechun Liu, Changsheng Zhao, Hanxian Huang et al.
Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh
Xiangjun Gao, Xiaoyu Li, Yiyu Zhuang et al.
NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics
David Robinson, Marius Miron, Masato Hagiwara et al.
Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient
George Wang, Jesse Hoogland, Stan van Wingerden et al.
CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution
Xin Liu, Jie Liu, Jie Tang et al.
Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization
Kyle Sargent, Kyle Hsu, Justin Johnson et al.
SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting
Gyeongjin Kang, Jisang Yoo, Jihyeon Park et al.
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
Teng Xiao, Yige Yuan, Zhengyu Chen et al.
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation
Hongxiang Li, Yaowei Li, Yuhang Yang et al.
Diffusion Generative Modeling for Spatially Resolved Gene Expression Inference from Histology Images
Sichen Zhu, Yuchen Zhu, Molei Tao et al.
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
Yuxuan Cai, Jiangning Zhang, Haoyang He et al.
Text-to-Image Rectified Flow as Plug-and-Play Priors
Xiaofeng Yang, Cheng Chen, xulei yang et al.
Epona: Autoregressive Diffusion World Model for Autonomous Driving
Kaiwen Zhang, Zhenyu Tang, Xiaotao Hu et al.
Addressing Misspecification in Simulation-based Inference through Data-driven Calibration
Antoine Wehenkel, Juan L. Gamella, Ozan Sener et al.
MMQA: Evaluating LLMs with Multi-Table Multi-Hop Complex Questions
Jian Wu, Linyi Yang, Dongyuan Li et al.
Efficient Visual State Space Model for Image Deblurring
Lingshun Kong, Jiangxin Dong, Jinhui Tang et al.
RadGPT: Constructing 3D Image-Text Tumor Datasets
Pedro Bassi, Mehmet Yavuz, Ibrahim Ethem Hamamci et al.
Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
Chenyangguang Zhang, Alexandros Delitzas, Fangjinhua Wang et al.
A Percolation Model of Emergence: Analyzing Transformers Trained on a Formal Language
Ekdeep Singh Lubana, Kyogo Kawaguchi, Robert Dick et al.
Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping
Zijian Liu, Zhengyuan Zhou
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Weihao Zeng, Yuzhen Huang, Lulu Zhao et al.
miniCTX: Neural Theorem Proving with (Long-)Contexts
Jiewen Hu, Thomas Zhu, Sean Welleck
OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning
Xiaoqiang Wang, Bang Liu
MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction
Zixuan Gong, Qi Zhang, Guangyin Bao et al.
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Wenxuan Zhang, Philip Torr, Mohamed Elhoseiny et al.
Teaching Language Models to Critique via Reinforcement Learning
Zhihui Xie, Jie chen, Liyu Chen et al.
Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving
Peidong Li, Dixiao Cui
Language Imbalance Driven Rewarding for Multilingual Self-improving
Wen Yang, Junhong Wu, Chen Wang et al.
JetFormer: An autoregressive generative model of raw images and text
Michael Tschannen, André Susano Pinto, Alexander Kolesnikov
The AdEMAMix Optimizer: Better, Faster, Older
Matteo Pagliardini, Pierre Ablin, David Grangier
KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models
Yongliang Wu, Zonghui Li, Xinting Hu et al.
A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
Wangbo Zhao, Yizeng Han, Jiasheng Tang et al.
From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons
Andrew Szot, Bogdan Mazoure, Omar Attia et al.
Reward Guided Latent Consistency Distillation
William Wang, Jiachen Li, Weixi Feng et al.
ICLR: In-Context Learning of Representations
Core Francisco Park, Andrew Lee, Ekdeep Singh Lubana et al.
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles
Jiangjie Chen, Qianyu He, Siyu Yuan et al.
Fantastic Copyrighted Beasts and How (Not) to Generate Them
Luxi He, Yangsibo Huang, Weijia Shi et al.
CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation
Jiahao Li, Weijian Ma, Xueyang Li et al.
LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging
Ke Wang, Nikos Dimitriadis, Alessandro Favero et al.
Limits to scalable evaluation at the frontier: LLM as judge won’t beat twice the data
Florian Eddie Dorner, Vivian Nastl, Moritz Hardt
Language Representations Can be What Recommenders Need: Findings and Potentials
Leheng Sheng, An Zhang, Yi Zhang et al.
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets
Guangqi Jiang, Yifei Sun, Tao Huang et al.
Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction
Jarrid Rector-Brooks, Mohsin Hasan, Zhangzhi Peng et al.
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs
Shaojie Zhang, Jiahui Yang, Jianqin Yin et al.
Towards a Mechanistic Explanation of Diffusion Model Generalization
Matthew Niedoba, Berend Zwartsenberg, Kevin Murphy et al.
GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding
Dongping Chen, Yue Huang, Siyuan Wu et al.
EditAR: Unified Conditional Generation with Autoregressive Models
Jiteng Mu, Nuno Vasconcelos, Xiaolong Wang
Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors
Weixuan Wang, JINGYUAN YANG, Wei Peng
Instant Policy: In-Context Imitation Learning via Graph Diffusion
Vitalis Vosylius, Edward Johns
HELMET: How to Evaluate Long-context Models Effectively and Thoroughly
Howard Yen, Tianyu Gao, Minmin Hou et al.
POSTA: A Go-to Framework for Customized Artistic Poster Generation
Haoyu Chen, Xiaojie Xu, Wenbo Li et al.
Self-Consistency Preference Optimization
Archiki Prasad, Weizhe Yuan, Richard Yuanzhe Pang et al.
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Jinyoung Park, Jeehye Na, Jinyoung Kim et al.
Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding
feilong tang, Chengzhi Liu, Zhongxing Xu et al.
OSV: One Step is Enough for High-Quality Image to Video Generation
Xiaofeng Mao, Zhengkai Jiang, Fu-Yun Wang et al.
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers
Wei Pang, Kevin Qinghong Lin, Xiangru Jian et al.
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
Ibragim Badertdinov, Alexander Golubev, Maksim Nekrashevich et al.
Numerical Pruning for Efficient Autoregressive Models
Xuan Shen, Zhao Song, Yufa Zhou et al.
Material Anything: Generating Materials for Any 3D Object via Diffusion
Xin Huang, Tengfei Wang, Ziwei Liu et al.
Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning
Somnath Basu Roy Chowdhury, Krzysztof Choromanski, Arijit Sehanobish et al.
Token Cleaning: Fine-Grained Data Selection for LLM Supervised Fine-Tuning
Jinlong Pang, Na Di, Zhaowei Zhu et al.
DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance
Yuxuan Luo, Zhengkun Rong, Lizhen Wang et al.
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.
Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models
Angela Castillo, Jonas Kohler, Juan C. Pérez et al.
CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology
Yuxuan Sun, Yixuan Si, Chenglu Zhu et al.
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
USVSN Sai Prashanth, Alvin Deng, Kyle O'Brien et al.
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
Clementine Domine, Nicolas Anguita, Alexandra M Proca et al.
Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models
Hongbang Yuan, Zhuoran Jin, Pengfei Cao et al.
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements
Jingyu Zhang, Ahmed Elgohary Ghoneim, Ahmed Magooda et al.
Understanding and Mitigating Hallucination in Large Vision-Language Models via Modular Attribution and Intervention
Tianyun Yang, Ziniu Li, Juan Cao et al.
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Xiaosen Zheng, Tianyu Pang, Chao Du et al.
Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid
Mingxin Huang, Yuliang Liu, Dingkang Liang et al.
CleanDIFT: Diffusion Features without Noise
Nick Stracke, Stefan Andreas Baumann, Kolja Bauer et al.
SONICS: Synthetic Or Not - Identifying Counterfeit Songs
Awsaf Rahman, Zaber Ibn Abdul Hakim, Najibul Haque Sarker et al.
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Wei Huang, Yue Liao, Jianhui Liu et al.
Towards Foundation Models for Mixed Integer Linear Programming
Sirui Li, Janardhan Kulkarni, Ishai Menache et al.
Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection
Zhen Qu, Xian Tao, Xinyi Gong et al.
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
Xinyan Chen, Renrui Zhang, Dongzhi JIANG et al.
Residual Connections and Normalization Can Provably Prevent Oversmoothing in GNNs
Michael Scholkemper, Xinyi Wu, Ali Jadbabaie et al.
Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models
Logan Cross, Violet Xiang, Agam Bhatia et al.
Artificial Kuramoto Oscillatory Neurons
Takeru Miyato, Sindy Löwe, Andreas Geiger et al.
Do LLMs ``know'' internally when they follow instructions?
Juyeon Heo, Christina Heinze-Deml, Oussama Elachqar et al.
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
Pengxiang Li, Lu Yin, Shiwei Liu
Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation
Yudi Shi, Shangzhe Di, Qirui Chen et al.
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
Rylan Schaeffer, Dan Valentine, Luke Bailey et al.
$\text{D}_{2}\text{O}$: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
Zhongwei Wan, Xinjian Wu, Yu Zhang et al.
LICO: Large Language Models for In-Context Molecular Optimization
Tung Nguyen, Aditya Grover
Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo
João Loula, Benjamin LeBrun, Li Du et al.
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
Tianwei Xiong, Jun Hao Liew, Zilong Huang et al.
Audio Large Language Models Can Be Descriptive Speech Quality Evaluators
CHEN CHEN, Yuchen Hu, Siyin Wang et al.
Towards General-Purpose Model-Free Reinforcement Learning
Scott Fujimoto, Pierluca D'Oro, Amy Zhang et al.
UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection
Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.