Most Cited NEURIPS "hybrid event-based sensor" Papers
5,858 papers found • Page 4 of 30
Conference
Continual Multimodal Contrastive Learning
Xiaohao Liu, Xiaobo Xia, See-Kiong Ng et al.
MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem
Fan LIU, Zherui Yang, Cancheng Liu et al.
Locality in Image Diffusion Models Emerges from Data Statistics
Artem Lukoianov, Chenyang Yuan, Justin Solomon et al.
Vision-centric Token Compression in Large Language Model
Ling Xing, Alex Jinpeng Wang, Rui Yan et al.
What Do Latent Action Models Actually Learn?
Chuheng Zhang, Tim Pearce, Pushi Zhang et al.
Whole-Body Conditioned Egocentric Video Prediction
Yutong Bai, Danny Tran, Amir Bar et al.
Hierarchical Fine-grained Preference Optimization for Physically Plausible Video Generation
Harold Haodong Chen, Haojian Huang, Qifeng Chen et al.
Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation
Shuo Wang, Yongcai Wang, Wanting Li et al.
CausalPFN: Amortized Causal Effect Estimation via In-Context Learning
Vahid Balazadeh, Hamidreza Kamkari, Valentin Thomas et al.
Unifying Appearance Codes and Bilateral Grids for Driving Scene Gaussian Splatting
Nan Wang, Lixing Xiao, Yuantao Chen et al.
Escaping Collapse: The Strength of Weak Data for Large Language Model Training
Kareem Amin, Sara Babakniya, Alex Bie et al.
GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior
Penghao Wu, Shengnan Ma, Bo Wang et al.
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges
Khaoula Chehbouni, Mohammed Haddou, Jackie CK Cheung et al.
The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement
Ruihan Yang, Fanghua Ye, Jian Li et al.
Learning Robust Spectral Dynamics for Temporal Domain Generalization
En Yu, Jie Lu, Xiaoyu Yang et al.
CSBrain: A Cross-scale Spatiotemporal Brain Foundation Model for EEG Decoding
Yuchen Zhou, Jiamin Wu, Zichen Ren et al.
DOTA: Distributional Test-time Adaptation of Vision-Language Models
Zongbo Han, Jialong Yang, Guangyu Wang et al.
Seeing Sound, Hearing Sight: Uncovering Modality Bias and Conflict of AI models in Sound Localization
Yanhao Jia, Ji Xie, S Jivaganesh et al.
DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution
Zheng Chen, Zichen Zou, Kewei Zhang et al.
NeuralPLexer3: Accurate Biomolecular Complex Structure Prediction with Flow Models
Jarren Zhuoran Qiao, Feizhi Ding, Thomas Dresselhaus et al.
Reward-Instruct: A Reward-Centric Approach to Fast Photo-Realistic Image Generation
Yihong Luo, Tianyang Hu, Weijian Luo et al.
Reviving DSP for Advanced Theorem Proving in the Era of Reasoning Models
Chenrui Cao, Liangcheng Song, Zenan Li et al.
UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning
Xiangyu Wang, Donglin Yang, Yue Liao et al.
PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation
Chen Wang, Chuhao Chen, Yiming Huang et al.
VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation
Wenhao Wang, Yi Yang
SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning
Jiaqi Huang, Zunnan Xu, Jun Zhou et al.
Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences?
Paul Gölz, Nika Haghtalab, Kunhe Yang
Bayesian Concept Bottleneck Models with LLM Priors
Jean Feng, Avni Kothari, Lucas Zier et al.
PoE-World: Compositional World Modeling with Products of Programmatic Experts
Top Piriyakulkij, Yichao Liang, Hao Tang et al.
MergeBench: A Benchmark for Merging Domain-Specialized LLMs
Yifei He, Siqi Zeng, Yuzheng Hu et al.
RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing
Fengxiang Wang, Yulin Wang, Mingshuo Chen et al.
ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection
Zhihao Sun, Haoran Jiang, Haoran Chen et al.
HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction
Jikai Wang, Qifan Zhang, Yu-Wei Chao et al.
Efficient Randomized Experiments Using Foundation Models
Piersilvio De Bartolomeis, Javier Abad, Guanbo Wang et al.
EndoBench: A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis
Shengyuan Liu, Boyun Zheng, Wenting Chen et al.
ConfTuner: Training Large Language Models to Express Their Confidence Verbally
Yibo Li, Miao Xiong, Jiaying Wu et al.
Pre-Trained Policy Discriminators are General Reward Models
Shihan Dou, Shichun Liu, Yuming Yang et al.
SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment
Wonje Jeung, Yoon Sangyeon, Minsuk Kahng et al.
DynamicVL: Benchmarking Multimodal Large Language Models for Dynamic City Understanding
Weihao Xuan, Junjue Wang, Heli Qi et al.
From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots
Yuxuan Wang, Ming Yang, Gang Ding et al.
Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training
Mengru Wang, Xingyu Chen, Yue Wang et al.
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
Denis Sutter, Julian Minder, Thomas Hofmann et al.
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling
Tsung-Han (Patrick) Wu, Heekyung Lee, Jiaxin Ge et al.
Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
Xuannan Liu, Zekun Li, Zheqi He et al.
Self-Improving Embodied Foundation Models
Seyed Kamyar Seyed Ghasemipour, Ayzaan Wahid, Jonathan Tompson et al.
Fundamental Limitations in Pointwise Defences of LLM Finetuning APIs
Xander Davies, Eric Winsor, Alexandra Souly et al.
SuffixDecoding: Extreme Speculative Decoding for Emerging AI Applications
Gabriele Oliaro, Zhihao Jia, Daniel Campos et al.
SciArena: An Open Evaluation Platform for Non-Verifiable Scientific Literature-Grounded Tasks
Yilun Zhao, Kaiyan Zhang, Tiansheng Hu et al.
InfoChartQA: A Benchmark for Multimodal Question Answering on Infographic Charts
Tianchi Xie, Minzhi Lin, Mengchen Liu et al.
FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents
Nandan Thakur, Jimmy Lin, Samuel Havens et al.
Quality-Driven Curation of Remote Sensing Vision-Language Data via Learned Scoring Models
Dilxat Muhtar, Enzhuo Zhang, Zhenshi Li et al.
MagCache: Fast Video Generation with Magnitude-Aware Cache
Zehong Ma, Longhui Wei, Feng Wang et al.
Beyond Verifiable Rewards: Scaling Reinforcement Learning in Language Models to Unverifiable Data
Yunhao Tang, Sid Wang, Lovish Madaan et al.
CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward
Yandong Guan, Xilin Wang, XiMing Xing et al.
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics
Dongyoung Kim, Huiwon Jang, Sumin Park et al.
Distributive Fairness in Large Language Models: Evaluating Alignment with Human Values
Hadi Hosseini, Samarth Khanna
Quartet: Native FP4 Training Can Be Optimal for Large Language Models
Roberto Castro, Andrei Panferov, Rush Tabesh et al.
Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning
Qitao Tan, Jun Liu, Zheng Zhan et al.
First SFT, Second RL, Third UPT: Continual Improving Multi-Modal LLM Reasoning via Unsupervised Post-Training
Lai Wei, Yuting Li, Chen Wang et al.
Diffusion Transformers as Open-World Spatiotemporal Foundation Models
Yuan Yuan, Chonghua Han, Jingtao Ding et al.
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Marlon Becker, Frederick Altrock, Benjamin Risse
Direct Alignment with Heterogeneous Preferences
Ali Shirali, Arash Nasr-Esfahany, Abdullah Alomar et al.
SnapMoGen: Human Motion Generation from Expressive Texts
chuan guo, Inwoo Hwang, Jian Wang et al.
Data-Driven Performance Guarantees for Classical and Learned Optimizers
Rajiv Sambharya, Bartolomeo Stellato
Antidistillation Sampling
Yash Savani, Asher Trockman, Zhili Feng et al.
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models
Liyan Tang, Grace Kim, Xinyu Zhao et al.
Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models
Zidi Xiong, Shan Chen, Zhenting Qi et al.
Lorentz Local Canonicalization: How to make any Network Lorentz-Equivariant
Jonas Spinner, Luigi Favaro, Peter Lippmann et al.
Learning World Models for Interactive Video Generation
Taiye Chen, Xun Hu, Zihan Ding et al.
Robust LLM Alignment via Distributionally Robust Direct Preference Optimization
Zaiyan Xu, Sushil Vemuri, Kishan Panaganti et al.
Learning Pattern-Specific Experts for Time Series Forecasting Under Patch-level Distribution Shift
Yanru Sun, Zongxia Xie, Emadeldeen Eldele et al.
Advancing Expert Specialization for Better MoE
Hongcan Guo, Haolang Lu, Guoshun Nan et al.
FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion
Akide Liu, Zeyu Zhang, Zhexin Li et al.
MIP against Agent: Malicious Image Patches Hijacking Multimodal OS Agents
Lukas Aichberger, Alasdair Paren, Guohao Li et al.
Unleashing Hour-Scale Video Training for Long Video-Language Understanding
Jingyang Lin, Jialian Wu, Ximeng Sun et al.
LLMs Encode Harmfulness and Refusal Separately
Jiachen Zhao, Jing Huang, Zhengxuan Wu et al.
Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning
Roger Creus Castanyer, Johan Obando Ceron, Lu Li et al.
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation
Jiaxin Huang, Runnan Chen, Ziwen Li et al.
Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing its Preconditioner
Runa Eschenhagen, Aaron Defazio, Tsung-Hsien Lee et al.
Gemstones: A Model Suite for Multi-Faceted Scaling Laws
Sean McLeish, John Kirchenbauer, David Miller et al.
Continuous Thought Machines
Luke Darlow, Ciaran Regan, Sebastian Risi et al.
Sample complexity of data-driven tuning of model hyperparameters in neural networks with structured parameter-dependent dual function
Maria-Florina Balcan, Anh Nguyen, Dravyansh Sharma
IntrinsiX: High-Quality PBR Generation using Image Priors
Peter Kocsis, Lukas Höllein, Matthias Niessner
Temporal Chain of Thought: Long-Video Understanding by Thinking in Frames
Anurag Arnab, Ahmet Iscen, Mathilde Caron et al.
MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research
Hui Chen, Miao Xiong, Yujie Lu et al.
PMQ-VE: Progressive Multi-Frame Quantization for Video Enhancement
ZhanFeng Feng, Long Peng, Xin Di et al.
APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning
Azim Ospanov, Farzan Farnia, Roozbeh Yousefzadeh
Advancing Wasserstein Convergence Analysis of Score-Based Models: Insights from Discretization and Second-Order Acceleration
Yifeng Yu, Lu Yu
RBench-V: A Primary Assessment for Visual Reasoning Models with Multimodal Outputs
Meng-Hao Guo, Xuanyu Chu, Qianrui Yang et al.
BLEUBERI: BLEU is a surprisingly effective reward for instruction following
Yapei Chang, Yekyung Kim, Michael Krumdick et al.
We Should Chart an Atlas of All the World's Models
Eliahu Horwitz, Nitzan Kurer, Jonathan Kahana et al.
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
Zhijian Zhuo, Yutao Zeng, Ya Wang et al.
Preference-Guided Diffusion for Multi-Objective Offline Optimization
Yashas Annadani, Syrine Belakaria, Stefano Ermon et al.
A Generalist Intracortical Motor Decoder
Joel Ye, Fabio Rizzoglio, Xuan Ma et al.
HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts
Neil He, Rishabh Anand, Hiren Madhu et al.
MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement
Jaehyun Nam, Jinsung Yoon, Jiefeng Chen et al.
Safe RLHF-V: Safe Reinforcement Learning from Multi-modal Human Feedback
Jiaming Ji, Xinyu Chen, Rui Pan et al.
Genesis: Multimodal Driving Scene Generation with Spatio-Temporal and Cross-Modal Consistency
Xiangyu Guo, Zhanqian Wu, Kaixin Xiong et al.
VIKI‑R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning
Li Kang, Xiufeng Song, Heng Zhou et al.
Information-Driven Design of Imaging Systems
Henry Pinkard, Leyla Kabuli, Eric Markley et al.
DreamPRM: Domain-reweighted Process Reward Model for Multimodal Reasoning
Qi Cao, Ruiyi Wang, Ruiyi Zhang et al.
OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection
Zhongyu Xia, Jishuo Li, Zhiwei Lin et al.
Markov Persuasion Processes: Learning to Persuade From Scratch
Francesco Bacchiocchi, Francesco Emanuele Stradi, Matteo Castiglioni et al.
Deep Nonlinear Sufficient Dimension Reduction
Yinfeng Chen, Yuling Jiao, Rui Qiu et al.
EgoBlind: Towards Egocentric Visual Assistance for the Blind
Junbin Xiao, Nanxin Huang, Hao Qiu et al.
Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness
Rongzhe Wei, Peizhi Niu, Hans Hao-Hsun Hsu et al.
EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification
Lin Zhang, Wenshuo Dong, Zhuoran Zhang et al.
Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenizations
Brian Zheng, Alisa Liu, Orevaoghene Ahia et al.
UltraHR-100K: Enhancing UHR Image Synthesis with A Large-Scale High-Quality Dataset
Chen Zhao, En Ci, Yunzhe Xu et al.
Model Provenance Testing for Large Language Models
Ivica Nikolic, Teodora Baluta, Prateek Saxena
The Emergence of Abstract Thought in Large Language Models Beyond Any Language
Yuxin Chen, Yiran Zhao, Yang Zhang et al.
SPARTAN: A Sparse Transformer World Model Attending to What Matters
Anson Lei, Bernhard Schölkopf, Ingmar Posner
RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation
Songhao Han, Boxiang Qiu, Yue Liao et al.
Towards General Continuous Memory for Vision-Language Models
Wenyi WU, Zixuan Song, Kun Zhou et al.
Grounding Language with Vision: A Conditional Mutual Information Calibrated Decoding Strategy for Reducing Hallucinations in LVLMs
Hao Fang, Changle Zhou, Jiawei Kong et al.
Trust Region Constrained Measure Transport in Path Space for Stochastic Optimal Control and Inference
Denis Blessing, Julius Berner, Lorenz Richter et al.
Emergence and Evolution of Interpretable Concepts in Diffusion Models
Berk Tinaz, Zalan Fabian, Mahdi Soltanolkotabi
From Specificity to Generality: Revisiting Generalizable Artifacts in Detecting Face Deepfakes
Long Ma, Zhiyuan Yan, Jin Xu et al.
Learning quadratic neural networks in high dimensions: SGD dynamics and scaling laws
Gerard Ben Arous, Murat Erdogdu, Nuri Mert Vural et al.
Better NTK Conditioning: A Free Lunch from (ReLU) Nonlinear Activation in Wide Neural Networks
Chaoyue Liu, Han Bi, Like Hui et al.
Amortized Sampling with Transferable Normalizing Flows
Charlie Tan, Majdi Hassan, Leon Klein et al.
U-REPA: Aligning Diffusion U-Nets to ViTs
Yuchuan Tian, Hanting Chen, Mengyu Zheng et al.
RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness
Fanhu Zeng, Haiyang Guo, Fei Zhu et al.
Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression
Kunjun Li, Zigeng Chen, Cheng-Yen Yang et al.
BrainOmni: A Brain Foundation Model for Unified EEG and MEG Signals
Qinfan Xiao, Ziyun Cui, Chi Zhang et al.
DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data
Ruiqi Wu, Xinjie wang, Liu.Liu et al.
TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive Generation
Zhekai Chen, Ruihang Chu, Yukang Chen et al.
Breaking AR’s Sampling Bottleneck: Provable Acceleration via Diffusion Language Models
Gen Li, Changxiao Cai
Combining Cost Constrained Runtime Monitors for AI Safety
Tim Hua, James Baskerville, Henri Lemoine et al.
SALMONN-omni: A Standalone Speech LLM without Codec Injection for Full-duplex Conversation
Wenyi Yu, Siyin Wang, Xiaoyu Yang et al.
RepLDM: Reprogramming Pretrained Latent Diffusion Models for High-Quality, High-Efficiency, High-Resolution Image Generation
Boyuan Cao, Jiaxin Ye, Yujie Wei et al.
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
Ling Yang, Xinchen Zhang, Ye Tian et al.
Measuring what Matters: Construct Validity in Large Language Model Benchmarks
Andrew M. Bean, Ryan Othniel Kearns, Angelika Romanou et al.
Don't Just Chase “Highlighted Tokens” in MLLMs: Revisiting Visual Holistic Context Retention
Xin Zou, Di Lu, Yizhou Wang et al.
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
Zichen Wen, Shaobo Wang, Yufa Zhou et al.
Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization
Daniel Palenicek, Florian Vogt, Joe Watson et al.
Shortcuts and Identifiability in Concept-based Models from a Neuro-Symbolic Lens
Samuele Bortolotti, Emanuele Marconato, Paolo Morettin et al.
ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering
Yuki Imajuku, Kohki Horie, Yoichi Iwata et al.
Vanish into Thin Air: Cross-prompt Universal Adversarial Attacks for SAM2
Ziqi Zhou, Yifan Hu, Yufei Song et al.
Aligning Text to Image in Diffusion Models is Easier Than You Think
Jaa-Yeon Lee, ByungHee Cha, Jeongsol Kim et al.
Superposition Yields Robust Neural Scaling
Yizhou Liu, Ziming Liu, Jeff Gore
Auditing Meta-Cognitive Hallucinations in Reasoning Large Language Models
Haolang Lu, Yilian Liu, Jingxin Xu et al.
Uncovering a Universal Abstract Algorithm for Modular Addition in Neural Networks
Gavin McCracken, Gabriela Moisescu-Pareja, Vincent Létourneau et al.
PatientSim: A Persona-Driven Simulator for Realistic Doctor-Patient Interactions
Daeun Kyung, Hyunseung Chung, Seongsu Bae et al.
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant
Haibo Wang, Bo Feng, Zhengfeng Lai et al.
KVFlow: Efficient Prefix Caching for Accelerating LLM-Based Multi-Agent Workflows
Zaifeng Pan, AJJKUMAR DAHYALAL PATEL, Yipeng Shen et al.
PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models
Tianchen Zhao, Ke Hong, Xinhao Yang et al.
Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners
Michal Nauman, Marek Cygan, Carmelo Sferrazza et al.
AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws
Oren Neumann, Claudius Gros
Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Juan Rodriguez, Haotian Zhang, Abhay Puri et al.
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs
Xiyao Wang, Zhengyuan Yang, Chao Feng et al.
LLM-PySC2: Starcraft II learning environment for Large Language Models
Zongyuan Li, Yanan Ni, Runnan Qi et al.
Scent of Knowledge: Optimizing Search-Enhanced Reasoning with Information Foraging
Hongjin Qian, Zheng Liu
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Generation
Zheng Anlin, Xin Wen, Xuanyang Zhang et al.
AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks
Fali Wang, Hui Liu, Zhenwei Dai et al.
SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning
Borong Zhang, Yuhao Zhang, Jiaming Ji et al.
AudSemThinker: Enhancing Audio-Language Models Through Reasoning over Semantics of Sound
Gijs Wijngaard, Elia Formisano, Michele Esposito et al.
LODGE: Level-of-Detail Large-Scale Gaussian Splatting with Efficient Rendering
Jonas Kulhanek, Marie-Julie Rakotosaona, Fabian Manhardt et al.
Multimodal Tabular Reasoning with Privileged Structured Information
Jun-Peng Jiang, Yu Xia, Hai-Long Sun et al.
GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling
Jialong Zhou, Lichao Wang, Xiao Yang
Online Experimental Design With Estimation-Regret Trade-off Under Network Interference
Zhiheng Zhang, Zichen Wang
Activation-Informed Merging of Large Language Models
Amin Heyrani Nobari, Kaveh Alimohammadi, Ali ArjomandBigdeli et al.
SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement
Chenyu Yang, Shuai Wang, Hangting Chen et al.
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios
Yang Shi, Huanqian Wang, Xie et al.
System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts
Xiaoqiang Wang, Suyuchen Wang, Yun Zhu et al.
Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning
Pengxiang Li, Zhi Gao, Bofei Zhang et al.
OpenGU: A Comprehensive Benchmark for Graph Unlearning
Bowen Fan, Yuming Ai, Xunkai Li et al.
Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation
François Rozet, Ruben Ohana, Michael McCabe et al.
MLZero: A Multi-Agent System for End-to-end Machine Learning Automation
Haoyang Fang, Boran Han, Nick Erickson et al.
Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting
Anand Bhattad, Konpat Preechakul, Alexei Efros
Root Cause Analysis of Outliers with Missing Structural Knowledge
William Roy Orchard, Nastaran Okati, Sergio Garrido Mejia et al.
Learning normalized image densities via dual score matching
Florentin Guth, Zahra Kadkhodaie, Eero Simoncelli
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
Maximilian Beck, Korbinian Pöppel, Phillip Lippe et al.
DriveDPO: Policy Learning via Safety DPO For End-to-End Autonomous Driving
Shuyao Shang, Yuntao Chen, Yuqi Wang et al.
IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering
Hengyu Liu, Chenxin Li, Zhengxin Li et al.
FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation
Ariel Shaulov, Itay Hazan, Lior Wolf et al.
Stable Port-Hamiltonian Neural Networks
Fabian J. Roth, Dominik K. Klein, Maximilian Kannapinn et al.
MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation
Zhenwen Liang, Linfeng Song, Yang Li et al.
From Judgment to Interference: Early Stopping LLM Harmful Outputs via Streaming Content Monitoring
Yang Li, Qiang Sheng, Yehan Yang et al.
One-Step Diffusion for Detail-Rich and Temporally Consistent Video Super-Resolution
Yujing Sun, Lingchen Sun, Shuaizheng Liu et al.
Non-equilibrium Annealed Adjoint Sampler
Jaemoo Choi, Yongxin Chen, Molei Tao et al.
Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models
Hao Cheng, Erjia Xiao, Jing Shao et al.
Causally Reliable Concept Bottleneck Models
Giovanni De Felice, Arianna Casanova Flores, Francesco De Santis et al.
Overcoming Challenges of Long-Horizon Prediction in Driving World Models
Arian Mousakhan, Sudhanshu Mittal, Silvio Galesso et al.
RoME: Domain-Robust Mixture-of-Experts for MILP Solution Prediction across Domains
Tianle Pu, Zijie Geng, Haoyang Liu et al.
Token Embeddings Violate the Manifold Hypothesis
Michael Robinson, Sourya Dey, Tony Chiang
Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion
Alan Amin, Nate Gruver, Andrew Wilson
MimeQA: Towards Socially-Intelligent Nonverbal Foundation Models
Hengzhi Li, Megan Tjandrasuwita, Yi R. (May) Fung et al.
Training-Free Safe Denoisers for Safe Use of Diffusion Models
Mingyu Kim, Dongjun Kim, Amman Yusuf et al.
CPathAgent: An Agent-based Foundation Model for Interpretable High-Resolution Pathology Image Analysis Mimicking Pathologists' Diagnostic Logic
YUXUAN SUN, Yixuan Si, Chenglu Zhu et al.
Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach
Yunuo Chen, Junli Cao, Vidit Goel et al.
HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance
Jiazi Bu, Pengyang Ling, Yujie Zhou et al.
Scalable Fingerprinting of Large Language Models
Anshul Nasery, Jonathan Hayase, Creston Brooks et al.
REPA Works Until It Doesn’t: Early-Stopped, Holistic Alignment Supercharges Diffusion Training
Ziqiao Wang, Wangbo Zhao, Yuhao Zhou et al.
OSCAR: One-Step Diffusion Codec Across Multiple Bit-rates
Jinpei Guo, Yifei Ji, Zheng Chen et al.
TimE: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenarios
Shaohang Wei, Wei Li, Feifan Song et al.
Foundations of Top-$k$ Decoding for Language Models
Georgy Noarov, Soham Mallick, Tao Wang et al.
Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs
Yaniv Nikankin, Dana Arad, Yossi Gandelsman et al.
Composite Flow Matching for Reinforcement Learning with Shifted-Dynamics Data
Lingkai Kong, Haichuan Wang, Tonghan Wang et al.
COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for Fine-Grained Understanding and Generation
Xueqing Deng, Linjie Yang, Qihang Yu et al.