Most Cited NEURIPS "gradient descent trajectory" Papers
5,858 papers found • Page 3 of 30
Conference
Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions
Siqiao Mu, Diego Klabjan
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Marlon Becker, Frederick Altrock, Benjamin Risse
VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation
Wenhao Wang, Yi Yang
Equivariance Everywhere All At Once: A Recipe for Graph Foundation Models
Ben Finkelshtein, Ismail Ilkan Ceylan, Michael Bronstein et al.
First SFT, Second RL, Third UPT: Continual Improving Multi-Modal LLM Reasoning via Unsupervised Post-Training
Lai Wei, Yuting Li, Chen Wang et al.
Emergent Temporal Correspondences from Video Diffusion Transformers
Jisu Nam, Soowon Son, Dahyun Chung et al.
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)
Tianyi Zhang, Mohsen Hariri, Shaochen (Henry) Zhong et al.
MDNS: Masked Diffusion Neural Sampler via Stochastic Optimal Control
Yuchen Zhu, Wei Guo, Jaemoo Choi et al.
AdvPrefix: An Objective for Nuanced LLM Jailbreaks
Sicheng Zhu, Brandon Amos, Yuandong Tian et al.
Enhancing Multilingual LLM Pretraining with Model-Based Data Selection
Bettina Messmer, Vinko Sabolčec, Martin Jaggi
Scaling Laws for Optimal Data Mixtures
Mustafa Shukor, Louis Bethune, Dan Busbridge et al.
RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics
Jie Zhang, Cezara Petrui, Kristina Nikolić et al.
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning of Vision Language Models
Huajie Tan, Yuheng Ji, Xiaoshuai Hao et al.
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
Jae-Won Chung, Jeff J. Ma, Ruofan Wu et al.
Bayesian Concept Bottleneck Models with LLM Priors
Jean Feng, Avni Kothari, Lucas Zier et al.
DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents
Hao Li, Xiaogeng Liu, CHIU Chun et al.
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling
Tsung-Han (Patrick) Wu, Heekyung Lee, Jiaxin Ge et al.
$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
Jin Zhou, Kaiwen Wang, Jonathan Chang et al.
Gemstones: A Model Suite for Multi-Faceted Scaling Laws
Sean McLeish, John Kirchenbauer, David Miller et al.
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text
Nikhil Kandpal, Brian Lester, Colin Raffel et al.
CLEVER: A Curated Benchmark for Formally Verified Code Generation
Amitayush Thakur, Jasper Lee, George Tsoukalas et al.
DOTA: Distributional Test-time Adaptation of Vision-Language Models
Zongbo Han, Jialong Yang, Guangyu Wang et al.
MagCache: Fast Video Generation with Magnitude-Aware Cache
Zehong Ma, Longhui Wei, Feng Wang et al.
MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem
Fan LIU, Zherui Yang, Cancheng Liu et al.
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
Tianyu Fu, Yi Ge, Yichen You et al.
Deep Nonlinear Sufficient Dimension Reduction
Yinfeng Chen, Yuling Jiao, Rui Qiu et al.
Markov Persuasion Processes: Learning to Persuade From Scratch
Francesco Bacchiocchi, Francesco Emanuele Stradi, Matteo Castiglioni et al.
GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling
Jialong Zhou, Lichao Wang, Xiao Yang
RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards
jingnan zheng, Xiangtian Ji, Yijun Lu et al.
Quality-Driven Curation of Remote Sensing Vision-Language Data via Learned Scoring Models
Dilxat Muhtar, Enzhuo Zhang, Zhenshi Li et al.
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
Zeqian Li, Shangzhe Di, Zhonghua Zhai et al.
LLM-PySC2: Starcraft II learning environment for Large Language Models
Zongyuan Li, Yanan Ni, Runnan Qi et al.
On Reasoning Strength Planning in Large Reasoning Models
Leheng Sheng, An Zhang, Zijian Wu et al.
Guided Diffusion Sampling on Function Spaces with Applications to PDEs
Jiachen Yao, Abbas Mammadov, Julius Berner et al.
HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction
Jikai Wang, Qifan Zhang, Yu-Wei Chao et al.
Theoretically Grounded Framework for LLM Watermarking: A Distribution-Adaptive Approach
Haiyun He, Yepeng Liu, Ziqiao Wang et al.
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models
Liyan Tang, Grace Kim, Xinyu Zhao et al.
Repo2Run: Automated Building Executable Environment for Code Repository at Scale
Ruida Hu, Chao Peng, XinchenWang et al.
BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems
Andy Zhang, Joey Ji, Celeste Menders et al.
Learning Pattern-Specific Experts for Time Series Forecasting Under Patch-level Distribution Shift
Yanru Sun, Zongxia Xie, Emadeldeen Eldele et al.
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics
Dongyoung Kim, Huiwon Jang, Sumin Park et al.
RepLDM: Reprogramming Pretrained Latent Diffusion Models for High-Quality, High-Efficiency, High-Resolution Image Generation
Boyuan Cao, Jiaxin Ye, Yujie Wei et al.
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
Denis Sutter, Julian Minder, Thomas Hofmann et al.
Advancing Expert Specialization for Better MoE
Hongcan Guo, Haolang Lu, Guoshun Nan et al.
Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning
Haozhen Zhang, Tao Feng, Jiaxuan You
Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model
Dongki Kim, Wonbin Lee, Sung Ju Hwang
PMQ-VE: Progressive Multi-Frame Quantization for Video Enhancement
ZhanFeng Feng, Long Peng, Xin Di et al.
EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification
Lin Zhang, Wenshuo Dong, Zhuoran Zhang et al.
Reasoning as an Adaptive Defense for Safety
Taeyoun Kim, Fahim Tajwar, Aditi Raghunathan et al.
AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks
Fali Wang, Hui Liu, Zhenwei Dai et al.
FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks
Luca Della Libera, Francesco Paissan, Cem Subakan et al.
Distributive Fairness in Large Language Models: Evaluating Alignment with Human Values
Hadi Hosseini, Samarth Khanna
Shortcuts and Identifiability in Concept-based Models from a Neuro-Symbolic Lens
Samuele Bortolotti, Emanuele Marconato, Paolo Morettin et al.
Diffusion Tree Sampling: Scalable inference‑time alignment of diffusion models
Vineet Jain, Kusha Sareen, Mohammad Pedramfar et al.
Solving Inequality Proofs with Large Language Models
Jiayi Sheng, Luna Lyu, Jikai Jin et al.
EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge
Ruskin Raj Manku, Yuzhi Tang, Xingjian Shi et al.
JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
Yunlong Lin, Zixu Lin, Kunjie Lin et al.
Enhancing Time Series Forecasting through Selective Representation Spaces: A Patch Perspective
Xingjian Wu, Xiangfei Qiu, Hanyin Cheng et al.
Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation
Shuo Wang, Yongcai Wang, Wanting Li et al.
RBench-V: A Primary Assessment for Visual Reasoning Models with Multimodal Outputs
Meng-Hao Guo, Xuanyu Chu, Qianrui Yang et al.
OLinear: A Linear Model for Time Series Forecasting in Orthogonally Transformed Domain
Wenzhen Yue, Yong Liu, Hao Wang et al.
EndoBench: A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis
Shengyuan Liu, Boyun Zheng, Wenting Chen et al.
Diffusion Transformers as Open-World Spatiotemporal Foundation Models
Yuan Yuan, Chonghua Han, Jingtao Ding et al.
DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution
Zheng Chen, Zichen Zou, Kewei Zhang et al.
AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws
Oren Neumann, Claudius Gros
CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward
Yandong Guan, Xilin Wang, XiMing Xing et al.
StateSpaceDiffuser: Bringing Long Context to Diffusion World Models
Nedko Savov, Naser Kazemi, Deheng Zhang et al.
SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications
Jinyang Li, Xiaolong Li, Ge Qu et al.
Incomplete Multi-view Deep Clustering with Data Imputation and Alignment
Jiyuan Liu, Xinwang Liu, Xinhang Wan et al.
Temporal Chain of Thought: Long-Video Understanding by Thinking in Frames
Anurag Arnab, Ahmet Iscen, Mathilde Caron et al.
Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach
Yunuo Chen, Junli Cao, Vidit Goel et al.
Can DPO Learn Diverse Human Values? A Theoretical Scaling Law
Shawn Im, Sharon Li
Safe RLHF-V: Safe Reinforcement Learning from Multi-modal Human Feedback
Jiaming Ji, Xinyu Chen, Rui Pan et al.
INST-IT: Boosting Instance Understanding via Explicit Visual Prompt Instruction Tuning
Wujian Peng, Lingchen Meng, Yitong Chen et al.
Scalable Fingerprinting of Large Language Models
Anshul Nasery, Jonathan Hayase, Creston Brooks et al.
PanTS: The Pancreatic Tumor Segmentation Dataset
Wenxuan Li, Xinze Zhou, Qi Chen et al.
VideoMAR: Autoregressive Video Generation with Continuous Tokens
Hu Yu, Biao Gong, Hangjie Yuan et al.
Do-PFN: In-Context Learning for Causal Effect Estimation
Jake Robertson, Arik Reuter, Siyuan Guo et al.
MergeBench: A Benchmark for Merging Domain-Specialized LLMs
Yifei He, Siqi Zeng, Yuzheng Hu et al.
Continual Multimodal Contrastive Learning
Xiaohao Liu, Xiaobo Xia, See-Kiong Ng et al.
Learning World Models for Interactive Video Generation
Taiye Chen, Xun Hu, Zihan Ding et al.
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
Zeyuan Allen-Zhu
GaussianFusion: Gaussian-Based Multi-Sensor Fusion for End-to-End Autonomous Driving
Shuai Liu, Quanmin Liang, Zefeng Li et al.
PhysX-3D: Physical-Grounded 3D Asset Generation
Ziang Cao, Zhaoxi Chen, Liang Pan et al.
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation
Jiaxin Huang, Runnan Chen, Ziwen Li et al.
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Hyungjoo Chae, Seonghwan Kim, Junhee Cho et al.
Sample complexity of data-driven tuning of model hyperparameters in neural networks with structured parameter-dependent dual function
Maria-Florina Balcan, Anh Nguyen, Dravyansh Sharma
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
Zichen Wen, Shaobo Wang, Yufa Zhou et al.
Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization
Daniel Palenicek, Florian Vogt, Joe Watson et al.
GC4NC: A Benchmark Framework for Graph Condensation on Node Classification with New Insights
Shengbo Gong, Juntong Ni, Noveen Sachdeva et al.
Escaping Collapse: The Strength of Weak Data for Large Language Model Training
Kareem Amin, Sara Babakniya, Alex Bie et al.
How do Transformers Learn Implicit Reasoning?
Jiaran Ye, Zijun Yao, Zhidian Huang et al.
From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots
Yuxuan Wang, Ming Yang, Gang Ding et al.
Genesis: Multimodal Driving Scene Generation with Spatio-Temporal and Cross-Modal Consistency
Xiangyu Guo, Zhanqian Wu, Kaixin Xiong et al.
VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models
Chongkai Gao, Zixuan Liu, Zhenghao Chi et al.
EgoBlind: Towards Egocentric Visual Assistance for the Blind
Junbin Xiao, Nanxin Huang, Hao Qiu et al.
VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents
Kangrui Wang, Pingyue Zhang, Zihan Wang et al.
Flow-Based Policy for Online Reinforcement Learning
Lei Lv, Yunfei Li, Yu Luo et al.
CPathAgent: An Agent-based Foundation Model for Interpretable High-Resolution Pathology Image Analysis Mimicking Pathologists' Diagnostic Logic
YUXUAN SUN, Yixuan Si, Chenglu Zhu et al.
Face-Human-Bench: A Comprehensive Benchmark of Face and Human Understanding for Multi-modal Assistants
Lixiong Qin, Shilong Ou, Miaoxuan Zhang et al.
SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning
Jiaqi Huang, Zunnan Xu, Jun Zhou et al.
From Specificity to Generality: Revisiting Generalizable Artifacts in Detecting Face Deepfakes
Long Ma, Zhiyuan Yan, Jin Xu et al.
HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts
Neil He, Rishabh Anand, Hiren Madhu et al.
KVFlow: Efficient Prefix Caching for Accelerating LLM-Based Multi-Agent Workflows
Zaifeng Pan, AJJKUMAR DAHYALAL PATEL, Yipeng Shen et al.
OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection
Zhongyu Xia, Jishuo Li, Zhiwei Lin et al.
Model Provenance Testing for Large Language Models
Ivica Nikolic, Teodora Baluta, Prateek Saxena
Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting
Anand Bhattad, Konpat Preechakul, Alexei Efros
Implicit Bias of Spectral Descent and Muon on Multiclass Separable Data
Chen Fan, Mark Schmidt, Christos Thrampoulidis
DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data
Ruiqi Wu, Xinjie wang, Liu.Liu et al.
Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training
Mengru Wang, Xingyu Chen, Yue Wang et al.
Direct Alignment with Heterogeneous Preferences
Ali Shirali, Arash Nasr-Esfahany, Abdullah Alomar et al.
Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion
Alan Amin, Nate Gruver, Andrew Wilson
COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for Fine-Grained Understanding and Generation
Xueqing Deng, Linjie Yang, Qihang Yu et al.
LLMs Encode Harmfulness and Refusal Separately
Jiachen Zhao, Jing Huang, Zhengxuan Wu et al.
Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling
Michal Balcerak, Tamaz Amiranashvili, Antonio Terpin et al.
UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning
Xiangyu Wang, Donglin Yang, Yue Liao et al.
Information-Driven Design of Imaging Systems
Henry Pinkard, Leyla Kabuli, Eric Markley et al.
Learning normalized image densities via dual score matching
Florentin Guth, Zahra Kadkhodaie, Eero Simoncelli
FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion
Akide Liu, Zeyu Zhang, Zhexin Li et al.
SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent
Yandan Yang, Baoxiong Jia, Shujie Zhang et al.
VIKI‑R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning
Li Kang, Xiufeng Song, Heng Zhou et al.
Measuring what Matters: Construct Validity in Large Language Model Benchmarks
Andrew M. Bean, Ryan Othniel Kearns, Angelika Romanou et al.
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding
Chongjun Tu, Lin Zhang, pengtao chen et al.
Learning Robust Spectral Dynamics for Temporal Domain Generalization
En Yu, Jie Lu, Xiaoyu Yang et al.
Foundations of Top-$k$ Decoding for Language Models
Georgy Noarov, Soham Mallick, Tao Wang et al.
REPA Works Until It Doesn’t: Early-Stopped, Holistic Alignment Supercharges Diffusion Training
Ziqiao Wang, Wangbo Zhao, Yuhao Zhou et al.
Scaling Embedding Layers in Language Models
Da Yu, Edith Cohen, Badih Ghazi et al.
APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning
Azim Ospanov, Farzan Farnia, Roozbeh Yousefzadeh
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Generation
Zheng Anlin, Xin Wen, Xuanyang Zhang et al.
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
Maximilian Beck, Korbinian Pöppel, Phillip Lippe et al.
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
Gang Li, Ming Lin, Tomer Galanti et al.
H3D-DGS: Exploring Heterogeneous 3D Motion Representation for Deformable 3D Gaussian Splatting
Bing He, Yunuo Chen, Guo Lu et al.
Combining Cost Constrained Runtime Monitors for AI Safety
Tim Hua, James Baskerville, Henri Lemoine et al.
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges
Khaoula Chehbouni, Mohammed Haddou, Jackie CK Cheung et al.
PurpCode: Reasoning for Safer Code Generation
Jiawei Liu, Nirav Diwan, Zhe Wang et al.
Beyond Verifiable Rewards: Scaling Reinforcement Learning in Language Models to Unverifiable Data
Yunhao Tang, Sid Wang, Lovish Madaan et al.
WHAT MAKES MATH PROBLEMS HARD FOR REINFORCEMENT LEARNING: A CASE STUDY
Ali Shehper, Anibal Medina-Mardones, Lucas Fagan et al.
MimeQA: Towards Socially-Intelligent Nonverbal Foundation Models
Hengzhi Li, Megan Tjandrasuwita, Yi R. (May) Fung et al.
Generative Pre-trained Autoregressive Diffusion Transformer
Yuan Zhang, Jiacheng Jiang, Guoqing Ma et al.
Latent Chain-of-Thought for Visual Reasoning
Guohao Sun, Hang Hua, Jian Wang et al.
From Judgment to Interference: Early Stopping LLM Harmful Outputs via Streaming Content Monitoring
Yang Li, Qiang Sheng, Yehan Yang et al.
Vision-centric Token Compression in Large Language Model
Ling Xing, Alex Jinpeng Wang, Rui Yan et al.
Among Us: A Sandbox for Measuring and Detecting Agentic Deception
Satvik Golechha, Adrià Garriga-Alonso
Geometry Aware Operator Transformer as an efficient and accurate neural surrogate for PDEs on arbitrary domains
Shizheng Wen, Arsh Kumbhat, Levi Lingsch et al.
Neighborhood Self-Dissimilarity Attention for Medical Image Segmentation
Junren Chen, Rui Chen, Wei Wang et al.
JAFAR: Jack up Any Feature at Any Resolution
Paul Couairon, Loïck Chambon, Louis Serrano et al.
Rethinking Verification for LLM Code Generation: From Generation to Testing
Zihan Ma, Taolin Zhang, Maosongcao et al.
LLM Strategic Reasoning: Agentic Study through Behavioral Game Theory
Jingru Jia, Zehua Yuan, Junhao Pan et al.
Don't Just Chase “Highlighted Tokens” in MLLMs: Revisiting Visual Holistic Context Retention
Xin Zou, Di Lu, Yizhou Wang et al.
Generative RLHF-V: Learning Principles from Multi-modal Human Preference
Jiayi Zhou, Jiaming Ji, Boyuan Chen et al.
A Generalist Intracortical Motor Decoder
Joel Ye, Fabio Rizzoglio, Xuan Ma et al.
What Do Latent Action Models Actually Learn?
Chuheng Zhang, Tim Pearce, Pushi Zhang et al.
Vanish into Thin Air: Cross-prompt Universal Adversarial Attacks for SAM2
Ziqi Zhou, Yifan Hu, Yufei Song et al.
Privacy amplification by random allocation
Moshe Shenfeld, Vitaly Feldman
Finite-Sample Analysis of Policy Evaluation for Robust Average Reward Reinforcement Learning
Yang Xu, Washim Mondal, Vaneet Aggarwal
FlexSelect: Flexible Token Selection for Efficient Long Video Understanding
yunzhu zhang, Yu Lu, Tianyi Wang et al.
GSRF: Complex-Valued 3D Gaussian Splatting for Efficient Radio-Frequency Data Synthesis
Kang Yang, Gaofeng Dong, Sijie Ji et al.
Training-Free Constrained Generation With Stable Diffusion Models
Stefano Zampini, Jacob K Christopher, Luca Oneto et al.
FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation
Ariel Shaulov, Itay Hazan, Lior Wolf et al.
SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning
Borong Zhang, Yuhao Zhang, Jiaming Ji et al.
Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
Xuannan Liu, Zekun Li, Zheqi He et al.
Online Experimental Design With Estimation-Regret Trade-off Under Network Interference
Zhiheng Zhang, Zichen Wang
AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding
Xue zhucun, Jiangning Zhang, Xie Xurong et al.
Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents
Yaxin Luo, Zhaoyi Li, Jiacheng Liu et al.
EchoShot: Multi-Shot Portrait Video Generation
Jiahao Wang, Hualian Sheng, Sijia Cai et al.
Solving Inverse Problems with FLAIR
Julius Erbach, Dominik Narnhofer, Andreas Dombos et al.
MLZero: A Multi-Agent System for End-to-end Machine Learning Automation
Haoyang Fang, Boran Han, Nick Erickson et al.
Amortized Sampling with Transferable Normalizing Flows
Charlie Tan, Majdi Hassan, Leon Klein et al.
SnapMoGen: Human Motion Generation from Expressive Texts
chuan guo, Inwoo Hwang, Jian Wang et al.
LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS
Wanhua Li, Yujie Zhao, Minghan Qin et al.
Depth-Bounds for Neural Networks via the Braid Arrangement
Moritz Grillo, Christoph Hertrich, Georg Loho
Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation
François Rozet, Ruben Ohana, Michael McCabe et al.
Enhancing 3D Reconstruction for Dynamic Scenes
Jisang Han, Honggyu An, Jaewoo Jung et al.
CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment
Qinfeng Li, Tianyue Luo, Xuhong Zhang et al.
Spatial Understanding from Videos: Structured Prompts Meet Simulation Data
Haoyu Zhang, Meng Liu, Zaijing Li et al.
Value-Guided Search for Efficient Chain-of-Thought Reasoning
Kaiwen Wang, Jin Zhou, Jonathan Chang et al.
Stable Port-Hamiltonian Neural Networks
Fabian J. Roth, Dominik K. Klein, Maximilian Kannapinn et al.
MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement
Jaehyun Nam, Jinsung Yoon, Jiefeng Chen et al.
Activation-Informed Merging of Large Language Models
Amin Heyrani Nobari, Kaveh Alimohammadi, Ali ArjomandBigdeli et al.
Mesh-RFT: Enhancing Mesh Generation via Fine-grained Reinforcement Fine-Tuning
Jian Liu, Jing Xu, Song Guo et al.
Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents
Qizheng Zhang, Michael Wornow, Kunle Olukotun
Locality in Image Diffusion Models Emerges from Data Statistics
Artem Lukoianov, Chenyang Yuan, Justin Solomon et al.
Kinetics: Rethinking Test-Time Scaling Law
Ranajoy Sadhukhan, Zhuoming Chen, Haizhong Zheng et al.
Object-centric binding in Contrastive Language-Image Pretraining
Rim Assouel, Pietro Astolfi, Florian Bordes et al.
Emergence and Evolution of Interpretable Concepts in Diffusion Models
Berk Tinaz, Zalan Fabian, Mahdi Soltanolkotabi
OSCAR: One-Step Diffusion Codec Across Multiple Bit-rates
Jinpei Guo, Yifei Ji, Zheng Chen et al.
Sculpting Features from Noise: Reward-Guided Hierarchical Diffusion for Task-Optimal Feature Transformation
Nanxu Gong, Zijun Li, Sixun Dong et al.
Variational Regularized Unbalanced Optimal Transport: Single Network, Least Action
Yuhao Sun, Zhenyi Zhang, Zihan Wang et al.
FlashMD: long-stride, universal prediction of molecular dynamics
Filippo Bigi, Sanggyu Chong, Agustinus Kristiadi et al.
Extrapolation by Association: Length Generalization Transfer In Transformers
Ziyang Cai, Nayoung Lee, Avi Schwarzschild et al.
Scaling Laws for Gradient Descent and Sign Descent for Linear Bigram Models under Zipf’s Law
Frederik Kunstner, Francis Bach
Hyperbolic Dataset Distillation
Wenyuan Li, Guang Li, Keisuke Maeda et al.
ConTextTab: A Semantics-Aware Tabular In-Context Learner
Marco Spinaci, Marek Polewczyk, Maximilian Schambach et al.
DynamicVL: Benchmarking Multimodal Large Language Models for Dynamic City Understanding
Weihao Xuan, Junjue Wang, Heli Qi et al.
RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness
Fanhu Zeng, Haiyang Guo, Fei Zhu et al.
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models
Hongbo Liu, Jingwen He, Yi Jin et al.
Mitra: Mixed Synthetic Priors for Enhancing Tabular Foundation Models
Xiyuan Zhang, Danielle Maddix Robinson, Junming Yin et al.
GraphLand: Evaluating Graph Machine Learning Models on Diverse Industrial Data
Gleb Bazhenov, Oleg Platonov, Liudmila Prokhorenkova
Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness
Thomas Pethick, Wanyun Xie, Mete Erdogan et al.
DriveDPO: Policy Learning via Safety DPO For End-to-End Autonomous Driving
Shuyao Shang, Yuntao Chen, Yuqi Wang et al.