Most Cited NEURIPS "video-level data synthesis" Papers
5,858 papers found • Page 6 of 30
Conference
Thought Communication in Multiagent Collaboration
Yujia Zheng, Zhuokai Zhao, Zijian Li et al.
Constant Bit-size Transformers Are Turing Complete
Qian Li, Yuyi Wang
Pareto-Optimal Energy Alignment for Designing Nature-Like Antibodies
Yibo Wen, Chenwei Xu, Jerry Yao-Chieh Hu et al.
GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning
Shutong Ding, Ke Hu, Shan Zhong et al.
Distributionally Robust Learning for Multi-source Unsupervised Domain Adaptation
Zhenyu Wang, Peter Bühlmann, Zijian Guo
SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents
Wanxin Tian, Shijie Zhang, Kevin Zhang et al.
GoRA: Gradient-driven Adaptive Low Rank Adaptation
haonan he, Peng Ye, Yuchen Ren et al.
LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits
Duy Nguyen, Archiki Prasad, Elias Stengel-Eskin et al.
Understanding Representation Dynamics of Diffusion Models via Low-Dimensional Modeling
Xiao Li, Zekai Zhang, Xiang Li et al.
Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization
Wenqi Liu, Xuemeng Song, Jiaxi Li et al.
CAT: Content-Adaptive Image Tokenization
Junhong Shen, Kushal Tirumala, Michihiro Yasunaga et al.
Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy
Bogdan Kulynych, Juan Gomez, Georgios Kaissis et al.
MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds
Bingquan Dai, Luo Li, Qihong Tang et al.
Towards foundational LiDAR world models with efficient latent flow matching
Tianran Liu, Shengwen Zhao, Nicholas Rhinehart
ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback
Litao Guo, Xinli Xu, Luozhou Wang et al.
Towards Doctor-Like Reasoning: Medical RAG Fusing Knowledge with Patient Analogy through Textual Gradients
Yuxing Lu, Gecheng Fu, Wei Wu et al.
Token Perturbation Guidance for Diffusion Models
Javad Rajabi, Soroush Mehraban, Seyedmorteza Sadat et al.
Short-length Adversarial Training Helps LLMs Defend Long-length Jailbreak Attacks: Theoretical and Empirical Evidence
Shaopeng Fu, Liang Ding, Jingfeng ZHANG et al.
Video Perception Models for 3D Scene Synthesis
Rui Huang, Guangyao Zhai, Zuria Bauer et al.
EgoBridge: Domain Adaptation for Generalizable Imitation from Egocentric Human Data
Ryan Punamiya, Dhruv Patel, Patcharapong Aphiwetsa et al.
FAIR Universe HiggsML Uncertainty Dataset and Competition
Wahid Bhimji, Ragansu Chakkappai, Po-Wen Chang et al.
Encoder-Decoder Diffusion Language Models for Efficient Training and Inference
Marianne Arriola, Yair Schiff, Hao Phung et al.
Secure and Confidential Certificates of Online Fairness
Olive Franzese, Ali Shahin Shamsabadi, Carter Luck et al.
From Replication to Redesign: Exploring Pairwise Comparisons for LLM-Based Peer Review
Yaohui Zhang, Haijing ZHANG, Wenlong Ji et al.
1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering
Yuheng Yuan, Qiuhong Shen, Xingyi Yang et al.
MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems
Xuanming Zhang, Yuxuan Chen, Samuel (Min-Hsuan) Yeh et al.
RIGNO: A Graph-based Framework For Robust And Accurate Operator Learning For PDEs On Arbitrary Domains
Sepehr Mousavi, Shizheng Wen, Levi Lingsch et al.
Towards Unified and Lossless Latent Space for 3D Molecular Latent Diffusion Modeling
Yanchen Luo, ZHIYUAN LIU, Yi Zhao et al.
$\Psi$-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models
Taehoon Yoon, Yunhong Min, Kyeongmin Yeo et al.
TRACE: Grounding Time Series in Context for Multimodal Embedding and Retrieval
Jialin Chen, Ziyu Zhao, Gaukhar Nurbek et al.
Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime
Amit Attia, Matan Schliserman, Uri Sherman et al.
Structured Reinforcement Learning for Combinatorial Decision-Making
Heiko Hoppe, Léo Baty, Louis Bouvier et al.
Scaling Offline RL via Efficient and Expressive Shortcut Models
Nicolas Espinosa-Dice, Yiyi Zhang, Yiding Chen et al.
SceneSplat++: A Large Dataset and Comprehensive Benchmark for Language Gaussian Splatting
Mengjiao Ma, Qi Ma, Yue Li et al.
Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools
Kanghua Mo, Li Hu, Yucheng Long et al.
NavBench: Probing Multimodal Large Language Models for Embodied Navigation
Yanyuan Qiao, Haodong Hong, Wenqi Lyu et al.
Absorb and Converge: Provable Convergence Guarantee for Absorbing Discrete Diffusion Models
Yuchen Liang, Renxiang Huang, Lifeng LAI et al.
Optimal Spectral Transitions in High-Dimensional Multi-Index Models
Leonardo Defilippis, Yatin Dandi, Pierre Mergny et al.
When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
Yan Shu, Hangui Lin, Yexin Liu et al.
Curly Flow Matching for Learning Non-gradient Field Dynamics
Katarina Petrović, Lazar Atanackovic, Viggo Moro et al.
On the Loss of Context Awareness in General Instruction Fine-tuning
Yihan Wang, Andrew Bai, Nanyun Peng et al.
Test3R: Learning to Reconstruct 3D at Test Time
Yuheng Yuan, Qiuhong Shen, Shizun Wang et al.
Decompile-Bench: Million-Scale Binary-Source Function Pairs for Real-World Binary Decompilation
hanzhuo tan, Xiaolong Tian, Hanrui Qi et al.
Dynamical Low-Rank Compression of Neural Networks with Robustness under Adversarial Attacks
Steffen Schotthöfer, Lexie Yang, Stefan Schnake
Deep Continuous-Time State-Space Models for Marked Event Sequences
Yuxin Chang, Alex Boyd, Cao (Danica) Xiao et al.
Mask Image Watermarking
Runyi Hu, Jie Zhang, Shiqian Zhao et al.
Týr-the-Pruner: Structural Pruning LLMs via Global Sparsity Distribution Optimization
Guanchen Li, Yixing Xu, Zeping Li et al.
KLASS: KL-Guided Fast Inference in Masked Diffusion Models
Seo Hyun Kim, Sunwoo Hong, Hojung Jung et al.
Matryoshka Pilot: Learning to Drive Black-Box LLMs with LLMs
ChangHao Li, Yuchen Zhuang, Rushi Qiang et al.
Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding
Xiaoqian Shen, Wenxuan Zhang, Jun Chen et al.
On scalable and efficient training of diffusion samplers
Minkyu Kim, Kiyoung Seong, Dongyeop Woo et al.
T2V-OptJail: Discrete Prompt Optimization for Text-to-Video Jailbreak Attacks
Jiayang Liu, Siyuan Liang, Shiqian Zhao et al.
StreamForest: Efficient Online Video Understanding with Persistent Event Memory
Xiangyu Zeng, Kefan Qiu, Qingyu Zhang et al.
Doubly Robust Alignment for Large Language Models
Erhan Xu, Kai Ye, Hongyi Zhou et al.
Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models
Ilgee Hong, Changlong Yu, Liang Qiu et al.
Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
Will Merrill, Shane Arora, Dirk Groeneveld et al.
SQLens: An End-to-End Framework for Error Detection and Correction in Text-to-SQL
Yue Gong, Chuan Lei, Xiao Qin et al.
Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following
Vivek Myers, Bill Zheng, Anca Dragan et al.
EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting
Bohao Liao, Wei Zhai, Zengyu Wan et al.
EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?
Yuqian Yuan, Ronghao Dang, long li et al.
DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation
Jiashuo Sun, Xianrui Zhong, Sizhe Zhou et al.
Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation
Chaoyang Wang, Ashkan Mirzaei, Vidit Goel et al.
Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving of Inequalities
Haoyu Zhao, Yihan Geng, Shange Tang et al.
ROSE: Remove Objects with Side Effects in Videos
Chenxuan Miao, Yutong Feng, Jianshu Zeng et al.
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks
Vishnu Sarukkai, Zhiqiang Xie, Kayvon Fatahalian
Towards Better Dental AI: A Multimodal Benchmark and Instruction Dataset for Panoramic X-ray Analysis
Jing Hao, Yuxuan Fan, Yanpeng Sun et al.
Seg4Diff: Unveiling Open-Vocabulary Semantic Segmentation in Text-to-Image Diffusion Transformers
Chaehyun Kim, Heeseong Shin, Eunbeen Hong et al.
Asymptotics of SGD in Sequence-Single Index Models and Single-Layer Attention Networks
Luca Arnaboldi, Bruno Loureiro, Ludovic Stephan et al.
LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers
Yusuf Dalva, Hidir Yesiltepe, Pinar Yanardag
Demystifying Language Model Forgetting with Low-rank Example Associations
Xisen Jin, Xiang Ren
Geometric Learning with Positively Decomposable Kernels
Nathael Da Costa, Cyrus Mostajeran, Juan-Pablo Ortega et al.
Time Series Generation Under Data Scarcity: A Unified Generative Modeling Approach
Tal Gonen, Itai Pemper, Ilan Naiman et al.
Improved Representation Steering for Language Models
Zhengxuan Wu, Qinan Yu, Aryaman Arora et al.
PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation
Ao Wang, Hui Chen, Jianchao Tan et al.
FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts
Heming Zou, Yunliang Zang, Wutong Xu et al.
Time Travel is Cheating: Going Live with DeepFund for Real-Time Fund Investment Benchmarking
Changlun Li, Yao SHI, Chen Wang et al.
A Tale of Two Symmetries: Exploring the Loss Landscape of Equivariant Models
YuQing Xie, Tess Smidt
Behavior Injection: Preparing Language Models for Reinforcement Learning
Zhepeng Cen, Yihang Yao, William Han et al.
ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS
Weijie Wang, Donny Y. Chen, Zeyu Zhang et al.
High-Dimensional Calibration from Swap Regret
Maxwell Fishelson, Noah Golowich, Mehryar Mohri et al.
Probabilistic Stability Guarantees for Feature Attributions
Helen Jin, Anton Xue, Weiqiu You et al.
Reverse Diffusion Sequential Monte Carlo Samplers
Luhuan Wu, Yi Han, Christian Andersson Naesseth et al.
Scaling Speculative Decoding with Lookahead Reasoning
Yichao Fu, Rui Ge, Zelei Shao et al.
Beyond Scalars: Concept-Based Alignment Analysis in Vision Transformers
Johanna Vielhaben, Dilyara Bareeva, Jim Berend et al.
AgentBreeder: Mitigating the AI Safety Risks of Multi-Agent Scaffolds via Self-Improvement
J Rosser, Jakob Foerster
Parameter Efficient Fine-tuning via Explained Variance Adaptation
Fabian Paischer, Lukas Hauzenberger, Thomas Schmied et al.
State Entropy Regularization for Robust Reinforcement Learning
Yonatan Ashlag, Uri Koren, Mirco Mutti et al.
IGD: Token Decisiveness Modeling via Information Gain in LLMs for Personalized Recommendation
Zijie Lin, Yang Zhang, Xiaoyan Zhao et al.
Seeing the Arrow of Time in Large Multimodal Models
Zihui (Sherry) Xue, Romy Luo, Kristen Grauman
Execution Guided Line-by-Line Code Generation
Boaz Lavon, Shahar Katz, Lior Wolf
Truth over Tricks: Measuring and Mitigating Shortcut Learning in Misinformation Detection
Herun Wan, Jiaying Wu, Minnan Luo et al.
The Hawthorne Effect in Reasoning Models: Evaluating and Steering Test Awareness
Sahar Abdelnabi, Ahmed Salem
Thinking in Character: Advancing Role-Playing Agents with Role-Aware Reasoning
Yihong Tang, Kehai Chen, Muyun Yang et al.
Normalized Attention Guidance: Universal Negative Guidance for Diffusion Models
Dar-Yen Chen, Hmrishav Bandyopadhyay, Kai Zou et al.
Large Language Models Think Too Fast To Explore Effectively
Lan Pan, Hanbo Xie, Robert Wilson
DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy
Yuran Wang, Ruihai Wu, Yue Chen et al.
Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning
Julian Minder, Clément Dumas, Caden Juang et al.
One-Step Offline Distillation of Diffusion-based Models via Koopman Modeling
Nimrod Berman, Ilan Naiman, Moshe Eliasof et al.
FlexOLMo: Open Language Models for Flexible Data Use
Weijia Shi, Akshita Bhagia, Kevin Farhat et al.
Uniform Generalization Bounds on Data-Dependent Hypothesis Sets via PAC-Bayesian Theory on Random Sets
Benjamin Dupuis, Paul Viallard, George Deligiannidis et al.
Generalizable, real-time neural decoding with hybrid state-space models
Avery Hee-Woon Ryoo, Nanda H Krishna, Ximeng Mao et al.
Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation
David Heineman, Valentin Hofmann, Ian Magnusson et al.
New Perspectives on the Polyak Stepsize: Surrogate Functions and Negative Results
Francesco Orabona, Ryan D'Orazio
Fast Inference for Augmented Large Language Models
Rana Shahout, Cong Liang, Shiji Xin et al.
SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic Reasoning
Yiting Wang, Wanghao Ye, Ping Guo et al.
Estimating Model Performance Under Covariate Shift Without Labels
Jakub Białek, Juhani Kivimäki, Wojciech Kuberski et al.
DyMU: Dynamic Merging and Virtual Unmerging for Efficient Variable-Length VLMs
Zhenhailong Wang, Senthil Purushwalkam, Caiming Xiong et al.
GeoSVR: Taming Sparse Voxels for Geometrically Accurate Surface Reconstruction
Jiahe Li, Jiawei Zhang, Youmin Zhang et al.
Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning
Ali Taghibakhshi, Sharath Turuvekere Sreenivas, Saurav Muralidharan et al.
Walking the Tightrope: Autonomous Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning
Xiaoyu Yang, Jie Lu, En Yu
Scaffolding Dexterous Manipulation with Vision-Language Models
Vincent de Bakker, Joey Hejna, Tyler Lum et al.
Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference
Weizhi Fei, Xueyan Niu, XIE GUOQING et al.
PhysDrive: A Multimodal Remote Physiological Measurement Dataset for In-vehicle Driver Monitoring
Wang, Xiao Yang, Qingyong Hu et al.
SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification
Zhenglin Lai, Mengyao Liao, Bingzhe Wu et al.
Audio-Sync Video Generation with Multi-Stream Temporal Control
Shuchen Weng, Haojie Zheng, zheng chang et al.
Shape it Up! Restoring LLM Safety during Finetuning
ShengYun Peng, Pin-Yu Chen, Jianfeng Chi et al.
Towards Resilient Safety-driven Unlearning for Diffusion Models against Downstream Fine-tuning
Boheng Li, Renjie Gu, Junjie Wang et al.
Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents
Han Lin, Jaemin Cho, Amir Zadeh et al.
Search and Refine During Think: Facilitating Knowledge Refinement for Improved Retrieval-Augmented Reasoning
Yaorui Shi, Sihang Li, Chang Wu et al.
HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs
Saleh Ashkboos, Mahdi Nikdan, Rush Tabesh et al.
A Stable Whitening Optimizer for Efficient Neural Network Training
Kevin Frans, Sergey Levine, Pieter Abbeel
4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time
Ziqiao Ma, Xuweiyi Chen, Shoubin Yu et al.
Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers
Zhengliang Shi, Lingyong Yan, Dawei Yin et al.
Prediction-Powered Causal Inferences
Riccardo Cadei, Ilker Demirel, Piersilvio De Bartolomeis et al.
Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws
Zhixuan Pan, Shaowen Wang, Liao Pengfei et al.
Attention Mechanism, Max-Affine Partition, and Universal Approximation
Hude Liu, Jerry Yao-Chieh Hu, Zhao Song et al.
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
Daniel Kunin, Giovanni Luca Marchetti, Feng Chen et al.
Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation
Liliang Ren, Congcong Chen, Haoran Xu et al.
Strategyproof Reinforcement Learning from Human Feedback
Thomas Kleine Buening, Jiarui Gan, Debmalya Mandal et al.
Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs
Hao Kang, Qingru Zhang, Han Cai et al.
HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning
Zhi Jing, Siyuan Yang, Jicong Ao et al.
AOR: Anatomical Ontology-Guided Reasoning for Medical Large Multimodal Model in Chest X-Ray Interpretation
Qingqiu Li, Zihang Cui, Seongsu Bae et al.
Audio Super-Resolution with Latent Bridge Models
Chang Li, Zehua Chen, Liyuan Wang et al.
Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs
Jingyao Wang, Wenwen Qiang, Zeen Song et al.
BiggerGait: Unlocking Gait Recognition with Layer-wise Representations from Large Vision Models
Dingqiang Ye, Chao Fan, Zhanbo Huang et al.
OpenHOI: Open-World Hand-Object Interaction Synthesis with Multimodal Large Language Model
Zhenhao Zhang, Ye Shi, Lingxiao Yang et al.
Language Models Can Predict Their Own Behavior
Dhananjay Ashok, Jonathan May
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
Jingjing Jiang, Chongjie Si, Jun Luo et al.
VideoVLA: Video Generators Can Be Generalizable Robot Manipulators
Yichao Shen, Fangyun Wei, Zhiying Du et al.
Orthogonal Survival Learners for Estimating Heterogeneous Treatment Effects from Time-to-Event Data
Dennis Frauen, Maresa Schröder, Konstantin Hess et al.
AI-Generated Video Detection via Perceptual Straightening
Christian Internò, Robert Geirhos, Markus Olhofer et al.
Better Language Model Inversion by Compactly Representing Next-Token Distributions
Murtaza Nazir, Matthew Finlayson, John Morris et al.
CellVerse: Do Large Language Models Really Understand Cell Biology?
Fan Zhang, Tianyu Liu, Zhihong Zhu et al.
Modeling Microenvironment Trajectories on Spatial Transcriptomics with NicheFlow
Kristiyan Sakalyan, Alessandro Palma, Filippo Guerranti et al.
Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation
Wenbo Zhang, Tianrun Hu, Hanbo Zhang et al.
Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs
Yifan Shen, Yuanzhe Liu, Jingyuan Zhu et al.
Entropic Time Schedulers for Generative Diffusion Models
Dejan Stancevic, Florian Handke, Luca Ambrogioni
Transition Matching: Scalable and Flexible Generative Modeling
Neta Shaul, Uriel Singer, Itai Gat et al.
Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling
Mónika Farsang, Radu Grosu
Capturing Individual Human Preferences with Reward Features
Andre Barreto, Vincent Dumoulin, Yiran Mao et al.
AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?
Ori Press, Brandon Amos, Haoyu Zhao et al.
A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning
Zhi Zhou, Tan Yuhao, Zenan Li et al.
Efficient Data Selection at Scale via Influence Distillation
Mahdi Nikdan, Vincent Cohen-Addad, Dan Alistarh et al.
ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning
Ruiyang Zhou, Shuozhe Li, Amy Zhang et al.
Learning Diffusion Models with Flexible Representation Guidance
Chenyu Wang, Cai Zhou, Sharut Gupta et al.
On the Edge of Memorization in Diffusion Models
Sam Buchanan, Druv Pai, Yi Ma et al.
InstructSAM: A Training-free Framework for Instruction-Oriented Remote Sensing Object Recognition
Yijie Zheng, Weijie Wu, Qingyun Li et al.
Flow Matching-Based Autonomous Driving Planning with Advanced Interactive Behavior Modeling
Tianyi Tan, Yinan Zheng, Ruiming Liang et al.
Head Pursuit: Probing Attention Specialization in Multimodal Transformers
Lorenzo Basile, Valentino Maiorca, Diego Doimo et al.
Glocal Information Bottleneck for Time Series Imputation
Jie Yang, Kexin Zhang, Guibin Zhang et al.
OSVI-WM: One-Shot Visual Imitation for Unseen Tasks using World-Model-Guided Trajectory Generation
Raktim Goswami, Prashanth Krishnamurthy, Yann LeCun et al.
Revisiting End-to-End Learning with Slide-level Supervision in Computational Pathology
Wenhao Tang, Rong Qin, Heng Fang et al.
BEDLAM2.0: Synthetic humans and cameras in motion
Joachim Tesch, Giorgio Becherini, Prerana Achar et al.
NeurIPT: Foundation Model for Neural Interfaces
Zitao Fang, Chenxuan Li, Hongting Zhou et al.
Systematic Reward Gap Optimization for Mitigating VLM Hallucinations
Lehan He, Zeren Chen, Zhelun Shi et al.
Multiplayer Federated Learning: Reaching Equilibrium with Less Communication
TaeHo Yoon, Sayantan Choudhury, Nicolas Loizou
MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks
Sanjoy Chowdhury, Mohamed Elmoghany, Yohan Abeysinghe et al.
Exploring Diffusion Transformer Designs via Grafting
Keshigeyan Chandrasegaran, Michael Poli, Dan Fu et al.
Refusal Direction is Universal Across Safety-Aligned Languages
Xinpeng Wang, Mingyang Wang, Yihong Liu et al.
Vid-SME: Membership Inference Attacks against Large Video Understanding Models
Qi Li, Runpeng Yu, Xinchao Wang
MEGADance: Mixture-of-Experts Architecture for Genre-Aware 3D Dance Generation
kaixing yang, Xulong Tang, Ziqiao Peng et al.
DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis?
Tianhong Zhou, xu yin, Yingtao Zhu et al.
Learning Orthogonal Multi-Index Models: A Fine-Grained Information Exponent Analysis
Yunwei Ren, Jason Lee
Functional Scaling Laws in Kernel Regression: Loss Dynamics and Learning Rate Schedules
Binghui Li, Fengling Chen, Zixun Huang et al.
ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding
Muye Huang, Lingling Zhang, Jie Ma et al.
Treatment Effect Estimation for Optimal Decision-Making
Dennis Frauen, Valentyn Melnychuk, Jonas Schweisthal et al.
Block-Biased Mamba for Long-Range Sequence Processing
Annan Yu, N. Benjamin Erichson
Brain-like Variational Inference
Hadi Vafaii, Dekel Galor, Jacob Yates
Graph Data Selection for Domain Adaptation: A Model-Free Approach
Ting-Wei Li, Ruizhong Qiu, Hanghang Tong
MedMax: Mixed-Modal Instruction Tuning for Training Biomedical Assistants
Hritik Bansal, Daniel Israel, Siyan Zhao et al.
Tight Lower Bounds and Improved Convergence in Performative Prediction
Pedram Khorsandi, Rushil Gupta, Mehrnaz Mofakhami et al.
The Structural Complexity of Matrix-Vector Multiplication
Emile Anand, Jan van den Brand, Rose McCarty
FALCON: An ML Framework for Fully Automated Layout-Constrained Analog Circuit Design
Asal Mehradfar, Xuzhe Zhao, Yilun Huang et al.
Rethinking Neural Combinatorial Optimization for Vehicle Routing Problems with Different Constraint Tightness Degrees
Fu Luo, Yaoxin Wu, Zhi Zheng et al.
A solvable model of learning generative diffusion: theory and insights
Hugo Cui, Cengiz Pehlevan, Yue Lu
Linear Mixture Distributionally Robust Markov Decision Processes
Zhishuai Liu, Pan Xu
Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning
Emile Anand, Ishani Karmarkar, Guannan Qu
Learning to Integrate Diffusion ODEs by Averaging the Derivatives
Wenze Liu, Xiangyu Yue
The Nuclear Route: Sharp Asymptotics of ERM in Overparameterized Quadratic Networks
Vittorio Erba, Emanuele Troiani, Lenka Zdeborová et al.
Towards Understanding the Mechanisms of Classifier-Free Guidance
Xiang Li, Rongrong Wang, Qing Qu
Watermarking Autoregressive Image Generation
Nikola Jovanović, Ismail Labiad, Tomas Soucek et al.
MIRA: Medical Time Series Foundation Model for Real-World Health Data
Hao Li, Bowen Deng, Chang Xu et al.
Flexible MOF Generation with Torsion-Aware Flow Matching
Nayoung Kim, Seongsu Kim, Sungsoo Ahn
Neurosymbolic Diffusion Models
Emile van Krieken, Pasquale Minervini, Edoardo Maria Ponti et al.
Native-Resolution Image Synthesis
ZiDong Wang, LEI BAI, Xiangyu Yue et al.
Why Do Some Language Models Fake Alignment While Others Don't?
Abhay Sheshadri, John Hughes, Julian Michael et al.
Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
Qiong Wu, Wenhao Lin, Yiyi Zhou et al.
Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs
Kejia Zhang, Keda TAO, Jiasheng Tang et al.
Transformers Provably Learn Chain-of-Thought Reasoning with Length Generalization
Yu Huang, Zixin Wen, Aarti Singh et al.
Predicting Empirical AI Research Outcomes with Language Models
Jiaxin Wen, Chenglei Si, Yueh-Han Chen et al.