Most Cited 2025 "diffusion bridge mechanism" Papers
22,274 papers found • Page 15 of 112
Conference
MambaIC: State Space Models for High-Performance Learned Image Compression
Fanhu Zeng, Hao Tang, Yihua Shao et al.
Text2midi: Generating Symbolic Music from Captions
Keshav Bhandari, Abhinaba Roy, Kyra Wang et al.
RefactorBench: Evaluating Stateful Reasoning in Language Agents Through Code
Dhruv Gautam, Spandan Garg, Jinu Jang et al.
Reasoning Limitations of Multimodal Large Language Models. A case study of Bongard Problems
Mikołaj Małkiński, Szymon Pawlonka, Jacek Mańdziuk
EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding
Yuqi Wu, Wenzhao Zheng, Sicheng Zuo et al.
Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model
Tudor Cebere, Aurélien Bellet, Nicolas Papernot
Inference-Time Hyper-Scaling with KV Cache Compression
Adrian Łańcucki, Konrad Staniszewski, Piotr Nawrot et al.
TimePFN: Effective Multivariate Time Series Forecasting with Synthetic Data
Ege Onur Taga, Muhammed Emrullah Ildiz, Samet Oymak
CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers
Dimitrios Mallis, Ahmet Karadeniz, Sebastian Cavada et al.
A CLIP-Powered Framework for Robust and Generalizable Data Selection
Suorong Yang, Peng Ye, Wanli Ouyang et al.
Learning Precise Affordances from Egocentric Videos for Robotic Manipulation
Li, Nikolaos Tsagkas, Jifei Song et al.
EnvGS: Modeling View-Dependent Appearance with Environment Gaussian
Tao Xie, Xi Chen, Zhen Xu et al.
Debate or Vote: Which Yields Better Decisions in Multi-Agent Large Language Models?
Hyeong Kyu Choi, Jerry Zhu, Sharon Li
AutoBencher: Towards Declarative Benchmark Construction
XIANG LI, Farzaan Kaiyom, Evan Liu et al.
Optimal Transport for Time Series Imputation
Hao Wang, zhengnan li, Haoxuan Li et al.
(How) Do Language Models Track State?
Belinda Li, Carl Guo, Jacob Andreas
DeLLMa: Decision Making Under Uncertainty with Large Language Models
Ollie Liu, Deqing Fu, Dani Yogatama et al.
Personalized Federated Collaborative Filtering: A Variational AutoEncoder Approach
Zhiwei Li, Guodong Long, Tianyi Zhou et al.
Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity
Shuo Xie, Mohamad Amin Mohamadi, Zhiyuan Li
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Shoubin Yu, Jaehong Yoon, Mohit Bansal
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Songjun Tu, Jiahao Lin, Xiangyu Tian et al.
Idiosyncrasies in Large Language Models
Mingjie Sun, Yida Yin, Zhiqiu (Oscar) Xu et al.
UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior
I-Hsiang Chen, Wei-Ting Chen, Yu-Wei Liu et al.
U-Nets as Belief Propagation: Efficient Classification, Denoising, and Diffusion in Generative Hierarchical Models
Song Mei
DBLoss: Decomposition-based Loss Function for Time Series Forecasting
Xiangfei Qiu, Xingjian Wu, Hanyin Cheng et al.
Emergence and scaling laws in SGD learning of shallow neural networks
Yunwei Ren, Eshaan Nichani, Denny Wu et al.
An Engorgio Prompt Makes Large Language Model Babble on
Jianshuo Dong, Ziyuan Zhang, Qingjie Zhang et al.
Do Vision-Language Models Really Understand Visual Language?
Yifan Hou, Buse Giledereli, Yilei Tu et al.
JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation
Yao Yao, Peike Li, Boyu Chen et al.
Scalable Discrete Diffusion Samplers: Combinatorial Optimization and Statistical Physics
Sebastian Sanokowski, Wilhelm Berghammer, Haoyu Wang et al.
EpiCoder: Encompassing Diversity and Complexity in Code Generation
Yaoxiang Wang, Haoling Li, Xin Zhang et al.
MPQ-DM: Mixed Precision Quantization for Extremely Low Bit Diffusion Models
Weilun Feng, Haotong Qin, Chuanguang Yang et al.
V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
Junqi Ge, Ziyi Chen, Jintao Lin et al.
PRAGA: Prototype-aware Graph Adaptive Aggregation for Spatial Multi-modal Omics Analysis
Xinlei Huang, Zhiqi Ma, Dian Meng et al.
Learning Interpretable Hierarchical Dynamical Systems Models from Time Series Data
Manuel Brenner, Elias Weber, Georgia Koppe et al.
Can We Talk Models Into Seeing the World Differently?
Paul Gavrikov, Jovita Lukasik, Steffen Jung et al.
The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense
Yangyang Guo, Fangkai Jiao, Liqiang Nie et al.
A Probabilistic Perspective on Unlearning and Alignment for Large Language Models
Yan Scholten, Stephan Günnemann, Leo Schwinn
Magic Insert: Style-Aware Drag-and-Drop
Nataniel Ruiz, Yuanzhen Li, Neal Wadhwa et al.
Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning
Qinghao Ye, Xianhan Zeng, Fu Li et al.
Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training
Brian Bartoldson, Siddarth Venkatraman, James Diffenderfer et al.
TULIP: Token-length Upgraded CLIP
Ivona Najdenkoska, Mohammad Mahdi Derakhshani, Yuki Asano et al.
Force Prompting: Video Generation Models Can Learn And Generalize Physics-based Control Signals
Nate Gillman, Charles Herrmann, Michael Freeman et al.
Physics-Constrained Flow Matching: Sampling Generative Models with Hard Constraints
Utkarsh Utkarsh, Pengfei Cai, Alan Edelman et al.
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
Yue Yang, Shuibo Zhang, Kaipeng Zhang et al.
Quamba: A Post-Training Quantization Recipe for Selective State Space Models
Hung-Yueh Chiang, Chi-Chih Chang, Natalia Frumkin et al.
PrEditor3D: Fast and Precise 3D Shape Editing
Ziya Erkoc, Can Gümeli, Chaoyang Wang et al.
Omni-ID: Holistic Identity Representation Designed for Generative Tasks
Guocheng Qian, Kuan-Chieh Wang, Or Patashnik et al.
Power Lines: Scaling laws for weight decay and batch size in LLM pre-training
Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.
SAEs Can Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs
Aashiq Muhamed, Jacopo Bonato, Mona T. Diab et al.
Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs
Sreyan Ghosh, Chandra Kiran Evuru, Sonal Kumar et al.
Long-Form Speech Generation with Spoken Language Models
Se Jin Park, Julian Salazar, Aren Jansen et al.
Large Images Are Gaussians: High-Quality Large Image Representation with Levels of 2D Gaussian Splatting
Lingting Zhu, Guying Lin, Jinnan Chen et al.
Optimizing Temperature for Language Models with Multi-Sample Inference
Weihua Du, Yiming Yang, Sean Welleck
xLSTM-Mixer: Multivariate Time Series Forecasting by Mixing via Scalar Memories
Maurice Kraus, Felix Divo, Devendra Singh Dhami et al.
A Simple Data Augmentation for Feature Distribution Skewed Federated Learning
Yunlu Yan, Huazhu Fu, Yuexiang Li et al.
Cross-Entropy Is All You Need To Invert the Data Generating Process
Patrik Reizinger, Alice Bizeul, Attila Juhos et al.
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Theodoros Kouzelis, Efstathios Karypidis, Ioannis Kakogeorgiou et al.
A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules
Kairong Luo, Haodong Wen, Shengding Hu et al.
GraphMoRE: Mitigating Topological Heterogeneity via Mixture of Riemannian Experts
Zihao Guo, Qingyun Sun, Haonan Yuan et al.
AllTracker: Efficient Dense Point Tracking at High Resolution
Adam Harley, Yang You, Yang Zheng et al.
Emergence in non-neural models: grokking modular arithmetic via average gradient outer product
Neil Mallinar, Daniel Beaglehole, Libin Zhu et al.
Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance
Linxi Zhao, Yihe Deng, Weitong Zhang et al.
Closed-Form Merging of Parameter-Efficient Modules for Federated Continual Learning
Riccardo Salami, Pietro Buzzega, Matteo Mosconi et al.
AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment
Yan Li, Yifei Xing, Xiangyuan Lan et al.
DexVLG: Dexterous Vision-Language-Grasp Model at Scale
Jiawei He, Danshi Li, Xinqiang Yu et al.
UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer
Haoxuan Wang, Jinlong Peng, Qingdong He et al.
Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach
Jiancong Xiao, Bojian Hou, Zhanliang Wang et al.
Swift4D: Adaptive divide-and-conquer Gaussian Splatting for compact and efficient reconstruction of dynamic scene
Jiahao Wu, Rui Peng, Zhiyan Wang et al.
VITA-Audio: Fast Interleaved Audio-Text Token Generation for Efficient Large Speech-Language Model
Zuwei Long, Yunhang Shen, Chaoyou Fu et al.
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Mutimodal Models
Xingrui Wang, Wufei Ma, Tiezheng Zhang et al.
Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation
Zhuoman Liu, Weicai Ye, Yan Luximon et al.
FOLDER: Accelerating Multi-Modal Large Language Models with Enhanced Performance
Haicheng Wang, Zhemeng Yu, Gabriele Spadaro et al.
ThinkBot: Embodied Instruction Following with Thought Chain Reasoning
Guanxing Lu, Ziwei Wang, Changliu Liu et al.
Learning 3D Persistent Embodied World Models
Siyuan Zhou, Yilun Du, Yuncong Yang et al.
Learning Clustering-based Prototypes for Compositional Zero-Shot Learning
Hongyu Qu, Jianan Wei, Xiangbo Shu et al.
Copilot Arena: A Platform for Code LLM Evaluation in the Wild
Wayne Chi, Valerie Chen, Anastasios Angelopoulos et al.
Accelerated Diffusion Models via Speculative Sampling
Valentin De Bortoli, Alexandre Galashov, Arthur Gretton et al.
Block-Attention for Efficient Prefilling
Dongyang Ma, Yan Wang, Tian Lan
CaDA: Cross-Problem Routing Solver with Constraint-Aware Dual-Attention
Han Li, Fei Liu, Zhi Zheng et al.
Cross-modulated Attention Transformer for RGBT Tracking
Yun Xiao, Jiacong Zhao, Andong Lu et al.
VideoDirector: Precise Video Editing via Text-to-Video Models
Yukun Wang, Longguang Wang, Zhiyuan Ma et al.
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
Mingze Xu, Mingfei Gao, Shiyu Li et al.
Controllable Context Sensitivity and the Knob Behind It
Julian Minder, Kevin Du, Niklas Stoehr et al.
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation
Junyuan Zhang, Qintong Zhang, Bin Wang et al.
Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning
Jaehun Jung, Seungju Han, Ximing Lu et al.
Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration
Ran Xu, Wenqi Shi, Yuchen Zhuang et al.
Visual Thoughts: A Unified Perspective of Understanding Multimodal Chain-of-Thought
ZIhui Cheng, Qiguang Chen, Xiao Xu et al.
GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation
Hongyin Zhang, Pengxiang Ding, Shangke Lyu et al.
Learning Efficient Positional Encodings with Graph Neural Networks
Charilaos Kanatsoulis, Evelyn Choi, Stefanie Jegelka et al.
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
Kaiyue Wen, Huaqing Zhang, Hongzhou Lin et al.
Training Neural Networks as Recognizers of Formal Languages
Alexandra Butoi, Ghazal Khalighinejad, Anej Svete et al.
CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph
Haitao Lin, Guojiang Zhao, Odin Zhang et al.
Spiking Vision Transformer with Saccadic Attention
Shuai Wang, Malu Zhang, Dehao Zhang et al.
GraphMaster: Automated Graph Synthesis via LLM Agents in Data-Limited Environments
Enjun Du, Xunkai Li, Tian Jin et al.
Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards
Charles Arnal, Gaëtan Narozniak, Vivien Cabannes et al.
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention
Dejia Xu, Yifan Jiang, Chen Huang et al.
DoomArena: A framework for Testing AI Agents Against Evolving Security Threats
Léo Boisvert, Abhay Puri, Gabriel Huang et al.
DreamOmni: Unified Image Generation and Editing
Bin Xia, Yuechen Zhang, Jingyao Li et al.
CNC: Cross-modal Normality Constraint for Unsupervised Multi-class Anomaly Detection
Xiaolei Wang, Xiaoyang Wang, Huihui Bai et al.
SiReRAG: Indexing Similar and Related Information for Multihop Reasoning
Nan Zhang, Prafulla Kumar Choubey, Alexander Fabbri et al.
DynaSaur: Large Language Agents Beyond Predefined Actions
Dang Nguyen, Viet Dac Lai, Seunghyun Yoon et al.
LeVo: High-Quality Song Generation with Multi-Preference Alignment
Shun Lei, Yaoxun XU, ZhiweiLin et al.
Task Vectors in In-Context Learning: Emergence, Formation, and Benefits
Liu Yang, Ziqian Lin, Kangwook Lee et al.
AffordDexGrasp: Open-set Language-guided Dexterous Grasp with Generalizable-Instructive Affordance
Yilin Wei, Mu Lin, Yuhao Lin et al.
ET-SEED: EFFICIENT TRAJECTORY-LEVEL SE(3) EQUIVARIANT DIFFUSION POLICY
Chenrui Tie, Yue Chen, Ruihai Wu et al.
HEROS-GAN: Honed-Energy Regularized and Optimal Supervised GAN for Enhancing Accuracy and Range of Low-Cost Accelerometers
Yifeng Wang, Yi Zhao
Logically Consistent Language Models via Neuro-Symbolic Integration
Diego Calanzone, Stefano Teso, Antonio Vergari
MLLM-as-a-Judge for Image Safety without Human Labeling
Zhenting Wang, Shuming Hu, Shiyu Zhao et al.
A Many-Objective Problem Where Crossover Is Provably Indispensable
Andre Opris
COME: Test-time Adaption by Conservatively Minimizing Entropy
Qingyang Zhang, Yatao Bian, Xinke Kong et al.
FrugalNeRF: Fast Convergence for Extreme Few-shot Novel View Synthesis without Learned Priors
Chin-Yang Lin, Chung-Ho Wu, Changhan Yeh et al.
MonoInstance: Enhancing Monocular Priors via Multi-view Instance Alignment for Neural Rendering and Reconstruction
Wenyuan Zhang, Yixiao Yang, Han Huang et al.
SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces
Sumit Chaturvedi, Mengwei Ren, Yannick Hold-Geoffroy et al.
AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration
Andy Zhou, Kevin Wu, Francesco Pinto et al.
Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction
Shiyu Zhao, Zhenting Wang, Felix Juefei-Xu et al.
Prior-guided Hierarchical Harmonization Network for Efficient Image Dehazing
Xiongfei Su, Siyuan Li, Yuning Cui et al.
An Empirical Analysis of Uncertainty in Large Language Model Evaluations
Qiujie Xie, Qingqiu Li, Zhuohao Yu et al.
Speeding Up the NSGA-II with a Simple Tie-Breaking Rule
Benjamin Doerr, Tudor Ivan, Martin S. Krejca
SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks
Hwiwon Lee, Ziqi Zhang, Hanxiao Lu et al.
LLMs Can Plan Only If We Tell Them
Bilgehan Sel, Ruoxi Jia, Ming Jin
Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems
Maksim Zhdanov, Max Welling, Jan-Willem van de Meent
IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis
Yuji Wang, Jingchen Ni, Yong Liu et al.
Improved Bounds for Online Facility Location with Predictions
Dimitris Fotakis, Evangelia Gergatsouli, Themistoklis Gouleakis et al.
Black-Box Detection of Language Model Watermarks
Thibaud Gloaguen, Nikola Jovanović, Robin Staab et al.
Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Dynamic Scenes
Isabella Liu, Hao Su, Xiaolong Wang
Visual Autoregressive Modeling for Image Super-Resolution
Yunpeng Qu, Kun Yuan, Jinhua Hao et al.
Can In-context Learning Really Generalize to Out-of-distribution Tasks?
Qixun Wang, Yifei Wang, Xianghua Ying et al.
SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation
Hongjian Liu, Qingsong Xie, Tianxiang Ye et al.
RSAR: Restricted State Angle Resolver and Rotated SAR Benchmark
Xin Zhang, Xue Yang, Yuxuan Li et al.
Is Factuality Enhancement a Free Lunch For LLMs? Better Factuality Can Lead to Worse Context-Faithfulness
Baolong Bi, Shenghua Liu, Yiwei Wang et al.
LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid
Tianyi Zhang, Anshumali Shrivastava
Improving LLM General Preference Alignment via Optimistic Online Mirror Descent
Yuheng Zhang, Dian Yu, Tao Ge et al.
InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding
Minsoo Kim, Kyuhong Shim, Jungwook Choi et al.
ThinkSound: Chain-of-Thought Reasoning in Multimodal LLMs for Audio Generation and Editing
Huadai Liu, Kaicheng Luo, Jialei Wang et al.
Action-Minimization Meets Generative Modeling: Efficient Transition Path Sampling with the Onsager-Machlup Functional
Sanjeev Raja, Martin Šípka, Michael Psenka et al.
Diff2Flow: Training Flow Matching Models via Diffusion Model Alignment
Johannes Schusterbauer, Ming Gui, Frank Fundel et al.
EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos
Jilan Xu, Yifei Huang, Baoqi Pei et al.
Patch-level Sounding Object Tracking for Audio-Visual Question Answering
Zhangbin Li, Jinxing Zhou, Jing Zhang et al.
Quantization without Tears
Minghao Fu, Hao Yu, Jie Shao et al.
Track-On: Transformer-based Online Point Tracking with Memory
Görkay Aydemir, Xiongyi Cai, Weidi Xie et al.
Structured Packing in LLM Training Improves Long Context Utilization
Konrad Staniszewski, Szymon Tworkowski, Sebastian Jaszczur et al.
DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling
Xin Xie, Dong Gong
Where am I? Cross-View Geo-localization with Natural Language Descriptions
Junyan Ye, Honglin Lin, Leyan Ou et al.
Expressive Power of Graph Neural Networks for (Mixed-Integer) Quadratic Programs
Ziang Chen, Xiaohan Chen, Jialin Liu et al.
Uncertainty-guided Perturbation for Image Super-Resolution Diffusion Model
Leheng Zhang, Weiyi You, Kexuan Shi et al.
3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds
Hengshuo Chu, Xiang Deng, Qi Lv et al.
Reversible Decoupling Network for Single Image Reflection Removal
Hao Zhao, Mingjia Li, Qiming Hu et al.
Refine Knowledge of Large Language Models via Adaptive Contrastive Learning
Yinghui Li, Haojing Huang, Jiayi Kuang et al.
On Calibration of LLM-based Guard Models for Reliable Content Moderation
Hongfu Liu, Hengguan Huang, Xiangming Gu et al.
BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models
Yu Feng, Ben Zhou, Weidong Lin et al.
LLM Unlearning via Neural Activation Redirection
William Shen, Xinchi Qiu, Meghdad Kurmanji et al.
GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models
Mianchu Wang, Rui Yang, Xi Chen et al.
Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance
Dongmin Park, Sebin Kim, Taehong Moon et al.
WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions
Zizhang Li, Hong-Xing Yu, Wei Liu et al.
Movie Weaver: Tuning-Free Multi-Concept Video Personalization with Anchored Prompts
Feng Liang, Haoyu Ma, Zecheng He et al.
EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion
Haotian Wang, Yuzhe Weng, Yueyan Li et al.
MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization
Yougang Lyu, Lingyong Yan, Zihan Wang et al.
GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling
Pinxin Liu, Luchuan Song, Junhua Huang et al.
The Same but Different: Structural Similarities and Differences in Multilingual Language Modeling
Ruochen Zhang, Qinan Yu, Matianyu Zang et al.
Adapting Multi-modal Large Language Model to Concept Drift From Pre-training Onwards
Xiaoyu Yang, Jie Lu, En Yu
Presto! Distilling Steps and Layers for Accelerating Music Generation
Zachary Novack, Ge Zhu, Jonah Casebeer et al.
Memory Injection Attacks on LLM Agents via Query-Only Interaction
Shen Dong, Shaochen Xu, Pengfei He et al.
EMOS: Embodiment-aware Heterogeneous Multi-robot Operating System with LLM Agents
Junting Chen, Checheng Yu, Xunzhe Zhou et al.
DRoC: Elevating Large Language Models for Complex Vehicle Routing via Decomposed Retrieval of Constraints
Xia Jiang, Yaoxin Wu, Chenhao Zhang et al.
DreamPhysics: Learning Physics-Based 3D Dynamics with Video Diffusion Priors
Tianyu Huang, Haoze Zhang, Yihan Zeng et al.
FlowDec: A flow-based full-band general audio codec with high perceptual quality
Simon Welker, Matthew Le, Ricky T. Q. Chen et al.
Security Attacks on LLM-based Code Completion Tools
Wen Cheng, Ke Sun, Xinyu Zhang et al.
MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors
Fanqi Pu, Yifan Wang, Jiru Deng et al.
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Julie Kallini, Shikhar Murty, Christopher Manning et al.
VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception
Ziang Yan, Yinan He, Xinhao Li et al.
OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting
Yongsheng Yu, Ziyun Zeng, Haitian Zheng et al.
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization
Jui-Nan Yen, Si Si, Zhao Meng et al.
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning
Xiao Liang, Zhong-Zhi Li, Yeyun Gong et al.
Unified Parameter-Efficient Unlearning for LLMs
Chenlu Ding, Jiancan Wu, Yancheng Yuan et al.
How Much is a Noisy Image Worth? Data Scaling Laws for Ambient Diffusion.
Giannis Daras, Yeshwanth Cherapanamjeri, Constantinos C Daskalakis
A Unified Comparative Study with Generalized Conformity Scores for Multi-Output Conformal Regression
Victor Dheur, Matteo Fontana, Yorick Estievenart et al.
Signature Kernel Conditional Independence Tests in Causal Discovery for Stochastic Processes
Georg Manten, Cecilia Casolo, Emilio Ferrucci et al.
Federated Unlearning with Gradient Descent and Conflict Mitigation
Zibin Pan, Zhichao Wang, Chi Li et al.
Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition
Jiyeon Kim, Hyunji Lee, Hyowon Cho et al.
MOS: Model Surgery for Pre-Trained Model-Based Class-Incremental Learning
Hai-Long Sun, Da-Wei Zhou, Hanbin Zhao et al.
Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations
Ji-An Li, Huadong Xiong, Robert Wilson et al.
Let LRMs Break Free from Overthinking via Self-Braking Tuning
Haoran Zhao, Yuchen Yan, Yongliang Shen et al.
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
Jiuhai Chen, Jianwei Yang, Haiping Wu et al.
HGSFusion: Radar-Camera Fusion with Hybrid Generation and Synchronization for 3D Object Detection
Zijian Gu, Jianwei Ma, Yan Huang et al.
UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping
Wenbo Wang, Fangyun Wei, Lei Zhou et al.
GGS: Generalizable Gaussian Splatting for Lane Switching in Autonomous Driving
Huasong Han, Kaixuan Zhou, Xiaoxiao Long et al.
Breaking the Batch Barrier (B3) of Contrastive Learning via Smart Batch Mining
Raghuveer Thirukovalluru, Rui Meng, Ye Liu et al.
CARES: Comprehensive Evaluation of Safety and Adversarial Robustness in Medical LLMs
Sijia Chen, Xiaomin Li, mengxue zhang et al.
Synthetic-to-Real Self-supervised Robust Depth Estimation via Learning with Motion and Structure Priors
Weilong Yan, Ming Li, Li Haipeng et al.
Provably Accurate Shapley Value Estimation via Leverage Score Sampling
Christopher Musco, R. Teal Witter
ContextGNN: Beyond Two-Tower Recommendation Systems
Yiwen Yuan, Zecheng Zhang, Xinwei He et al.
ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object
Zhe Shan, Yang Liu, Lei Zhou et al.
Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic Learning
Haque Ishfaq, Guangyuan Wang, Sami Islam et al.
Probabilistic Language-Image Pre-Training
Sanghyuk Chun, Wonjae Kim, Song Park et al.
VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Yunlong Tang, JunJia Guo, Hang Hua et al.
Re-Thinking Inverse Graphics With Large Language Models
Haiwen Feng, Michael J Black, Weiyang Liu et al.
Motion Prior Knowledge Learning with Homogeneous Language Descriptions for Moving Infrared Small Target Detection
Shengjia Chen, Luping Ji, Weiwei Duan et al.
Mimir: Improving Video Diffusion Models for Precise Text Understanding
Shuai Tan, Biao Gong, Yutong Feng et al.
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion
Marco Mistretta, Alberto Baldrati, Lorenzo Agnolucci et al.