Most Cited 2025 "stochastic multi-agent systems" Papers

22,274 papers found • Page 16 of 112

#3001

Provable Convergence and Limitations of Geometric Tempering for Langevin Dynamics

Omar Chehab, Anna Korba, Austin Stromme et al.

ICLR 2025arXiv:2410.09697
9
citations
#3002

On Temperature Scaling and Conformal Prediction of Deep Classifiers

Lahav Dabah, Tom Tirer

ICML 2025arXiv:2402.05806
9
citations
#3003

On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent

Bingrui Li, Wei Huang, Andi Han et al.

ICLR 2025arXiv:2410.04870
9
citations
#3004

SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments

Yue Cao, Yun Xing, Jie Zhang et al.

CVPR 2025arXiv:2412.00114
9
citations
#3005

FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation

Dong Zhao, Jinlong Li, Shuang Wang et al.

CVPR 2025arXiv:2503.17940
9
citations
#3006

WeatherGFM: Learning a Weather Generalist Foundation Model via In-context Learning

Xiangyu Zhao, Zhiwang Zhou, Wenlong Zhang et al.

ICLR 2025oralarXiv:2411.05420
9
citations
#3007

Pre-Training Graph Neural Networks on Molecules by Using Subgraph-Conditioned Graph Information Bottleneck

Van Thuy Hoang, O-Joun Lee

AAAI 2025paperarXiv:2412.15589
9
citations
#3008

Self-attention-based Diffusion Model for Time-series Imputation in Partial Blackout Scenarios

Mohammad Rafid Ul Islam, Prasad Tadepalli, Alan Fern

AAAI 2025paperarXiv:2503.01737
9
citations
#3009

Scaling Laws for Differentially Private Language Models

Ryan McKenna, Yangsibo Huang, Amer Sinha et al.

ICML 2025arXiv:2501.18914
9
citations
#3010

Monet: Mixture of Monosemantic Experts for Transformers

Jungwoo Park, Young Jin Ahn, Kee-Eung Kim et al.

ICLR 2025arXiv:2412.04139
9
citations
#3011

(Mis)Fitting Scaling Laws: A Survey of Scaling Law Fitting Techniques in Deep Learning

Margaret Li, Sneha Kudugunta, Luke Zettlemoyer

ICLR 2025
9
citations
#3012

$\gamma-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models

Yaxin Luo, Gen Luo, Jiayi Ji et al.

ICLR 2025
9
citations
#3013

I Can Hear You: Selective Robust Training for Deepfake Audio Detection

Zirui Zhang, Wei Hao, Aroon Sankoh et al.

ICLR 2025arXiv:2411.00121
9
citations
#3014

Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval

Arun Reddy, Alexander Martin, Eugene Yang et al.

CVPR 2025arXiv:2503.19009
9
citations
#3015

Repo2Run: Automated Building Executable Environment for Code Repository at Scale

Ruida Hu, Chao Peng, XinchenWang et al.

NEURIPS 2025spotlightarXiv:2502.13681
9
citations
#3016

OS-ATLAS: Foundation Action Model for Generalist GUI Agents

Zhiyong Wu, Zhenyu Wu, Fangzhi Xu et al.

ICLR 2025
9
citations
#3017

Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents

Haoyu Wang, Sunhao Dai, Haiyuan Zhao et al.

ICLR 2025arXiv:2503.08684
9
citations
#3018

EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge

Ruskin Raj Manku, Yuzhi Tang, Xingjian Shi et al.

NEURIPS 2025arXiv:2505.23009
9
citations
#3019

Post-pre-training for Modality Alignment in Vision-Language Foundation Models

Shin'ya Yamaguchi, Dewei Feng, Sekitoshi Kanai et al.

CVPR 2025arXiv:2504.12717
9
citations
#3020

Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks

Miran Heo, Min-Hung Chen, De-An Huang et al.

CVPR 2025arXiv:2501.08326
9
citations
#3021

Flow-Based Policy for Online Reinforcement Learning

Lei Lv, Yunfei Li, Yu Luo et al.

NEURIPS 2025arXiv:2506.12811
9
citations
#3022

Breaking Free from MMI: A New Frontier in Rationalization by Probing Input Utilization

Wei Liu, Zhiying Deng, Zhongyu Niu et al.

ICLR 2025arXiv:2503.06202
9
citations
#3023

Random-Set Neural Networks

Shireen Kudukkil Manchingal, Muhammad Mubashar, Kaizheng Wang et al.

ICLR 2025arXiv:2307.05772
9
citations
#3024

Group Distributionally Robust Dataset Distillation with Risk Minimization

Saeed Vahidian, Mingyu Wang, Jianyang Gu et al.

ICLR 2025arXiv:2402.04676
9
citations
#3025

Constrained Fair and Efficient Allocations

Benjamin Cookson, Soroush Ebadian, Nisarg Shah

AAAI 2025paperarXiv:2411.00133
9
citations
#3026

Synthesizing Privacy-Preserving Text Data via Finetuning *without* Finetuning Billion-Scale LLMs

Bowen Tan, Zheng Xu, Eric Xing et al.

ICML 2025arXiv:2503.12347
9
citations
#3027

SuperPC: A Single Diffusion Model for Point Cloud Completion, Upsampling, Denoising, and Colorization

Yi Du, Zhipeng Zhao, Shaoshu Su et al.

CVPR 2025arXiv:2503.14558
9
citations
#3028

From Poses to Identity: Training-Free Person Re-Identification via Feature Centralization

Chao Yuan, Guiwei Zhang, Changxiao Ma et al.

CVPR 2025arXiv:2503.00938
9
citations
#3029

Accurate and Regret-Aware Numerical Problem Solver for Tabular Question Answering

Yuxiang Wang, Jianzhong Qi, Junhao Gan

AAAI 2025paperarXiv:2410.12846
9
citations
#3030

Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation

Gao Peng, Le Zhuo, Dongyang Liu et al.

ICLR 2025oral
9
citations
#3031

K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences

Zhikai Li, Xuewen Liu, Dongrong Joe Fu et al.

CVPR 2025arXiv:2408.14468
9
citations
#3032

ScribbleLight: Single Image Indoor Relighting with Scribbles

Jun Myeong Choi, Annie N. Wang, Pieter Peers et al.

CVPR 2025arXiv:2411.17696
9
citations
#3033

MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking

Haolin Qin, Tingfa Xu, Tianhao Li et al.

CVPR 2025arXiv:2503.17699
9
citations
#3034

BrainUICL: An Unsupervised Individual Continual Learning Framework for EEG Applications

Yangxuan Zhou, Sha Zhao, Jiquan Wang et al.

ICLR 2025
9
citations
#3035

Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?

Simon Park, Abhishek Panigrahi, Yun Cheng et al.

ICML 2025arXiv:2501.02669
9
citations
#3036

PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes

Bin Tan, Rui Yu, Yujun Shen et al.

CVPR 2025highlightarXiv:2412.03451
9
citations
#3037

When Do LLMs Help With Node Classification? A Comprehensive Analysis

Xixi Wu, Yifei Shen, Fangzhou Ge et al.

ICML 2025arXiv:2502.00829
9
citations
#3038

Perspective-Invariant 3D Object Detection

Alan Liang, Lingdong Kong, Dongyue Lu et al.

ICCV 2025arXiv:2507.17665
9
citations
#3039

PreciseCam: Precise Camera Control for Text-to-Image Generation

Edurne Bernal-Berdun, Ana Serrano, Belen Masia et al.

CVPR 2025arXiv:2501.12910
9
citations
#3040

Measuring what Matters: Construct Validity in Large Language Model Benchmarks

Andrew M. Bean, Ryan Othniel Kearns, Angelika Romanou et al.

NEURIPS 2025arXiv:2511.04703
9
citations
#3041

Online Video Understanding: OVBench and VideoChat-Online

Zhenpeng Huang, Xinhao Li, Jiaqi Li et al.

CVPR 2025arXiv:2501.00584
9
citations
#3042

Realistic Evaluation of Deep Partial-Label Learning Algorithms

Wei Wang, Dong-Dong Wu, Jindong Wang et al.

ICLR 2025arXiv:2502.10184
9
citations
#3043

Can Classic GNNs Be Strong Baselines for Graph-level Tasks? Simple Architectures Meet Excellence

Yuankai Luo, Lei Shi, Xiao-Ming Wu

ICML 2025arXiv:2502.09263
9
citations
#3044

All-in-One: Transferring Vision Foundation Models into Stereo Matching

Jingyi Zhou, Haoyu Zhang, Jiakang Yuan et al.

AAAI 2025paperarXiv:2412.09912
9
citations
#3045

HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation

Trong-Thuan Nguyen, Pha Nguyen, Jackson Cothren et al.

CVPR 2025arXiv:2411.18042
9
citations
#3046

Objective drives the consistency of representational similarity across datasets

Laure Ciernik, Lorenz Linhardt, Marco Morik et al.

ICML 2025arXiv:2411.05561
9
citations
#3047

TraF-Align: Trajectory-aware Feature Alignment for Asynchronous Multi-agent Perception

Zhiying Song, Lei Yang, Fuxi Wen et al.

CVPR 2025arXiv:2503.19391
9
citations
#3048

Fast Training of Sinusoidal Neural Fields via Scaling Initialization

Taesun Yeom, Sangyoon Lee, Jaeho Lee

ICLR 2025arXiv:2410.04779
9
citations
#3049

HyperGS: Hyperspectral 3D Gaussian Splatting

Christopher Thirgood, Oscar Mendez, Erin Chao Ling et al.

CVPR 2025arXiv:2412.12849
9
citations
#3050

HybridGS: High-Efficiency Gaussian Splatting Data Compression using Dual-Channel Sparse Representation and Point Cloud Encoder

Qi Yang, Le Yang, Geert Van der Auwera et al.

ICML 2025arXiv:2505.01938
9
citations
#3051

KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation

Antoni Bigata Casademunt, Michał Stypułkowski, Rodrigo Mira et al.

CVPR 2025arXiv:2503.01715
9
citations
#3052

You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning

Ayan Sengupta, Siddhant Chaudhary, Tanmoy Chakraborty

ICLR 2025arXiv:2501.15296
9
citations
#3053

GIFT: Unlocking Full Potential of Labels in Distilled Dataset at Near-zero Cost

Xinyi Shang, Peng Sun, Tao Lin

ICLR 2025arXiv:2405.14736
9
citations
#3054

Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control

Hejia Chen, Haoxian Zhang, Shoulong Zhang et al.

ICLR 2025oralarXiv:2503.14517
9
citations
#3055

PDE-Transformer: Efficient and Versatile Transformers for Physics Simulations

Benjamin Holzschuh, Qiang Liu, Georg Kohl et al.

ICML 2025oralarXiv:2505.24717
9
citations
#3056

AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws

Oren Neumann, Claudius Gros

NEURIPS 2025spotlightarXiv:2412.11979
9
citations
#3057

LeFusion: Controllable Pathology Synthesis via Lesion-Focused Diffusion Models

Hantao Zhang, Yuhe Liu, Jiancheng Yang et al.

ICLR 2025arXiv:2403.14066
9
citations
#3058

IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves

Ruofan Wang, Juncheng Li, Yixu Wang et al.

ICCV 2025arXiv:2411.00827
9
citations
#3059

HiMoR: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation

Yiming Liang, Tianhan Xu, Yuta Kikuchi

CVPR 2025arXiv:2504.06210
9
citations
#3060

Learning Robust Spectral Dynamics for Temporal Domain Generalization

En Yu, Jie Lu, Xiaoyu Yang et al.

NEURIPS 2025oralarXiv:2505.12585
9
citations
#3061

ND-SDF: Learning Normal Deflection Fields for High-Fidelity Indoor Reconstruction

Ziyu Tang, Weicai Ye, Yifan Wang et al.

ICLR 2025arXiv:2408.12598
9
citations
#3062

Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Generation

Zheng Anlin, Xin Wen, Xuanyang Zhang et al.

NEURIPS 2025
9
citations
#3063

Zero-Shot Styled Text Image Generation, but Make It Autoregressive

Vittorio Pippi, Fabio Quattrini, Silvia Cascianelli et al.

CVPR 2025arXiv:2503.17074
9
citations
#3064

Multi-modal brain encoding models for multi-modal stimuli

SUBBA REDDY OOTA, Khushbu Pahwa, mounika marreddy et al.

ICLR 2025arXiv:2505.20027
9
citations
#3065

Multimodal Latent Diffusion Model for Complex Sewing Pattern Generation

Shengqi Liu, Yuhao Cheng, Zhuo Chen et al.

ICCV 2025arXiv:2412.14453
9
citations
#3066

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

Chejian Xu, Jiawei Zhang, Zhaorun Chen et al.

ICLR 2025arXiv:2503.14827
9
citations
#3067

The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?

Denis Sutter, Julian Minder, Thomas Hofmann et al.

NEURIPS 2025spotlightarXiv:2507.08802
9
citations
#3068

MegActor-Sigma: Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer

Shurong Yang, Huadong Li, Juhao Wu et al.

AAAI 2025paper
9
citations
#3069

Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks

Shikai Qiu, Lechao Xiao, Andrew Wilson et al.

ICML 2025oralarXiv:2507.02119
9
citations
#3070

FixTalk: Taming Identity Leakage for High-Quality Talking Head Generation in Extreme Cases

Shuai Tan, Bill Gong, Bin Ji et al.

ICCV 2025arXiv:2507.01390
9
citations
#3071

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

Han Xiao, Guozhi Wang, Yuxiang Chai et al.

NEURIPS 2025arXiv:2505.21496
9
citations
#3072

LITA-GS: Illumination-Agnostic Novel View Synthesis via Reference-Free 3D Gaussian Splatting and Physical Priors

Han Zhou, Wei Dong, Jun Chen

CVPR 2025arXiv:2504.00219
9
citations
#3073

Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences

Nikos Dimitriadis, Pascal Frossard, François Fleuret

ICLR 2025arXiv:2407.08056
9
citations
#3074

Attention-Driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models Without Fine-Tuning

Hai-Ming Xu, Qi Chen, Lei Wang et al.

AAAI 2025paperarXiv:2412.10840
9
citations
#3075

Do Large Language Models Truly Understand Geometric Structures?

Xiaofeng Wang, Yiming Wang, Wenhong Zhu et al.

ICLR 2025arXiv:2501.13773
9
citations
#3076

Knowledge-Aligned Counterfactual-Enhancement Diffusion Perception for Unsupervised Cross-Domain Visual Emotion Recognition

Wen Yin, Yong Wang, Guiduo Duan et al.

CVPR 2025arXiv:2505.19694
9
citations
#3077

StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition

Xin Ding, Hao Wu, Yifan Yang et al.

ICCV 2025arXiv:2503.06220
9
citations
#3078

DreamDistribution: Learning Prompt Distribution for Diverse In-distribution Generation

Brian Nlong Zhao, Yuhang Xiao, Jiashu Xu et al.

ICLR 2025arXiv:2312.14216
9
citations
#3079

Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization

Yamato Arai, Yuma Ichikawa

NEURIPS 2025arXiv:2504.09629
9
citations
#3080

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

Zun Wang, Jialu Li, Yicong Hong et al.

ICLR 2025arXiv:2412.08467
9
citations
#3081

LIBA: Language Instructed Multi-granularity Bridge Assistant for 3D Visual Grounding

Yuan Wang, Ya-Li Li, W U Eastman Z Y et al.

AAAI 2025paper
9
citations
#3082

QuaDiM: A Conditional Diffusion Model For Quantum State Property Estimation

Yehui Tang, Mabiao Long, Junchi Yan

ICLR 2025
9
citations
#3083

ADIFF: Explaining audio difference using natural language

Soham Deshmukh, Shuo Han, Rita Singh et al.

ICLR 2025arXiv:2502.04476
9
citations
#3084

ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation

Mengyang Wu, Yuzhi Zhao, Jialun Cao et al.

AAAI 2025paperarXiv:2412.18216
9
citations
#3085

SILO: Solving Inverse Problems with Latent Operators

Ron Raphaeli, Sean Man, Michael Elad

ICCV 2025arXiv:2501.11746
9
citations
#3086

Counterfactual Generative Modeling with Variational Causal Inference

Yulun Wu, Louis McConnell, Claudia Iriondo

ICLR 2025arXiv:2410.12730
9
citations
#3087

Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition

Zhong Zheng, Haochen Zhang, Lingzhou Xue

ICLR 2025arXiv:2410.07574
9
citations
#3088

SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement

Yuqi Lin, Hengjia Li, Wenqi Shao et al.

ICLR 2025arXiv:2502.06756
9
citations
#3089

Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration

Kang Liao, Zongsheng Yue, Zhouxia Wang et al.

ICLR 2025arXiv:2406.18516
9
citations
#3090

UnCommon Objects in 3D

Xingchen Liu, Piyush Tayal, Jianyuan Wang et al.

CVPR 2025arXiv:2501.07574
9
citations
#3091

Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)

Leander Girrbach, Stephan Alaniz, Yiran Huang et al.

ICLR 2025arXiv:2410.19314
9
citations
#3092

SMARTIES: Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images

Gencer Sumbul, Chang Xu, Emanuele Dalsasso et al.

ICCV 2025arXiv:2506.19585
9
citations
#3093

Physical Plausibility-aware Trajectory Prediction via Locomotion Embodiment

Hiromu Taketsugu, Takeru Oba, Takahiro Maeda et al.

CVPR 2025arXiv:2503.17267
9
citations
#3094

Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse

Arthur Jacot, Peter Súkeník, Zihan Wang et al.

ICLR 2025arXiv:2410.04887
9
citations
#3095

OLinear: A Linear Model for Time Series Forecasting in Orthogonally Transformed Domain

Wenzhen Yue, Yong Liu, Hao Wang et al.

NEURIPS 2025oralarXiv:2505.08550
9
citations
#3096

Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization

Luca Masserano, Abdul Fatir Ansari, Boran Han et al.

ICML 2025oralarXiv:2412.05244
9
citations
#3097

Diff3DS: Generating View-Consistent 3D Sketch via Differentiable Curve Rendering

Yibo Zhang, Lihong Wang, Changqing Zou et al.

ICLR 2025arXiv:2405.15305
9
citations
#3098

Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models

Jiaming Zhang, Junhong Ye, Xingjun Ma et al.

CVPR 2025arXiv:2410.05346
9
citations
#3099

GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling

Jialong Zhou, Lichao Wang, Xiao Yang

NEURIPS 2025oralarXiv:2505.19234
9
citations
#3100

Relieving Universal Label Noise for Unsupervised Visible-Infrared Person Re-Identification by Inferring from Neighbors

Xiao Teng, Long Lan, Dingyao Chen et al.

AAAI 2025paperarXiv:2412.12220
9
citations
#3101

Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment

Yifan Zhang, Ge Zhang, Yue Wu et al.

ICML 2025arXiv:2410.02197
9
citations
#3102

RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance

Chengrui Wang, Pengfei Liu, Min Zhou et al.

AAAI 2025paperarXiv:2404.13984
9
citations
#3103

Dual Prompting Image Restoration with Diffusion Transformers

Dehong Kong, Fan Li, Zhixin Wang et al.

CVPR 2025arXiv:2504.17825
9
citations
#3104

Transition Path Sampling with Improved Off-Policy Training of Diffusion Path Samplers

Kiyoung Seong, Seonghyun Park, Seonghwan Kim et al.

ICLR 2025arXiv:2405.19961
9
citations
#3105

RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards

jingnan zheng, Xiangtian Ji, Yijun Lu et al.

NEURIPS 2025arXiv:2506.07736
9
citations
#3106

Deep Nonlinear Sufficient Dimension Reduction

Yinfeng Chen, Yuling Jiao, Rui Qiu et al.

NEURIPS 2025
9
citations
#3107

AutoOcc: Automatic Open-Ended Semantic Occupancy Annotation via Vision-Language Guided Gaussian Splatting

Xiaoyu Zhou, Jingqi Wang, Yongtao Wang et al.

ICCV 2025highlightarXiv:2502.04981
9
citations
#3108

How Transformers Learn Structured Data: Insights From Hierarchical Filtering

Jerome Garnier-Brun, Marc Mezard, Emanuele Moscato et al.

ICML 2025arXiv:2408.15138
9
citations
#3109

SAM2Object: Consolidating View Consistency via SAM2 for Zero-Shot 3D Instance Segmentation

Jihuai Zhao, Junbao Zhuo, Jiansheng Chen et al.

CVPR 2025
9
citations
#3110

Fast Summation of Radial Kernels via QMC Slicing

Johannes Hertrich, Tim Jahn, Michael Quellmalz

ICLR 2025arXiv:2410.01316
9
citations
#3111

Diffusion models for Gaussian distributions: Exact solutions and Wasserstein errors

Emile Pierret, Bruno Galerne

ICML 2025arXiv:2405.14250
9
citations
#3112

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos

Jiashuo Yu, Yue Wu, Meng Chu et al.

ICCV 2025arXiv:2506.10857
9
citations
#3113

HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model

Tao Wang, Changxu Cheng, Lingfeng Wang et al.

ICCV 2025arXiv:2503.13026
9
citations
#3114

SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation

Koichi Saito, Dongjun Kim, Takashi Shibuya et al.

ICLR 2025arXiv:2405.18503
9
citations
#3115

Test-time Adaptation for Cross-modal Retrieval with Query Shift

Haobin Li, Peng Hu, Qianjun Zhang et al.

ICLR 2025arXiv:2410.15624
9
citations
#3116

MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation

Jiaxin Huang, Runnan Chen, Ziwen Li et al.

NEURIPS 2025arXiv:2503.18135
9
citations
#3117

ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models

Liyan Tang, Grace Kim, Xinyu Zhao et al.

NEURIPS 2025arXiv:2505.13444
9
citations
#3118

Vanish into Thin Air: Cross-prompt Universal Adversarial Attacks for SAM2

Ziqi Zhou, Yifan Hu, Yufei Song et al.

NEURIPS 2025spotlightarXiv:2510.24195
9
citations
#3119

DepthCues: Evaluating Monocular Depth Perception in Large Vision Models

Duolikun Danier, Mehmet Aygun, Changjian Li et al.

CVPR 2025arXiv:2411.17385
9
citations
#3120

UAVScenes: A Multi-Modal Dataset for UAVs

Sijie Wang, Siqi Li, Yawei Zhang et al.

ICCV 2025arXiv:2507.22412
9
citations
#3121

Make Me Happier: Evoking Emotions Through Image Diffusion Models

Qing Lin, Jingfeng Zhang, YEW-SOON ONG et al.

ICCV 2025arXiv:2403.08255
9
citations
#3122

Multi-Focus Image Fusion via Explicit Defocus Blur Modelling

Yuhui Quan, Xi Wan, Zitao Tang et al.

AAAI 2025paper
9
citations
#3123

Beyond the convexity assumption: Realistic tabular data generation under quantifier-free real linear constraints

Mihaela Stoian, Eleonora Giunchiglia

ICLR 2025arXiv:2502.18237
9
citations
#3124

HELM: Hierarchical Encoding for mRNA Language Modeling

Mehdi Yazdani-Jahromi, Mangal Prakash, Tommaso Mansi et al.

ICLR 2025arXiv:2410.12459
9
citations
#3125

Markov Persuasion Processes: Learning to Persuade From Scratch

Francesco Bacchiocchi, Francesco Emanuele Stradi, Matteo Castiglioni et al.

NEURIPS 2025arXiv:2402.03077
9
citations
#3126

MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections

Da Xiao, Qingye Meng, Shengping Li et al.

ICML 2025arXiv:2502.12170
9
citations
#3127

Mimic In-Context Learning for Multimodal Tasks

Yuchu Jiang, Jiale Fu, chenduo hao et al.

CVPR 2025arXiv:2504.08851
9
citations
#3128

Rapidly Adapting Policies to the Real-World via Simulation-Guided Fine-Tuning

Patrick Yin, Tyler Westenbroek, Ching-An Cheng et al.

ICLR 2025arXiv:2502.02705
9
citations
#3129

MagCache: Fast Video Generation with Magnitude-Aware Cache

Zehong Ma, Longhui Wei, Feng Wang et al.

NEURIPS 2025arXiv:2506.09045
9
citations
#3130

DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image

Qingxuan Wu, Zhiyang Dou, Sirui Xu et al.

ICLR 2025arXiv:2406.17988
9
citations
#3131

Quadratic Gaussian Splatting: High Quality Surface Reconstruction with Second-order Geometric Primitives

ziyu zhang, Binbin Huang, Hanqing Jiang et al.

ICCV 2025arXiv:2411.16392
9
citations
#3132

Quality-Driven Curation of Remote Sensing Vision-Language Data via Learned Scoring Models

Dilxat Muhtar, Enzhuo Zhang, Zhenshi Li et al.

NEURIPS 2025arXiv:2503.00743
9
citations
#3133

VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models

Muchao Ye, Weiyang Liu, Pan He

CVPR 2025arXiv:2412.01095
9
citations
#3134

PMQ-VE: Progressive Multi-Frame Quantization for Video Enhancement

ZhanFeng Feng, Long Peng, Xin Di et al.

NEURIPS 2025oralarXiv:2505.12266
9
citations
#3135

Seurat: From Moving Points to Depth

Seokju Cho, Gabriel Huang, Seungryong Kim et al.

CVPR 2025highlightarXiv:2504.14687
9
citations
#3136

MIB: A Mechanistic Interpretability Benchmark

Aaron Mueller, Atticus Geiger, Sarah Wiegreffe et al.

ICML 2025arXiv:2504.13151
9
citations
#3137

ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation

Angxiao Yue, Zichong Wang, Hongteng Xu

ICML 2025arXiv:2502.14637
9
citations
#3138

ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos

Tanveer Hannan, Md Mohaiminul Islam, Jindong Gu et al.

CVPR 2025arXiv:2411.14901
9
citations
#3139

MP-SfM: Monocular Surface Priors for Robust Structure-from-Motion

Zador Pataki, Paul-Edouard Sarlin, Johannes Schönberger et al.

CVPR 2025arXiv:2504.20040
9
citations
#3140

PENCIL: Long Thoughts with Short Memory

Chenxiao Yang, Nati Srebro, David McAllester et al.

ICML 2025arXiv:2503.14337
9
citations
#3141

Genesis: Multimodal Driving Scene Generation with Spatio-Temporal and Cross-Modal Consistency

Xiangyu Guo, Zhanqian Wu, Kaixin Xiong et al.

NEURIPS 2025oralarXiv:2506.07497
9
citations
#3142

Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI

Julien Pourcel, Cédric Colas, Pierre-Yves Oudeyer

ICML 2025arXiv:2507.14172
9
citations
#3143

BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems

Andy Zhang, Joey Ji, Celeste Menders et al.

NEURIPS 2025arXiv:2505.15216
9
citations
#3144

PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment

Daiwei Chen, Yi Chen, Aniket Rege et al.

ICLR 2025
9
citations
#3145

Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training

Haicheng Wang, Chen Ju, Weixiong Lin et al.

CVPR 2025arXiv:2412.00440
9
citations
#3146

Controllable Generation via Locally Constrained Resampling

Kareem Ahmed, Kai-Wei Chang, Guy Van den Broeck

ICLR 2025arXiv:2410.13111
9
citations
#3147

Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies

Nadav Timor, Jonathan Mamou, Daniel Korat et al.

ICML 2025oralarXiv:2502.05202
9
citations
#3148

SurFhead: Affine Rig Blending for Geometrically Accurate 2D Gaussian Surfel Head Avatars

Jaeseong Lee, Taewoong Kang, Marcel Buehler et al.

ICLR 2025arXiv:2410.11682
9
citations
#3149

Bayesian Test-Time Adaptation for Vision-Language Models

Lihua Zhou, Mao Ye, Shuaifeng Li et al.

CVPR 2025arXiv:2503.09248
9
citations
#3150

Adversarially Robust Out-of-Distribution Detection Using Lyapunov-Stabilized Embeddings

Hossein Mirzaei Sadeghlou, Mackenzie Mathis

ICLR 2025
9
citations
#3151

Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model

Benlin Liu, Yuhao Dong, Yiqin Wang et al.

CVPR 2025arXiv:2408.00754
9
citations
#3152

On the Training Convergence of Transformers for In-Context Classification of Gaussian Mixtures

Wei Shen, Ruida Zhou, Jing Yang et al.

ICML 2025arXiv:2410.11778
9
citations
#3153

GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers

Shijie Ma, Yuying Ge, Teng Wang et al.

ICCV 2025arXiv:2503.19480
9
citations
#3154

Fourier Sliced-Wasserstein Embedding for Multisets and Measures

Tal Amir, Nadav Dym

ICLR 2025arXiv:2405.16519
9
citations
#3155

SuperMat: Physically Consistent PBR Material Estimation at Interactive Rates

Yijia Hong, Yuan-Chen Guo, Ran Yi et al.

ICCV 2025arXiv:2411.17515
9
citations
#3156

Solving Inequality Proofs with Large Language Models

Jiayi Sheng, Luna Lyu, Jikai Jin et al.

NEURIPS 2025spotlightarXiv:2506.07927
9
citations
#3157

Model Provenance Testing for Large Language Models

Ivica Nikolic, Teodora Baluta, Prateek Saxena

NEURIPS 2025arXiv:2502.00706
9
citations
#3158

DriveEditor: A Unified 3D Information-Guided Framework for Controllable Object Editing in Driving Scenes

Yiyuan Liang, Zhiying Yan, Liqun Chen et al.

AAAI 2025paperarXiv:2412.19458
9
citations
#3159

Learning World Models for Interactive Video Generation

Taiye Chen, Xun Hu, Zihan Ding et al.

NEURIPS 2025oralarXiv:2505.21996
9
citations
#3160

MultiADS: Defect-aware Supervision for Multi-type Anomaly Detection and Segmentation in Zero-Shot Learning

Ylli Sadikaj, Hongkuan Zhou, Lavdim Halilaj et al.

ICCV 2025arXiv:2504.06740
9
citations
#3161

MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes

Ruijie Lu, Yixin Chen, Junfeng Ni et al.

CVPR 2025arXiv:2412.11457
9
citations
#3162

ZoRI: Towards Discriminative Zero-Shot Remote Sensing Instance Segmentation

Shiqi Huang, Shuting He, Bihan Wen

AAAI 2025paperarXiv:2412.12798
9
citations
#3163

Universal Video Temporal Grounding with Generative Multi-modal Large Language Models

Zeqian Li, Shangzhe Di, Zhonghua Zhai et al.

NEURIPS 2025oralarXiv:2506.18883
9
citations
#3164

Discrete Codebook World Models for Continuous Control

Aidan Scannell, Mohammadreza Nakhaeinezhadfard, Kalle Kujanpää et al.

ICLR 2025arXiv:2503.00653
9
citations
#3165

DiffCalib: Reformulating Monocular Camera Calibration as Diffusion-Based Dense Incident Map Generation

Xiankang He, Guangkai Xu, Bo Zhang et al.

AAAI 2025paperarXiv:2405.15619
9
citations
#3166

WildFake: A Large-Scale and Hierarchical Dataset for AI-Generated Images Detection

Yan Hong, Jianming Feng, Haoxing Chen et al.

AAAI 2025paper
9
citations
#3167

Aligned Better, Listen Better for Audio-Visual Large Language Models

Yuxin Guo, Shuailei Ma, Shijie Ma et al.

ICLR 2025oralarXiv:2504.02061
9
citations
#3168

Theoretically Grounded Framework for LLM Watermarking: A Distribution-Adaptive Approach

Haiyun He, Yepeng Liu, Ziqiao Wang et al.

NEURIPS 2025arXiv:2410.02890
9
citations
#3169

Energy-based Backdoor Defense Against Federated Graph Learning

Guancheng Wan, Zitong Shi, Wenke Huang et al.

ICLR 2025
9
citations
#3170

Multi-Pair Temporal Sentence Grounding via Multi-Thread Knowledge Transfer Network

Xiang Fang, Wanlong Fang, Changshuo Wang et al.

AAAI 2025paperarXiv:2412.15678
9
citations
#3171

Evaluating Large Language Models through Role-Guide and Self-Reflection: A Comparative Study

Lili Zhao, Yang Wang, Qi Liu et al.

ICLR 2025
9
citations
#3172

TASAR: Transfer-based Attack on Skeletal Action Recognition

Yunfeng Diao, Baiqi Wu, Ruixuan Zhang et al.

ICLR 2025oralarXiv:2409.02483
9
citations
#3173

AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks

Fali Wang, Hui Liu, Zhenwei Dai et al.

NEURIPS 2025arXiv:2508.00890
9
citations
#3174

Learning Pattern-Specific Experts for Time Series Forecasting Under Patch-level Distribution Shift

Yanru Sun, Zongxia Xie, Emadeldeen Eldele et al.

NEURIPS 2025oralarXiv:2410.09836
9
citations
#3175

MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem

Fan LIU, Zherui Yang, Cancheng Liu et al.

NEURIPS 2025arXiv:2505.14148
9
citations
#3176

Solving Video Inverse Problems Using Image Diffusion Models

Taesung Kwon, Jong Chul YE

ICLR 2025oralarXiv:2409.02574
9
citations
#3177

Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation

Zhao Song, Mingquan Ye, Junze Yin et al.

ICLR 2025arXiv:2306.04169
9
citations
#3178

LATINO-PRO: LAtent consisTency INverse sOlver with PRompt Optimization

Alessio Spagnoletti, Jean Prost, Andres Almansa et al.

ICCV 2025arXiv:2503.12615
9
citations
#3179

Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification

Jiaxiang Gou, Luping Ji, Pei Liu et al.

AAAI 2025paperarXiv:2410.10573
9
citations
#3180

Rethinking Evaluation of Sparse Autoencoders through the Representation of Polysemous Words

Gouki Gouki, Hiroki Furuta, Yusuke Iwasawa et al.

ICLR 2025arXiv:2501.06254
9
citations
#3181

MP-GUI: Modality Perception with MLLMs for GUI Understanding

Ziwei Wang, Weizhi Chen, Leyang Yang et al.

CVPR 2025arXiv:2503.14021
9
citations
#3182

Graph Generative Pre-trained Transformer

Xiaohui Chen, Yinkai Wang, JIAXING HE et al.

ICML 2025arXiv:2501.01073
9
citations
#3183

GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization

Yirui Chen, Xudong Huang, Quan Zhang et al.

AAAI 2025paperarXiv:2406.16531
9
citations
#3184

Correcting Deviations from Normality: A Reformulated Diffusion Model for Multi-Class Unsupervised Anomaly Detection

Farzad Beizaee, Gregory A. Lodygensky, Christian Desrosiers et al.

CVPR 2025arXiv:2503.19357
9
citations
#3185

BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models

Zenghui Yuan, Jiawen Shi, Pan Zhou et al.

CVPR 2025arXiv:2503.16023
9
citations
#3186

STAA-SNN: Spatial-Temporal Attention Aggregator for Spiking Neural Networks

Tianqing Zhang, Kairong Yu, Xian Zhong et al.

CVPR 2025arXiv:2503.02689
9
citations
#3187

Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models

Zhengfeng Lai, Vasileios Saveris, Chen Chen et al.

ICLR 2025arXiv:2410.02740
9
citations
#3188

Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation

Chanyoung Kim, Dayun Ju, Woojung Han et al.

CVPR 2025arXiv:2411.17150
9
citations
#3189

PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution

Zhu Li Bo, Jianze Li, Haotong Qin et al.

CVPR 2025arXiv:2411.17106
9
citations
#3190

Deformable Radial Kernel Splatting

Yihua Huang, Mingxian Lin, Yangtian Sun et al.

CVPR 2025arXiv:2412.11752
9
citations
#3191

RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything

Shilin Xu, Haobo Yuan, Qingyu Shi et al.

ICLR 2025arXiv:2401.10228
9
citations
#3192

Guided Diffusion Sampling on Function Spaces with Applications to PDEs

Jiachen Yao, Abbas Mammadov, Julius Berner et al.

NEURIPS 2025arXiv:2505.17004
9
citations
#3193

Fine-Tuning Visual Autogressive Models for Subject-Driven Generation

Jiwoo Chung, Sangeek Hyun, Hyunjun Kim et al.

ICCV 2025
9
citations
#3194

DiffGAD: A Diffusion-based Unsupervised Graph Anomaly Detector

Jinghan Li, Yuan Gao, Jinda Lu et al.

ICLR 2025arXiv:2410.06549
9
citations
#3195

InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct

Yutong Wu, Di Huang, Wenxuan Shi et al.

AAAI 2025paperarXiv:2407.05700
9
citations
#3196

InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment

Yunhong Lu, Qichao Wang, Hengyuan Cao et al.

CVPR 2025highlightarXiv:2503.18454
9
citations
#3197

Diffusion Tree Sampling: Scalable inference‑time alignment of diffusion models

Vineet Jain, Kusha Sareen, Mohammad Pedramfar et al.

NEURIPS 2025arXiv:2506.20701
9
citations
#3198

Advancing Expert Specialization for Better MoE

Hongcan Guo, Haolang Lu, Guoshun Nan et al.

NEURIPS 2025oralarXiv:2505.22323
9
citations
#3199

Distilling Monocular Foundation Model for Fine-grained Depth Completion

Yingping Liang, Yutao Hu, Wenqi Shao et al.

CVPR 2025arXiv:2503.16970
9
citations
#3200

Multimodal Quantitative Language for Generative Recommendation

Jianyang Zhai, Zi-Feng Mai, Chang-Dong Wang et al.

ICLR 2025arXiv:2504.05314
9
citations