Most Cited 2025 "multiparameter bandit models" Papers

22,274 papers found • Page 22 of 112

#4201

Flat-LoRA: Low-Rank Adaptation over a Flat Loss Landscape

Tao Li, Zhengbao He, Yujun Li et al.

ICML 2025arXiv:2409.14396
11
citations
#4202

FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers

Renshan Zhang, Rui Shao, Gongwei Chen et al.

ICCV 2025arXiv:2501.16297
11
citations
#4203

Data Synthesis with Diverse Styles for Face Recognition via 3DMM-Guided Diffusion

Yuxi Mi, Zhizhou Zhong, Yuge Huang et al.

CVPR 2025arXiv:2504.00430
11
citations
#4204

QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search

Zongyu Lin, Yao Tang, Xingcheng Yao et al.

ICML 2025arXiv:2502.02584
11
citations
#4205

ResCLIP: Residual Attention for Training-free Dense Vision-language Inference

Jinhong Deng, Yuhang Yang, Wen Li et al.

CVPR 2025arXiv:2411.15851
11
citations
#4206

Mosaic of Modalities: A Comprehensive Benchmark for Multimodal Graph Learning

Jing Zhu, Yuhang Zhou, Shengyi Qian et al.

CVPR 2025arXiv:2406.16321
11
citations
#4207

MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design

Haojie Duanmu, Xiuhong Li, Zhihang Yuan et al.

ICML 2025arXiv:2505.05799
11
citations
#4208

FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning

Gaojian Wang, Feng Lin, Tong Wu et al.

CVPR 2025arXiv:2412.12032
11
citations
#4209

metabench - A Sparse Benchmark of Reasoning and Knowledge in Large Language Models

Alex Kipnis, Konstantinos Voudouris, Luca Schulze Buschoff et al.

ICLR 2025arXiv:2407.12844
11
citations
#4210

UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence

Jie Feng, Shengyuan Wang, Tianhui Liu et al.

ICCV 2025arXiv:2506.23219
11
citations
#4211

BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models

Yige Li, Hanxun Huang, Yunhan Zhao et al.

NEURIPS 2025arXiv:2408.12798
11
citations
#4212

ROADWork: A Dataset and Benchmark for Learning to Recognize, Observe, Analyze and Drive Through Work Zones

Anurag Ghosh, Shen Zheng, Robert Tamburo et al.

ICCV 2025arXiv:2406.07661
11
citations
#4213

Detecting Backdoor Samples in Contrastive Language Image Pretraining

Hanxun Huang, Sarah Erfani, Yige Li et al.

ICLR 2025arXiv:2502.01385
11
citations
#4214

Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting

Suraj Anand, Michael Lepori, Jack Merullo et al.

ICLR 2025arXiv:2406.00053
11
citations
#4215

Temporal Fair Division

Benjamin Cookson, Soroush Ebadian, Nisarg Shah

AAAI 2025paperarXiv:2410.23416
11
citations
#4216

The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning

Youssef Allouah, Joshua Kazdan, Rachid Guerraoui et al.

ICLR 2025arXiv:2412.09119
11
citations
#4217

Differentiable Optimization of Similarity Scores Between Models and Brains

Nathan Cloos, Moufan Li, Markus Siegel et al.

ICLR 2025arXiv:2407.07059
11
citations
#4218

RelitLRM: Generative Relightable Radiance for Large Reconstruction Models

Tianyuan Zhang, Zhengfei Kuang, Haian Jin et al.

ICLR 2025arXiv:2410.06231
11
citations
#4219

Understanding Virtual Nodes: Oversquashing and Node Heterogeneity

Joshua Southern, Francesco Di Giovanni, Michael Bronstein et al.

ICLR 2025arXiv:2405.13526
11
citations
#4220

HyperFace: Generating Synthetic Face Recognition Datasets by Exploring Face Embedding Hypersphere

Hatef Otroshi Shahreza, Sébastien Marcel

ICLR 2025arXiv:2411.08470
11
citations
#4221

Multi-view Reconstruction via SfM-guided Monocular Depth Estimation

Haoyu Guo, He Zhu, Sida Peng et al.

CVPR 2025arXiv:2503.14483
11
citations
#4222

LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation models

Ziqi Lu, Heng Yang, Danfei Xu et al.

ICLR 2025arXiv:2412.07746
11
citations
#4223

Accurate and Regret-Aware Numerical Problem Solver for Tabular Question Answering

Yuxiang Wang, Jianzhong Qi, Junhao Gan

AAAI 2025paperarXiv:2410.12846
11
citations
#4224

DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution

Zheng Chen, Zichen Zou, Kewei Zhang et al.

NEURIPS 2025arXiv:2505.16239
11
citations
#4225

VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step

Hanyang Wang, Fangfu Liu, Jiawei Chi et al.

CVPR 2025highlightarXiv:2504.01956
11
citations
#4226

Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations

Yiyou Sun, Yu Gai, Lijie Chen et al.

NEURIPS 2025arXiv:2504.12691
11
citations
#4227

Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering

Federico Cocchi, Nicholas Moratelli, Marcella Cornia et al.

CVPR 2025arXiv:2411.16863
11
citations
#4228

Adversarial Robustness in Two-Stage Learning-to-Defer: Algorithms and Guarantees

Yannis Montreuil, Axel Carlier, Lai Xing Ng et al.

ICML 2025arXiv:2502.01027
11
citations
#4229

Understanding Model Calibration - A gentle introduction and visual exploration of calibration and the expected calibration error (ECE)

Maja Pavlovic

ICLR 2025arXiv:2501.19047
11
citations
#4230

MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections

Da Xiao, Qingye Meng, Shengping Li et al.

ICML 2025arXiv:2502.12170
11
citations
#4231

Reviving DSP for Advanced Theorem Proving in the Era of Reasoning Models

Chenrui Cao, Liangcheng Song, Zenan Li et al.

NEURIPS 2025arXiv:2506.11487
11
citations
#4232

R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference

Zhenyu Zhang, Zechun Liu, Yuandong Tian et al.

ICLR 2025arXiv:2504.19449
11
citations
#4233

Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs

Shuo Li, Tao Ji, Xiaoran Fan et al.

ICLR 2025arXiv:2410.11302
11
citations
#4234

Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts

Yun Wang, Longguang Wang, Chenghao Zhang et al.

ICCV 2025highlightarXiv:2507.04631
11
citations
#4235

RGBAvatar: Reduced Gaussian Blendshapes for Online Modeling of Head Avatars

Linzhou Li, Yumeng Li, Yanlin Weng et al.

CVPR 2025highlightarXiv:2503.12886
11
citations
#4236

Bridging the User-side Knowledge Gap in Knowledge-aware Recommendations with Large Language Models

Zheng Hu, Zhe Li, Ziyun Jiao et al.

AAAI 2025paperarXiv:2412.13544
11
citations
#4237

Flash-VStream: Efficient Real-Time Understanding for Long Video Streams

Haoji Zhang, Yiqin Wang, Yansong Tang et al.

ICCV 2025arXiv:2506.23825
11
citations
#4238

Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search

Jonathan Light, Min Cai, Weiqin Chen et al.

ICLR 2025arXiv:2408.10635
11
citations
#4239

MPDrive: Improving Spatial Understanding with Marker-Based Prompt Learning for Autonomous Driving

Zhi-Yuan Zhang, Xiaofan Li, Zhihao Xu et al.

CVPR 2025highlightarXiv:2504.00379
11
citations
#4240

Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation

Yongkang Li, Tianheng Cheng, Bin Feng et al.

CVPR 2025arXiv:2412.04533
11
citations
#4241

Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis

Weikai Li, Ding Wang, Zijian Ding et al.

AAAI 2025paperarXiv:2410.19225
11
citations
#4242

BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations

Weixi Feng, Chao Liu, Sifei Liu et al.

CVPR 2025arXiv:2501.07647
11
citations
#4243

Memory Mosaics

Jianyu Zhang, Niklas Nolte, Ranajoy Sadhukhan et al.

ICLR 2025arXiv:2405.06394
11
citations
#4244

Generation from Noisy Examples

Ananth Raman, Vinod Raman

ICML 2025arXiv:2501.04179
11
citations
#4245

How Expressive are Knowledge Graph Foundation Models?

Xingyue Huang, Pablo Barcelo, Michael Bronstein et al.

ICML 2025arXiv:2502.13339
11
citations
#4246

Reasoning as an Adaptive Defense for Safety

Taeyoun Kim, Fahim Tajwar, Aditi Raghunathan et al.

NEURIPS 2025arXiv:2507.00971
11
citations
#4247

SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction

Lu Dai, Yijie Xu, Jinhui Ye et al.

ICLR 2025arXiv:2503.01478
11
citations
#4248

3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model

Wenbo Hu, Yining Hong, Yanjun Wang et al.

NEURIPS 2025oralarXiv:2505.22657
11
citations
#4249

How do Transformers Learn Implicit Reasoning?

Jiaran Ye, Zijun Yao, Zhidian Huang et al.

NEURIPS 2025oralarXiv:2505.23653
11
citations
#4250

X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing

Xinyan Chen, Jianfei Yang

ICLR 2025arXiv:2410.10167
11
citations
#4251

Lifting Motion to the 3D World via 2D Diffusion

Jiaman Li, Karen Liu, Jiajun Wu

CVPR 2025highlightarXiv:2411.18808
11
citations
#4252

Efficiently Parameterized Neural Metriplectic Systems

Anthony Gruber, Kookjin Lee, Haksoo Lim et al.

ICLR 2025arXiv:2405.16305
11
citations
#4253

On Conformal Isometry of Grid Cells: Learning Distance-Preserving Position Embedding

Dehong Xu, Ruiqi Gao, Wenhao Zhang et al.

ICLR 2025arXiv:2405.16865
11
citations
#4254

Jailbreaking as a Reward Misspecification Problem

Zhihui Xie, Jiahui Gao, Lei Li et al.

ICLR 2025arXiv:2406.14393
11
citations
#4255

Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation

Shuo Wang, Yongcai Wang, Wanting Li et al.

NEURIPS 2025arXiv:2505.11886
11
citations
#4256

TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation

Mohan Xu, Kai Li, Guo Chen et al.

ICLR 2025oralarXiv:2410.01469
11
citations
#4257

Probing the Latent Hierarchical Structure of Data via Diffusion Models

Antonio Sclocchi, Alessandro Favero, Noam Levi et al.

ICLR 2025arXiv:2410.13770
11
citations
#4258

Video Summarization with Large Language Models

Min Jung Lee, Dayoung Gong, Minsu Cho

CVPR 2025arXiv:2504.11199
11
citations
#4259

TEASER: Token Enhanced Spatial Modeling for Expressions Reconstruction

Yunfei Liu, Lei Zhu, Lijian Lin et al.

ICLR 2025arXiv:2502.10982
11
citations
#4260

The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement

Ruihan Yang, Fanghua Ye, Jian Li et al.

NEURIPS 2025arXiv:2503.16024
11
citations
#4261

SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters

Jianping Jiang, Weiye Xiao, Zhengyu Lin et al.

CVPR 2025arXiv:2412.00174
11
citations
#4262

LiDAR-RT: Gaussian-based Ray Tracing for Dynamic LiDAR Re-simulation

Chenxu Zhou, Lvchang Fu, Sida Peng et al.

CVPR 2025arXiv:2412.15199
11
citations
#4263

SILO: Solving Inverse Problems with Latent Operators

Ron Raphaeli, Sean Man, Michael Elad

ICCV 2025arXiv:2501.11746
11
citations
#4264

Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs

Yingji Zhong, Zhihao Li, Dave Zhenyu Chen et al.

CVPR 2025highlightarXiv:2503.05082
11
citations
#4265

Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis

Hongkang Li, Songtao Lu, Pin-Yu Chen et al.

ICLR 2025arXiv:2410.02167
11
citations
#4266

Optimized Multi-Token Joint Decoding With Auxiliary Model for LLM Inference

Zongyue Qin, Ziniu Hu, Zifan He et al.

ICLR 2025arXiv:2407.09722
11
citations
#4267

Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization

Peiyan Zhang, Haibo Jin, Leyang Hu et al.

ICML 2025arXiv:2412.03092
11
citations
#4268

CASE-Bench: Context-Aware SafEty Benchmark for Large Language Models

Guangzhi Sun, Xiao Zhan, Shutong Feng et al.

ICML 2025arXiv:2501.14940
11
citations
#4269

Hierarchical Fine-grained Preference Optimization for Physically Plausible Video Generation

Harold Haodong Chen, Haojian Huang, Qifeng Chen et al.

NEURIPS 2025oralarXiv:2508.10858
11
citations
#4270

Open-World Amodal Appearance Completion

Jiayang Ao, Yanbei Jiang, Qiuhong Ke et al.

CVPR 2025arXiv:2411.13019
11
citations
#4271

The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?

Zhenheng Tang, Xiang Liu, Qian Wang et al.

ICLR 2025arXiv:2502.17535
11
citations
#4272

PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation

Chen Wang, Chuhao Chen, Yiming Huang et al.

NEURIPS 2025oralarXiv:2509.20358
11
citations
#4273

Dense Policy: Bidirectional Autoregressive Learning of Actions

Yue Su, Xinyu Zhan, Hongjie Fang et al.

ICCV 2025arXiv:2503.13217
11
citations
#4274

Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance

Dimitris Oikonomou, Nicolas Loizou

ICLR 2025arXiv:2406.04142
11
citations
#4275

How Much Can We Forget about Data Contamination?

Sebastian Bordt, Suraj Srinivas, Valentyn Boreiko et al.

ICML 2025arXiv:2410.03249
11
citations
#4276

PhyMPGN: Physics-encoded Message Passing Graph Network for spatiotemporal PDE systems

Bocheng Zeng, Qi Wang, Mengtao Yan et al.

ICLR 2025oralarXiv:2410.01337
11
citations
#4277

SMamba: Sparse Mamba for Event-based Object Detection

Nan Yang, Yang Wang, Zhanwen Liu et al.

AAAI 2025paperarXiv:2501.11971
11
citations
#4278

Anyprefer: An Agentic Framework for Preference Data Synthesis

Yiyang Zhou, Zhaoyang Wang, Tianle Wang et al.

ICLR 2025arXiv:2504.19276
11
citations
#4279

Learning Flow Fields in Attention for Controllable Person Image Generation

Zijian Zhou, Shikun Liu, Xiao Han et al.

CVPR 2025arXiv:2412.08486
11
citations
#4280

Glad: A Streaming Scene Generator for Autonomous Driving

Bin Xie, Yingfei Liu, Tiancai Wang et al.

ICLR 2025oralarXiv:2503.00045
11
citations
#4281

Analysis of Linear Mode Connectivity via Permutation-Based Weight Matching: With Insights into Other Permutation Search Methods

Akira Ito, Masanori Yamada, Atsutoshi Kumagai

ICLR 2025arXiv:2402.04051
11
citations
#4282

Is this Generated Person Existed in Real-world? Fine-grained Detecting and Calibrating Abnormal Human-body

Zeqing Wang, Qingyang Ma, Wentao Wan et al.

CVPR 2025highlightarXiv:2411.14205
11
citations
#4283

Beyond Graphs: Can Large Language Models Comprehend Hypergraphs?

Yifan Feng, Chengwu Yang, Xingliang Hou et al.

ICLR 2025arXiv:2410.10083
11
citations
#4284

Bayesian Test-Time Adaptation for Vision-Language Models

Lihua Zhou, Mao Ye, Shuaifeng Li et al.

CVPR 2025arXiv:2503.09248
11
citations
#4285

Rethinking Invariance in In-context Learning

Lizhe Fang, Yifei Wang, Khashayar Gatmiry et al.

ICLR 2025arXiv:2505.04994
11
citations
#4286

Improving Your Model Ranking on Chatbot Arena by Vote Rigging

Rui Min, Tianyu Pang, Chao Du et al.

ICML 2025arXiv:2501.17858
11
citations
#4287

Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late In Training

Zhanpeng Zhou, Mingze Wang, Yuchen Mao et al.

ICLR 2025arXiv:2410.10373
11
citations
#4288

nnWNet: Rethinking the Use of Transformers in Biomedical Image Segmentation and Calling for a Unified Evaluation Benchmark

Yanfeng Zhou, Lingrui Li, Le Lu et al.

CVPR 2025
11
citations
#4289

Contrastive Flow Matching

George Stoica, Vivek Ramanujan, Xiang Fan et al.

ICCV 2025arXiv:2506.05350
11
citations
#4290

The Computational Complexity of Circuit Discovery for Inner Interpretability

Federico Adolfi, Martina G. Vilas, Todd Wareham

ICLR 2025arXiv:2410.08025
11
citations
#4291

Attention-Driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models Without Fine-Tuning

Hai-Ming Xu, Qi Chen, Lei Wang et al.

AAAI 2025paperarXiv:2412.10840
11
citations
#4292

COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting

Jiaxin Zhang, Junjun Jiang, Youyu Chen et al.

CVPR 2025arXiv:2503.19443
11
citations
#4293

VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text

Tianyu Zhang, Suyuchen Wang, Lu Li et al.

ICLR 2025arXiv:2406.06462
11
citations
#4294

I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models

Zhenxing Mi, Kuan-Chieh Wang, Guocheng Qian et al.

ICML 2025arXiv:2502.10458
11
citations
#4295

CellFlux: Simulating Cellular Morphology Changes via Flow Matching

Yuhui Zhang, Yuchang Su, Chenyu Wang et al.

ICML 2025arXiv:2502.09775
11
citations
#4296

6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering

Zhongpai Gao, Benjamin Planche, Meng Zheng et al.

ICLR 2025arXiv:2410.04974
11
citations
#4297

Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning

Jiwon Song, Dongwon Jo, Yulhwa Kim et al.

NEURIPS 2025arXiv:2505.13866
11
citations
#4298

ViSAGe: Video-to-Spatial Audio Generation

Jaeyeon Kim, Heeseung Yun, Gunhee Kim

ICLR 2025oralarXiv:2506.12199
11
citations
#4299

Revisiting Energy Based Models as Policies: Ranking Noise Contrastive Estimation and Interpolating Energy Models

Sumeet Singh, Vikas Sindhwani, Stephen Tu

ICLR 2025arXiv:2309.05803
11
citations
#4300

CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization

Feize Wu, Yun Pang, Junyi Zhang et al.

AAAI 2025paperarXiv:2408.15914
11
citations
#4301

Layered Image Vectorization via Semantic Simplification

Zhenyu Wang, Jianxi Huang, Zhida Sun et al.

CVPR 2025arXiv:2406.05404
11
citations
#4302

Task-Agnostic Guided Feature Expansion for Class-Incremental Learning

Bowen Zheng, Da-Wei Zhou, Han-Jia Ye et al.

CVPR 2025arXiv:2503.00823
11
citations
#4303

Identifying and Mitigating Position Bias of Multi-image Vision-Language Models

Xinyu Tian, Shu Zou, Zhaoyuan Yang et al.

CVPR 2025arXiv:2503.13792
11
citations
#4304

Neural Sampling from Boltzmann Densities: Fisher-Rao Curves in the Wasserstein Geometry

Jannis Chemseddine, Christian Wald, Richard Duong et al.

ICLR 2025arXiv:2410.03282
11
citations
#4305

GraphGPT: Generative Pre-trained Graph Eulerian Transformer

Qifang Zhao, Weidong Ren, Tianyu Li et al.

ICML 2025arXiv:2401.00529
11
citations
#4306

Improving the Scaling Laws of Synthetic Data with Deliberate Practice

Reyhane Askari Hemmat, Mohammad Pezeshki, Elvis Dohmatob et al.

ICML 2025oralarXiv:2502.15588
11
citations
#4307

DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation

Jing He, Haodong Li, huyongzhe et al.

ICLR 2025arXiv:2410.02067
11
citations
#4308

Rectifying Magnitude Neglect in Linear Attention

Qihang Fan, Huaibo Huang, Yuang Ai et al.

ICCV 2025highlightarXiv:2507.00698
11
citations
#4309

Rectified Diffusion Guidance for Conditional Generation

Mengfei Xia, Nan Xue, Yujun Shen et al.

CVPR 2025arXiv:2410.18737
11
citations
#4310

Scaling Trends in Language Model Robustness

Nikolaus Howe, Ian McKenzie, Oskar Hollinsworth et al.

ICML 2025spotlightarXiv:2407.18213
11
citations
#4311

Can Transformers Reason Logically? A Study in SAT Solving

Leyan Pan, Vijay Ganesh, Jacob Abernethy et al.

ICML 2025arXiv:2410.07432
11
citations
#4312

UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning

Xiangyu Wang, Donglin Yang, Yue Liao et al.

NEURIPS 2025arXiv:2505.15725
11
citations
#4313

XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning

Alexander Nikulin, Ilya Zisman, Alexey Zemtsov et al.

ICLR 2025arXiv:2406.08973
11
citations
#4314

RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction

Peng Liu, Dongyang Dai, Zhiyong Wu

ICLR 2025arXiv:2403.05010
11
citations
#4315

TopoTune: A Framework for Generalized Combinatorial Complex Neural Networks

Mathilde Papillon, Guillermo Bernardez, Claudio Battiloro et al.

ICML 2025arXiv:2410.06530
11
citations
#4316

GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior

Penghao Wu, Shengnan Ma, Bo Wang et al.

NEURIPS 2025arXiv:2506.08012
11
citations
#4317

RiOSWorld: Benchmarking the Risk of Multimodal Computer-Use Agents

Jingyi Yang, Shuai Shao, Dongrui Liu et al.

NEURIPS 2025arXiv:2506.00618
11
citations
#4318

Towards Precise Scaling Laws for Video Diffusion Transformers

Yuanyang Yin, Yaqi Zhao, Mingwu Zheng et al.

CVPR 2025arXiv:2411.17470
11
citations
#4319

TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation

Gihyun Kwon, Jong Chul YE

ICLR 2025arXiv:2410.05591
11
citations
#4320

Reward-Instruct: A Reward-Centric Approach to Fast Photo-Realistic Image Generation

Yihong Luo, Tianyang Hu, Weijian Luo et al.

NEURIPS 2025arXiv:2503.13070
11
citations
#4321

SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent

Yandan Yang, Baoxiong Jia, Shujie Zhang et al.

NEURIPS 2025arXiv:2509.20414
11
citations
#4322

Point2RBox-v2: Rethinking Point-supervised Oriented Object Detection with Spatial Layout Among Instances

Yi Yu, Botao Ren, Peiyuan Zhang et al.

CVPR 2025arXiv:2502.04268
11
citations
#4323

Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment

Cheryl Li, Tianyuan Xu, Yiwen Guo

ICML 2025arXiv:2502.07803
11
citations
#4324

VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation

Wenhao Wang, Yi Yang

NEURIPS 2025arXiv:2503.01739
11
citations
#4325

SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding

Yangliu Hu, Zikai Song, Na Feng et al.

CVPR 2025arXiv:2504.07745
11
citations
#4326

MP-GUI: Modality Perception with MLLMs for GUI Understanding

Ziwei Wang, Weizhi Chen, Leyang Yang et al.

CVPR 2025arXiv:2503.14021
11
citations
#4327

Hierarchical Equivariant Policy via Frame Transfer

Haibo Zhao, Dian Wang, Yizhe Zhu et al.

ICML 2025arXiv:2502.05728
11
citations
#4328

Random-Set Neural Networks

Shireen Kudukkil Manchingal, Muhammad Mubashar, Kaizheng Wang et al.

ICLR 2025arXiv:2307.05772
11
citations
#4329

Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis

Bingda Tang, Sayak Paul, Boyang Zheng et al.

CVPR 2025arXiv:2505.10046
11
citations
#4330

Adaptive Self-improvement LLM Agentic System for ML Library Development

Genghan Zhang, Weixin Liang, Olivia Hsu et al.

ICML 2025arXiv:2502.02534
11
citations
#4331

QERA: an Analytical Framework for Quantization Error Reconstruction

Cheng Zhang, Jeffrey T. H. Wong, Can Xiao et al.

ICLR 2025arXiv:2410.06040
11
citations
#4332

Lossy Compression with Pretrained Diffusion Models

jeremy vonderfecht, Feng Liu

ICLR 2025arXiv:2501.09815
11
citations
#4333

DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers

Xuanlei Zhao, Shenggan Cheng, Chang Chen et al.

ICML 2025arXiv:2403.10266
11
citations
#4334

Don’t Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models

Sohyun An, Ruochen Wang, Tianyi Zhou et al.

NEURIPS 2025
11
citations
#4335

ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding

Zhenxing Zhang, Yaxiong Wang, Lechao Cheng et al.

CVPR 2025arXiv:2412.12718
11
citations
#4336

Using Diffusion Priors for Video Amodal Segmentation

Kaihua Chen, Deva Ramanan, Tarasha Khurana

CVPR 2025arXiv:2412.04623
11
citations
#4337

Puzzle: Distillation-Based NAS for Inference-Optimized LLMs

Akhiad Bercovich, Tomer Ronen, Talor Abramovich et al.

ICML 2025arXiv:2411.19146
11
citations
#4338

Retrieval-Augmented Perception: High-resolution Image Perception Meets Visual RAG

Wenbin Wang, Yongcheng Jing, Liang Ding et al.

ICML 2025oralarXiv:2503.01222
11
citations
#4339

CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance

Yongchao Chen, Yilun Hao, Yueying Liu et al.

ICML 2025arXiv:2502.04350
11
citations
#4340

MAPLE: Many-Shot Adaptive Pseudo-Labeling for In-Context Learning

Zihan Chen, Song Wang, Zhen Tan et al.

ICML 2025arXiv:2505.16225
11
citations
#4341

Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles

Buu Phan, Brandon Amos, Itai Gat et al.

ICLR 2025arXiv:2410.09303
11
citations
#4342

Dual Consolidation for Pre-Trained Model-Based Domain-Incremental Learning

Da-Wei Zhou, Zi-Wen Cai, Han-Jia Ye et al.

CVPR 2025arXiv:2410.00911
11
citations
#4343

DOTA: Distributional Test-time Adaptation of Vision-Language Models

Zongbo Han, Jialong Yang, Guangyu Wang et al.

NEURIPS 2025arXiv:2409.19375
11
citations
#4344

TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning

Ge Li, Dong Tian, Hongyi Zhou et al.

ICLR 2025oralarXiv:2410.09536
11
citations
#4345

ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models

Yeji Park, Deokyeong Lee, Junsuk Choe et al.

AAAI 2025paperarXiv:2408.13906
11
citations
#4346

HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator

Fan Yang, Ru Zhen, Jianing Wang et al.

CVPR 2025arXiv:2411.17261
11
citations
#4347

MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios

Jiacheng Ruan, Wenzhen Yuan, Zehao Lin et al.

AAAI 2025paperarXiv:2409.16084
11
citations
#4348

EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge

Ruskin Raj Manku, Yuzhi Tang, Xingjian Shi et al.

NEURIPS 2025arXiv:2505.23009
11
citations
#4349

xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference

Maximilian Beck, Korbinian Pöppel, Phillip Lippe et al.

ICML 2025arXiv:2503.13427
11
citations
#4350

TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting

Jianchuan Chen, Jingchuan Hu, Gaige Wang et al.

CVPR 2025highlightarXiv:2503.17032
11
citations
#4351

BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing

Yunqi Gu, Ian Huang, Jihyeon Je et al.

CVPR 2025highlightarXiv:2504.01786
11
citations
#4352

FEAT: Free energy Estimators with Adaptive Transport

Yuanqi Du, Jiajun He, Francisco Vargas et al.

NEURIPS 2025arXiv:2504.11516
11
citations
#4353

3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding

Tatiana Zemskova, Dmitry Yudin

ICCV 2025arXiv:2412.18450
11
citations
#4354

Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models

Zeman Li, Xinwei Zhang, Peilin Zhong et al.

ICLR 2025arXiv:2410.06441
11
citations
#4355

Science-T2I: Addressing Scientific Illusions in Image Synthesis

Jialuo Li, Wenhao Chai, XINGYU FU et al.

CVPR 2025arXiv:2504.13129
11
citations
#4356

Privacy-Preserving Low-Rank Adaptation Against Membership Inference Attacks for Latent Diffusion Models

Zihao Luo, Xilie Xu, Feng Liu et al.

AAAI 2025paperarXiv:2402.11989
11
citations
#4357

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

Chejian Xu, Jiawei Zhang, Zhaorun Chen et al.

ICLR 2025arXiv:2503.14827
11
citations
#4358

Sports-Traj: A Unified Trajectory Generation Model for Multi-Agent Movement in Sports

Yi Xu, Yun Fu

ICLR 2025oralarXiv:2405.17680
11
citations
#4359

Open-World Reinforcement Learning over Long Short-Term Imagination

Jiajian Li, Qi Wang, Yunbo Wang et al.

ICLR 2025arXiv:2410.03618
11
citations
#4360

Diff-Shadow: Global-guided Diffusion Model for Shadow Removal

Jinting Luo, Ru Li, Chengzhi Jiang et al.

AAAI 2025paperarXiv:2407.16214
11
citations
#4361

Topological Blindspots: Understanding and Extending Topological Deep Learning Through the Lens of Expressivity

Yam Eitan, Yoav Gelberg, Guy Bar-Shalom et al.

ICLR 2025arXiv:2408.05486
11
citations
#4362

Fast training and sampling of Restricted Boltzmann Machines

Nicolas BEREUX, Aurélien Decelle, Cyril Furtlehner et al.

ICLR 2025arXiv:2405.15376
11
citations
#4363

ExpertAF: Expert Actionable Feedback from Video

Kumar Ashutosh, Tushar Nagarajan, Georgios Pavlakos et al.

CVPR 2025arXiv:2408.00672
11
citations
#4364

A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks

Thomas Schmied, Thomas Adler, Vihang Patil et al.

ICML 2025arXiv:2410.22391
11
citations
#4365

One Node One Model: Featuring the Missing-Half for Graph Clustering

Xuanting Xie, Bingheng Li, Erlin Pan et al.

AAAI 2025paperarXiv:2412.09902
11
citations
#4366

VladVA: Discriminative Fine-tuning of LVLMs

Yassine Ouali, Adrian Bulat, ALEXANDROS XENOS et al.

CVPR 2025arXiv:2412.04378
11
citations
#4367

Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences

Nikos Dimitriadis, Pascal Frossard, François Fleuret

ICLR 2025arXiv:2407.08056
11
citations
#4368

PLeaS - Merging Models with Permutations and Least Squares

Anshul Nasery, Jonathan Hayase, Pang Wei Koh et al.

CVPR 2025arXiv:2407.02447
11
citations
#4369

ConTextTab: A Semantics-Aware Tabular In-Context Learner

Marco Spinaci, Marek Polewczyk, Maximilian Schambach et al.

NEURIPS 2025spotlightarXiv:2506.10707
11
citations
#4370

SRSA: Skill Retrieval and Adaptation for Robotic Assembly Tasks

Yijie Guo, Bingjie Tang, Iretiayo Akinola et al.

ICLR 2025arXiv:2503.04538
11
citations
#4371

ArtFormer: Controllable Generation of Diverse 3D Articulated Objects

Jiayi Su, Youhe Feng, Zheng Li et al.

CVPR 2025arXiv:2412.07237
11
citations
#4372

Whole-Body Conditioned Egocentric Video Prediction

Yutong Bai, Danny Tran, Amir Bar et al.

NEURIPS 2025arXiv:2506.21552
11
citations
#4373

IgGM: A Generative Model for Functional Antibody and Nanobody Design

Rubo Wang, Fandi Wu, Xingyu Gao et al.

ICLR 2025
11
citations
#4374

Locality in Image Diffusion Models Emerges from Data Statistics

Artem Lukoianov, Chenyang Yuan, Justin Solomon et al.

NEURIPS 2025spotlightarXiv:2509.09672
11
citations
#4375

Scaling Embedding Layers in Language Models

Da Yu, Edith Cohen, Badih Ghazi et al.

NEURIPS 2025arXiv:2502.01637
11
citations
#4376

MambaPEFT: Exploring Parameter-Efficient Fine-Tuning for Mamba

Masakazu Yoshimura, Teruaki Hayashi, Yota Maeda

ICLR 2025arXiv:2411.03855
11
citations
#4377

TopoCellGen: Generating Histopathology Cell Topology with a Diffusion Model

Meilong Xu, Saumya Gupta, Xiaoling Hu et al.

CVPR 2025arXiv:2412.06011
11
citations
#4378

Does Generation Require Memorization? Creative Diffusion Models using Ambient Diffusion

Kulin Shah, Alkis Kalavasis, Adam Klivans et al.

ICML 2025arXiv:2502.21278
11
citations
#4379

Visual-Instructed Degradation Diffusion for All-in-One Image Restoration

Haina Qin, Wenyang Luo, Zewen Chen et al.

CVPR 2025arXiv:2506.16960
11
citations
#4380

Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization

Luca Masserano, Abdul Fatir Ansari, Boran Han et al.

ICML 2025oralarXiv:2412.05244
11
citations
#4381

Activation Gradient based Poisoned Sample Detection Against Backdoor Attacks

Danni Yuan, Mingda Zhang, Shaokui Wei et al.

ICLR 2025arXiv:2312.06230
11
citations
#4382

On the Crucial Role of Initialization for Matrix Factorization

Bingcong Li, Liang Zhang, Aryan Mokhtari et al.

ICLR 2025arXiv:2410.18965
11
citations
#4383

MLPs Learn In-Context on Regression and Classification Tasks

William Tong, Cengiz Pehlevan

ICLR 2025arXiv:2405.15618
11
citations
#4384

Harnessing Massive Satellite Imagery with Efficient Masked Image Modeling

Fengxiang Wang, Hongzhen Wang, Di Wang et al.

ICCV 2025arXiv:2406.11933
11
citations
#4385

Attention layers provably solve single-location regression

Pierre Marion, Raphaël Berthier, Gérard Biau et al.

ICLR 2025arXiv:2410.01537
11
citations
#4386

SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language

zehan wang, Sashuai zhou, Shaoxuan He et al.

CVPR 2025
11
citations
#4387

Statistical Advantages of Perturbing Cosine Router in Mixture of Experts

Huy Nguyen, Pedram Akbarian Saravi, Trang Pham et al.

ICLR 2025arXiv:2405.14131
11
citations
#4388

Locality Alignment Improves Vision-Language Models

Ian Covert, Tony Sun, James Y Zou et al.

ICLR 2025arXiv:2410.11087
11
citations
#4389

Semantic and Sequential Alignment for Referring Video Object Segmentation

Feiyu Pan, Hao Fang, Fangkai Li et al.

CVPR 2025
11
citations
#4390

StyO: Stylize Your Face in Only One-Shot

Bonan Li, Zicheng Zhang, Xuecheng Nie et al.

AAAI 2025paperarXiv:2303.03231
11
citations
#4391

Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering

Youngsun Lim, Hojun Choi, Hyunjung Shim

AAAI 2025paperarXiv:2409.12784
11
citations
#4392

EventGPT: Event Stream Understanding with Multimodal Large Language Models

shaoyu liu, Jianing Li, guanghui zhao et al.

CVPR 2025arXiv:2412.00832
11
citations
#4393

Tri-Ergon: Fine-Grained Video-to-Audio Generation with Multi-Modal Conditions and LUFS Control

Bingliang Li, Fengyu Yang, Yuxin Mao et al.

AAAI 2025paperarXiv:2412.20378
11
citations
#4394

DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation

Zhiqiang Shen, Ammar Sherif, Zeyuan Yin et al.

CVPR 2025arXiv:2411.19946
11
citations
#4395

CSBrain: A Cross-scale Spatiotemporal Brain Foundation Model for EEG Decoding

Yuchen Zhou, Jiamin Wu, Zichen Ren et al.

NEURIPS 2025oralarXiv:2506.23075
11
citations
#4396

SuperDec: 3D Scene Decomposition with Superquadrics Primitives

Elisabetta Fedele, Boyang Sun, Francis Engelmann et al.

ICCV 2025arXiv:2504.00992
11
citations
#4397

Learning Robust Spectral Dynamics for Temporal Domain Generalization

En Yu, Jie Lu, Xiaoyu Yang et al.

NEURIPS 2025oralarXiv:2505.12585
11
citations
#4398

h4rm3l: A Language for Composable Jailbreak Attack Synthesis

Moussa Koulako Bala Doumbouya, Ananjan Nandi, Gabriel Poesia et al.

ICLR 2025arXiv:2408.04811
11
citations
#4399

KnowPO: Knowledge-Aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models

Ruizhe Zhang, Yongxin Xu, Yuzhen Xiao et al.

AAAI 2025paperarXiv:2408.03297
11
citations
#4400

MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation

Seyeon Kim, Siyoon Jin, Jihye Park et al.

AAAI 2025paperarXiv:2403.19144
11
citations