Most Cited 2025 "cascading phenomena" Papers

22,274 papers found • Page 22 of 112

Filters:Most Cited 2025 cascading phenomena Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

#4201

Flat-LoRA: Low-Rank Adaptation over a Flat Loss Landscape

Tao Li, Zhengbao He, Yujun Li et al.

ICML 2025arXiv:2409.14396

citations

#4202

FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers

Renshan Zhang, Rui Shao, Gongwei Chen et al.

ICCV 2025arXiv:2501.16297

citations

#4203

Data Synthesis with Diverse Styles for Face Recognition via 3DMM-Guided Diffusion

Yuxi Mi, Zhizhou Zhong, Yuge Huang et al.

CVPR 2025arXiv:2504.00430

citations

#4204

QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search

Zongyu Lin, Yao Tang, Xingcheng Yao et al.

ICML 2025arXiv:2502.02584

citations

#4205

ResCLIP: Residual Attention for Training-free Dense Vision-language Inference

Jinhong Deng, Yuhang Yang, Wen Li et al.

CVPR 2025arXiv:2411.15851

citations

#4206

Mosaic of Modalities: A Comprehensive Benchmark for Multimodal Graph Learning

Jing Zhu, Yuhang Zhou, Shengyi Qian et al.

CVPR 2025arXiv:2406.16321

citations

#4207

MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design

Haojie Duanmu, Xiuhong Li, Zhihang Yuan et al.

ICML 2025arXiv:2505.05799

citations

#4208

FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning

Gaojian Wang, Feng Lin, Tong Wu et al.

CVPR 2025arXiv:2412.12032

citations

#4209

metabench - A Sparse Benchmark of Reasoning and Knowledge in Large Language Models

Alex Kipnis, Konstantinos Voudouris, Luca Schulze Buschoff et al.

ICLR 2025arXiv:2407.12844

citations

#4210

UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence

Jie Feng, Shengyuan Wang, Tianhui Liu et al.

ICCV 2025arXiv:2506.23219

citations

#4211

BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models

Yige Li, Hanxun Huang, Yunhan Zhao et al.

NEURIPS 2025arXiv:2408.12798

citations

#4212

ROADWork: A Dataset and Benchmark for Learning to Recognize, Observe, Analyze and Drive Through Work Zones

Anurag Ghosh, Shen Zheng, Robert Tamburo et al.

ICCV 2025arXiv:2406.07661

citations

#4213

Detecting Backdoor Samples in Contrastive Language Image Pretraining

Hanxun Huang, Sarah Erfani, Yige Li et al.

ICLR 2025arXiv:2502.01385

citations

#4214

Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting

Suraj Anand, Michael Lepori, Jack Merullo et al.

ICLR 2025arXiv:2406.00053

citations

#4215

Temporal Fair Division

Benjamin Cookson, Soroush Ebadian, Nisarg Shah

AAAI 2025paperarXiv:2410.23416

citations

#4216

The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning

Youssef Allouah, Joshua Kazdan, Rachid Guerraoui et al.

ICLR 2025arXiv:2412.09119

citations

#4217

Differentiable Optimization of Similarity Scores Between Models and Brains

Nathan Cloos, Moufan Li, Markus Siegel et al.

ICLR 2025arXiv:2407.07059

citations

#4218

RelitLRM: Generative Relightable Radiance for Large Reconstruction Models

Tianyuan Zhang, Zhengfei Kuang, Haian Jin et al.

ICLR 2025arXiv:2410.06231

citations

#4219

Understanding Virtual Nodes: Oversquashing and Node Heterogeneity

Joshua Southern, Francesco Di Giovanni, Michael Bronstein et al.

ICLR 2025arXiv:2405.13526

citations

#4220

HyperFace: Generating Synthetic Face Recognition Datasets by Exploring Face Embedding Hypersphere

Hatef Otroshi Shahreza, Sébastien Marcel

ICLR 2025arXiv:2411.08470

citations

#4221

Multi-view Reconstruction via SfM-guided Monocular Depth Estimation

Haoyu Guo, He Zhu, Sida Peng et al.

CVPR 2025arXiv:2503.14483

citations

#4222

LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation models

Ziqi Lu, Heng Yang, Danfei Xu et al.

ICLR 2025arXiv:2412.07746

citations

#4223

Accurate and Regret-Aware Numerical Problem Solver for Tabular Question Answering

Yuxiang Wang, Jianzhong Qi, Junhao Gan

AAAI 2025paperarXiv:2410.12846

citations

#4224

DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution

Zheng Chen, Zichen Zou, Kewei Zhang et al.

NEURIPS 2025arXiv:2505.16239

citations

#4225

VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step

Hanyang Wang, Fangfu Liu, Jiawei Chi et al.

CVPR 2025highlightarXiv:2504.01956

citations

#4226

Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations

Yiyou Sun, Yu Gai, Lijie Chen et al.

NEURIPS 2025arXiv:2504.12691

citations

#4227

Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering

Federico Cocchi, Nicholas Moratelli, Marcella Cornia et al.

CVPR 2025arXiv:2411.16863

citations

#4228

Adversarial Robustness in Two-Stage Learning-to-Defer: Algorithms and Guarantees

Yannis Montreuil, Axel Carlier, Lai Xing Ng et al.

ICML 2025arXiv:2502.01027

citations

#4229

Understanding Model Calibration - A gentle introduction and visual exploration of calibration and the expected calibration error (ECE)

Maja Pavlovic

ICLR 2025arXiv:2501.19047

citations

#4230

MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections

Da Xiao, Qingye Meng, Shengping Li et al.

ICML 2025arXiv:2502.12170

citations

#4231

Reviving DSP for Advanced Theorem Proving in the Era of Reasoning Models

Chenrui Cao, Liangcheng Song, Zenan Li et al.

NEURIPS 2025arXiv:2506.11487

citations

#4232

R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference

Zhenyu Zhang, Zechun Liu, Yuandong Tian et al.

ICLR 2025arXiv:2504.19449

citations

#4233

Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs

Shuo Li, Tao Ji, Xiaoran Fan et al.

ICLR 2025arXiv:2410.11302

citations

#4234

Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts

Yun Wang, Longguang Wang, Chenghao Zhang et al.

ICCV 2025highlightarXiv:2507.04631

citations

#4235

RGBAvatar: Reduced Gaussian Blendshapes for Online Modeling of Head Avatars

Linzhou Li, Yumeng Li, Yanlin Weng et al.

CVPR 2025highlightarXiv:2503.12886

citations

#4236

Bridging the User-side Knowledge Gap in Knowledge-aware Recommendations with Large Language Models

Zheng Hu, Zhe Li, Ziyun Jiao et al.

AAAI 2025paperarXiv:2412.13544

citations

#4237

Flash-VStream: Efficient Real-Time Understanding for Long Video Streams

Haoji Zhang, Yiqin Wang, Yansong Tang et al.

ICCV 2025arXiv:2506.23825

citations

#4238

Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search

Jonathan Light, Min Cai, Weiqin Chen et al.

ICLR 2025arXiv:2408.10635

citations

#4239

MPDrive: Improving Spatial Understanding with Marker-Based Prompt Learning for Autonomous Driving

Zhi-Yuan Zhang, Xiaofan Li, Zhihao Xu et al.

CVPR 2025highlightarXiv:2504.00379

citations

#4240

Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation

Yongkang Li, Tianheng Cheng, Bin Feng et al.

CVPR 2025arXiv:2412.04533

citations

#4241

Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis

Weikai Li, Ding Wang, Zijian Ding et al.

AAAI 2025paperarXiv:2410.19225

citations

#4242

BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations

Weixi Feng, Chao Liu, Sifei Liu et al.

CVPR 2025arXiv:2501.07647

citations

#4243

Memory Mosaics

Jianyu Zhang, Niklas Nolte, Ranajoy Sadhukhan et al.

ICLR 2025arXiv:2405.06394

citations

#4244

Generation from Noisy Examples

Ananth Raman, Vinod Raman

ICML 2025arXiv:2501.04179

citations

#4245

How Expressive are Knowledge Graph Foundation Models?

Xingyue Huang, Pablo Barcelo, Michael Bronstein et al.

ICML 2025arXiv:2502.13339

citations

#4246

Reasoning as an Adaptive Defense for Safety

Taeyoun Kim, Fahim Tajwar, Aditi Raghunathan et al.

NEURIPS 2025arXiv:2507.00971

citations

#4247

SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction

Lu Dai, Yijie Xu, Jinhui Ye et al.

ICLR 2025arXiv:2503.01478

citations

#4248

3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model

Wenbo Hu, Yining Hong, Yanjun Wang et al.

NEURIPS 2025oralarXiv:2505.22657

citations

#4249

How do Transformers Learn Implicit Reasoning?

Jiaran Ye, Zijun Yao, Zhidian Huang et al.

NEURIPS 2025oralarXiv:2505.23653

citations

#4250

X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing

Xinyan Chen, Jianfei Yang

ICLR 2025arXiv:2410.10167

citations

#4251

Lifting Motion to the 3D World via 2D Diffusion

Jiaman Li, Karen Liu, Jiajun Wu

CVPR 2025highlightarXiv:2411.18808

citations

#4252

Efficiently Parameterized Neural Metriplectic Systems

Anthony Gruber, Kookjin Lee, Haksoo Lim et al.

ICLR 2025arXiv:2405.16305

citations

#4253

On Conformal Isometry of Grid Cells: Learning Distance-Preserving Position Embedding

Dehong Xu, Ruiqi Gao, Wenhao Zhang et al.

ICLR 2025arXiv:2405.16865

citations

#4254

Jailbreaking as a Reward Misspecification Problem

Zhihui Xie, Jiahui Gao, Lei Li et al.

ICLR 2025arXiv:2406.14393

citations

#4255

Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation

Shuo Wang, Yongcai Wang, Wanting Li et al.

NEURIPS 2025arXiv:2505.11886

citations

#4256

TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation

Mohan Xu, Kai Li, Guo Chen et al.

ICLR 2025oralarXiv:2410.01469

citations

#4257

Probing the Latent Hierarchical Structure of Data via Diffusion Models

Antonio Sclocchi, Alessandro Favero, Noam Levi et al.

ICLR 2025arXiv:2410.13770

citations

#4258

Video Summarization with Large Language Models

Min Jung Lee, Dayoung Gong, Minsu Cho

CVPR 2025arXiv:2504.11199

citations

#4259

TEASER: Token Enhanced Spatial Modeling for Expressions Reconstruction

Yunfei Liu, Lei Zhu, Lijian Lin et al.

ICLR 2025arXiv:2502.10982

citations

#4260

The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement

Ruihan Yang, Fanghua Ye, Jian Li et al.

NEURIPS 2025arXiv:2503.16024

citations

#4261

SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters

Jianping Jiang, Weiye Xiao, Zhengyu Lin et al.

CVPR 2025arXiv:2412.00174

citations

#4262

LiDAR-RT: Gaussian-based Ray Tracing for Dynamic LiDAR Re-simulation

Chenxu Zhou, Lvchang Fu, Sida Peng et al.

CVPR 2025arXiv:2412.15199

citations

#4263

SILO: Solving Inverse Problems with Latent Operators

Ron Raphaeli, Sean Man, Michael Elad

ICCV 2025arXiv:2501.11746

citations

#4264

Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs

Yingji Zhong, Zhihao Li, Dave Zhenyu Chen et al.

CVPR 2025highlightarXiv:2503.05082

citations

#4265

Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis

Hongkang Li, Songtao Lu, Pin-Yu Chen et al.

ICLR 2025arXiv:2410.02167

citations

#4266

Optimized Multi-Token Joint Decoding With Auxiliary Model for LLM Inference

Zongyue Qin, Ziniu Hu, Zifan He et al.

ICLR 2025arXiv:2407.09722

citations

#4267

Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization

Peiyan Zhang, Haibo Jin, Leyang Hu et al.

ICML 2025arXiv:2412.03092

citations

#4268

CASE-Bench: Context-Aware SafEty Benchmark for Large Language Models

Guangzhi Sun, Xiao Zhan, Shutong Feng et al.

ICML 2025arXiv:2501.14940

citations

#4269

Hierarchical Fine-grained Preference Optimization for Physically Plausible Video Generation

Harold Haodong Chen, Haojian Huang, Qifeng Chen et al.

NEURIPS 2025oralarXiv:2508.10858

citations

#4270

Open-World Amodal Appearance Completion

Jiayang Ao, Yanbei Jiang, Qiuhong Ke et al.

CVPR 2025arXiv:2411.13019

citations

#4271

The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?

Zhenheng Tang, Xiang Liu, Qian Wang et al.

ICLR 2025arXiv:2502.17535

citations

#4272

PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation

Chen Wang, Chuhao Chen, Yiming Huang et al.

NEURIPS 2025oralarXiv:2509.20358

citations

#4273

Dense Policy: Bidirectional Autoregressive Learning of Actions

Yue Su, Xinyu Zhan, Hongjie Fang et al.

ICCV 2025arXiv:2503.13217

citations

#4274

Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance

Dimitris Oikonomou, Nicolas Loizou

ICLR 2025arXiv:2406.04142

citations

#4275

How Much Can We Forget about Data Contamination?

Sebastian Bordt, Suraj Srinivas, Valentyn Boreiko et al.

ICML 2025arXiv:2410.03249

citations

#4276

PhyMPGN: Physics-encoded Message Passing Graph Network for spatiotemporal PDE systems

Bocheng Zeng, Qi Wang, Mengtao Yan et al.

ICLR 2025oralarXiv:2410.01337

citations

#4277

SMamba: Sparse Mamba for Event-based Object Detection

Nan Yang, Yang Wang, Zhanwen Liu et al.

AAAI 2025paperarXiv:2501.11971

citations

#4278

Anyprefer: An Agentic Framework for Preference Data Synthesis

Yiyang Zhou, Zhaoyang Wang, Tianle Wang et al.

ICLR 2025arXiv:2504.19276

citations

#4279

Learning Flow Fields in Attention for Controllable Person Image Generation

Zijian Zhou, Shikun Liu, Xiao Han et al.

CVPR 2025arXiv:2412.08486

citations

#4280

Glad: A Streaming Scene Generator for Autonomous Driving

Bin Xie, Yingfei Liu, Tiancai Wang et al.

ICLR 2025oralarXiv:2503.00045

citations

#4281

Analysis of Linear Mode Connectivity via Permutation-Based Weight Matching: With Insights into Other Permutation Search Methods

Akira Ito, Masanori Yamada, Atsutoshi Kumagai

ICLR 2025arXiv:2402.04051

citations

#4282

Is this Generated Person Existed in Real-world? Fine-grained Detecting and Calibrating Abnormal Human-body

Zeqing Wang, Qingyang Ma, Wentao Wan et al.

CVPR 2025highlightarXiv:2411.14205

citations

#4283

Beyond Graphs: Can Large Language Models Comprehend Hypergraphs?

Yifan Feng, Chengwu Yang, Xingliang Hou et al.

ICLR 2025arXiv:2410.10083

citations

#4284

Bayesian Test-Time Adaptation for Vision-Language Models

Lihua Zhou, Mao Ye, Shuaifeng Li et al.

CVPR 2025arXiv:2503.09248

citations

#4285

Rethinking Invariance in In-context Learning

Lizhe Fang, Yifei Wang, Khashayar Gatmiry et al.

ICLR 2025arXiv:2505.04994

citations

#4286

Improving Your Model Ranking on Chatbot Arena by Vote Rigging

Rui Min, Tianyu Pang, Chao Du et al.

ICML 2025arXiv:2501.17858

citations

#4287

Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late In Training

Zhanpeng Zhou, Mingze Wang, Yuchen Mao et al.

ICLR 2025arXiv:2410.10373

citations

#4288

nnWNet: Rethinking the Use of Transformers in Biomedical Image Segmentation and Calling for a Unified Evaluation Benchmark

Yanfeng Zhou, Lingrui Li, Le Lu et al.

CVPR 2025

citations

#4289

Contrastive Flow Matching

George Stoica, Vivek Ramanujan, Xiang Fan et al.

ICCV 2025arXiv:2506.05350

citations

#4290

The Computational Complexity of Circuit Discovery for Inner Interpretability

Federico Adolfi, Martina G. Vilas, Todd Wareham

ICLR 2025arXiv:2410.08025

citations

#4291

Attention-Driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models Without Fine-Tuning

Hai-Ming Xu, Qi Chen, Lei Wang et al.

AAAI 2025paperarXiv:2412.10840

citations

#4292

COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting

Jiaxin Zhang, Junjun Jiang, Youyu Chen et al.

CVPR 2025arXiv:2503.19443

citations

#4293

VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text

Tianyu Zhang, Suyuchen Wang, Lu Li et al.

ICLR 2025arXiv:2406.06462

citations

#4294

I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models

Zhenxing Mi, Kuan-Chieh Wang, Guocheng Qian et al.

ICML 2025arXiv:2502.10458

citations

#4295

CellFlux: Simulating Cellular Morphology Changes via Flow Matching

Yuhui Zhang, Yuchang Su, Chenyu Wang et al.

ICML 2025arXiv:2502.09775

citations

#4296

6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering

Zhongpai Gao, Benjamin Planche, Meng Zheng et al.

ICLR 2025arXiv:2410.04974

citations

#4297

Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning

Jiwon Song, Dongwon Jo, Yulhwa Kim et al.

NEURIPS 2025arXiv:2505.13866

citations

#4298

ViSAGe: Video-to-Spatial Audio Generation

Jaeyeon Kim, Heeseung Yun, Gunhee Kim

ICLR 2025oralarXiv:2506.12199

citations

#4299

Revisiting Energy Based Models as Policies: Ranking Noise Contrastive Estimation and Interpolating Energy Models

Sumeet Singh, Vikas Sindhwani, Stephen Tu

ICLR 2025arXiv:2309.05803

citations

#4300

CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization

Feize Wu, Yun Pang, Junyi Zhang et al.

AAAI 2025paperarXiv:2408.15914

citations

#4301

Layered Image Vectorization via Semantic Simplification

Zhenyu Wang, Jianxi Huang, Zhida Sun et al.

CVPR 2025arXiv:2406.05404

citations

#4302

Task-Agnostic Guided Feature Expansion for Class-Incremental Learning

Bowen Zheng, Da-Wei Zhou, Han-Jia Ye et al.

CVPR 2025arXiv:2503.00823

citations

#4303

Identifying and Mitigating Position Bias of Multi-image Vision-Language Models

Xinyu Tian, Shu Zou, Zhaoyuan Yang et al.

CVPR 2025arXiv:2503.13792

citations

#4304

Neural Sampling from Boltzmann Densities: Fisher-Rao Curves in the Wasserstein Geometry

Jannis Chemseddine, Christian Wald, Richard Duong et al.

ICLR 2025arXiv:2410.03282

citations

#4305

GraphGPT: Generative Pre-trained Graph Eulerian Transformer

Qifang Zhao, Weidong Ren, Tianyu Li et al.

ICML 2025arXiv:2401.00529

citations

#4306

Improving the Scaling Laws of Synthetic Data with Deliberate Practice

Reyhane Askari Hemmat, Mohammad Pezeshki, Elvis Dohmatob et al.

ICML 2025oralarXiv:2502.15588

citations

#4307

DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation

Jing He, Haodong Li, huyongzhe et al.

ICLR 2025arXiv:2410.02067

citations

#4308

Rectifying Magnitude Neglect in Linear Attention

Qihang Fan, Huaibo Huang, Yuang Ai et al.

ICCV 2025highlightarXiv:2507.00698

citations

#4309

Rectified Diffusion Guidance for Conditional Generation

Mengfei Xia, Nan Xue, Yujun Shen et al.

CVPR 2025arXiv:2410.18737

citations

#4310

Scaling Trends in Language Model Robustness

Nikolaus Howe, Ian McKenzie, Oskar Hollinsworth et al.

ICML 2025spotlightarXiv:2407.18213

citations

#4311

Can Transformers Reason Logically? A Study in SAT Solving

Leyan Pan, Vijay Ganesh, Jacob Abernethy et al.

ICML 2025arXiv:2410.07432

citations

#4312

UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning

Xiangyu Wang, Donglin Yang, Yue Liao et al.

NEURIPS 2025arXiv:2505.15725

citations

#4313

XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning

Alexander Nikulin, Ilya Zisman, Alexey Zemtsov et al.

ICLR 2025arXiv:2406.08973

citations

#4314

RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction

Peng Liu, Dongyang Dai, Zhiyong Wu

ICLR 2025arXiv:2403.05010

citations

#4315

TopoTune: A Framework for Generalized Combinatorial Complex Neural Networks

Mathilde Papillon, Guillermo Bernardez, Claudio Battiloro et al.

ICML 2025arXiv:2410.06530

citations

#4316

GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior

Penghao Wu, Shengnan Ma, Bo Wang et al.

NEURIPS 2025arXiv:2506.08012

citations

#4317

RiOSWorld: Benchmarking the Risk of Multimodal Computer-Use Agents

Jingyi Yang, Shuai Shao, Dongrui Liu et al.

NEURIPS 2025arXiv:2506.00618

citations

#4318

Towards Precise Scaling Laws for Video Diffusion Transformers

Yuanyang Yin, Yaqi Zhao, Mingwu Zheng et al.

CVPR 2025arXiv:2411.17470

citations

#4319

TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation

Gihyun Kwon, Jong Chul YE

ICLR 2025arXiv:2410.05591

citations

#4320

Reward-Instruct: A Reward-Centric Approach to Fast Photo-Realistic Image Generation

Yihong Luo, Tianyang Hu, Weijian Luo et al.

NEURIPS 2025arXiv:2503.13070

citations

#4321

SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent

Yandan Yang, Baoxiong Jia, Shujie Zhang et al.

NEURIPS 2025arXiv:2509.20414

citations

#4322

Point2RBox-v2: Rethinking Point-supervised Oriented Object Detection with Spatial Layout Among Instances

Yi Yu, Botao Ren, Peiyuan Zhang et al.

CVPR 2025arXiv:2502.04268

citations

#4323

Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment

Cheryl Li, Tianyuan Xu, Yiwen Guo

ICML 2025arXiv:2502.07803

citations

#4324

VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation

Wenhao Wang, Yi Yang

NEURIPS 2025arXiv:2503.01739

citations

#4325

SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding

Yangliu Hu, Zikai Song, Na Feng et al.

CVPR 2025arXiv:2504.07745

citations

#4326

MP-GUI: Modality Perception with MLLMs for GUI Understanding

Ziwei Wang, Weizhi Chen, Leyang Yang et al.

CVPR 2025arXiv:2503.14021

citations

#4327

Hierarchical Equivariant Policy via Frame Transfer

Haibo Zhao, Dian Wang, Yizhe Zhu et al.

ICML 2025arXiv:2502.05728

citations

#4328

Random-Set Neural Networks

Shireen Kudukkil Manchingal, Muhammad Mubashar, Kaizheng Wang et al.

ICLR 2025arXiv:2307.05772

citations

#4329

Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis

Bingda Tang, Sayak Paul, Boyang Zheng et al.

CVPR 2025arXiv:2505.10046

citations

#4330

Adaptive Self-improvement LLM Agentic System for ML Library Development

Genghan Zhang, Weixin Liang, Olivia Hsu et al.

ICML 2025arXiv:2502.02534

citations

#4331

QERA: an Analytical Framework for Quantization Error Reconstruction

Cheng Zhang, Jeffrey T. H. Wong, Can Xiao et al.

ICLR 2025arXiv:2410.06040

citations

#4332

Lossy Compression with Pretrained Diffusion Models

jeremy vonderfecht, Feng Liu

ICLR 2025arXiv:2501.09815

citations

#4333

DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers

Xuanlei Zhao, Shenggan Cheng, Chang Chen et al.

ICML 2025arXiv:2403.10266

citations

#4334

Don’t Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models

Sohyun An, Ruochen Wang, Tianyi Zhou et al.

NEURIPS 2025

citations

#4335

ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding

Zhenxing Zhang, Yaxiong Wang, Lechao Cheng et al.

CVPR 2025arXiv:2412.12718

citations

#4336

Using Diffusion Priors for Video Amodal Segmentation

Kaihua Chen, Deva Ramanan, Tarasha Khurana

CVPR 2025arXiv:2412.04623

citations

#4337

Puzzle: Distillation-Based NAS for Inference-Optimized LLMs

Akhiad Bercovich, Tomer Ronen, Talor Abramovich et al.

ICML 2025arXiv:2411.19146

citations

#4338

Retrieval-Augmented Perception: High-resolution Image Perception Meets Visual RAG

Wenbin Wang, Yongcheng Jing, Liang Ding et al.

ICML 2025oralarXiv:2503.01222

citations

#4339

CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance

Yongchao Chen, Yilun Hao, Yueying Liu et al.

ICML 2025arXiv:2502.04350

citations

#4340

MAPLE: Many-Shot Adaptive Pseudo-Labeling for In-Context Learning

Zihan Chen, Song Wang, Zhen Tan et al.

ICML 2025arXiv:2505.16225

citations

#4341

Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles

Buu Phan, Brandon Amos, Itai Gat et al.

ICLR 2025arXiv:2410.09303

citations

#4342

Dual Consolidation for Pre-Trained Model-Based Domain-Incremental Learning

Da-Wei Zhou, Zi-Wen Cai, Han-Jia Ye et al.

CVPR 2025arXiv:2410.00911

citations

#4343

DOTA: Distributional Test-time Adaptation of Vision-Language Models

Zongbo Han, Jialong Yang, Guangyu Wang et al.

NEURIPS 2025arXiv:2409.19375

citations

#4344

TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning

Ge Li, Dong Tian, Hongyi Zhou et al.

ICLR 2025oralarXiv:2410.09536

citations

#4345

ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models

Yeji Park, Deokyeong Lee, Junsuk Choe et al.

AAAI 2025paperarXiv:2408.13906

citations

#4346

HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator

Fan Yang, Ru Zhen, Jianing Wang et al.

CVPR 2025arXiv:2411.17261

citations

#4347

MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios

Jiacheng Ruan, Wenzhen Yuan, Zehao Lin et al.

AAAI 2025paperarXiv:2409.16084

citations

#4348

EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge

Ruskin Raj Manku, Yuzhi Tang, Xingjian Shi et al.

NEURIPS 2025arXiv:2505.23009

citations

#4349

xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference

Maximilian Beck, Korbinian Pöppel, Phillip Lippe et al.

ICML 2025arXiv:2503.13427

citations

#4350

TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting

Jianchuan Chen, Jingchuan Hu, Gaige Wang et al.

CVPR 2025highlightarXiv:2503.17032

citations

#4351

BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing

Yunqi Gu, Ian Huang, Jihyeon Je et al.

CVPR 2025highlightarXiv:2504.01786

citations

#4352

FEAT: Free energy Estimators with Adaptive Transport

Yuanqi Du, Jiajun He, Francisco Vargas et al.

NEURIPS 2025arXiv:2504.11516

citations

#4353

3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding

Tatiana Zemskova, Dmitry Yudin

ICCV 2025arXiv:2412.18450

citations

#4354

Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models

Zeman Li, Xinwei Zhang, Peilin Zhong et al.

ICLR 2025arXiv:2410.06441

citations

#4355

Science-T2I: Addressing Scientific Illusions in Image Synthesis

Jialuo Li, Wenhao Chai, XINGYU FU et al.

CVPR 2025arXiv:2504.13129

citations

#4356

Privacy-Preserving Low-Rank Adaptation Against Membership Inference Attacks for Latent Diffusion Models

Zihao Luo, Xilie Xu, Feng Liu et al.

AAAI 2025paperarXiv:2402.11989

citations

#4357

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

Chejian Xu, Jiawei Zhang, Zhaorun Chen et al.

ICLR 2025arXiv:2503.14827

citations

#4358

Sports-Traj: A Unified Trajectory Generation Model for Multi-Agent Movement in Sports

Yi Xu, Yun Fu

ICLR 2025oralarXiv:2405.17680

citations

#4359

Open-World Reinforcement Learning over Long Short-Term Imagination

Jiajian Li, Qi Wang, Yunbo Wang et al.

ICLR 2025arXiv:2410.03618

citations

#4360

Diff-Shadow: Global-guided Diffusion Model for Shadow Removal

Jinting Luo, Ru Li, Chengzhi Jiang et al.

AAAI 2025paperarXiv:2407.16214

citations

#4361

Topological Blindspots: Understanding and Extending Topological Deep Learning Through the Lens of Expressivity

Yam Eitan, Yoav Gelberg, Guy Bar-Shalom et al.

ICLR 2025arXiv:2408.05486

citations

#4362

Fast training and sampling of Restricted Boltzmann Machines

Nicolas BEREUX, Aurélien Decelle, Cyril Furtlehner et al.

ICLR 2025arXiv:2405.15376

citations

#4363

ExpertAF: Expert Actionable Feedback from Video

Kumar Ashutosh, Tushar Nagarajan, Georgios Pavlakos et al.

CVPR 2025arXiv:2408.00672

citations

#4364

A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks

Thomas Schmied, Thomas Adler, Vihang Patil et al.

ICML 2025arXiv:2410.22391

citations

#4365

One Node One Model: Featuring the Missing-Half for Graph Clustering

Xuanting Xie, Bingheng Li, Erlin Pan et al.

AAAI 2025paperarXiv:2412.09902

citations

#4366

VladVA: Discriminative Fine-tuning of LVLMs

Yassine Ouali, Adrian Bulat, ALEXANDROS XENOS et al.

CVPR 2025arXiv:2412.04378

citations

#4367

Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences

Nikos Dimitriadis, Pascal Frossard, François Fleuret

ICLR 2025arXiv:2407.08056

citations

#4368

PLeaS - Merging Models with Permutations and Least Squares

Anshul Nasery, Jonathan Hayase, Pang Wei Koh et al.

CVPR 2025arXiv:2407.02447

citations

#4369

ConTextTab: A Semantics-Aware Tabular In-Context Learner

Marco Spinaci, Marek Polewczyk, Maximilian Schambach et al.

NEURIPS 2025spotlightarXiv:2506.10707

citations

#4370

SRSA: Skill Retrieval and Adaptation for Robotic Assembly Tasks

Yijie Guo, Bingjie Tang, Iretiayo Akinola et al.

ICLR 2025arXiv:2503.04538

citations

#4371

ArtFormer: Controllable Generation of Diverse 3D Articulated Objects

Jiayi Su, Youhe Feng, Zheng Li et al.

CVPR 2025arXiv:2412.07237

citations

#4372

Whole-Body Conditioned Egocentric Video Prediction

Yutong Bai, Danny Tran, Amir Bar et al.

NEURIPS 2025arXiv:2506.21552

citations

#4373

IgGM: A Generative Model for Functional Antibody and Nanobody Design

Rubo Wang, Fandi Wu, Xingyu Gao et al.

ICLR 2025

citations

#4374

Locality in Image Diffusion Models Emerges from Data Statistics

Artem Lukoianov, Chenyang Yuan, Justin Solomon et al.

NEURIPS 2025spotlightarXiv:2509.09672

citations

#4375

Scaling Embedding Layers in Language Models

Da Yu, Edith Cohen, Badih Ghazi et al.

NEURIPS 2025arXiv:2502.01637

citations

#4376

MambaPEFT: Exploring Parameter-Efficient Fine-Tuning for Mamba

Masakazu Yoshimura, Teruaki Hayashi, Yota Maeda

ICLR 2025arXiv:2411.03855

citations

#4377

TopoCellGen: Generating Histopathology Cell Topology with a Diffusion Model

Meilong Xu, Saumya Gupta, Xiaoling Hu et al.

CVPR 2025arXiv:2412.06011

citations

#4378

Does Generation Require Memorization? Creative Diffusion Models using Ambient Diffusion

Kulin Shah, Alkis Kalavasis, Adam Klivans et al.

ICML 2025arXiv:2502.21278

citations

#4379

Visual-Instructed Degradation Diffusion for All-in-One Image Restoration

Haina Qin, Wenyang Luo, Zewen Chen et al.

CVPR 2025arXiv:2506.16960

citations

#4380

Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization

Luca Masserano, Abdul Fatir Ansari, Boran Han et al.

ICML 2025oralarXiv:2412.05244

citations

#4381

Activation Gradient based Poisoned Sample Detection Against Backdoor Attacks

Danni Yuan, Mingda Zhang, Shaokui Wei et al.

ICLR 2025arXiv:2312.06230

citations

#4382

On the Crucial Role of Initialization for Matrix Factorization

Bingcong Li, Liang Zhang, Aryan Mokhtari et al.

ICLR 2025arXiv:2410.18965

citations

#4383

MLPs Learn In-Context on Regression and Classification Tasks

William Tong, Cengiz Pehlevan

ICLR 2025arXiv:2405.15618

citations

#4384

Harnessing Massive Satellite Imagery with Efficient Masked Image Modeling

Fengxiang Wang, Hongzhen Wang, Di Wang et al.

ICCV 2025arXiv:2406.11933

citations

#4385

Attention layers provably solve single-location regression

Pierre Marion, Raphaël Berthier, Gérard Biau et al.

ICLR 2025arXiv:2410.01537

citations

#4386

SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language

zehan wang, Sashuai zhou, Shaoxuan He et al.

CVPR 2025

citations

#4387

Statistical Advantages of Perturbing Cosine Router in Mixture of Experts

Huy Nguyen, Pedram Akbarian Saravi, Trang Pham et al.

ICLR 2025arXiv:2405.14131

citations

#4388

Locality Alignment Improves Vision-Language Models

Ian Covert, Tony Sun, James Y Zou et al.

ICLR 2025arXiv:2410.11087

citations

#4389

Semantic and Sequential Alignment for Referring Video Object Segmentation

Feiyu Pan, Hao Fang, Fangkai Li et al.

CVPR 2025

citations

#4390

StyO: Stylize Your Face in Only One-Shot

Bonan Li, Zicheng Zhang, Xuecheng Nie et al.

AAAI 2025paperarXiv:2303.03231

citations

#4391

Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering

Youngsun Lim, Hojun Choi, Hyunjung Shim

AAAI 2025paperarXiv:2409.12784

citations

#4392

EventGPT: Event Stream Understanding with Multimodal Large Language Models

shaoyu liu, Jianing Li, guanghui zhao et al.

CVPR 2025arXiv:2412.00832

citations

#4393

Tri-Ergon: Fine-Grained Video-to-Audio Generation with Multi-Modal Conditions and LUFS Control

Bingliang Li, Fengyu Yang, Yuxin Mao et al.

AAAI 2025paperarXiv:2412.20378

citations

#4394

DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation

Zhiqiang Shen, Ammar Sherif, Zeyuan Yin et al.

CVPR 2025arXiv:2411.19946

citations

#4395

CSBrain: A Cross-scale Spatiotemporal Brain Foundation Model for EEG Decoding

Yuchen Zhou, Jiamin Wu, Zichen Ren et al.

NEURIPS 2025oralarXiv:2506.23075

citations

#4396

SuperDec: 3D Scene Decomposition with Superquadrics Primitives

Elisabetta Fedele, Boyang Sun, Francis Engelmann et al.

ICCV 2025arXiv:2504.00992

citations

#4397

Learning Robust Spectral Dynamics for Temporal Domain Generalization

En Yu, Jie Lu, Xiaoyu Yang et al.

NEURIPS 2025oralarXiv:2505.12585

citations

#4398

h4rm3l: A Language for Composable Jailbreak Attack Synthesis

Moussa Koulako Bala Doumbouya, Ananjan Nandi, Gabriel Poesia et al.

ICLR 2025arXiv:2408.04811

citations

#4399

KnowPO: Knowledge-Aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models

Ruizhe Zhang, Yongxin Xu, Yuzhen Xiao et al.

AAAI 2025paperarXiv:2408.03297

citations

#4400

MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation

Seyeon Kim, Siyoon Jin, Jihye Park et al.

AAAI 2025paperarXiv:2403.19144

citations

← Previous

1...20 21 22 23 24...112