Most Cited 2025 "stereo image super-resolution" Papers

22,274 papers found • Page 12 of 112

#2201

Learning Distributions of Complex Fluid Simulations with Diffusion Graph Networks

Mario Lino, Tobias Pfaff, Nils Thuerey

ICLR 2025arXiv:2504.02843
21
citations
#2202

Argumentative Large Language Models for Explainable and Contestable Claim Verification

Gabriel Freedman, Adam Dejl, Deniz Gorur et al.

AAAI 2025paperarXiv:2405.02079
21
citations
#2203

REFINE: Inversion-Free Backdoor Defense via Model Reprogramming

Yukun Chen, Shuo Shao, Enhao Huang et al.

ICLR 2025arXiv:2502.18508
21
citations
#2204

MMAD: Multi-label Micro-Action Detection in Videos

Kun Li, pengyu Liu, Dan Guo et al.

ICCV 2025arXiv:2407.05311
21
citations
#2205

Model Merging in Pre-training of Large Language Models

Yunshui Li, Yiyuan Ma, Shen Yan et al.

NEURIPS 2025arXiv:2505.12082
21
citations
#2206

Light-A-Video: Training-free Video Relighting via Progressive Light Fusion

Yujie Zhou, Jiazi Bu, Pengyang Ling et al.

ICCV 2025arXiv:2502.08590
21
citations
#2207

SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device

Yushu Wu, Zhixing Zhang, Yanyu Li et al.

CVPR 2025arXiv:2412.10494
21
citations
#2208

3DMambaIPF: A State Space Model for Iterative Point Cloud Filtering via Differentiable Rendering

Qingyuan Zhou, Weidong Yang, Ben Fei et al.

AAAI 2025paperarXiv:2404.05522
21
citations
#2209

Generative Trajectory Stitching through Diffusion Composition

Yunhao Luo, Utkarsh Mishra, Yilun Du et al.

NEURIPS 2025spotlightarXiv:2503.05153
21
citations
#2210

MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders

jiajun cao, Yuan Zhang, Tao Huang et al.

CVPR 2025arXiv:2501.01709
21
citations
#2211

DataGen: Unified Synthetic Dataset Generation via Large Language Models

Yue Huang, Siyuan Wu, Chujie Gao et al.

ICLR 2025arXiv:2406.18966
21
citations
#2212

DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation

Tianyi Yan, Dongming Wu, Wencheng Han et al.

CVPR 2025arXiv:2411.11252
21
citations
#2213

SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video

Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello et al.

CVPR 2025arXiv:2412.09982
21
citations
#2214

TimeBridge: Non-Stationarity Matters for Long-term Time Series Forecasting

Peiyuan Liu, Beiliang Wu, Yifan Hu et al.

ICML 2025arXiv:2410.04442
21
citations
#2215

VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers

Yating Wang, Haoyi Zhu, Mingyu Liu et al.

ICCV 2025arXiv:2507.01016
21
citations
#2216

Mixture of In-Context Prompters for Tabular PFNs

Derek Xu, Olcay Cirit, Reza Asadi et al.

ICLR 2025arXiv:2405.16156
21
citations
#2217

OccMamba: Semantic Occupancy Prediction with State Space Models

Heng Li, Yuenan Hou, Xiaohan Xing et al.

CVPR 2025arXiv:2408.09859
21
citations
#2218

Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression

Jingcun Wang, Yu-Guang Chen, Ing-Chao Lin et al.

ICLR 2025arXiv:2410.03765
21
citations
#2219

Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation

Chengyang Ye, Yunzhi Zhuge, Pingping Zhang

AAAI 2025paperarXiv:2412.19492
21
citations
#2220

TetSphere Splatting: Representing High-Quality Geometry with Lagrangian Volumetric Meshes

Minghao Guo, Bohan Wang, Kaiming He et al.

ICLR 2025arXiv:2405.20283
21
citations
#2221

Generative Image Layer Decomposition with Visual Effects

Jinrui Yang, Qing Liu, Yijun Li et al.

CVPR 2025arXiv:2411.17864
21
citations
#2222

V2XPnP: Vehicle-to-Everything Spatio-Temporal Fusion for Multi-Agent Perception and Prediction

Zewei Zhou, Hao Xiang, Zhaoliang Zheng et al.

ICCV 2025arXiv:2412.01812
21
citations
#2223

StoryGPT-V: Large Language Models as Consistent Story Visualizers

Xiaoqian Shen, Mohamed Elhoseiny

CVPR 2025arXiv:2312.02252
21
citations
#2224

CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs

Jinlan Fu, Shenzhen Huangfu, Hao Fei et al.

ICLR 2025arXiv:2501.16629
21
citations
#2225

Improving LLM Safety Alignment with Dual-Objective Optimization

Xuandong Zhao, Will Cai, Tianneng Shi et al.

ICML 2025arXiv:2503.03710
21
citations
#2226

Mixture of Noise for Pre-Trained Model-Based Class-Incremental Learning

Kai Jiang, Zhengyan Shi, Dell Zhang et al.

NEURIPS 2025arXiv:2509.16738
21
citations
#2227

SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations

Zhaorun Chen, Francesco Pinto, Minzhou Pan et al.

ICLR 2025arXiv:2412.06878
21
citations
#2228

Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks

Michael Matthews, Michael Beukman, Chris Lu et al.

ICLR 2025arXiv:2410.23208
21
citations
#2229

LSNet: See Large, Focus Small

Ao Wang, Hui Chen, Zijia Lin et al.

CVPR 2025arXiv:2503.23135
21
citations
#2230

Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead

Rickard Gabrielsson, Jiacheng Zhu, Onkar Bhardwaj et al.

ICML 2025arXiv:2407.00066
21
citations
#2231

OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentation

Ding Zhong, Xu Zheng, Chenfei Liao et al.

ICCV 2025highlightarXiv:2503.07098
21
citations
#2232

Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation

Kunpeng Qiu, Zhiqiang Gao, Zhiying Zhou et al.

CVPR 2025arXiv:2505.06068
21
citations
#2233

Decision Theoretic Foundations for Conformal Prediction: Optimal Uncertainty Quantification for Risk-Averse Agents

Shayan Kiyani, George Pappas, Aaron Roth et al.

ICML 2025spotlightarXiv:2502.02561
21
citations
#2234

Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models

Ma Teng, Xiaojun Jia, Ranjie Duan et al.

ICCV 2025arXiv:2412.05934
21
citations
#2235

Do Language Models Use Their Depth Efficiently?

Róbert Csordás, Christopher D Manning, Chris Potts

NEURIPS 2025arXiv:2505.13898
21
citations
#2236

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

Weixian Lei, Jiacong Wang, Haochen Wang et al.

ICCV 2025highlightarXiv:2504.10462
21
citations
#2237

MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

Wei Cheng, Juncheng Mu, Xianfang Zeng et al.

CVPR 2025arXiv:2411.02336
21
citations
#2238

Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks

Shengbin Yue, Siyuan Wang, Wei Chen et al.

AAAI 2025paperarXiv:2407.09893
21
citations
#2239

In Search of Forgotten Domain Generalization

Prasanna Mayilvahanan, Roland Zimmermann, Thaddäus Wiedemer et al.

ICLR 2025arXiv:2410.08258
21
citations
#2240

Reflective Gaussian Splatting

Yuxuan Yao, Zixuan Zeng, Chun Gu et al.

ICLR 2025arXiv:2412.19282
21
citations
#2241

Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration

Mark Endo, Xiaohan Wang, Serena Yeung-Levy

ICCV 2025arXiv:2412.13180
21
citations
#2242

Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning

Weitai Kang, Haifeng Huang, Yuzhang Shang et al.

ICCV 2025arXiv:2410.00255
21
citations
#2243

Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient

Zigeng Chen, Xinyin Ma, Gongfan Fang et al.

CVPR 2025arXiv:2411.17787
21
citations
#2244

On the Benefits of Memory for Modeling Time-Dependent PDEs

Ricardo Buitrago Ruiz, Tanya Marwah, Albert Gu et al.

ICLR 2025arXiv:2409.02313
21
citations
#2245

Is Sarcasm Detection a Step-by-Step Reasoning Process in Large Language Models?

Ben Yao, Yazhou Zhang, Qiuchi Li et al.

AAAI 2025paperarXiv:2407.12725
21
citations
#2246

ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance

Shuwei Shi, Wenbo Li, Yuechen Zhang et al.

AAAI 2025paperarXiv:2406.16476
21
citations
#2247

Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models

Ce Zhang, Zifu Wan, Zhehan Kan et al.

ICLR 2025arXiv:2502.06130
21
citations
#2248

Self-Challenging Language Model Agents

Yifei Zhou, Sergey Levine, Jason Weston et al.

NEURIPS 2025arXiv:2506.01716
21
citations
#2249

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

Tonghe Zhang, Chao Yu, Sichang Su et al.

NEURIPS 2025arXiv:2505.22094
21
citations
#2250

SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis

Hyojun Go, byeongjun park, Jiho Jang et al.

CVPR 2025arXiv:2411.16443
21
citations
#2251

Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

Xinyu Yang, Yuwei An, Hongyi Liu et al.

NEURIPS 2025spotlightarXiv:2506.09991
21
citations
#2252

EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition

Issar Tzachor, Boaz Lerner, Matan Levy et al.

ICLR 2025arXiv:2405.18065
21
citations
#2253

Training on the Benchmark Is Not All You Need

Shiwen Ni, Xiangtao Kong, Chengming Li et al.

AAAI 2025paperarXiv:2409.01790
21
citations
#2254

Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation

CHUANQI CHENG, Jian Guan, Wei Wu et al.

ICML 2025oralarXiv:2504.02438
21
citations
#2255

MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents

Kaijie Zhu, Xianjun Yang, Jindong Wang et al.

ICML 2025arXiv:2502.05174
21
citations
#2256

OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?

Junjielong Xu, Qinan Zhang, Zhiqing Zhong et al.

ICLR 2025
21
citations
#2257

BaxBench: Can LLMs Generate Correct and Secure Backends?

Mark Vero, Niels Mündler, Viktor Chibotaru et al.

ICML 2025spotlightarXiv:2502.11844
21
citations
#2258

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

Zhong-Yu Li, Ruoyi Du, Juncheng Yan et al.

ICCV 2025arXiv:2504.07960
21
citations
#2259

How do language models learn facts? Dynamics, curricula and hallucinations

Nicolas Zucchet, Jorg Bornschein, Stephanie C.Y. Chan et al.

COLM 2025paper
21
citations
#2260

Causal Concept Graph Models: Beyond Causal Opacity in Deep Learning

Gabriele Dominici, Pietro Barbiero, Mateo Espinosa Zarlenga et al.

ICLR 2025arXiv:2405.16507
21
citations
#2261

ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification

Hyunseok Lee, Seunghyuk Oh, Jaehyung Kim et al.

ICML 2025arXiv:2502.14565
21
citations
#2262

Monte Carlo Tree Diffusion for System 2 Planning

Jaesik Yoon, Hyeonseo Cho, Doojin Baek et al.

ICML 2025spotlightarXiv:2502.07202
21
citations
#2263

Personality Alignment of Large Language Models

Minjun Zhu, Yixuan Weng, Linyi Yang et al.

ICLR 2025oralarXiv:2408.11779
21
citations
#2264

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices

Xudong LU, Yinghao Chen, chencheng Chen et al.

CVPR 2025arXiv:2411.10640
21
citations
#2265

Rethinking External Slow-Thinking: From Snowball Errors to Probability of Correct Reasoning

Zeyu Gan, Yun Liao, Yong Liu

ICML 2025arXiv:2501.15602
21
citations
#2266

Pruning Large Language Models with Semi-Structural Adaptive Sparse Training

Weiyu Huang, Yuezhou Hu, Guohao Jian et al.

AAAI 2025paperarXiv:2407.20584
21
citations
#2267

Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding

Tian Jin, Ellie Cheng, Zachary Ankner et al.

ICML 2025arXiv:2502.11517
21
citations
#2268

Omni-Scene: Omni-Gaussian Representation for Ego-Centric Sparse-View Scene Reconstruction

Dongxu Wei, Zhiqi Li, Peidong Liu

CVPR 2025arXiv:2412.06273
21
citations
#2269

Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization

Zhitong Xu, Haitao Wang, Jeff Phillips et al.

ICLR 2025arXiv:2402.02746
21
citations
#2270

The Loss Landscape of Deep Linear Neural Networks: a Second-order Analysis

El Mehdi Achour, Francois Malgouyres, Sebastien Gerchinovitz

ICLR 2025arXiv:2107.13289
21
citations
#2271

Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation

Yuanbo Yang, Jiahao Shao, Xinyang Li et al.

CVPR 2025arXiv:2412.21117
21
citations
#2272

Benchmarking Agentic Workflow Generation

Shuofei Qiao, Runnan Fang, Zhisong Qiu et al.

ICLR 2025arXiv:2410.07869
21
citations
#2273

UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions

Xue zhucun, Jiangning Zhang, Teng Hu et al.

NEURIPS 2025arXiv:2506.13691
21
citations
#2274

PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models

Dhouib Mohamed, Davide Buscaldi, Vanier Sonia et al.

CVPR 2025arXiv:2504.08966
21
citations
#2275

Does SGD really happen in tiny subspaces?

Minhak Song, Kwangjun Ahn, Chulhee Yun

ICLR 2025arXiv:2405.16002
21
citations
#2276

Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

Yingyu Liang, Zhizhou Sha, Zhenmei Shi et al.

ICCV 2025arXiv:2405.16418
21
citations
#2277

InfAlign: Inference-aware language model alignment

Ananth Balashankar, Ziteng Sun, Jonathan Berant et al.

ICML 2025arXiv:2412.19792
21
citations
#2278

Reducing Tool Hallucination via Reliability Alignment

Hongshen Xu, Zichen Zhu, Lei Pan et al.

ICML 2025arXiv:2412.04141
21
citations
#2279

From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications

Ajay Jaiswal, Yifan Wang, Lu Yin et al.

ICML 2025arXiv:2407.11239
20
citations
#2280

AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement

Pranjal Aggarwal, Bryan Parno, Sean Welleck

ICML 2025arXiv:2412.06176
20
citations
#2281

REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites

Div Garg, Diego Caples, Andis Draguns et al.

NEURIPS 2025arXiv:2504.11543
20
citations
#2282

CRANE: Reasoning with constrained LLM generation

Debangshu Banerjee, Tarun Suresh, Shubham Ugare et al.

ICML 2025arXiv:2502.09061
20
citations
#2283

Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?

Letitia Parcalabescu, Anette Frank

ICLR 2025arXiv:2404.18624
20
citations
#2284

What Kind of Visual Tokens Do We Need? Training-Free Visual Token Pruning for Multi-Modal Large Language Models from the Perspective of Graph

Yutao Jiang, Qiong Wu, Wenhao Lin et al.

AAAI 2025paperarXiv:2501.02268
20
citations
#2285

Rethinking Aleatoric and Epistemic Uncertainty

Freddie Bickford Smith, Jannik Kossen, Eleanor Trollope et al.

ICML 2025arXiv:2412.20892
20
citations
#2286

P(all-atom) Is Unlocking New Path For Protein Design

Wei Qu, Jiawei Guan, Rui Ma et al.

ICML 2025spotlight
20
citations
#2287

Position: Uncertainty Quantification Needs Reassessment for Large Language Model Agents

Michael Kirchhof, Gjergji Kasneci, Enkelejda Kasneci

ICML 2025arXiv:2505.22655
20
citations
#2288

SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models

Shuaijie Shen, Chao Wang, Renzhuo Huang et al.

AAAI 2025paperarXiv:2408.14909
20
citations
#2289

Streaming DiLoCo with overlapping communication

Arthur Douillard, Yani Donchev, J Keith Rush et al.

COLM 2025paperarXiv:2501.18512
20
citations
#2290

Revelio: Interpreting and leveraging semantic information in diffusion models

Dahye Kim, Xavier Thomas, Deepti Ghadiyaram

ICCV 2025arXiv:2411.16725
20
citations
#2291

LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs – No Silver Bullet for LC or RAG Routing

Kuan Li, Liwen Zhang, Yong Jiang et al.

ICML 2025arXiv:2502.09977
20
citations
#2292

The Devil is in Temporal Token: High Quality Video Reasoning Segmentation

Sitong Gong, Yunzhi Zhuge, Lu Zhang et al.

CVPR 2025arXiv:2501.08549
20
citations
#2293

DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs

Jongwoo Ko, Tianyi Chen, Sungnyun Kim et al.

ICML 2025oralarXiv:2503.07067
20
citations
#2294

MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance

Quanhao Li, Zhen Xing, Rui Wang et al.

ICCV 2025arXiv:2503.16421
20
citations
#2295

Influence Functions for Scalable Data Attribution in Diffusion Models

Bruno Mlodozeniec, Runa Eschenhagen, Juhan Bae et al.

ICLR 2025arXiv:2410.13850
20
citations
#2296

CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and Reranking

Tarun Suresh, Revanth Gangi Reddy, Yifei Xu et al.

ICLR 2025arXiv:2412.01007
20
citations
#2297

You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs

Yihong Luo, Xiaolong Chen, Xinghua Qu et al.

ICLR 2025arXiv:2403.12931
20
citations
#2298

GaussianSR: High Fidelity 2D Gaussian Splatting for Arbitrary-Scale Image Super-Resolution

Jintong Hu, Bin Xia, Bin Chen et al.

AAAI 2025paperarXiv:2407.18046
20
citations
#2299

Wavelet-Assisted Multi-Frequency Attention Network for Pansharpening

Jie Huang, Rui Huang, Jinghao Xu et al.

AAAI 2025paperarXiv:2502.04903
20
citations
#2300

Language Models Need Inductive Biases to Count Inductively

Yingshan Chang, Yonatan Bisk

ICLR 2025arXiv:2405.20131
20
citations
#2301

VMBench: A Benchmark for Perception-Aligned Video Motion Generation

Xinran Ling, Chen Zhu, Meiqi Wu et al.

ICCV 2025arXiv:2503.10076
20
citations
#2302

On Oversquashing in Graph Neural Networks Through the Lens of Dynamical Systems

Alessio Gravina, Moshe Eliasof, Claudio Gallicchio et al.

AAAI 2025paperarXiv:2405.01009
20
citations
#2303

Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy

Joonhyun Jeong, Seyun Bae, Yeonsung Jung et al.

CVPR 2025arXiv:2503.20823
20
citations
#2304

FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction

Siyu Jiao, Gengwei Zhang, Yinlong Qian et al.

NEURIPS 2025arXiv:2502.20313
20
citations
#2305

LEAPS: A discrete neural sampler via locally equivariant networks

Peter Holderrieth, Michael Albergo, Tommi Jaakkola

ICML 2025arXiv:2502.10843
20
citations
#2306

Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation

Xin Zhang, Robby T. Tan

CVPR 2025highlightarXiv:2504.03193
20
citations
#2307

CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up

Songhua Liu, Zhenxiong Tan, Xinchao Wang

NEURIPS 2025arXiv:2412.16112
20
citations
#2308

Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?

Seth Aycock, David Stap, Di Wu et al.

ICLR 2025arXiv:2409.19151
20
citations
#2309

The Illusion of Empathy: How AI Chatbots Shape Conversation Perception

Tingting Liu, Salvatore Giorgi, Ankit Aich et al.

AAAI 2025paperarXiv:2411.12877
20
citations
#2310

Wyckoff Transformer: Generation of Symmetric Crystals

Nikita Kazeev, Wei Nong, Ignat Romanov et al.

ICML 2025arXiv:2503.02407
20
citations
#2311

Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model

Jiarui Jin, Haoyu Wang, Hongyan Li et al.

ICLR 2025arXiv:2502.10707
20
citations
#2312

Great Models Think Alike and this Undermines AI Oversight

Shashwat Goel, Joschka Strüber, Ilze Amanda Auzina et al.

ICML 2025spotlightarXiv:2502.04313
20
citations
#2313

Rope to Nope and Back Again: A New Hybrid Attention Strategy

Bowen Yang, Bharat Venkitesh, Dwaraknath Gnaneshwar Talupuru et al.

NEURIPS 2025arXiv:2501.18795
20
citations
#2314

Video Motion Transfer with Diffusion Transformers

Alexander Pondaven, Aliaksandr Siarohin, Sergey Tulyakov et al.

CVPR 2025arXiv:2412.07776
20
citations
#2315

SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Model

Yue Zhang, Zhiyang Xu, Ying Shen et al.

ICLR 2025arXiv:2410.03878
20
citations
#2316

Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More

Feng Wang, Yaodong Yu, Wei Shao et al.

ICML 2025arXiv:2502.03738
20
citations
#2317

OSDFace: One-Step Diffusion Model for Face Restoration

Jingkai Wang, Jue Gong, Lin Zhang et al.

CVPR 2025arXiv:2411.17163
20
citations
#2318

OptMATH: A Scalable Bidirectional Data Synthesis Framework for Optimization Modeling

Hongliang Lu, Zhonglin Xie, Yaoyu Wu et al.

ICML 2025arXiv:2502.11102
20
citations
#2319

Encryption-Friendly LLM Architecture

Donghwan Rho, Taeseong Kim, Minje Park et al.

ICLR 2025arXiv:2410.02486
20
citations
#2320

Cut Your Losses in Large-Vocabulary Language Models

Erik Wijmans, Brody Huval, Alexander Hertzberg et al.

ICLR 2025arXiv:2411.09009
20
citations
#2321

Towards a Unified Copernicus Foundation Model for Earth Vision

Yi Wang, Zhitong Xiong, Chenying Liu et al.

ICCV 2025arXiv:2503.11849
20
citations
#2322

BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks

Yunhan Zhao, Xiang Zheng, Lin Luo et al.

ICLR 2025arXiv:2410.20971
20
citations
#2323

MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO

Yicheng Xiao, Lin Song, Yukang Chen et al.

NEURIPS 2025arXiv:2505.13031
20
citations
#2324

How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension

Xinnan Dai, Haohao QU, Yifei Shen et al.

ICLR 2025arXiv:2410.05298
20
citations
#2325

Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models

Mateusz Pach, Shyamgopal Karthik, Quentin Bouniot et al.

NEURIPS 2025arXiv:2504.02821
20
citations
#2326

Dense Audio-Visual Event Localization Under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration

Ziheng Zhou, Jinxing Zhou, Wei Qian et al.

AAAI 2025paperarXiv:2412.12628
20
citations
#2327

Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior

Chen Guo, Junxuan Li, Yash Kant et al.

CVPR 2025arXiv:2503.01610
20
citations
#2328

InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction

Yuhui WU, Liyi Chen, Ruibin Li et al.

ICCV 2025arXiv:2503.20287
20
citations
#2329

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

Zhuofan Zong, Dongzhi Jiang, Bingqi Ma et al.

ICML 2025arXiv:2412.09618
20
citations
#2330

Euler Characteristic Tools for Topological Data Analysis

Olympio Hacquard, Vadim Lebovici

ICLR 2025arXiv:2303.14040
20
citations
#2331

SketchAgent: Language-Driven Sequential Sketch Generation

Yael Vinker, Tamar Rott Shaham, Kristine Zheng et al.

CVPR 2025arXiv:2411.17673
20
citations
#2332

Training-Free Diffusion Model Alignment with Sampling Demons

Po-Hung Yeh, Kuang-Huei Lee, Jun-Cheng Chen

ICLR 2025arXiv:2410.05760
20
citations
#2333

SynCity: Training-Free Generation of 3D Cities

Paul Engstler, Aleksandar Shtedritski, Iro Laina et al.

ICCV 2025
20
citations
#2334

Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization

Wenkai Yang, Shiqi Shen, Guangyao Shen et al.

ICLR 2025arXiv:2406.11431
20
citations
#2335

Instant Gaussian Stream: Fast and Generalizable Streaming of Dynamic Scene Reconstruction via Gaussian Splatting

Jinbo Yan, Rui Peng, Zhiyan Wang et al.

CVPR 2025highlightarXiv:2503.16979
20
citations
#2336

Adaptive teachers for amortized samplers

Minsu Kim, Sanghyeok Choi, Taeyoung Yun et al.

ICLR 2025arXiv:2410.01432
20
citations
#2337

Autoregressive Pretraining with Mamba in Vision

Sucheng Ren, Xianhang Li, Haoqin Tu et al.

ICLR 2025arXiv:2406.07537
20
citations
#2338

MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models

Mohammad Shahab Sepehri, Zalan Fabian, Maryam Soltanolkotabi et al.

ICLR 2025arXiv:2409.15477
20
citations
#2339

Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models

Uladzislau Sobal, Wancong Zhang, Kyunghyun Cho et al.

NEURIPS 2025arXiv:2502.14819
20
citations
#2340

ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

Yifan Pu, Yiming Zhao, Zhicong Tang et al.

CVPR 2025arXiv:2502.18364
20
citations
#2341

E(n) Equivariant Topological Neural Networks

Claudio Battiloro, Ege Karaismailoglu, Mauricio Tec et al.

ICLR 2025arXiv:2405.15429
20
citations
#2342

Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning

Fan Lu, Wei Wu, Kecheng Zheng et al.

CVPR 2025arXiv:2412.08614
20
citations
#2343

DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness

Yiming Zhong, Qi Jiang, Jingyi Yu et al.

CVPR 2025highlightarXiv:2503.08257
20
citations
#2344

WonderTurbo: Generating Interactive 3D World in 0.72 Seconds

Chaojun Ni, Xiaofeng Wang, Zheng Zhu et al.

ICCV 2025arXiv:2504.02261
20
citations
#2345

SAFIRE: Segment Any Forged Image Region

Myung-Joon Kwon, Wonjun Lee, Seung-Hun Nam et al.

AAAI 2025paperarXiv:2412.08197
20
citations
#2346

Learning Long Range Dependencies on Graphs via Random Walks

Dexiong Chen, Till Schulz, Karsten Borgwardt

ICLR 2025arXiv:2406.03386
20
citations
#2347

Dynamic Negative Guidance of Diffusion Models

Felix Koulischer, Johannes Deleu, Gabriel Raya et al.

ICLR 2025arXiv:2410.14398
20
citations
#2348

Flow Matching with Gaussian Process Priors for Probabilistic Time Series Forecasting

Marcel Kollovieh, Marten Lienen, David Lüdke et al.

ICLR 2025oralarXiv:2410.03024
20
citations
#2349

Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models

Zeyu Yang, Zijie Pan, Chun Gu et al.

ICLR 2025oralarXiv:2404.02148
20
citations
#2350

Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens

Kaihang Pan, Wang Lin, Zhongqi Yue et al.

CVPR 2025arXiv:2504.14666
20
citations
#2351

VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data

Thomas Zeng, Shuibai Zhang, Shutong Wu et al.

ICML 2025oralarXiv:2502.06737
20
citations
#2352

Video-T1: Test-time Scaling for Video Generation

Fangfu Liu, Hanyang Wang, Yimo Cai et al.

ICCV 2025arXiv:2503.18942
20
citations
#2353

KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models

Eunice Yiu, Maan Qraitem, Anisa Majhi et al.

ICLR 2025arXiv:2407.17773
20
citations
#2354

{$\tau$}-bench: A Benchmark for \underline{T}ool-\underline{A}gent-\underline{U}ser Interaction in Real-World Domains

Shunyu Yao, Noah Shinn, Pedram Razavi et al.

ICLR 2025
20
citations
#2355

Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption

Du CHEN, Tianhe Wu, Kede Ma et al.

CVPR 2025arXiv:2503.11221
20
citations
#2356

Inference-Time Alignment of Diffusion Models with Direct Noise Optimization

Zhiwei Tang, Jiangweizhi Peng, Jiasheng Tang et al.

ICML 2025arXiv:2405.18881
20
citations
#2357

Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition

Hongda Liu, Yunfan Liu, Min Ren et al.

CVPR 2025highlightarXiv:2411.18941
20
citations
#2358

Sekai: A Video Dataset towards World Exploration

Zhen Li, Chuanhao Li, Xiaofeng Mao et al.

NEURIPS 2025arXiv:2506.15675
20
citations
#2359

BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning

Jianyang Gu, Sam Stevens, Elizabeth Campolongo et al.

NEURIPS 2025spotlightarXiv:2505.23883
20
citations
#2360

KAN-AD: Time Series Anomaly Detection with Kolmogorov–Arnold Networks

Quan Zhou, Changhua Pei, Fei Sun et al.

ICML 2025arXiv:2411.00278
20
citations
#2361

D^3: Scaling Up Deepfake Detection by Learning from Discrepancy

Yongqi Yang, Zhihao Qian, Ye Zhu et al.

CVPR 2025arXiv:2404.04584
20
citations
#2362

OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

Pengfei Zhou, Xiaopeng Peng, Jiajun Song et al.

CVPR 2025arXiv:2411.18499
20
citations
#2363

Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models

Ronghuan Wu, Wanchao Su, Jing Liao

CVPR 2025arXiv:2411.16602
20
citations
#2364

Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking

You Wu, Xucheng Wang, Xiangyang Yang et al.

CVPR 2025arXiv:2504.09228
20
citations
#2365

Cross-Embodiment Dexterous Grasping with Reinforcement Learning

Haoqi Yuan, Bohan Zhou, Yuhui Fu et al.

ICLR 2025arXiv:2410.02479
20
citations
#2366

X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention

XiaoChen Zhao, Hongyi Xu, Guoxian Song et al.

ICLR 2025arXiv:2507.23143
20
citations
#2367

Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning

Minheng Ni, YuTao Fan, Lei Zhang et al.

ICLR 2025arXiv:2410.03321
20
citations
#2368

Gradient-Free Generation for Hard-Constrained Systems

Chaoran Cheng, Boran Han, Danielle Maddix et al.

ICLR 2025arXiv:2412.01786
20
citations
#2369

EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting

Dong In Lee, Hyeongcheol Park, Jiyoung Seo et al.

CVPR 2025arXiv:2412.11520
20
citations
#2370

DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization

Zhenglin Zhou, Xiaobo Xia, Fan Ma et al.

ICML 2025arXiv:2502.04370
20
citations
#2371

Parrot: Multilingual Visual Instruction Tuning

Hai-Long Sun, Da-Wei Zhou, Yang Li et al.

ICML 2025arXiv:2406.02539
20
citations
#2372

CrossMPT: Cross-attention Message-passing Transformer for Error Correcting Codes

Seong-Joon Park, Hee-Youl Kwak, Sang-Hyo Kim et al.

ICLR 2025arXiv:2405.01033
20
citations
#2373

Rethinking Light Decoder-based Solvers for Vehicle Routing Problems

Ziwei Huang, Jianan Zhou, Zhiguang Cao et al.

ICLR 2025arXiv:2503.00753
20
citations
#2374

Occlusion-Embedded Hybrid Transformer for Light Field Super-Resolution

Zeyu Xiao, Zhuoyuan Li, Wei Jia

AAAI 2025paper
20
citations
#2375

MoFlow: One-Step Flow Matching for Human Trajectory Forecasting via Implicit Maximum Likelihood Estimation based Distillation

Yuxiang Fu, Qi Yan, Ke Li et al.

CVPR 2025arXiv:2503.09950
20
citations
#2376

BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning

Xuechen Zhang, Zijian Huang, Yingcong Li et al.

NEURIPS 2025arXiv:2506.17211
20
citations
#2377

Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation

Yuseung Lee, Jihyeon Je, Chanho Park et al.

ICCV 2025arXiv:2504.17207
20
citations
#2378

CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation

Nikolai Kalischek, Michael Oechsle, Fabian Manhardt et al.

ICLR 2025arXiv:2501.17162
20
citations
#2379

PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology

Fatemeh Ghezloo, Saygin Seyfioglu, Rustin Soraki et al.

ICCV 2025arXiv:2502.08916
20
citations
#2380

Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs

Barrett Tang, Zile Huang, Chengzhi Liu et al.

ICLR 2025
20
citations
#2381

AdaDiff: Adaptive Step Selection for Fast Diffusion Models

Hui Zhang, Zuxuan Wu, Zhen Xing et al.

AAAI 2025paperarXiv:2311.14768
20
citations
#2382

Grokking at the Edge of Numerical Stability

Lucas Prieto, Melih Barsbey, Pedro Mediano et al.

ICLR 2025arXiv:2501.04697
20
citations
#2383

Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay

Yifan Sun, Jingyan Shen, Yibin Wang et al.

NEURIPS 2025arXiv:2506.05316
20
citations
#2384

Text2PDE: Latent Diffusion Models for Accessible Physics Simulation

Anthony Zhou, Zijie Li, Michael Schneier et al.

ICLR 2025oralarXiv:2410.01153
20
citations
#2385

RocketEval: Efficient automated LLM evaluation via grading checklist

Tianjun Wei, Wei Wen, Ruizhi Qiao et al.

ICLR 2025arXiv:2503.05142
20
citations
#2386

Learning General-purpose Biomedical Volume Representations using Randomized Synthesis

Neel Dey, Benjamin Billot, Hallee Wong et al.

ICLR 2025arXiv:2411.02372
20
citations
#2387

DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation

Bo-Wen Yin, Jiao-Long Cao, Ming-Ming Cheng et al.

CVPR 2025arXiv:2504.04701
20
citations
#2388

Controlling Large Language Models Through Concept Activation Vectors

Hanyu Zhang, Xiting Wang, Chengao Li et al.

AAAI 2025paperarXiv:2501.05764
20
citations
#2389

CAPTURE: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting

Atin Pothiraj, Jaemin Cho, Elias Stengel-Eskin et al.

ICCV 2025arXiv:2504.15485
20
citations
#2390

Enhancing Chain of Thought Prompting in Large Language Models via Reasoning Patterns

Yufeng Zhang, Xuepeng Wang, Lingxiang Wu et al.

AAAI 2025paperarXiv:2404.14812
20
citations
#2391

Sort-free Gaussian Splatting via Weighted Sum Rendering

Qiqi Hou, Randall Rauwendaal, Zifeng Li et al.

ICLR 2025arXiv:2410.18931
20
citations
#2392

ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination

Xinxin Zhao, Wenzhe Cai, Likun Tang et al.

ICLR 2025arXiv:2410.09874
20
citations
#2393

Perturbation-Restrained Sequential Model Editing

Jun-Yu Ma, Hong Wang, Hao-Xiang Xu et al.

ICLR 2025arXiv:2405.16821
20
citations
#2394

Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders

Fiona Ryan, Ajay Bati, Sangmin Lee et al.

CVPR 2025highlightarXiv:2412.09586
20
citations
#2395

EmoEdit: Evoking Emotions through Image Manipulation

Jingyuan Yang, Jiawei Feng, Weibin Luo et al.

CVPR 2025arXiv:2405.12661
20
citations
#2396

Multi-Sensor Object Anomaly Detection: Unifying Appearance, Geometry, and Internal Properties

wenqiao Li, BoZhong Zheng, Xiaohao Xu et al.

CVPR 2025arXiv:2412.14592
20
citations
#2397

Modeling Complex System Dynamics with Flow Matching Across Time and Conditions

Martin Rohbeck, Edward De Brouwer, Charlotte Bunne et al.

ICLR 2025oral
20
citations
#2398

Investigating Non-Transitivity in LLM-as-a-Judge

Yi Xu, Laura Ruis, Tim Rocktäschel et al.

ICML 2025spotlightarXiv:2502.14074
20
citations
#2399

LaVin-DiT: Large Vision Diffusion Transformer

Zhaoqing Wang, Xiaobo Xia, Runnan Chen et al.

CVPR 2025arXiv:2411.11505
20
citations
#2400

In Search of Adam’s Secret Sauce

Antonio Orvieto, Robert Gower

NEURIPS 2025oralarXiv:2505.21829
20
citations