Most Cited 2025 "stereo image super-resolution" Papers

22,274 papers found • Page 12 of 112

Filters:Most Cited 2025 stereo image super-resolution Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

#2201

Learning Distributions of Complex Fluid Simulations with Diffusion Graph Networks

Mario Lino, Tobias Pfaff, Nils Thuerey

ICLR 2025arXiv:2504.02843

citations

#2202

Argumentative Large Language Models for Explainable and Contestable Claim Verification

Gabriel Freedman, Adam Dejl, Deniz Gorur et al.

AAAI 2025paperarXiv:2405.02079

citations

#2203

REFINE: Inversion-Free Backdoor Defense via Model Reprogramming

Yukun Chen, Shuo Shao, Enhao Huang et al.

ICLR 2025arXiv:2502.18508

citations

#2204

MMAD: Multi-label Micro-Action Detection in Videos

Kun Li, pengyu Liu, Dan Guo et al.

ICCV 2025arXiv:2407.05311

citations

#2205

Model Merging in Pre-training of Large Language Models

Yunshui Li, Yiyuan Ma, Shen Yan et al.

NEURIPS 2025arXiv:2505.12082

citations

#2206

Light-A-Video: Training-free Video Relighting via Progressive Light Fusion

Yujie Zhou, Jiazi Bu, Pengyang Ling et al.

ICCV 2025arXiv:2502.08590

citations

#2207

SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device

Yushu Wu, Zhixing Zhang, Yanyu Li et al.

CVPR 2025arXiv:2412.10494

citations

#2208

3DMambaIPF: A State Space Model for Iterative Point Cloud Filtering via Differentiable Rendering

Qingyuan Zhou, Weidong Yang, Ben Fei et al.

AAAI 2025paperarXiv:2404.05522

citations

#2209

Generative Trajectory Stitching through Diffusion Composition

Yunhao Luo, Utkarsh Mishra, Yilun Du et al.

NEURIPS 2025spotlightarXiv:2503.05153

citations

#2210

MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders

jiajun cao, Yuan Zhang, Tao Huang et al.

CVPR 2025arXiv:2501.01709

citations

#2211

DataGen: Unified Synthetic Dataset Generation via Large Language Models

Yue Huang, Siyuan Wu, Chujie Gao et al.

ICLR 2025arXiv:2406.18966

citations

#2212

DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation

Tianyi Yan, Dongming Wu, Wencheng Han et al.

CVPR 2025arXiv:2411.11252

citations

#2213

SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video

Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello et al.

CVPR 2025arXiv:2412.09982

citations

#2214

TimeBridge: Non-Stationarity Matters for Long-term Time Series Forecasting

Peiyuan Liu, Beiliang Wu, Yifan Hu et al.

ICML 2025arXiv:2410.04442

citations

#2215

VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers

Yating Wang, Haoyi Zhu, Mingyu Liu et al.

ICCV 2025arXiv:2507.01016

citations

#2216

Mixture of In-Context Prompters for Tabular PFNs

Derek Xu, Olcay Cirit, Reza Asadi et al.

ICLR 2025arXiv:2405.16156

citations

#2217

OccMamba: Semantic Occupancy Prediction with State Space Models

Heng Li, Yuenan Hou, Xiaohan Xing et al.

CVPR 2025arXiv:2408.09859

citations

#2218

Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression

Jingcun Wang, Yu-Guang Chen, Ing-Chao Lin et al.

ICLR 2025arXiv:2410.03765

citations

#2219

Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation

Chengyang Ye, Yunzhi Zhuge, Pingping Zhang

AAAI 2025paperarXiv:2412.19492

citations

#2220

TetSphere Splatting: Representing High-Quality Geometry with Lagrangian Volumetric Meshes

Minghao Guo, Bohan Wang, Kaiming He et al.

ICLR 2025arXiv:2405.20283

citations

#2221

Generative Image Layer Decomposition with Visual Effects

Jinrui Yang, Qing Liu, Yijun Li et al.

CVPR 2025arXiv:2411.17864

citations

#2222

V2XPnP: Vehicle-to-Everything Spatio-Temporal Fusion for Multi-Agent Perception and Prediction

Zewei Zhou, Hao Xiang, Zhaoliang Zheng et al.

ICCV 2025arXiv:2412.01812

citations

#2223

StoryGPT-V: Large Language Models as Consistent Story Visualizers

Xiaoqian Shen, Mohamed Elhoseiny

CVPR 2025arXiv:2312.02252

citations

#2224

CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs

Jinlan Fu, Shenzhen Huangfu, Hao Fei et al.

ICLR 2025arXiv:2501.16629

citations

#2225

Improving LLM Safety Alignment with Dual-Objective Optimization

Xuandong Zhao, Will Cai, Tianneng Shi et al.

ICML 2025arXiv:2503.03710

citations

#2226

Mixture of Noise for Pre-Trained Model-Based Class-Incremental Learning

Kai Jiang, Zhengyan Shi, Dell Zhang et al.

NEURIPS 2025arXiv:2509.16738

citations

#2227

SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations

Zhaorun Chen, Francesco Pinto, Minzhou Pan et al.

ICLR 2025arXiv:2412.06878

citations

#2228

Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks

Michael Matthews, Michael Beukman, Chris Lu et al.

ICLR 2025arXiv:2410.23208

citations

#2229

LSNet: See Large, Focus Small

Ao Wang, Hui Chen, Zijia Lin et al.

CVPR 2025arXiv:2503.23135

citations

#2230

Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead

Rickard Gabrielsson, Jiacheng Zhu, Onkar Bhardwaj et al.

ICML 2025arXiv:2407.00066

citations

#2231

OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentation

Ding Zhong, Xu Zheng, Chenfei Liao et al.

ICCV 2025highlightarXiv:2503.07098

citations

#2232

Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation

Kunpeng Qiu, Zhiqiang Gao, Zhiying Zhou et al.

CVPR 2025arXiv:2505.06068

citations

#2233

Decision Theoretic Foundations for Conformal Prediction: Optimal Uncertainty Quantification for Risk-Averse Agents

Shayan Kiyani, George Pappas, Aaron Roth et al.

ICML 2025spotlightarXiv:2502.02561

citations

#2234

Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models

Ma Teng, Xiaojun Jia, Ranjie Duan et al.

ICCV 2025arXiv:2412.05934

citations

#2235

Do Language Models Use Their Depth Efficiently?

Róbert Csordás, Christopher D Manning, Chris Potts

NEURIPS 2025arXiv:2505.13898

citations

#2236

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

Weixian Lei, Jiacong Wang, Haochen Wang et al.

ICCV 2025highlightarXiv:2504.10462

citations

#2237

MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

Wei Cheng, Juncheng Mu, Xianfang Zeng et al.

CVPR 2025arXiv:2411.02336

citations

#2238

Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks

Shengbin Yue, Siyuan Wang, Wei Chen et al.

AAAI 2025paperarXiv:2407.09893

citations

#2239

In Search of Forgotten Domain Generalization

Prasanna Mayilvahanan, Roland Zimmermann, Thaddäus Wiedemer et al.

ICLR 2025arXiv:2410.08258

citations

#2240

Reflective Gaussian Splatting

Yuxuan Yao, Zixuan Zeng, Chun Gu et al.

ICLR 2025arXiv:2412.19282

citations

#2241

Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration

Mark Endo, Xiaohan Wang, Serena Yeung-Levy

ICCV 2025arXiv:2412.13180

citations

#2242

Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning

Weitai Kang, Haifeng Huang, Yuzhang Shang et al.

ICCV 2025arXiv:2410.00255

citations

#2243

Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient

Zigeng Chen, Xinyin Ma, Gongfan Fang et al.

CVPR 2025arXiv:2411.17787

citations

#2244

On the Benefits of Memory for Modeling Time-Dependent PDEs

Ricardo Buitrago Ruiz, Tanya Marwah, Albert Gu et al.

ICLR 2025arXiv:2409.02313

citations

#2245

Is Sarcasm Detection a Step-by-Step Reasoning Process in Large Language Models?

Ben Yao, Yazhou Zhang, Qiuchi Li et al.

AAAI 2025paperarXiv:2407.12725

citations

#2246

ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance

Shuwei Shi, Wenbo Li, Yuechen Zhang et al.

AAAI 2025paperarXiv:2406.16476

citations

#2247

Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models

Ce Zhang, Zifu Wan, Zhehan Kan et al.

ICLR 2025arXiv:2502.06130

citations

#2248

Self-Challenging Language Model Agents

Yifei Zhou, Sergey Levine, Jason Weston et al.

NEURIPS 2025arXiv:2506.01716

citations

#2249

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

Tonghe Zhang, Chao Yu, Sichang Su et al.

NEURIPS 2025arXiv:2505.22094

citations

#2250

SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis

Hyojun Go, byeongjun park, Jiho Jang et al.

CVPR 2025arXiv:2411.16443

citations

#2251

Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

Xinyu Yang, Yuwei An, Hongyi Liu et al.

NEURIPS 2025spotlightarXiv:2506.09991

citations

#2252

EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition

Issar Tzachor, Boaz Lerner, Matan Levy et al.

ICLR 2025arXiv:2405.18065

citations

#2253

Training on the Benchmark Is Not All You Need

Shiwen Ni, Xiangtao Kong, Chengming Li et al.

AAAI 2025paperarXiv:2409.01790

citations

#2254

Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation

CHUANQI CHENG, Jian Guan, Wei Wu et al.

ICML 2025oralarXiv:2504.02438

citations

#2255

MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents

Kaijie Zhu, Xianjun Yang, Jindong Wang et al.

ICML 2025arXiv:2502.05174

citations

#2256

OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?

Junjielong Xu, Qinan Zhang, Zhiqing Zhong et al.

ICLR 2025

citations

#2257

BaxBench: Can LLMs Generate Correct and Secure Backends?

Mark Vero, Niels Mündler, Viktor Chibotaru et al.

ICML 2025spotlightarXiv:2502.11844

citations

#2258

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

Zhong-Yu Li, Ruoyi Du, Juncheng Yan et al.

ICCV 2025arXiv:2504.07960

citations

#2259

How do language models learn facts? Dynamics, curricula and hallucinations

Nicolas Zucchet, Jorg Bornschein, Stephanie C.Y. Chan et al.

COLM 2025paper

citations

#2260

Causal Concept Graph Models: Beyond Causal Opacity in Deep Learning

Gabriele Dominici, Pietro Barbiero, Mateo Espinosa Zarlenga et al.

ICLR 2025arXiv:2405.16507

citations

#2261

ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification

Hyunseok Lee, Seunghyuk Oh, Jaehyung Kim et al.

ICML 2025arXiv:2502.14565

citations

#2262

Monte Carlo Tree Diffusion for System 2 Planning

Jaesik Yoon, Hyeonseo Cho, Doojin Baek et al.

ICML 2025spotlightarXiv:2502.07202

citations

#2263

Personality Alignment of Large Language Models

Minjun Zhu, Yixuan Weng, Linyi Yang et al.

ICLR 2025oralarXiv:2408.11779

citations

#2264

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices

Xudong LU, Yinghao Chen, chencheng Chen et al.

CVPR 2025arXiv:2411.10640

citations

#2265

Rethinking External Slow-Thinking: From Snowball Errors to Probability of Correct Reasoning

Zeyu Gan, Yun Liao, Yong Liu

ICML 2025arXiv:2501.15602

citations

#2266

Pruning Large Language Models with Semi-Structural Adaptive Sparse Training

Weiyu Huang, Yuezhou Hu, Guohao Jian et al.

AAAI 2025paperarXiv:2407.20584

citations

#2267

Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding

Tian Jin, Ellie Cheng, Zachary Ankner et al.

ICML 2025arXiv:2502.11517

citations

#2268

Omni-Scene: Omni-Gaussian Representation for Ego-Centric Sparse-View Scene Reconstruction

Dongxu Wei, Zhiqi Li, Peidong Liu

CVPR 2025arXiv:2412.06273

citations

#2269

Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization

Zhitong Xu, Haitao Wang, Jeff Phillips et al.

ICLR 2025arXiv:2402.02746

citations

#2270

The Loss Landscape of Deep Linear Neural Networks: a Second-order Analysis

El Mehdi Achour, Francois Malgouyres, Sebastien Gerchinovitz

ICLR 2025arXiv:2107.13289

citations

#2271

Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation

Yuanbo Yang, Jiahao Shao, Xinyang Li et al.

CVPR 2025arXiv:2412.21117

citations

#2272

Benchmarking Agentic Workflow Generation

Shuofei Qiao, Runnan Fang, Zhisong Qiu et al.

ICLR 2025arXiv:2410.07869

citations

#2273

UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions

Xue zhucun, Jiangning Zhang, Teng Hu et al.

NEURIPS 2025arXiv:2506.13691

citations

#2274

PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models

Dhouib Mohamed, Davide Buscaldi, Vanier Sonia et al.

CVPR 2025arXiv:2504.08966

citations

#2275

Does SGD really happen in tiny subspaces?

Minhak Song, Kwangjun Ahn, Chulhee Yun

ICLR 2025arXiv:2405.16002

citations

#2276

Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

Yingyu Liang, Zhizhou Sha, Zhenmei Shi et al.

ICCV 2025arXiv:2405.16418

citations

#2277

InfAlign: Inference-aware language model alignment

Ananth Balashankar, Ziteng Sun, Jonathan Berant et al.

ICML 2025arXiv:2412.19792

citations

#2278

Reducing Tool Hallucination via Reliability Alignment

Hongshen Xu, Zichen Zhu, Lei Pan et al.

ICML 2025arXiv:2412.04141

citations

#2279

From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications

Ajay Jaiswal, Yifan Wang, Lu Yin et al.

ICML 2025arXiv:2407.11239

citations

#2280

AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement

Pranjal Aggarwal, Bryan Parno, Sean Welleck

ICML 2025arXiv:2412.06176

citations

#2281

REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites

Div Garg, Diego Caples, Andis Draguns et al.

NEURIPS 2025arXiv:2504.11543

citations

#2282

CRANE: Reasoning with constrained LLM generation

Debangshu Banerjee, Tarun Suresh, Shubham Ugare et al.

ICML 2025arXiv:2502.09061

citations

#2283

Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?

Letitia Parcalabescu, Anette Frank

ICLR 2025arXiv:2404.18624

citations

#2284

What Kind of Visual Tokens Do We Need? Training-Free Visual Token Pruning for Multi-Modal Large Language Models from the Perspective of Graph

Yutao Jiang, Qiong Wu, Wenhao Lin et al.

AAAI 2025paperarXiv:2501.02268

citations

#2285

Rethinking Aleatoric and Epistemic Uncertainty

Freddie Bickford Smith, Jannik Kossen, Eleanor Trollope et al.

ICML 2025arXiv:2412.20892

citations

#2286

P(all-atom) Is Unlocking New Path For Protein Design

Wei Qu, Jiawei Guan, Rui Ma et al.

ICML 2025spotlight

citations

#2287

Position: Uncertainty Quantification Needs Reassessment for Large Language Model Agents

Michael Kirchhof, Gjergji Kasneci, Enkelejda Kasneci

ICML 2025arXiv:2505.22655

citations

#2288

SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models

Shuaijie Shen, Chao Wang, Renzhuo Huang et al.

AAAI 2025paperarXiv:2408.14909

citations

#2289

Streaming DiLoCo with overlapping communication

Arthur Douillard, Yani Donchev, J Keith Rush et al.

COLM 2025paperarXiv:2501.18512

citations

#2290

Revelio: Interpreting and leveraging semantic information in diffusion models

Dahye Kim, Xavier Thomas, Deepti Ghadiyaram

ICCV 2025arXiv:2411.16725

citations

#2291

LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs – No Silver Bullet for LC or RAG Routing

Kuan Li, Liwen Zhang, Yong Jiang et al.

ICML 2025arXiv:2502.09977

citations

#2292

The Devil is in Temporal Token: High Quality Video Reasoning Segmentation

Sitong Gong, Yunzhi Zhuge, Lu Zhang et al.

CVPR 2025arXiv:2501.08549

citations

#2293

DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs

Jongwoo Ko, Tianyi Chen, Sungnyun Kim et al.

ICML 2025oralarXiv:2503.07067

citations

#2294

MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance

Quanhao Li, Zhen Xing, Rui Wang et al.

ICCV 2025arXiv:2503.16421

citations

#2295

Influence Functions for Scalable Data Attribution in Diffusion Models

Bruno Mlodozeniec, Runa Eschenhagen, Juhan Bae et al.

ICLR 2025arXiv:2410.13850

citations

#2296

CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and Reranking

Tarun Suresh, Revanth Gangi Reddy, Yifei Xu et al.

ICLR 2025arXiv:2412.01007

citations

#2297

You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs

Yihong Luo, Xiaolong Chen, Xinghua Qu et al.

ICLR 2025arXiv:2403.12931

citations

#2298

GaussianSR: High Fidelity 2D Gaussian Splatting for Arbitrary-Scale Image Super-Resolution

Jintong Hu, Bin Xia, Bin Chen et al.

AAAI 2025paperarXiv:2407.18046

citations

#2299

Wavelet-Assisted Multi-Frequency Attention Network for Pansharpening

Jie Huang, Rui Huang, Jinghao Xu et al.

AAAI 2025paperarXiv:2502.04903

citations

#2300

Language Models Need Inductive Biases to Count Inductively

Yingshan Chang, Yonatan Bisk

ICLR 2025arXiv:2405.20131

citations

#2301

VMBench: A Benchmark for Perception-Aligned Video Motion Generation

Xinran Ling, Chen Zhu, Meiqi Wu et al.

ICCV 2025arXiv:2503.10076

citations

#2302

On Oversquashing in Graph Neural Networks Through the Lens of Dynamical Systems

Alessio Gravina, Moshe Eliasof, Claudio Gallicchio et al.

AAAI 2025paperarXiv:2405.01009

citations

#2303

Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy

Joonhyun Jeong, Seyun Bae, Yeonsung Jung et al.

CVPR 2025arXiv:2503.20823

citations

#2304

FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction

Siyu Jiao, Gengwei Zhang, Yinlong Qian et al.

NEURIPS 2025arXiv:2502.20313

citations

#2305

LEAPS: A discrete neural sampler via locally equivariant networks

Peter Holderrieth, Michael Albergo, Tommi Jaakkola

ICML 2025arXiv:2502.10843

citations

#2306

Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation

Xin Zhang, Robby T. Tan

CVPR 2025highlightarXiv:2504.03193

citations

#2307

CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up

Songhua Liu, Zhenxiong Tan, Xinchao Wang

NEURIPS 2025arXiv:2412.16112

citations

#2308

Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?

Seth Aycock, David Stap, Di Wu et al.

ICLR 2025arXiv:2409.19151

citations

#2309

The Illusion of Empathy: How AI Chatbots Shape Conversation Perception

Tingting Liu, Salvatore Giorgi, Ankit Aich et al.

AAAI 2025paperarXiv:2411.12877

citations

#2310

Wyckoff Transformer: Generation of Symmetric Crystals

Nikita Kazeev, Wei Nong, Ignat Romanov et al.

ICML 2025arXiv:2503.02407

citations

#2311

Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model

Jiarui Jin, Haoyu Wang, Hongyan Li et al.

ICLR 2025arXiv:2502.10707

citations

#2312

Great Models Think Alike and this Undermines AI Oversight

Shashwat Goel, Joschka Strüber, Ilze Amanda Auzina et al.

ICML 2025spotlightarXiv:2502.04313

citations

#2313

Rope to Nope and Back Again: A New Hybrid Attention Strategy

Bowen Yang, Bharat Venkitesh, Dwaraknath Gnaneshwar Talupuru et al.

NEURIPS 2025arXiv:2501.18795

citations

#2314

Video Motion Transfer with Diffusion Transformers

Alexander Pondaven, Aliaksandr Siarohin, Sergey Tulyakov et al.

CVPR 2025arXiv:2412.07776

citations

#2315

SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Model

Yue Zhang, Zhiyang Xu, Ying Shen et al.

ICLR 2025arXiv:2410.03878

citations

#2316

Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More

Feng Wang, Yaodong Yu, Wei Shao et al.

ICML 2025arXiv:2502.03738

citations

#2317

OSDFace: One-Step Diffusion Model for Face Restoration

Jingkai Wang, Jue Gong, Lin Zhang et al.

CVPR 2025arXiv:2411.17163

citations

#2318

OptMATH: A Scalable Bidirectional Data Synthesis Framework for Optimization Modeling

Hongliang Lu, Zhonglin Xie, Yaoyu Wu et al.

ICML 2025arXiv:2502.11102

citations

#2319

Encryption-Friendly LLM Architecture

Donghwan Rho, Taeseong Kim, Minje Park et al.

ICLR 2025arXiv:2410.02486

citations

#2320

Cut Your Losses in Large-Vocabulary Language Models

Erik Wijmans, Brody Huval, Alexander Hertzberg et al.

ICLR 2025arXiv:2411.09009

citations

#2321

Towards a Unified Copernicus Foundation Model for Earth Vision

Yi Wang, Zhitong Xiong, Chenying Liu et al.

ICCV 2025arXiv:2503.11849

citations

#2322

BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks

Yunhan Zhao, Xiang Zheng, Lin Luo et al.

ICLR 2025arXiv:2410.20971

citations

#2323

MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO

Yicheng Xiao, Lin Song, Yukang Chen et al.

NEURIPS 2025arXiv:2505.13031

citations

#2324

How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension

Xinnan Dai, Haohao QU, Yifei Shen et al.

ICLR 2025arXiv:2410.05298

citations

#2325

Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models

Mateusz Pach, Shyamgopal Karthik, Quentin Bouniot et al.

NEURIPS 2025arXiv:2504.02821

citations

#2326

Dense Audio-Visual Event Localization Under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration

Ziheng Zhou, Jinxing Zhou, Wei Qian et al.

AAAI 2025paperarXiv:2412.12628

citations

#2327

Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior

Chen Guo, Junxuan Li, Yash Kant et al.

CVPR 2025arXiv:2503.01610

citations

#2328

InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction

Yuhui WU, Liyi Chen, Ruibin Li et al.

ICCV 2025arXiv:2503.20287

citations

#2329

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

Zhuofan Zong, Dongzhi Jiang, Bingqi Ma et al.

ICML 2025arXiv:2412.09618

citations

#2330

Euler Characteristic Tools for Topological Data Analysis

Olympio Hacquard, Vadim Lebovici

ICLR 2025arXiv:2303.14040

citations

#2331

SketchAgent: Language-Driven Sequential Sketch Generation

Yael Vinker, Tamar Rott Shaham, Kristine Zheng et al.

CVPR 2025arXiv:2411.17673

citations

#2332

Training-Free Diffusion Model Alignment with Sampling Demons

Po-Hung Yeh, Kuang-Huei Lee, Jun-Cheng Chen

ICLR 2025arXiv:2410.05760

citations

#2333

SynCity: Training-Free Generation of 3D Cities

Paul Engstler, Aleksandar Shtedritski, Iro Laina et al.

ICCV 2025

citations

#2334

Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization

Wenkai Yang, Shiqi Shen, Guangyao Shen et al.

ICLR 2025arXiv:2406.11431

citations

#2335

Instant Gaussian Stream: Fast and Generalizable Streaming of Dynamic Scene Reconstruction via Gaussian Splatting

Jinbo Yan, Rui Peng, Zhiyan Wang et al.

CVPR 2025highlightarXiv:2503.16979

citations

#2336

Adaptive teachers for amortized samplers

Minsu Kim, Sanghyeok Choi, Taeyoung Yun et al.

ICLR 2025arXiv:2410.01432

citations

#2337

Autoregressive Pretraining with Mamba in Vision

Sucheng Ren, Xianhang Li, Haoqin Tu et al.

ICLR 2025arXiv:2406.07537

citations

#2338

MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models

Mohammad Shahab Sepehri, Zalan Fabian, Maryam Soltanolkotabi et al.

ICLR 2025arXiv:2409.15477

citations

#2339

Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models

Uladzislau Sobal, Wancong Zhang, Kyunghyun Cho et al.

NEURIPS 2025arXiv:2502.14819

citations

#2340

ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

Yifan Pu, Yiming Zhao, Zhicong Tang et al.

CVPR 2025arXiv:2502.18364

citations

#2341

E(n) Equivariant Topological Neural Networks

Claudio Battiloro, Ege Karaismailoglu, Mauricio Tec et al.

ICLR 2025arXiv:2405.15429

citations

#2342

Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning

Fan Lu, Wei Wu, Kecheng Zheng et al.

CVPR 2025arXiv:2412.08614

citations

#2343

DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness

Yiming Zhong, Qi Jiang, Jingyi Yu et al.

CVPR 2025highlightarXiv:2503.08257

citations

#2344

WonderTurbo: Generating Interactive 3D World in 0.72 Seconds

Chaojun Ni, Xiaofeng Wang, Zheng Zhu et al.

ICCV 2025arXiv:2504.02261

citations

#2345

SAFIRE: Segment Any Forged Image Region

Myung-Joon Kwon, Wonjun Lee, Seung-Hun Nam et al.

AAAI 2025paperarXiv:2412.08197

citations

#2346

Learning Long Range Dependencies on Graphs via Random Walks

Dexiong Chen, Till Schulz, Karsten Borgwardt

ICLR 2025arXiv:2406.03386

citations

#2347

Dynamic Negative Guidance of Diffusion Models

Felix Koulischer, Johannes Deleu, Gabriel Raya et al.

ICLR 2025arXiv:2410.14398

citations

#2348

Flow Matching with Gaussian Process Priors for Probabilistic Time Series Forecasting

Marcel Kollovieh, Marten Lienen, David Lüdke et al.

ICLR 2025oralarXiv:2410.03024

citations

#2349

Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models

Zeyu Yang, Zijie Pan, Chun Gu et al.

ICLR 2025oralarXiv:2404.02148

citations

#2350

Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens

Kaihang Pan, Wang Lin, Zhongqi Yue et al.

CVPR 2025arXiv:2504.14666

citations

#2351

VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data

Thomas Zeng, Shuibai Zhang, Shutong Wu et al.

ICML 2025oralarXiv:2502.06737

citations

#2352

Video-T1: Test-time Scaling for Video Generation

Fangfu Liu, Hanyang Wang, Yimo Cai et al.

ICCV 2025arXiv:2503.18942

citations

#2353

KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models

Eunice Yiu, Maan Qraitem, Anisa Majhi et al.

ICLR 2025arXiv:2407.17773

citations

#2354

{$\tau$}-bench: A Benchmark for \underline{T}ool-\underline{A}gent-\underline{U}ser Interaction in Real-World Domains

Shunyu Yao, Noah Shinn, Pedram Razavi et al.

ICLR 2025

citations

#2355

Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption

Du CHEN, Tianhe Wu, Kede Ma et al.

CVPR 2025arXiv:2503.11221

citations

#2356

Inference-Time Alignment of Diffusion Models with Direct Noise Optimization

Zhiwei Tang, Jiangweizhi Peng, Jiasheng Tang et al.

ICML 2025arXiv:2405.18881

citations

#2357

Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition

Hongda Liu, Yunfan Liu, Min Ren et al.

CVPR 2025highlightarXiv:2411.18941

citations

#2358

Sekai: A Video Dataset towards World Exploration

Zhen Li, Chuanhao Li, Xiaofeng Mao et al.

NEURIPS 2025arXiv:2506.15675

citations

#2359

BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning

Jianyang Gu, Sam Stevens, Elizabeth Campolongo et al.

NEURIPS 2025spotlightarXiv:2505.23883

citations

#2360

KAN-AD: Time Series Anomaly Detection with Kolmogorov–Arnold Networks

Quan Zhou, Changhua Pei, Fei Sun et al.

ICML 2025arXiv:2411.00278

citations

#2361

D^3: Scaling Up Deepfake Detection by Learning from Discrepancy

Yongqi Yang, Zhihao Qian, Ye Zhu et al.

CVPR 2025arXiv:2404.04584

citations

#2362

OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

Pengfei Zhou, Xiaopeng Peng, Jiajun Song et al.

CVPR 2025arXiv:2411.18499

citations

#2363

Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models

Ronghuan Wu, Wanchao Su, Jing Liao

CVPR 2025arXiv:2411.16602

citations

#2364

Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking

You Wu, Xucheng Wang, Xiangyang Yang et al.

CVPR 2025arXiv:2504.09228

citations

#2365

Cross-Embodiment Dexterous Grasping with Reinforcement Learning

Haoqi Yuan, Bohan Zhou, Yuhui Fu et al.

ICLR 2025arXiv:2410.02479

citations

#2366

X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention

XiaoChen Zhao, Hongyi Xu, Guoxian Song et al.

ICLR 2025arXiv:2507.23143

citations

#2367

Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning

Minheng Ni, YuTao Fan, Lei Zhang et al.

ICLR 2025arXiv:2410.03321

citations

#2368

Gradient-Free Generation for Hard-Constrained Systems

Chaoran Cheng, Boran Han, Danielle Maddix et al.

ICLR 2025arXiv:2412.01786

citations

#2369

EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting

Dong In Lee, Hyeongcheol Park, Jiyoung Seo et al.

CVPR 2025arXiv:2412.11520

citations

#2370

DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization

Zhenglin Zhou, Xiaobo Xia, Fan Ma et al.

ICML 2025arXiv:2502.04370

citations

#2371

Parrot: Multilingual Visual Instruction Tuning

Hai-Long Sun, Da-Wei Zhou, Yang Li et al.

ICML 2025arXiv:2406.02539

citations

#2372

CrossMPT: Cross-attention Message-passing Transformer for Error Correcting Codes

Seong-Joon Park, Hee-Youl Kwak, Sang-Hyo Kim et al.

ICLR 2025arXiv:2405.01033

citations

#2373

Rethinking Light Decoder-based Solvers for Vehicle Routing Problems

Ziwei Huang, Jianan Zhou, Zhiguang Cao et al.

ICLR 2025arXiv:2503.00753

citations

#2374

Occlusion-Embedded Hybrid Transformer for Light Field Super-Resolution

Zeyu Xiao, Zhuoyuan Li, Wei Jia

AAAI 2025paper

citations

#2375

MoFlow: One-Step Flow Matching for Human Trajectory Forecasting via Implicit Maximum Likelihood Estimation based Distillation

Yuxiang Fu, Qi Yan, Ke Li et al.

CVPR 2025arXiv:2503.09950

citations

#2376

BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning

Xuechen Zhang, Zijian Huang, Yingcong Li et al.

NEURIPS 2025arXiv:2506.17211

citations

#2377

Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation

Yuseung Lee, Jihyeon Je, Chanho Park et al.

ICCV 2025arXiv:2504.17207

citations

#2378

CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation

Nikolai Kalischek, Michael Oechsle, Fabian Manhardt et al.

ICLR 2025arXiv:2501.17162

citations

#2379

PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology

Fatemeh Ghezloo, Saygin Seyfioglu, Rustin Soraki et al.

ICCV 2025arXiv:2502.08916

citations

#2380

Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs

Barrett Tang, Zile Huang, Chengzhi Liu et al.

ICLR 2025

citations

#2381

AdaDiff: Adaptive Step Selection for Fast Diffusion Models

Hui Zhang, Zuxuan Wu, Zhen Xing et al.

AAAI 2025paperarXiv:2311.14768

citations

#2382

Grokking at the Edge of Numerical Stability

Lucas Prieto, Melih Barsbey, Pedro Mediano et al.

ICLR 2025arXiv:2501.04697

citations

#2383

Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay

Yifan Sun, Jingyan Shen, Yibin Wang et al.

NEURIPS 2025arXiv:2506.05316

citations

#2384

Text2PDE: Latent Diffusion Models for Accessible Physics Simulation

Anthony Zhou, Zijie Li, Michael Schneier et al.

ICLR 2025oralarXiv:2410.01153

citations

#2385

RocketEval: Efficient automated LLM evaluation via grading checklist

Tianjun Wei, Wei Wen, Ruizhi Qiao et al.

ICLR 2025arXiv:2503.05142

citations

#2386

Learning General-purpose Biomedical Volume Representations using Randomized Synthesis

Neel Dey, Benjamin Billot, Hallee Wong et al.

ICLR 2025arXiv:2411.02372

citations

#2387

DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation

Bo-Wen Yin, Jiao-Long Cao, Ming-Ming Cheng et al.

CVPR 2025arXiv:2504.04701

citations

#2388

Controlling Large Language Models Through Concept Activation Vectors

Hanyu Zhang, Xiting Wang, Chengao Li et al.

AAAI 2025paperarXiv:2501.05764

citations

#2389

CAPTURE: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting

Atin Pothiraj, Jaemin Cho, Elias Stengel-Eskin et al.

ICCV 2025arXiv:2504.15485

citations

#2390

Enhancing Chain of Thought Prompting in Large Language Models via Reasoning Patterns

Yufeng Zhang, Xuepeng Wang, Lingxiang Wu et al.

AAAI 2025paperarXiv:2404.14812

citations

#2391

Sort-free Gaussian Splatting via Weighted Sum Rendering

Qiqi Hou, Randall Rauwendaal, Zifeng Li et al.

ICLR 2025arXiv:2410.18931

citations

#2392

ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination

Xinxin Zhao, Wenzhe Cai, Likun Tang et al.

ICLR 2025arXiv:2410.09874

citations

#2393

Perturbation-Restrained Sequential Model Editing

Jun-Yu Ma, Hong Wang, Hao-Xiang Xu et al.

ICLR 2025arXiv:2405.16821

citations

#2394

Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders

Fiona Ryan, Ajay Bati, Sangmin Lee et al.

CVPR 2025highlightarXiv:2412.09586

citations

#2395

EmoEdit: Evoking Emotions through Image Manipulation

Jingyuan Yang, Jiawei Feng, Weibin Luo et al.

CVPR 2025arXiv:2405.12661

citations

#2396

Multi-Sensor Object Anomaly Detection: Unifying Appearance, Geometry, and Internal Properties

wenqiao Li, BoZhong Zheng, Xiaohao Xu et al.

CVPR 2025arXiv:2412.14592

citations

#2397

Modeling Complex System Dynamics with Flow Matching Across Time and Conditions

Martin Rohbeck, Edward De Brouwer, Charlotte Bunne et al.

ICLR 2025oral

citations

#2398

Investigating Non-Transitivity in LLM-as-a-Judge

Yi Xu, Laura Ruis, Tim Rocktäschel et al.

ICML 2025spotlightarXiv:2502.14074

citations

#2399

LaVin-DiT: Large Vision Diffusion Transformer

Zhaoqing Wang, Xiaobo Xia, Runnan Chen et al.

CVPR 2025arXiv:2411.11505

citations

#2400

In Search of Adam’s Secret Sauce

Antonio Orvieto, Robert Gower

NEURIPS 2025oralarXiv:2505.21829

citations

← Previous

1...10 11 12 13 14...112