Most Cited 2025 "linear rewards" Papers

22,274 papers found • Page 33 of 112

#6401

SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization

Hongrui Jia, Chaoya Jiang, Haiyang Xu et al.

CVPR 2025arXiv:2411.11909
7
citations
#6402

MI-DETR: An Object Detection Model with Multi-time Inquiries Mechanism

Zhixiong Nan, Xianghong Li, Tao Xiang et al.

CVPR 2025arXiv:2503.01463
7
citations
#6403

Robust Federated Finetuning of LLMs via Alternating Optimization of LoRA

Shuangyi Chen, Yuanxin Guo, Yue Ju et al.

NEURIPS 2025arXiv:2502.01755
7
citations
#6404

APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers

Zhuguanyu Wu, Jiayi Zhang, Jiaxin Chen et al.

CVPR 2025arXiv:2504.02508
7
citations
#6405

Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs

Zeyi Huang, Yuyang Ji, Xiaofang Wang et al.

CVPR 2025arXiv:2501.04336
7
citations
#6406

Spatial Understanding from Videos: Structured Prompts Meet Simulation Data

Haoyu Zhang, Meng Liu, Zaijing Li et al.

NEURIPS 2025spotlightarXiv:2506.03642
7
citations
#6407

Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences

Hyojin Bahng, Caroline Chan, Fredo Durand et al.

ICCV 2025arXiv:2506.02095
7
citations
#6408

Integral Imprecise Probability Metrics

Siu Lun (Alan) Chau, Michele Caprio, Krikamol Muandet

NEURIPS 2025arXiv:2505.16156
7
citations
#6409

Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization

Mingzhe Du, Anh Tuan Luu, Yue Liu et al.

NEURIPS 2025arXiv:2505.23387
7
citations
#6410

SubTrack++ : Gradient Subspace Tracking for Scalable LLM Training

Sahar Rajabi, Nayeema Nonta, Sirisha Rambhatla

NEURIPS 2025arXiv:2502.01586
7
citations
#6411

Co-op: Correspondence-based Novel Object Pose Estimation

Sungphill Moon, Hyeontae Son, Dongcheol Hur et al.

CVPR 2025arXiv:2503.17731
7
citations
#6412

Language Driven Occupancy Prediction

Zhu Yu, Bowen Pang, Lizhe Liu et al.

ICCV 2025arXiv:2411.16072
7
citations
#6413

TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models

Ruidong Chen, honglin guo, Lanjun Wang et al.

ICCV 2025arXiv:2503.07389
7
citations
#6414

LazyMAR: Accelerating Masked Autoregressive Models via Feature Caching

Feihong Yan, qingyan wei, Jiayi Tang et al.

ICCV 2025arXiv:2503.12450
7
citations
#6415

MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent

Xinyao Liao, Xianfang Zeng, Liao Wang et al.

ICCV 2025arXiv:2502.03207
7
citations
#6416

OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images

Ziyue Huang, Yongchao Feng, Ziqi Liu et al.

ICCV 2025arXiv:2503.06146
7
citations
#6417

VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models

Byung-Kwan Lee, Ryo Hachiuma, Yu-Chiang Frank Wang et al.

CVPR 2025arXiv:2412.01822
7
citations
#6418

AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations

Junli Liu, Qizhi Chen, Zhigang Wang et al.

ICCV 2025arXiv:2504.07836
7
citations
#6419

CWNet: Causal Wavelet Network for Low-Light Image Enhancement

Tongshun Zhang, Pingping Liu, Yubing Lu et al.

ICCV 2025arXiv:2507.10689
7
citations
#6420

Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities

Liuyi Wang, Xinyuan Xia, Hui Zhao et al.

ICCV 2025arXiv:2507.13019
7
citations
#6421

GARF: Learning Generalizable 3D Reassembly for Real-World Fractures

Sihang Li, Zeyu Jiang, Grace Chen et al.

ICCV 2025arXiv:2504.05400
7
citations
#6422

ODDR: Outlier Detection & Dimension Reduction Based Defense Against Adversarial Patches

Nandish Chattopadhyay, Amira Guesmi, Muhammad Abdullah Hanif et al.

ICCV 2025arXiv:2311.12084
7
citations
#6423

StreamGS: Online Generalizable Gaussian Splatting Reconstruction for Unposed Image Streams

Yang LI, Jinglu Wang, Lei Chu et al.

ICCV 2025arXiv:2503.06235
7
citations
#6424

ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting

Ruijie Zhu, Mulin Yu, Linning Xu et al.

ICCV 2025arXiv:2507.15454
7
citations
#6425

TruthPrInt: Mitigating Large Vision-Language Models Object Hallucination Via Latent Truthful-Guided Pre-Intervention

Jinhao Duan, Fei Kong, Hao Cheng et al.

ICCV 2025
7
citations
#6426

RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation

Kaidong Zhang, Rongtao Xu, Ren Pengzhen et al.

ICCV 2025arXiv:2505.01709
7
citations
#6427

Can Generative Video Models Help Pose Estimation?

Ruojin Cai, Jason Y. Zhang, Philipp Henzler et al.

CVPR 2025highlightarXiv:2412.16155
7
citations
#6428

RealEdit: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations

Peter Sushko, Ayana Bharadwaj, Zhi Yang Lim et al.

CVPR 2025arXiv:2502.03629
7
citations
#6429

Enhancing 3D Reconstruction for Dynamic Scenes

Jisang Han, Honggyu An, Jaewoo Jung et al.

NEURIPS 2025oralarXiv:2504.06264
7
citations
#6430

Object-centric binding in Contrastive Language-Image Pretraining

Rim Assouel, Pietro Astolfi, Florian Bordes et al.

NEURIPS 2025arXiv:2502.14113
7
citations
#6431

Multimodal LLMs as Customized Reward Models for Text-to-Image Generation

Shijie Zhou, Ruiyi Zhang, Huaisheng Zhu et al.

ICCV 2025arXiv:2507.21391
7
citations
#6432

AgroBench: Vision-Language Model Benchmark in Agriculture

Risa Shinoda, Nakamasa Inoue, Hirokatsu Kataoka et al.

ICCV 2025arXiv:2507.20519
7
citations
#6433

DiffLocks: Generating 3D Hair from a Single Image using Diffusion Models

Radu Alexandru Rosu, Keyu Wu, Yao Feng et al.

CVPR 2025arXiv:2505.06166
7
citations
#6434

FOCUS: Knowledge-enhanced Adaptive Visual Compression for Few-shot Whole Slide Image Classification

Zhengrui Guo, Conghao Xiong, Jiabo MA et al.

CVPR 2025arXiv:2411.14743
7
citations
#6435

Salvaging the Overlooked: Leveraging Class-Aware Contrastive Learning for Multi-Class Anomaly Detection

Lei Fan, Junjie Huang, Donglin Di et al.

ICCV 2025arXiv:2412.04769
7
citations
#6436

GS-LIVM: Real-Time Photo-Realistic LiDAR-Inertial-Visual Mapping with Gaussian Splatting

Yusen XIE, Zhenmin Huang, Jin Wu et al.

ICCV 2025arXiv:2410.17084
7
citations
#6437

What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?

Jinhong Ni, Chang-Bin Zhang, Qiang Zhang et al.

ICCV 2025arXiv:2505.22129
7
citations
#6438

The emergence of sparse attention: impact of data distribution and benefits of repetition

Nicolas Zucchet, Francesco D'Angelo, Andrew Lampinen et al.

NEURIPS 2025oralarXiv:2505.17863
7
citations
#6439

MobileIE: An Extremely Lightweight and Effective ConvNet for Real-Time Image Enhancement on Mobile Devices

HAILONG YAN, Ao Li, Xiangtao Zhang et al.

ICCV 2025arXiv:2507.01838
7
citations
#6440

Video2BEV: Transforming Drone Videos to BEVs for Video-based Geo-localization

Hao Ju, Shaofei Huang, Si Liu et al.

ICCV 2025arXiv:2411.13610
7
citations
#6441

ARIG: Autoregressive Interactive Head Generation for Real-time Conversations

Ying Guo, Xi Liu, Cheng Zhen et al.

ICCV 2025arXiv:2507.00472
7
citations
#6442

Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing

Yudong Liu, Jingwei Sun, Yueqian Lin et al.

ICCV 2025arXiv:2503.10742
7
citations
#6443

Vision-Language Models Can't See the Obvious

YASSER ABDELAZIZ DAHOU DJILALI, Ngoc Huynh, Phúc Lê Khắc et al.

ICCV 2025arXiv:2507.04741
7
citations
#6444

DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding

Jungbin Cho, Junwan Kim, Jisoo Kim et al.

ICCV 2025highlightarXiv:2411.19527
7
citations
#6445

Semantic Watermarking Reinvented: Enhancing Robustness and Generation Quality with Fourier Integrity

Sung Ju Lee, Nam Ik Cho

ICCV 2025arXiv:2509.07647
7
citations
#6446

Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos

Rundong Luo, Matthew Wallingford, Ali Farhadi et al.

ICCV 2025arXiv:2504.07940
7
citations
#6447

REDUCIO! Generating 1K Video within 16 Seconds using Extremely Compressed Motion Latents

Rui Tian, Qi Dai, Jianmin Bao et al.

ICCV 2025arXiv:2411.13552
7
citations
#6448

Parameter-Efficient Adaptation of Geospatial Foundation Models through Embedding Deflection

Romain Thoreau, Valerio Marsocci, Dawa Derksen

ICCV 2025arXiv:2503.09493
7
citations
#6449

Riemannian-Geometric Fingerprints of Generative Models

Hae Jin Song, Laurent Itti

ICCV 2025highlightarXiv:2506.22802
7
citations
#6450

SITE: towards Spatial Intelligence Thorough Evaluation

Wenqi Wang, Reuben Tan, Pengyue Zhu et al.

ICCV 2025arXiv:2505.05456
7
citations
#6451

TR-PTS: Task-Relevant Parameter and Token Selection for Efficient Tuning

Siqi Luo, Haoran Yang, Yi Xin et al.

ICCV 2025arXiv:2507.22872
7
citations
#6452

SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing

Yingying Zhang, Lixiang Ru, Kang Wu et al.

ICCV 2025arXiv:2507.13812
7
citations
#6453

Federated Continual Instruction Tuning

Haiyang Guo, Fanhu Zeng, Fei Zhu et al.

ICCV 2025arXiv:2503.12897
7
citations
#6454

Event-based Tiny Object Detection: A Benchmark Dataset and Baselines

Nuo Chen, Chao Xiao, Yimian Dai et al.

ICCV 2025arXiv:2506.23575
7
citations
#6455

Accelerating Diffusion Transformer via Gradient-Optimized Cache

Junxiang Qiu, Lin Liu, Shuo Wang et al.

ICCV 2025arXiv:2503.05156
7
citations
#6456

Object-centric Video Question Answering with Visual Grounding and Referring

Haochen Wang, Qirui Chen, Cilin Yan et al.

ICCV 2025arXiv:2507.19599
7
citations
#6457

StableCodec: Taming One-Step Diffusion for Extreme Image Compression

Tianyu Zhang, Xin Luo, Li Li et al.

ICCV 2025arXiv:2506.21977
7
citations
#6458

PlanGen: Towards Unified Layout Planning and Image Generation in Auto-Regressive Vision Language Models

Runze He, bo cheng, Yuhang Ma et al.

ICCV 2025arXiv:2503.10127
7
citations
#6459

GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography

Mengchen Zhang, Tong Wu, Jing Tan et al.

ICCV 2025arXiv:2504.07083
7
citations
#6460

SynFER: Towards Boosting Facial Expression Recognition with Synthetic Data

Xilin He, Cheng Luo, Xiaole Xian et al.

ICCV 2025arXiv:2410.09865
7
citations
#6461

Dynamic Typography: Bringing Text to Life via Video Diffusion Prior

Zichen Liu, Yihao Meng, Hao Ouyang et al.

ICCV 2025arXiv:2404.11614
7
citations
#6462

FlexGen: Flexible Multi-View Generation from Text and Image Inputs

Xinli Xu, Wenhang Ge, Jiantao Lin et al.

ICCV 2025arXiv:2410.10745
7
citations
#6463

Importance-Based Token Merging for Efficient Image and Video Generation

Haoyu Wu, Jingyi Xu, Hieu Le et al.

ICCV 2025arXiv:2411.16720
7
citations
#6464

Towards Real Unsupervised Anomaly Detection Via Confident Meta-Learning

Muhammad Aqeel, Shakiba Sharifi, Marco Cristani et al.

ICCV 2025arXiv:2508.02293
7
citations
#6465

CCMNet: Leveraging Calibrated Color Correction Matrices for Cross-Camera Color Constancy

Dongyoung Kim, Mahmoud Afifi, Dongyun Kim et al.

ICCV 2025arXiv:2504.07959
7
citations
#6466

FlowR: Flowing from Sparse to Dense 3D Reconstructions

Tobias Fischer, Samuel Rota Bulò, Yung-Hsu Yang et al.

ICCV 2025highlightarXiv:2504.01647
7
citations
#6467

Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning

Jingjing Jiang, Chao Ma, Xurui Song et al.

ICCV 2025highlightarXiv:2507.07424
7
citations
#6468

RigGS: Rigging of 3D Gaussians for Modeling Articulated Objects in Videos

Yuxin Yao, Zhi Deng, Junhui Hou

CVPR 2025arXiv:2503.16822
7
citations
#6469

MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges?

Yunxiang Zhang, Muhammad Khalifa, Shitanshu Bhushan et al.

NEURIPS 2025arXiv:2504.09702
7
citations
#6470

Object-Shot Enhanced Grounding Network for Egocentric Video

Yisen Feng, Haoyu Zhang, Meng Liu et al.

CVPR 2025arXiv:2505.04270
7
citations
#6471

T-SHIRT: Token-Selective Hierarchical Data Selection for Instruction Tuning

Yanjun Fu, Faisal Hamman, Sanghamitra Dutta

NEURIPS 2025arXiv:2506.01317
7
citations
#6472

Robust Multimodal Survival Prediction with Conditional Latent Differentiation Variational AutoEncoder

Junjie Zhou, Jiao Tang, Yingli Zuo et al.

CVPR 2025
7
citations
#6473

Sim-to-Real Causal Transfer: A Metric Learning Approach to Causally-Aware Interaction Representations

Ahmad Rahimi, Po-Chien Luan, Yuejiang Liu et al.

CVPR 2025arXiv:2312.04540
7
citations
#6474

COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training

Sanghwan Kim, Rui Xiao, Iuliana Georgescu et al.

CVPR 2025arXiv:2412.01814
7
citations
#6475

PICD: Versatile Perceptual Image Compression with Diffusion Rendering

Tongda Xu, Jiahao Li, Bin Li et al.

CVPR 2025arXiv:2505.05853
7
citations
#6476

LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration

Yuyao Zhang, Jinghao Li, Yu-Wing Tai

NEURIPS 2025arXiv:2504.00010
7
citations
#6477

Compiler-R1: Towards Agentic Compiler Auto-tuning with Reinforcement Learning

Haolin Pan, Hongyu Lin, Haoran Luo et al.

NEURIPS 2025arXiv:2506.15701
7
citations
#6478

Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation

Shuling Zhao, Fa-Ting Hong, Xiaoshui Huang et al.

CVPR 2025arXiv:2412.00719
7
citations
#6479

On the Relation between Rectified Flows and Optimal Transport

Johannes Hertrich, Antonin Chambolle, Julie Delon

NEURIPS 2025arXiv:2505.19712
7
citations
#6480

Straight-Line Diffusion Model for Efficient 3D Molecular Generation

Yuyan Ni, Shikun Feng, Haohan Chi et al.

NEURIPS 2025arXiv:2503.02918
7
citations
#6481

Temporal Alignment-Free Video Matching for Few-shot Action Recognition

SuBeen Lee, WonJun Moon, Hyun Seok Seong et al.

CVPR 2025arXiv:2504.05956
7
citations
#6482

Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models

Cameron Tice, Philipp Kreer, Nathan Helm-Burger et al.

NEURIPS 2025arXiv:2412.01784
7
citations
#6483

FreqDebias: Towards Generalizable Deepfake Detection via Consistency-Driven Frequency Debiasing

Hossein Kashiani, Niloufar Alipour Talemi, Fatemeh Afghah

CVPR 2025arXiv:2509.22412
7
citations
#6484

MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception

Wenzhuo Liu, Wenshuo Wang, Yicheng Qiao et al.

CVPR 2025arXiv:2504.02264
7
citations
#6485

M3amba: Memory Mamba is All You Need for Whole Slide Image Classification

Tingting Zheng, Kui Jiang, Yi Xiao et al.

CVPR 2025
7
citations
#6486

Multitwine: Multi-Object Compositing with Text and Layout Control

Gemma Canet Tarrés, Zhe Lin, Zhifei Zhang et al.

CVPR 2025highlightarXiv:2502.05165
6
citations
#6487

PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Mobius Spatial Augmentation

Zidong Cao, Jinjing Zhu, Weiming Zhang et al.

CVPR 2025arXiv:2406.13378
6
citations
#6488

TCFG: Tangential Damping Classifier-free Guidance

Mingi Kwon, Shin seong Kim, Jaeseok Jeong et al.

CVPR 2025arXiv:2503.18137
6
citations
#6489

Navigating Image Restoration with VAR’s Distribution Alignment Prior

Siyang Wang, Naishan Zheng, Jie Huang et al.

CVPR 2025arXiv:2412.21063
6
citations
#6490

Real-IAD D³: A Real-World 2D/Pseudo-3D/3D Dataset for Industrial Anomaly Detection

wenbing zhu, Lidong Wang, Ziqing Zhou et al.

CVPR 2025
6
citations
#6491

DVHGNN: Multi-Scale Dilated Vision HGNN for Efficient Vision Recognition

Caoshuo Li, Tanzhe Li, Xiaobin Hu et al.

CVPR 2025arXiv:2503.14867
6
citations
#6492

ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions

Tomas Soucek, Prajwal Gatti, Michael Wray et al.

CVPR 2025arXiv:2412.01987
6
citations
#6493

Shape it Up! Restoring LLM Safety during Finetuning

ShengYun Peng, Pin-Yu Chen, Jianfeng Chi et al.

NEURIPS 2025arXiv:2505.17196
6
citations
#6494

FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution

Gene Chou, Wenqi Xian, Guandao Yang et al.

ICCV 2025highlightarXiv:2504.07093
6
citations
#6495

Scaffolding Dexterous Manipulation with Vision-Language Models

Vincent de Bakker, Joey Hejna, Tyler Lum et al.

NEURIPS 2025arXiv:2506.19212
6
citations
#6496

Beyond Local Sharpness: Communication-Efficient Global Sharpness-aware Minimization for Federated Learning

Debora Caldarola, Pietro Cagnasso, Barbara Caputo et al.

CVPR 2025arXiv:2412.03752
6
citations
#6497

CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation

Reza Abbasi, Ali Nazari, Aminreza Sefid et al.

CVPR 2025arXiv:2502.19842
6
citations
#6498

Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization

Feifei Li, Mi Zhang, Yiming Sun et al.

CVPR 2025arXiv:2503.15197
6
citations
#6499

Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions

Chan Hur, Jeong-hun Hong, Dong-hun Lee et al.

CVPR 2025arXiv:2503.05186
6
citations
#6500

Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes

Aodi Li, Liansheng Zhuang, Xiao Long et al.

CVPR 2025arXiv:2412.13573
6
citations
#6501

Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning

Ali Taghibakhshi, Sharath Turuvekere Sreenivas, Saurav Muralidharan et al.

NEURIPS 2025arXiv:2504.11409
6
citations
#6502

FlexOLMo: Open Language Models for Flexible Data Use

Weijia Shi, Akshita Bhagia, Kevin Farhat et al.

NEURIPS 2025spotlightarXiv:2507.07024
6
citations
#6503

Robust 3D Shape Reconstruction in Zero-Shot from a Single Image in the Wild

Junhyeong Cho, Kim Youwang, Hunmin Yang et al.

CVPR 2025arXiv:2403.14539
6
citations
#6504

Generalizable, real-time neural decoding with hybrid state-space models

Avery Hee-Woon Ryoo, Nanda H Krishna, Ximeng Mao et al.

NEURIPS 2025arXiv:2506.05320
6
citations
#6505

SDGOCC: Semantic and Depth-Guided Bird's-Eye View Transformation for 3D Multimodal Occupancy Prediction

ZaiPeng Duan, Xuzhong Hu, Pei An et al.

CVPR 2025arXiv:2507.17083
6
citations
#6506

Walking the Tightrope: Autonomous Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning

Xiaoyu Yang, Jie Lu, En Yu

NEURIPS 2025oral
6
citations
#6507

Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation

Reza Qorbani, Gianluca Villani, Theodoros Panagiotakopoulos et al.

CVPR 2025arXiv:2503.21780
6
citations
#6508

Audio-Sync Video Generation with Multi-Stream Temporal Control

Shuchen Weng, Haojie Zheng, zheng chang et al.

NEURIPS 2025oralarXiv:2506.08003
6
citations
#6509

DLF: Extreme Image Compression with Dual-generative Latent Fusion

Naifu Xue, Zhaoyang Jia, Jiahao Li et al.

ICCV 2025highlightarXiv:2503.01428
6
citations
#6510

Pareto-Optimal Energy Alignment for Designing Nature-Like Antibodies

Yibo Wen, Chenwei Xu, Jerry Yao-Chieh Hu et al.

NEURIPS 2025arXiv:2412.20984
6
citations
#6511

A Stable Whitening Optimizer for Efficient Neural Network Training

Kevin Frans, Sergey Levine, Pieter Abbeel

NEURIPS 2025arXiv:2506.07254
6
citations
#6512

Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws

Zhixuan Pan, Shaowen Wang, Liao Pengfei et al.

NEURIPS 2025spotlightarXiv:2504.09597
6
citations
#6513

GeoSVR: Taming Sparse Voxels for Geometrically Accurate Surface Reconstruction

Jiahe Li, Jiawei Zhang, Youmin Zhang et al.

NEURIPS 2025spotlightarXiv:2509.18090
6
citations
#6514

Estimating Model Performance Under Covariate Shift Without Labels

Jakub Białek, Juhani Kivimäki, Wojciech Kuberski et al.

NEURIPS 2025arXiv:2401.08348
6
citations
#6515

Hearing Anywhere in Any Environment

Xiulong Liu, Anurag Kumar, Paul Calamia et al.

CVPR 2025arXiv:2504.10746
6
citations
#6516

One-Step Offline Distillation of Diffusion-based Models via Koopman Modeling

Nimrod Berman, Ilan Naiman, Moshe Eliasof et al.

NEURIPS 2025arXiv:2505.13358
6
citations
#6517

Time-IMM: A Dataset and Benchmark for Irregular Multimodal Multivariate Time Series

Ching Chang, Jeehyun Hwang, Yidan Shi et al.

NEURIPS 2025arXiv:2506.10412
6
citations
#6518

A Tale of Two Symmetries: Exploring the Loss Landscape of Equivariant Models

YuQing Xie, Tess Smidt

NEURIPS 2025arXiv:2506.02269
6
citations
#6519

Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

Zhengyao Lyu, Tianlin Pan, Chenyang Si et al.

ICCV 2025arXiv:2506.07986
6
citations
#6520

DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding

Yudong Han, Qingpei Guo, Liyuan Pan et al.

CVPR 2025arXiv:2411.12355
6
citations
#6521

GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting

Zixuan Chen, Guangcong Wang, Jiahao Zhu et al.

CVPR 2025arXiv:2411.19895
6
citations
#6522

Normalized Attention Guidance: Universal Negative Guidance for Diffusion Models

Dar-Yen Chen, Hmrishav Bandyopadhyay, Kai Zou et al.

NEURIPS 2025arXiv:2505.21179
6
citations
#6523

Seeing the Arrow of Time in Large Multimodal Models

Zihui (Sherry) Xue, Romy Luo, Kristen Grauman

NEURIPS 2025oralarXiv:2506.03340
6
citations
#6524

Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval

Davide Caffagni, Sara Sarto, Marcella Cornia et al.

CVPR 2025arXiv:2503.01980
6
citations
#6525

DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction

Ben Kaye, Tomas Jakab, Shangzhe Wu et al.

CVPR 2025highlightarXiv:2412.04464
6
citations
#6526

Keyframe-Guided Creative Video Inpainting

Yuwei Guo, Ceyuan Yang, Anyi Rao et al.

CVPR 2025
6
citations
#6527

Absorb and Converge: Provable Convergence Guarantee for Absorbing Discrete Diffusion Models

Yuchen Liang, Renxiang Huang, Lifeng LAI et al.

NEURIPS 2025arXiv:2506.02318
6
citations
#6528

CountLLM: Towards Generalizable Repetitive Action Counting via Large Language Model

Ziyu Yao, Xuxin Cheng, Zhiqi Huang et al.

CVPR 2025arXiv:2503.17690
6
citations
#6529

Parametric Point Cloud Completion for Polygonal Surface Reconstruction

Zhaiyu Chen, Yuqing Wang, Liangliang Nan et al.

CVPR 2025arXiv:2503.08363
6
citations
#6530

AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization

Yiyang Du, Xiaochen Wang, Chi Chen et al.

CVPR 2025arXiv:2503.23733
6
citations
#6531

POT: Prototypical Optimal Transport for Weakly Supervised Semantic Segmentation

Jian Wang, Tianhong Dai, Bingfeng Zhang et al.

CVPR 2025
6
citations
#6532

3D-MVP: 3D Multiview Pretraining for Manipulation

Shengyi Qian, Kaichun Mo, Valts Blukis et al.

CVPR 2025
6
citations
#6533

Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking

Liangliang Zhang, Zhuorui Jiang, Hongliang Chi et al.

NEURIPS 2025arXiv:2505.23495
6
citations
#6534

3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View Transformation

Gyeongrok Oh, Sung June Kim, Heeju Ko et al.

CVPR 2025arXiv:2503.15185
6
citations
#6535

Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving of Inequalities

Haoyu Zhao, Yihan Geng, Shange Tang et al.

NEURIPS 2025arXiv:2505.12680
6
citations
#6536

LaTexBlend: Scaling Multi-concept Customized Generation with Latent Textual Blending

Jian Jin, Zhenbo Yu, Yang Shen et al.

CVPR 2025highlightarXiv:2503.06956
6
citations
#6537

AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs

Sanjoy Chowdhury, Hanan Gani, Nishit Anand et al.

ICCV 2025arXiv:2503.23219
6
citations
#6538

KAC: Kolmogorov-Arnold Classifier for Continual Learning

Yusong Hu, Zichen Liang, Fei Yang et al.

CVPR 2025highlightarXiv:2503.21076
6
citations
#6539

Hyperbolic Safety-Aware Vision-Language Models

Tobia Poppi, Tejaswi Kasarla, Pascal Mettes et al.

CVPR 2025highlightarXiv:2503.12127
6
citations
#6540

Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding

Yan Wang, Baoxiong Jia, Ziyu Zhu et al.

CVPR 2025arXiv:2504.19500
6
citations
#6541

Point Clouds Meets Physics: Dynamic Acoustic Field Fitting Network for Point Cloud Understanding

Changshuo Wang, Shuting He, Xiang Fang et al.

CVPR 2025
6
citations
#6542

High-Dimensional Calibration from Swap Regret

Maxwell Fishelson, Noah Golowich, Mehryar Mohri et al.

NEURIPS 2025oralarXiv:2505.21460
6
citations
#6543

Learning with Calibration: Exploring Test-Time Computing of Spatio-Temporal Forecasting

Wei Chen, Yuxuan Liang

NEURIPS 2025oralarXiv:2506.00635
6
citations
#6544

Visual Persona: Foundation Model for Full-Body Human Customization

Jisu Nam, Soowon Son, Zhan Xu et al.

CVPR 2025arXiv:2503.15406
6
citations
#6545

COME: Adding Scene-Centric Forecasting Control to Occupancy World Model

Yining Shi, Kun Jiang, Qiang Meng et al.

NEURIPS 2025oralarXiv:2506.13260
6
citations
#6546

Improved Representation Steering for Language Models

Zhengxuan Wu, Qinan Yu, Aryaman Arora et al.

NEURIPS 2025spotlightarXiv:2505.20809
6
citations
#6547

Improving the Transferability of Adversarial Attacks on Face Recognition with Diverse Parameters Augmentation

Fengfan Zhou, Bangjie Yin, Hefei Ling et al.

CVPR 2025arXiv:2411.15555
6
citations
#6548

A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation

Andrew Z Wang, Songwei Ge, Tero Karras et al.

CVPR 2025arXiv:2506.08210
6
citations
#6549

HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models

ZHIXIANG WEI, Guangting Wang, Xiaoxiao Ma et al.

ICCV 2025arXiv:2507.22431
6
citations
#6550

Fast Inference for Augmented Large Language Models

Rana Shahout, Cong Liang, Shiji Xin et al.

NEURIPS 2025arXiv:2410.18248
6
citations
#6551

Týr-the-Pruner: Structural Pruning LLMs via Global Sparsity Distribution Optimization

Guanchen Li, Yixing Xu, Zeping Li et al.

NEURIPS 2025arXiv:2503.09657
6
citations
#6552

R-LiViT: A LiDAR-Visual-Thermal Dataset Enabling Vulnerable Road User Focused Roadside Perception

Jonas Mirlach, Lei Wan, Andreas Wiedholz et al.

ICCV 2025arXiv:2503.17122
6
citations
#6553

LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers

Yusuf Dalva, Hidir Yesiltepe, Pinar Yanardag

NEURIPS 2025spotlightarXiv:2505.23758
6
citations
#6554

MIRE: Matched Implicit Neural Representations

Dhananjaya Jayasundara, Heng Zhao, Demetrio Labate et al.

CVPR 2025
6
citations
#6555

U-Know-DiffPAN: An Uncertainty-aware Knowledge Distillation Diffusion Framework with Details Enhancement for PAN-Sharpening

Sungpyo Kim, Jeonghyeok Do, Jaehyup Lee et al.

CVPR 2025arXiv:2412.06243
6
citations
#6556

Seg4Diff: Unveiling Open-Vocabulary Semantic Segmentation in Text-to-Image Diffusion Transformers

Chaehyun Kim, Heeseong Shin, Eunbeen Hong et al.

NEURIPS 2025
6
citations
#6557

Realistic Test-Time Adaptation of Vision-Language Models

Maxime Zanella, Clément Fuchs, Christophe De Vleeschouwer et al.

CVPR 2025highlightarXiv:2501.03729
6
citations
#6558

Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks

Vishnu Sarukkai, Zhiqiang Xie, Kayvon Fatahalian

NEURIPS 2025arXiv:2505.00234
6
citations
#6559

MATCHA: Towards Matching Anything

Fei Xue, Sven Elflein, Laura Leal-Taixe et al.

CVPR 2025highlight
6
citations
#6560

Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning

Julian Minder, Clément Dumas, Caden Juang et al.

NEURIPS 2025arXiv:2504.02922
6
citations
#6561

Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training

Will Merrill, Shane Arora, Dirk Groeneveld et al.

NEURIPS 2025spotlightarXiv:2505.23971
6
citations
#6562

On scalable and efficient training of diffusion samplers

Minkyu Kim, Kiyoung Seong, Dongyeop Woo et al.

NEURIPS 2025arXiv:2505.19552
6
citations
#6563

T2V-OptJail: Discrete Prompt Optimization for Text-to-Video Jailbreak Attacks

Jiayang Liu, Siyuan Liang, Shiqian Zhao et al.

NEURIPS 2025arXiv:2505.06679
6
citations
#6564

Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond)

Tomer Garber, Tom Tirer

CVPR 2025arXiv:2412.20596
6
citations
#6565

CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from A Single-View Image

Wonseok Roh, Hwanhee Jung, JongWook Kim et al.

ICCV 2025arXiv:2412.12906
6
citations
#6566

HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning

Zhi Jing, Siyuan Yang, Jicong Ao et al.

NEURIPS 2025arXiv:2507.00833
6
citations
#6567

GaussianVideo: Efficient Video Representation via Hierarchical Gaussian Splatting

Andrew Bond, Jui-Hsien Wang, Long Mai et al.

ICCV 2025arXiv:2501.04782
6
citations
#6568

MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems

Xuanming Zhang, Yuxuan Chen, Samuel (Min-Hsuan) Yeh et al.

NEURIPS 2025oralarXiv:2505.18943
6
citations
#6569

PEER Pressure: Model-to-Model Regularization for Single Source Domain Generalization

Dongkyu Cho, Inwoo Hwang, Sanghack Lee

CVPR 2025arXiv:2505.12745
6
citations
#6570

ZeroStereo: Zero-shot Stereo Matching from Single Images

Xianqi Wang, Hao Yang, Gangwei Xu et al.

ICCV 2025arXiv:2501.08654
6
citations
#6571

Unified Dense Prediction of Video Diffusion

Lehan Yang, Lu Qi, Xiangtai Li et al.

CVPR 2025arXiv:2503.09344
6
citations
#6572

ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding

Guangda Ji, Silvan Weder, Francis Engelmann et al.

CVPR 2025arXiv:2410.13924
6
citations
#6573

PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model

Mingju Gao, Yike Pan, Huan-ang Gao et al.

CVPR 2025arXiv:2503.19913
6
citations
#6574

Satellite Observations Guided Diffusion Model for Accurate Meteorological States at Arbitrary Resolution

Siwei Tu, Ben Fei, Weidong Yang et al.

CVPR 2025highlightarXiv:2502.07814
6
citations
#6575

Optimal Spectral Transitions in High-Dimensional Multi-Index Models

Leonardo Defilippis, Yatin Dandi, Pierre Mergny et al.

NEURIPS 2025arXiv:2502.02545
6
citations
#6576

Driving View Synthesis on Free-form Trajectories with Generative Prior

Zeyu Yang, Zijie Pan, Yuankun Yang et al.

ICCV 2025arXiv:2412.01717
6
citations
#6577

Uniform Generalization Bounds on Data-Dependent Hypothesis Sets via PAC-Bayesian Theory on Random Sets

Benjamin Dupuis, Paul Viallard, George Deligiannidis et al.

NEURIPS 2025arXiv:2404.17442
6
citations
#6578

Nonisotropic Gaussian Diffusion for Realistic 3D Human Motion Prediction

Cecilia Curreli, Dominik Muhle, Abhishek Saroha et al.

CVPR 2025arXiv:2501.06035
6
citations
#6579

Chat-based Person Retrieval via Dialogue-Refined Cross-Modal Alignment

Yang Bai, Yucheng Ji, Min Cao et al.

CVPR 2025
6
citations
#6580

AlphaPre: Amplitude-Phase Disentanglement Model for Precipitation Nowcasting

Kenghong Lin, Baoquan Zhang, Demin Yu et al.

CVPR 2025
6
citations
#6581

Distilling Parallel Gradients for Fast ODE Solvers of Diffusion Models

Beier Zhu, Ruoyu Wang, Tong Zhao et al.

ICCV 2025arXiv:2507.14797
6
citations
#6582

D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation

Weinan Jia, Mengqi Huang, Nan Chen et al.

CVPR 2025
6
citations
#6583

Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning

Marwa Abdulhai, Ryan Cheng, Donovan Clay et al.

NEURIPS 2025arXiv:2511.00222
6
citations
#6584

Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs

Jingyao Wang, Wenwen Qiang, Zeen Song et al.

NEURIPS 2025arXiv:2505.10425
6
citations
#6585

Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs

Hao Kang, Qingru Zhang, Han Cai et al.

NEURIPS 2025spotlightarXiv:2505.19481
6
citations
#6586

MIEB: Massive Image Embedding Benchmark

Chenghao Xiao, Isaac Chung, Imene Kerboua et al.

ICCV 2025arXiv:2504.10471
6
citations
#6587

AnyCalib: On-Manifold Learning for Model-Agnostic Single-View Camera Calibration

Javier Tirado-Garín, Javier Civera

ICCV 2025arXiv:2503.12701
6
citations
#6588

Time Series Generation Under Data Scarcity: A Unified Generative Modeling Approach

Tal Gonen, Itai Pemper, Ilan Naiman et al.

NEURIPS 2025oralarXiv:2505.20446
6
citations
#6589

Time Travel is Cheating: Going Live with DeepFund for Real-Time Fund Investment Benchmarking

Changlun Li, Yao SHI, Chen Wang et al.

NEURIPS 2025arXiv:2505.11065
6
citations
#6590

Token Perturbation Guidance for Diffusion Models

Javad Rajabi, Soroush Mehraban, Seyedmorteza Sadat et al.

NEURIPS 2025arXiv:2506.10036
6
citations
#6591

Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation

Liliang Ren, Congcong Chen, Haoran Xu et al.

NEURIPS 2025arXiv:2507.06607
6
citations
#6592

MambaVLT: Time-Evolving Multimodal State Space Model for Vision-Language Tracking

Xinqi Liu, Li Zhou, Zikun Zhou et al.

CVPR 2025highlightarXiv:2411.15459
6
citations
#6593

The Hawthorne Effect in Reasoning Models: Evaluating and Steering Test Awareness

Sahar Abdelnabi, Ahmed Salem

NEURIPS 2025spotlightarXiv:2505.14617
6
citations
#6594

Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation

Shivam Duggal, Yushi Hu, Oscar Michel et al.

CVPR 2025arXiv:2504.18509
6
citations
#6595

Manipulating Feature Visualizations with Gradient Slingshots

Dilyara Bareeva, Marina Höhne, Alexander Warnecke et al.

NEURIPS 2025arXiv:2401.06122
6
citations
#6596

Effective Cloud Removal for Remote Sensing Images by an Improved Mean-Reverting Denoising Model with Elucidated Design Space

Yi Liu, Wengen Li, Jihong Guan et al.

CVPR 2025arXiv:2503.23717
6
citations
#6597

Improving Gaussian Splatting with Localized Points Management

Haosen Yang, Chenhao Zhang, Wenqing Wang et al.

CVPR 2025highlightarXiv:2406.04251
6
citations
#6598

DefectFill: Realistic Defect Generation with Inpainting Diffusion Model for Visual Inspection

Jaewoo Song, Daemin Park, Kanghyun Baek et al.

CVPR 2025highlightarXiv:2503.13985
6
citations
#6599

InteractionMap: Improving Online Vectorized HDMap Construction with Interaction

Kuang Wu, Chuan Yang, Zhanbin Li

CVPR 2025arXiv:2503.21659
6
citations
#6600

RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments

Haisheng Su, Feixiang Song, CONG MA et al.

CVPR 2025arXiv:2408.15503
6
citations