Most Cited 2025 &quot;multi-view image generation&quot; Papers

NEURIPS 2025spotlightarXiv:2506.04587

#10602

Set Smoothness Unlocks Clarke Hyper-stationarity in Bilevel Optimization

He Chen, Jiajin Li, Anthony Man-Cho So

ICCV 2025highlightarXiv:2507.02398

#10603

Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection

Taehoon Kim, Jongwook Choi, Yonghyun Jeong et al.

NEURIPS 2025arXiv:2510.01619

#10604

MPMAvatar: Learning 3D Gaussian Avatars with Accurate and Robust Physics-Based Dynamics

Changmin Lee, Jihyun Lee, Tae-Kyun Kim

NEURIPS 2025arXiv:2507.08956

#10605

Beyond Scores: Proximal Diffusion Models

Zhenghan Fang, Mateo Diaz, Sam Buchanan et al.

NEURIPS 2025arXiv:2505.21785

#10606

Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities

Mayank Jobanputra, Yana Veitsman, Yash Sarrof et al.

ICCV 2025arXiv:2412.02503

#10607

VA-MoE: Variables-Adaptive Mixture of Experts for Incremental Weather Forecasting

Hao Chen, Tao Han, Song Guo et al.

NEURIPS 2025oralarXiv:2506.06981

#10608

Deep RL Needs Deep Behavior Analysis: Exploring Implicit Planning by Model-Free Agents in Open-Ended Environments

Riley Simmons-Edler, Ryan Badman, Felix Berg et al.

NEURIPS 2025arXiv:2510.12119

#10609

ImageSentinel: Protecting Visual Datasets from Unauthorized Retrieval-Augmented Image Generation

Ziyuan Luo, Yangyi Zhao, Ka Chun Cheung et al.

ICCV 2025arXiv:2507.12083

#10610

Foresight in Motion: Reinforcing Trajectory Prediction with Reward Heuristics

Muleilan Pei, Shaoshuai Shi, Xuesong Chen et al.

NEURIPS 2025arXiv:2504.21582

#10611

MF-LLM: Simulating Population Decision Dynamics via a Mean-Field Large Language Model Framework

Qirui Mi, Mengyue Yang, Xiangning Yu et al.

ICCV 2025highlightarXiv:2505.22911

#10612

Hierarchical Material Recognition from Local Appearance

Matthew Beveridge, Shree Nayar

NEURIPS 2025arXiv:2504.06020

#10613

Information-Theoretic Reward Decomposition for Generalizable RLHF

Liyuan Mao, Haoran Xu, Amy Zhang et al.

NEURIPS 2025arXiv:2501.19107

#10614

Brain network science modelling of sparse neural networks enables Transformers and LLMs to perform as fully connected

Yingtao Zhang, Diego Cerretti, Jialin Zhao et al.

ICCV 2025arXiv:2503.14774

#10615

Revisiting Image Fusion for Multi-Illuminant White-Balance Correction

David Serrano, Aditya Arora, Luis Herranz et al.

ICLR 2025oralarXiv:2410.07659

#10616

MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion

Onkar Susladkar, Jishu Sen Gupta, Chirag Sehgal et al.

ICCV 2025arXiv:2303.01803

#10617

Uncertainty-Aware Gradient Stabilization for Small Object Detection

Huixin Sun, Yanjing Li, Linlin Yang et al.

NEURIPS 2025arXiv:2506.12790

#10618

PDEfuncta: Spectrally-Aware Neural Representation for PDE Solution Modeling

Minju Jo, Woojin Cho, Uvini Balasuriya Mudiyanselage et al.

NEURIPS 2025arXiv:2503.22215

#10619

Learning to Instruct for Visual Instruction Tuning

Zhihan Zhou, Feng Hong, JIAAN LUO et al.

NEURIPS 2025arXiv:2411.18624

#10620

GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data

Wentao Wang, Hang Ye, Fangzhou Hong et al.

#10621

One-shot 3D Object Canonicalization based on Geometric and Semantic Consistency

Li Jin, Yujie Wang, Wenzheng Chen et al.

ICCV 2025highlightarXiv:2412.13155

#10622

F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration

Lu Liu, Huiyu Duan, Qiang Hu et al.

CVPR 2025arXiv:2504.02775

#10623

TailedCore: Few-Shot Sampling for Unsupervised Long-Tail Noisy Anomaly Detection

Yoon Gyo Jung, Jaewoo Park, Jaeho Yoon et al.

NEURIPS 2025arXiv:2505.20749

#10624

Can Agent Fix Agent Issues?

Alfin Wijaya Rahardja, Junwei Liu, Weitong Chen et al.

ICLR 2025oralarXiv:2408.13885

#10625

Neural Spacetimes for DAG Representation Learning

Haitz Sáez de Ocáriz Borde, Anastasis Kratsios, Marc T Law et al.

ICLR 2025arXiv:2503.15699

#10626

Representational Similarity via Interpretable Visual Concepts

Neehar Kondapaneni, Oisin Mac Aodha, Pietro Perona

NEURIPS 2025arXiv:2510.09012

#10627

Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy

Xiaoxiao Ma, Feng Zhao, Pengyang Ling et al.

ICCV 2025arXiv:2506.01923

#10628

TaxaDiffusion: Progressively Trained Diffusion Model for Fine-Grained Species Generation

Amin Karimi Monsefi, Mridul Khurana, Rajiv Ramnath et al.

NEURIPS 2025oralarXiv:2503.04981

#10629

Topology-Aware Conformal Prediction for Stream Networks

Jifan Zhang, Fangxin Wang, Zihe Song et al.

NEURIPS 2025arXiv:2505.11254

#10630

Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction

Jeffrey Willette, Heejun Lee, Sung Ju Hwang

ICCV 2025arXiv:2509.07596

#10631

Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation

Yusuke Hirota, Ryo Hachiuma, Boyi Li et al.

ICCV 2025arXiv:2504.10117

#10632

AGO: Adaptive Grounding for Open World 3D Occupancy Prediction

Peizheng Li, Shuxiao Ding, You Zhou et al.

NEURIPS 2025oralarXiv:2507.10449

#10633

CoralVQA: A Large-Scale Visual Question Answering Dataset for Coral Reef Image Understanding

hongyong han, Wei Wang, Gaowei Zhang et al.

NEURIPS 2025arXiv:2505.15315

#10634

Local-Global Associative Frames for Symmetry-Preserving Crystal Structure Modeling

haowei hua, Wanyu Lin

NEURIPS 2025arXiv:2506.05497

#10635

Conformal Prediction Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models

Sima Noorani, Shayan Kiyani, George J. Pappas et al.

#10636

Register and [CLS] tokens induce a decoupling of local and global features in large ViTs

Alexander Lappe, Martin Giese

NEURIPS 2025

ICLR 2025arXiv:2410.01786

#10637

Learning to Solve Differential Equation Constrained Optimization Problems

Vincenzo Di Vito Francesco, Mostafa Mohammadian, Kyri Baker et al.

NEURIPS 2025arXiv:2502.12627

#10638

DAMamba: Vision State Space Model with Dynamic Adaptive Scan

Tanzhe Li, Caoshuo Li, Jiayi Lyu et al.

NEURIPS 2025oralarXiv:2507.13328

#10639

Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It

Yulu Qin, Dheeraj Varghese, Adam Dahlgren Lindström et al.

NEURIPS 2025arXiv:2505.12398

#10640

Traversal Verification for Speculative Tree Decoding

Yepeng Weng, Qiao Hu, Xujie Chen et al.

NEURIPS 2025arXiv:2506.14951

#10641

Flat Channels to Infinity in Neural Loss Landscapes

Flavio Martinelli, Alexander van Meegen, Berfin Simsek et al.

CVPR 2025highlightarXiv:2503.20779

#10642

PGC: Physics-Based Gaussian Cloth from a Single Pose

Michelle Guo, Matt Jen-Yuan Chiang, Igor Santesteban et al.

NEURIPS 2025arXiv:2505.13499

#10643

Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency

Kelvin Kan, Xingjian Li, Benjamin Zhang et al.

NEURIPS 2025arXiv:2505.01386

#10644

CATransformers: Carbon Aware Transformers Through Joint Model-Hardware Optimization

Irene Wang, Mostafa Elhoushi, H Ekin Sumbul et al.

NEURIPS 2025arXiv:2505.18000

#10645

Anytime-valid, Bayes-assisted, Prediction-Powered Inference

Valentin Kilian, Stefano Cortinovis, Francois Caron

NEURIPS 2025arXiv:2409.03811

#10646

PARCO: Parallel AutoRegressive Models for Multi-Agent Combinatorial Optimization

Federico Berto, Chuanbo Hua, Laurin Luttmann et al.

NEURIPS 2025arXiv:2505.16690

#10647

Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator

Beier Luo, Shuoyuan Wang, Sharon Li et al.

CVPR 2025arXiv:2505.20941

#10648

PMA: Towards Parameter-Efficient Point Cloud Understanding via Point Mamba Adapter

Yaohua Zha, Yanzi Wang, Hang Guo et al.

NEURIPS 2025arXiv:2505.15138

#10649

Global Convergence for Average Reward Constrained MDPs with Primal-Dual Actor Critic Algorithm

Yang Xu, Swetha Ganesh, Washim Mondal et al.

ICCV 2025arXiv:2504.02747

#10650

GEOPARD: Geometric Pretraining for Articulation Prediction in 3D Shapes

Pradyumn Goyal, Dmitrii Petrov, Sheldon Andrews et al.

ICCV 2025arXiv:2507.04587

#10651

CVFusion: Cross-View Fusion of 4D Radar and Camera for 3D Object Detection

Hanzhi Zhong, Zhiyu Xiang, Ruoyu Xu et al.

ICCV 2025highlightarXiv:2507.12714

#10652

NeuraLeaf: Neural Parametric Leaf Models with Shape and Deformation Disentanglement

Yang Yang, Dongni Mao, Hiroaki Santo et al.

NEURIPS 2025oralarXiv:2505.18882

#10653

Personalized Safety in LLMs: A Benchmark and A Planning-Based Agent Approach

Yuchen Wu, Edward Sun, Kaijie Zhu et al.

ICCV 2025arXiv:2507.02363

#10654

LocalDyGS: Multi-view Global Dynamic Scene Modeling via Adaptive Local Implicit Feature Decoupling

Jiahao Wu, Rui Peng, Jianbo Jiao et al.

NEURIPS 2025oralarXiv:2503.14411

#10655

Unifying Text Semantics and Graph Structures for Temporal Text-attributed Graphs with Large Language Models

Siwei Zhang, Yun Xiong, Yateng Tang et al.

ICCV 2025arXiv:2507.04984

#10656

TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation

Zonglin Lyu, Chen Chen

NEURIPS 2025arXiv:2505.03176

#10657

seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models

Hafez Ghaemi, Eilif B. Muller, Shahab Bakhtiari

ICCV 2025arXiv:2411.13317

#10658

Teaching VLMs to Localize Specific Objects from In-context Examples

Sivan Doveh, Nimrod Shabtay, Eli Schwartz et al.

ICCV 2025arXiv:2507.23134

#10659

Details Matter for Indoor Open-vocabulary 3D Instance Segmentation

Sanghun Jung, Jingjing Zheng, Ke Zhang et al.

NEURIPS 2025arXiv:2506.00568

#10660

CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD via Reinforcement Fine-Tuning

Ke Niu, Zhuofan Chen, Haiyang Yu et al.

CVPR 2025arXiv:2406.02659

#10661

Reanimating Images using Neural Representations of Dynamic Stimuli

Jacob Yeung, Andrew Luo, Gabriel Sarch et al.

#10662

ATA: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting

Yizhe Tang, Zhimin Sun, Yuzhen Du et al.

NEURIPS 2025spotlightarXiv:2505.19645

#10663

MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE

Zongle Huang, Lei Zhu, ZongYuan Zhan et al.

NEURIPS 2025arXiv:2510.23285

#10664

Adaptive Stochastic Coefficients for Accelerating Diffusion Sampling

Ruoyu Wang, Beier Zhu, Junzhi Li et al.

ICCV 2025arXiv:2411.07747

#10665

Constraint-Aware Feature Learning for Parametric Point Cloud

Xi Cheng, Ruiqi Lei, Di Huang et al.

NEURIPS 2025arXiv:2410.06324

#10666

Differentiation Through Black-Box Quadratic Programming Solvers

Connor Magoon, Fengyu Yang, Noam Aigerman et al.

#10667

$\boldsymbol{\lambda}$-Orthogonality Regularization for Compatible Representation Learning

Simone Ricci, Niccolò Biondi, Federico Pernici et al.

NEURIPS 2025

ICCV 2025arXiv:2408.11817

#10668

GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models

Jonathan Roberts, Kai Han, Samuel Albanie

NEURIPS 2025arXiv:2510.08632

#10669

Next Semantic Scale Prediction via Hierarchical Diffusion Language Models

Cai Zhou, Chenyu Wang, Dinghuai Zhang et al.

NEURIPS 2025arXiv:2507.11344

#10670

Guiding LLM Decision-Making with Fairness Reward Models

Zara Hall, Melanie Subbiah, Thomas Zollo et al.

#10671

VIGFace: Virtual Identity Generation for Privacy-Free Face Recognition Dataset

Minsoo Kim, Min-Cheol Sagong, Gi Pyo Nam et al.

ICCV 2025

ICCV 2025arXiv:2405.20336

#10672

RapVerse: Coherent Vocals and Whole-Body Motion Generation from Text

Jiaben Chen, Xin Yan, Yihang Chen et al.

NEURIPS 2025arXiv:2505.20772

#10673

MetaSlot: Break Through the Fixed Number of Slots in Object-Centric Learning

Hongjia Liu, Rongzhen Zhao, Haohan Chen et al.

ICCV 2025arXiv:2505.24222

#10674

Unleashing High-Quality Image Generation in Diffusion Sampling Using Second-Order Levenberg-Marquardt-Langevin

Fangyikang Wang, Hubery Yin, Lei Qian et al.

NEURIPS 2025arXiv:2506.08004

#10675

Dynamic View Synthesis as an Inverse Problem

Hidir Yesiltepe, Pinar Yanardag

NEURIPS 2025arXiv:2505.21665

#10676

Convergent Functions, Divergent Forms

Hyeonseong Jeon, Ainaz Eftekhar, Aaron Walsman et al.

NEURIPS 2025arXiv:2410.02992

#10677

Learning to Better Search with Language Models via Guided Reinforced Self-Training

Seungyong Moon, Bumsoo Park, Hyun Oh Song

ICLR 2025arXiv:2501.16471

#10678

SIM: Surface-based fMRI Analysis for Inter-Subject Multimodal Decoding from Movie-Watching Experiments

Simon Dahan, Gabriel Bénédict, Logan Williams et al.

ICCV 2025arXiv:2507.19924

#10679

HumanSAM: Classifying Human-centric Forgery Videos in Human Spatial, Appearance, and Motion Anomaly

Chang Liu, Yunfan Ye, Fan Zhang et al.

ICLR 2025arXiv:2411.16600

#10680

Approximation algorithms for combinatorial optimization with predictions

Antonios Antoniadis, Marek Elias, Adam Polak et al.

ICLR 2025arXiv:2510.22897

#10681

Charting the Design Space of Neural Graph Representations for Subgraph Matching

Vaibhav Raj, Indradyumna Roy, Ashwin Ramachandran et al.

ICCV 2025arXiv:2411.16815

#10682

FREE-Merging: Fourier Transform for Efficient Model Merging

Shenghe Zheng, Hongzhi Wang

ICLR 2025arXiv:2503.05239

#10683

Robust Conformal Prediction with a Single Binary Certificate

Soroush H. Zargarbashi, Aleksandar Bojchevski

ICLR 2025arXiv:2509.18469

#10684

Probabilistic Geometric Principal Component Analysis with application to neural data

Han-Lin Hsieh, Maryam Shanechi

ICCV 2025arXiv:2507.15037

#10685

OmniVTON: Training-Free Universal Virtual Try-On

Zhaotong Yang, Yuhui Li, Shengfeng He et al.

CVPR 2025arXiv:2210.13533

#10686

Sufficient Invariant Learning for Distribution Shift

Taero Kim, Subeen Park, Sungjun Lim et al.

ICLR 2025arXiv:2410.15184

#10687

Action abstractions for amortized sampling

Oussama Boussif, Léna Ezzine, Joseph Viviano et al.

#10688

Decouple Distortion from Perception: Region Adaptive Diffusion for Extreme-low Bitrate Perception Image Compression

Jinchang Xu, Shaokang Wang, Jintao Chen et al.

#10689

Anatomical Consistency and Adaptive Prior-informed Transformation for Multi-contrast MR Image Synthesis via Diffusion Model

Yejee Shin, Yeeun Lee, Hanbyol Jang et al.

ICCV 2025arXiv:2507.19071

#10690

Cross-Subject Mind Decoding from Inaccurate Representations

Yangyang Xu, Bangzhen Liu, Wenqi Shao et al.

ICLR 2025oralarXiv:2502.21186

#10691

Scalable Decision-Making in Stochastic Environments through Learned Temporal Abstraction

Baiting Luo, Ava Pettet, Aron Laszka et al.

ICLR 2025arXiv:2410.10166

#10692

Automated Filtering of Human Feedback Data for Aligning Text-to-Image Diffusion Models

Yongjin Yang, Sihyeon Kim, Hojung Jung et al.

ICCV 2025arXiv:2507.00980

#10693

RTMap: Real-Time Recursive Mapping with Change Detection and Localization

Yuheng Du, Sheng Yang, Lingxuan Wang et al.

CVPR 2025arXiv:2504.02862

#10694

Towards Understanding How Knowledge Evolves in Large Vision-Language Models

Sudong Wang, Yunjian Zhang, Yao Zhu et al.

#10695

$\phi$-Update: A Class of Policy Update Methods with Policy Convergence Guarantee

Wenye Li, Jiacai Liu, Ke Wei

ICCV 2025arXiv:2507.15911

#10696

Local Dense Logit Relations for Enhanced Knowledge Distillation

Liuchi Xu, Kang Liu, Jinshuai Liu et al.

ICCV 2025arXiv:2503.12649

#10697

FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization

Hao Chen, Shell Xu Hu, Wayne Luk et al.

#10698

Shading Meets Motion: Self-supervised Indoor 3D Reconstruction Via Simultaneous Shape-from-Shading and Structure-from-Motion

Guoyu Lu

#10699

OW-OVD: Unified Open World and Open Vocabulary Object Detection

Xing Xi, Yangyang Huang, Ronghua Luo et al.

CVPR 2025arXiv:2409.09318

#10700

ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models

Yahan Tu, Rui Hu, Jitao Sang

CVPR 2025arXiv:2306.11339

#10701

Masking meets Supervision: A Strong Learning Alliance

Byeongho Heo, Taekyung Kim, Sangdoo Yun et al.

CVPR 2025arXiv:2505.10679

#10702

Are Spatial-Temporal Graph Convolution Networks for Human Action Recognition Over-Parameterized?

Jianyang Xie, Yitian Zhao, Yanda Meng et al.

ICLR 2025arXiv:2403.00282

#10703

Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning

Dohyeong Kim, Mineui Hong, Jeongho Park et al.

#10704

Selective Unlearning via Representation Erasure Using Domain Adversarial Training

Nazanin Sepahvand, Eleni Triantafillou, Hugo Larochelle et al.

#10705

BWFormer: Building Wireframe Reconstruction from Airborne LiDAR Point Cloud with Transformer

Yuzhou Liu, Lingjie Zhu, Hanqiao Ye et al.

CVPR 2025arXiv:2506.08964

#10706

ORIDa: Object-centric Real-world Image Composition Dataset

Jinwoo Kim, Sangmin Han, Jinho Jeong et al.

#10707

Annotation Ambiguity Aware Semi-Supervised Medical Image Segmentation

Suruchi Kumari, Pravendra Singh

ICCV 2025arXiv:2510.13253

#10708

End-to-End Multi-Modal Diffusion Mamba

Chunhao Lu, Qiang Lu, Meichen Dong et al.

ICCV 2025arXiv:2412.07608

#10709

Faster and Better 3D Splatting via Group Training

Chengbo Wang, Guozheng Ma, Yizhen Lao et al.

#10710

FedCALM: Conflict-aware Layer-wise Mitigation for Selective Aggregation in Deeper Personalized Federated Learning

Hao Zheng, Zhigang Hu, Boyu Wang et al.

ICCV 2025arXiv:2507.05601

#10711

Rethinking Layered Graphic Design Generation with a Top-Down Approach

Jingye Chen, Zhaowen Wang, Nanxuan Zhao et al.

ICCV 2025arXiv:2404.14745

#10712

You Think, You ACT: The New Task of Arbitrary Text to Motion Generation

Runqi Wang, Caoyuan Ma, Guopeng Li et al.

CVPR 2025arXiv:2503.16630

#10713

TriTex: Learning Texture from a Single Mesh via Triplane Semantic Features

Dana Cohen-Bar, Daniel Cohen-Or, Gal Chechik et al.

ICCV 2025arXiv:2506.23479

#10714

Instant GaussianImage: A Generalizable and Self-Adaptive Image Representation via 2D Gaussian Splatting

Zhaojie Zeng, Yuesong Wang, Chao Yang et al.

CVPR 2025arXiv:2503.00068

#10715

PI-HMR: Towards Robust In-bed Temporal Human Shape Reconstruction with Contact Pressure Sensing

Ziyu Wu, Yufan Xiong, Mengting Niu et al.

ICLR 2025arXiv:2409.07398

#10716

The Complexity of Two-Team Polymatrix Games with Independent Adversaries

Alexandros Hollender, Gilbert Maystre, Sai Ganesh Nagarajan

ICLR 2025arXiv:2502.10587

#10717

Towards Self-Supervised Covariance Estimation in Deep Heteroscedastic Regression

Megh Shukla, Aziz Shameem, Mathieu Salzmann et al.

ICCV 2025arXiv:2503.14944

#10718

MMAIF: Multi-task and Multi-degradation All-in-One for Image Fusion with Language Guidance

Zihan Cao, Yu Zhong, Ziqi Wang et al.

#10719

Improved Sampling Of Diffusion Models In Fluid Dynamics With Tweedie's Formula

Youssef Shehata, Benjamin Holzschuh, Nils Thuerey

ICLR 2025oral

#10720

GPAvatar: High-fidelity Head Avatars by Learning Efficient Gaussian Projections

Weiqi Feng, Dong Han, Zekang Zhou et al.

#10721

Machine Unlearning via Simulated Oracle Matching

Kristian G Georgiev, Roy Rinberg, Sam Park et al.

ICCV 2025arXiv:2508.01699

#10722

TimeExpert: An Expert-Guided Video LLM for Video Temporal Grounding

Zuhao Yang, Yingchen Yu, Yunqing Zhao et al.

ICCV 2025arXiv:2405.18937

#10723

Kestrel: 3D Multimodal LLM for Part-Aware Grounded Description

Mahmoud Ahmed, Junjie Fei, Jian Ding et al.

ICLR 2025arXiv:2505.07351

#10724

From Search to Sampling: Generative Models for Robust Algorithmic Recourse

Prateek Garg, Lokesh Nagalapatti, Sunita Sarawagi

CVPR 2025arXiv:2504.01466

#10725

Mesh Mamba: A Unified State Space Model for Saliency Prediction in Non-Textured and Textured Meshes

Kaiwei Zhang, Dandan Zhu, Xiongkuo Min et al.

#10726

Wavelet and Prototype Augmented Query-based Transformer for Pixel-level Surface Defect Detection

Feng Yan, Xiaoheng Jiang, Yang Lu et al.

CVPR 2025arXiv:2510.08791

#10727

Alignment, Mining and Fusion: Representation Alignment with Hard Negative Mining and Selective Knowledge Fusion for Medical Visual Question Answering

Yuanhao Zou, Zhaozheng Yin

ICCV 2025arXiv:2507.05056

#10728

INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling

Xin Dong, Shichao Dong, Jin Wang et al.

#10729

ICP: Immediate Compensation Pruning for Mid-to-high Sparsity

Xin Luo, Fu Xueming, Zihang Jiang et al.

ICLR 2025arXiv:2410.17610

#10730

ImDy: Human Inverse Dynamics from Imitated Observations

Xinpeng Liu, Junxuan Liang, Zili Lin et al.

#10731

VODiff: Controlling Object Visibility Order in Text-to-Image Generation

Dong Liang, Jinyuan Jia, Yuhao Liu et al.

ICCV 2025arXiv:2411.06106

#10732

Towards a Universal 3D Medical Multi-modality Generalization via Learning Personalized Invariant Representation

Zhaorui Tan, Xi Yang, Tan Pan et al.

#10733

Leveraging SD Map to Augment HD Map-based Trajectory Prediction

Zhiwei Dong, Ran Ding, Wei Li et al.

ICLR 2025arXiv:2504.02067

#10734

A Truncated Newton Method for Optimal Transport

Mete Kemertas, Amir-massoud Farahmand, Allan Jepson

#10735

DriveScape: High-Resolution Driving Video Generation by Multi-View Feature Fusion

Wei Wu, Xi Guo, Weixuan TANG et al.

#10736

BrepGiff: Lightweight Generation of Complex B-rep with 3D GAT Diffusion

Hao Guo, Xiaoshui Huang, Hao jiacheng et al.

#10737

Reproducible Vision-Language Models Meet Concepts Out of Pre-Training

Ziliang Chen, Xin Huang, Xiaoxuan Fan et al.

ICCV 2025arXiv:2507.06710

#10738

Spatial-Temporal Aware Visuomotor Diffusion Policy Learning

Zhenyang Liu, Yikai Wang, Kuanning Wang et al.

#10739

Less Attention is More: Prompt Transformer for Generalized Category Discovery

Wei Zhang, Baopeng Zhang, Zhu Teng et al.

#10740

Sensitivity-Aware Efficient Fine-Tuning via Compact Dynamic-Rank Adaptation

Tianran Chen, Jiarui Chen, Baoquan Zhang et al.

ICCV 2025arXiv:2509.05297

#10741

FlowSeek: Optical Flow Made Easier with Depth Foundation Models and Motion Bases

Matteo Poggi, Fabio Tosi

CVPR 2025arXiv:2506.01591

#10742

Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation

Yuan Gan, Jiaxu Miao, Yunze Wang et al.

ICLR 2025arXiv:2505.03172

#10743

Null Counterfactual Factor Interactions for Goal-Conditioned Reinforcement Learning

Caleb Chuck, Fan Feng, Carl Qi et al.

ICLR 2025arXiv:2504.13368

#10744

An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning

Haoran Xu, Shuozhe Li, Harshit Sikchi et al.

#10745

Identity-Clothing Similarity Modeling for Unsupervised Clothing Change Person Re-Identification

Zhiqi Pang, Junjie Wang, Lingling Zhao et al.

ICCV 2025arXiv:2409.17564

#10746

General Compression Framework for Efficient Transformer Object Tracking

Lingyi Hong, Jinglun Li, Xinyu Zhou et al.

CVPR 2025arXiv:2407.03314

#10747

BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs

Zhantao Yang, Ruili Feng, Keyu Yan et al.

ICLR 2025arXiv:2410.10870

#10748

PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches

Rana Muhammad Shahroz Khan, Pingzhi Li, Sukwon Yun et al.

CVPR 2025arXiv:2503.19824

#10749

AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers

Jiazhi Guan, Kaisiyuan Wang, Zhiliang Xu et al.

#10750

DUALFormer: Dual Graph Transformer

Zhuo Jiaming, Yuwei Liu, Yintong Lu et al.

CVPR 2025arXiv:2501.12216

#10751

RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression

Uri Gadot, Shie Mannor, Assaf Shocher et al.

ICLR 2025arXiv:2410.12025

#10752

Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture

Sajad Movahedi, Antonio Orvieto, Seyed-Mohsen Moosavi-Dezfooli

CVPR 2025arXiv:2503.08387

#10753

Recognition-Synergistic Scene Text Editing

Zhengyao Fang, Pengyuan Lyu, Jingjing Wu et al.

#10754

Towards Consistent Multi-Task Learning: Unlocking the Potential of Task-Specific Parameters

Xiaohan Qin, Xiaoxing Wang, Junchi Yan

#10755

Supervising Sound Localization by In-the-wild Egomotion

Anna Min, Ziyang Chen, Hang Zhao et al.

CVPR 2025arXiv:2411.19292

#10756

UrbanCAD: Towards Highly Controllable and Photorealistic 3D Vehicles for Urban Scene Simulation

Yichong Lu, Yichi Cai, Shangzhan Zhang et al.

ICCV 2025arXiv:2507.06224

#10757

EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow

Yixiang Chen, Peiyan Li, Yan Huang et al.

#10758

Adaptive backtracking for faster optimization

Joao V. Cavalcanti, Laurent Lessard, Ashia Wilson

CVPR 2025arXiv:2503.00746

#10759

DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting

Liao Shen, Tianqi Liu, Huiqiang Sun et al.

ICLR 2025arXiv:2503.03703

#10760

SoftMatcha: A Soft and Fast Pattern Matcher for Billion-Scale Corpus Searches

Hiroyuki Deguchi, Go Kamoda, Yusuke Matsushita et al.

ICLR 2025arXiv:2410.10024

#10761

Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods

Hossein Taheri, Christos Thrampoulidis, Arya Mazumdar

ICCV 2025arXiv:2507.15365

#10762

DAViD: Data-efficient and Accurate Vision Models from Synthetic Data

Fatemeh Saleh, Sadegh Aliakbarian, Charlie Hewitt et al.

CVPR 2025arXiv:2506.19488

#10763

SceneCrafter: Controllable Multi-View Driving Scene Editing

Zehao Zhu, Yuliang Zou, Chiyu “Max” Jiang et al.

ICCV 2025arXiv:2507.04699

#10764

A Visual Leap in CLIP Compositionality Reasoning through Generation of Counterfactual Sets

Zexi Jia, Chuanwei Huang, Yeshuang Zhu et al.

#10765

I2VGuard: Safeguarding Images against Misuse in Diffusion-based Image-to-Video Models

Dongnan Gui, Xun Guo, Wengang Zhou et al.

CVPR 2025arXiv:2501.12390

#10766

GPS as a Control Signal for Image Generation

Chao Feng, Ziyang Chen, Aleksander Holynski et al.

ICCV 2025arXiv:2410.06848

#10767

Forgetting Through Transforming: Enabling Federated Unlearning via Class-Aware Representation Transformation

Qi Guo, Zhen Tian, Minghao Yao et al.

#10768

Enhanced Visual-Semantic Interaction with Tailored Prompts for Pedestrian Attribute Recognition

Junyi Wu, Yan Huang, Min Gao et al.

ICCV 2025arXiv:2503.17491

#10769

Splat-LOAM: Gaussian Splatting LiDAR Odometry and Mapping

Emanuele Giacomini, Luca Di Giammarino, Lorenzo De Rebotti et al.

#10770

From Laboratory to Real World: A New Benchmark Towards Privacy-Preserved Visible-Infrared Person Re-Identification

Yan Jiang, Hao Yu, Xu Cheng et al.

ICCV 2025arXiv:2508.03256

#10771

Beyond Isolated Words: Diffusion Brush for Handwritten Text-Line Generation

Gang Dai, Yifan Zhang, Yutao Qin et al.

CVPR 2025arXiv:2411.19041

#10772

TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition

yilong wang, Zilin Gao, Qilong Wang et al.

ICLR 2025arXiv:2310.17531

#10773

Learning Regularized Graphon Mean-Field Games with Unknown Graphons

Fengzhuo Zhang, Vincent Tan, Zhaoran Wang et al.

CVPR 2025arXiv:2411.18025

#10774

Pixel-aligned RGB-NIR Stereo Imaging and Dataset for Robot Vision

Jinneyong Kim, Seung-Hwan Baek

CVPR 2025arXiv:2503.14564

#10775

Effortless Active Labeling for Long-Term Test-Time Adaptation

Guowei Wang, Changxing Ding

CVPR 2025arXiv:2503.08147

#10776

FilmComposer: LLM-Driven Music Production for Silent Film Clips

Zhifeng Xie, Qile He, Youjia Zhu et al.

ICLR 2025arXiv:2503.05306

#10777

Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning

Hyungkyu Kang, Min-hwan Oh

ICCV 2025arXiv:2503.16832

#10778

Joint Self-Supervised Video Alignment and Action Segmentation

Ali Shah Ali, Syed Ahmed Mahmood, Mubin Saeed et al.

CVPR 2025highlightarXiv:2411.08753

#10779

Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos

Sagnik Majumder, Tushar Nagarajan, Ziad Al-Halah et al.

ICLR 2025arXiv:2410.04988

#10780

Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling

Jasmine Bayrooti, Carl Ek, Amanda Prorok

ICCV 2025arXiv:2507.22061

#10781

MOVE: Motion-Guided Few-Shot Video Object Segmentation

Kaining Ying, Hengrui Hu, Henghui Ding

ICCV 2025arXiv:2503.16867

#10782

ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering

Kaisi Guan, Zhengfeng Lai, Yuchong Sun et al.

#10783

Making Old Film Great Again: Degradation-aware State Space Model for Old Film Restoration

Yudong Mao, Hao Luo, Zhiwei Zhong et al.

CVPR 2025arXiv:2410.06664

#10784

Decouple-Then-Merge: Finetune Diffusion Models as Multi-Task Learning

Qianli Ma, Xuefei Ning, Dongrui Liu et al.

CVPR 2025arXiv:2503.09402

#10785

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary

Kevin Qinghong Lin, Mike Zheng Shou

#10786

Understanding the Stability-based Generalization of Personalized Federated Learning

Yingqi Liu, Qinglun Li, Jie Tan et al.

ICCV 2025arXiv:2410.23287

#10787

ReferEverything: Towards Segmenting Everything We Can Speak of in Videos

Anurag Bagchi, Zhipeng Bao, Yu-Xiong Wang et al.

ICCV 2025arXiv:2508.04642

#10788

RoboTron-Sim: Improving Real-World Driving via Simulated Hard-Case

Baihui Xiao, Chengjian Feng, Zhijian Huang et al.

CVPR 2025arXiv:2503.23024

#10789

Empowering Large Language Models with 3D Situation Awareness

Zhihao Yuan, Yibo Peng, Jinke Ren et al.

CVPR 2025arXiv:2505.21591

#10790

Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning

Maosen Zhao, Pengtao Chen, Chong Yu et al.

ICCV 2025arXiv:2503.10225

#10791

Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA

Zhixuan Li, Hyunse Yoon, Sanghoon Lee et al.

#10792

Causal Graph Transformer for Treatment Effect Estimation Under Unknown Interference

Anpeng Wu, Haiyi Qiu, Zhengming Chen et al.

ICLR 2025arXiv:2410.03529

#10793

No Need to Talk: Asynchronous Mixture of Language Models

Anastasiia Filippova, Angelos Katharopoulos, David Grangier et al.

ICCV 2025arXiv:2411.14137

#10794

VAGUE: Visual Contexts Clarify Ambiguous Expressions

Heejeong Nam, Jinwoo Ahn, Keummin Ka et al.

CVPR 2025arXiv:2406.09126

#10795

3D-AVS: LiDAR-based 3D Auto-Vocabulary Segmentation

Weijie Wei, Osman Ülger, Fatemeh Karimi Nejadasl et al.

#10796

DOF-GS: Adjustable Depth-of-Field 3D Gaussian Splatting for Post-Capture Refocusing, Defocus Rendering and Blur Removal

Yujie Wang, Praneeth Chakravarthula, Baoquan Chen

ICCV 2025arXiv:2507.16403

#10797

ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering

Duong T. Tran, Trung-Kien Tran, Manfred Hauswirth et al.

ICLR 2025arXiv:2307.05793

#10798

Grid Cell-Inspired Fragmentation and Recall for Efficient Map Building

Jaedong Hwang, Zhang-Wei Hong, Eric Chen et al.

CVPR 2025arXiv:2412.02171

#10799

Can't Slow Me Down: Learning Robust and Hardware-Adaptive Object Detectors against Latency Attacks for Edge Devices

Tianyi Wang, Zichen Wang, Cong Wang et al.

ICLR 2025arXiv:2504.06003

#10800

econSG: Efficient and Multi-view Consistent Open-Vocabulary 3D Semantic Gaussians

Can Zhang, Gim H Lee