Most Cited 2025 "transformer-based approaches" Papers

22,274 papers found • Page 112 of 112

#22201

Backdoor Mitigation by Distance-Driven Detoxification

Shaokui Wei, Jiayin Liu, Hongyuan Zha

ICCV 2025highlightarXiv:2411.09585
#22202

Democratizing High-Fidelity Co-Speech Gesture Video Generation

Xu Yang, Shaoli Huang, Shenbo Xie et al.

ICCV 2025posterarXiv:2507.06812
#22203

UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI

Fangwei Zhong, Kui Wu, Churan Wang et al.

ICCV 2025highlightarXiv:2412.20977
#22204

HFD-Teacher: High-Frequency Depth Distillation from Depth Foundation Models for Enhanced Depth Completion

Zhiyuan Yang, Anqi Cheng, Haiyue Zhu et al.

ICCV 2025poster
#22205

QK-Edit: Revisiting Attention-based Injection in MM-DiT for Image and Video Editing

Tiancheng SHEN, Jun Hao Liew, Zilong Huang et al.

ICCV 2025poster
#22206

FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning

Hang Guo, Yawei Li, Taolin Zhang et al.

ICCV 2025posterarXiv:2503.23367
#22207

Separation for Better Integration: Disentangling Edge and Motion in Event-based Deblurring

Yufei Zhu, Hao Chen, Yongjian Deng et al.

ICCV 2025poster
#22208

DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution

Zheng-Peng Duan, jiawei zhang, Xin Jin et al.

ICCV 2025posterarXiv:2503.23580
#22209

Teleportraits: Training-Free People Insertion into Any Scene

Jialu Gao, Joseph K J, Fernando De la Torre

ICCV 2025posterarXiv:2510.05660
#22210

Diversity-Enhanced Distribution Alignment for Dataset Distillation

Hongcheng Li, Yucan Zhou, Xiaoyan Gu et al.

ICCV 2025poster
#22211

Height-Fidelity Dense Global Fusion for Multi-modal 3D Object Detection

Hanshi Wang, Jin Gao, Weiming Hu et al.

ICCV 2025highlightarXiv:2507.04369
#22212

SMSTracker: Tri-path Score Mask Sigma Fusion for Multi-Modal Tracking

Sixian Chan, Zedong Li, Xiaoqin Zhang et al.

ICCV 2025highlight
#22213

Two Losses, One Goal: Balancing Conflict Gradients for Semi-supervised Semantic Segmentation

Rui Sun, Huayu Mai, Wangkai Li et al.

ICCV 2025highlight
#22214

Region-based Cluster Discrimination for Visual Representation Learning

Yin Xie, Kaicheng Yang, Xiang An et al.

ICCV 2025highlightarXiv:2507.20025
#22215

CMB-ML: A Cosmic Microwave Background Dataset for the Oldest Possible Computer Vision Task

James Amato, Yunan Xie, Leonel Medina-Varela et al.

ICCV 2025poster
#22216

Adapt Foundational Segmentation Models with Heterogeneous Searching Space

Li Yi, Jie Hu, Songan Zhang et al.

ICCV 2025poster
#22217

Think Twice: Test-Time Reasoning for Robust CLIP Zero-Shot Classification

Shenyu Lu, Zhaoying Pan, Xiaoqian Wang

ICCV 2025poster
#22218

Memory-Integrated Reconfigurable Adapters: A Unified Framework for Settings with Multiple Tasks

Susmit Agrawal, Krishn Vishwas Kher, Saksham Mittal et al.

NEURIPS 2025posterarXiv:2512.00940
#22219

LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing

Achint Soni, Meet Soni, Sirisha Rambhatla

ICCV 2025posterarXiv:2503.21541
#22220

Sample and Map from a Single Convex Potential: Generation using Conjugate Moment Measures

Nina Vesseron, Louis Bethune, Marco Cuturi

NEURIPS 2025posterarXiv:2503.10576
#22221

Shape of Motion: 4D Reconstruction from a Single Video

Qianqian Wang, Vickie Ye, Hang Gao et al.

ICCV 2025highlightarXiv:2407.13764
#22222

FlexGen: Flexible Multi-View Generation from Text and Image Inputs

Xinli Xu, Wenhang Ge, Jiantao Lin et al.

ICCV 2025posterarXiv:2410.10745
#22223

EditCLIP: Representation Learning for Image Editing

Qian Wang, Aleksandar Cvejic, Abdelrahman Eldesokey et al.

ICCV 2025posterarXiv:2503.20318
#22224

Counting Stacked Objects

Corentin Dumery, Noa Ette, Aoxiang Fan et al.

ICCV 2025posterarXiv:2411.19149
#22225

Less-to-More Generalization: Unlocking More Controllability by In-Context Generation

shaojin wu, Mengqi Huang, wenxu wu et al.

ICCV 2025posterarXiv:2504.02160
#22226

Dropout Regularization Versus l2-Penalization in the Linear Model

Gabriel Clara, Sophie Langer, Johannes Schmidt-Hieber

NEURIPS 2025poster
#22227

Allowing Oscillation Quantization: Overcoming Solution Space Limitation in Low Bit-Width Quantization

Weiying Xie, Zihan Meng, Jitao Ma et al.

ICCV 2025poster
#22228

MOVE: Motion-Guided Few-Shot Video Object Segmentation

Kaining Ying, Hengrui Hu, Henghui Ding

ICCV 2025posterarXiv:2507.22061
#22229

CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation

Dengke Zhang, Fagui Liu, Quan Tang

ICCV 2025posterarXiv:2411.10086
#22230

mmCooper: A Multi-agent Multi-stage Communication-efficient and Collaboration-robust Cooperative Perception Framework

Bingyi Liu, Jian Teng, Hongfei Xue et al.

ICCV 2025posterarXiv:2501.12263
#22231

FreqPDE: Rethinking Positional Depth Embedding for Multi-View 3D Object Detection Transformers

Junjie Zhang, Haisheng Su, Feixiang Song et al.

ICCV 2025posterarXiv:2510.15385
#22232

GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling

Pinxin Liu, Luchuan Song, Junhua Huang et al.

ICCV 2025posterarXiv:2501.18898
#22233

SDFormer: Vision-based 3D Semantic Scene Completion via SAM-assisted Dual-channel Voxel Transformer

Yujie Xue, Huilong Pi, Jiapeng Zhang et al.

ICCV 2025poster
#22234

Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera

Zhengdi Yu, Stefanos Zafeiriou, Tolga Birdal

CVPR 2025highlightarXiv:2412.12861
#22235

TopoTTA: Topology-Enhanced Test-Time Adaptation for Tubular Structure Segmentation

Jiale Zhou, Wenhan Wang, Shikun Li et al.

ICCV 2025posterarXiv:2508.00442
#22236

RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control

Teng Li, Guangcong Zheng, Rui Jiang et al.

ICCV 2025posterarXiv:2502.10059
#22237

Gain-MLP: Improving HDR Gain Map Encoding via a Lightweight MLP

Trevor Canham, SaiKiran Tedla, Michael Murdoch et al.

ICCV 2025posterarXiv:2503.11883
#22238

MagShield: Towards Better Robustness in Sparse Inertial Motion Capture Under Magnetic Disturbances

Yunzhe Shao, Xinyu Yi, Lu Yin et al.

ICCV 2025posterarXiv:2506.22907
#22239

CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching

Zizhuo Li, Yifan Lu, Linfeng Tang et al.

ICCV 2025highlightarXiv:2503.23925
#22240

Semantic Discrepancy-aware Detector for Image Forgery Identification

Wang Ziye, Minghang Yu, Chunyan Xu et al.

ICCV 2025posterarXiv:2508.12341
#22241

DeFSS: Image-to-Mask Denoising Learning for Few-shot Segmentation

Zishu Qin, Junhao Xu, Weifeng Ge

ICCV 2025poster
#22242

UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis

Yuanrui Wang, Cong Han, Yafei Li et al.

ICCV 2025posterarXiv:2507.00992
#22243

UniversalBooth: Model-Agnostic Personalized Text-to-Image Generation

Songhua Liu, Ruonan Yu, Xinchao Wang

ICCV 2025poster
#22244

TAD-E2E: A Large-scale End-to-end Autonomous Driving Dataset

Chang Liu, mingxuzhu mingxuzhu, Zheyuan Zhang et al.

ICCV 2025poster
#22245

ShapeWords: Guiding Text-to-Image Synthesis with 3D Shape-Aware Prompts

Dmitrii M Petrov, Pradyumn Goyal, Divyansh Shivashok et al.

CVPR 2025posterarXiv:2412.02912
#22246

Put CASH on Bandits: A Max K-Armed Problem for Automated Machine Learning

Amir Rezaei Balef, Claire Vernade, Katharina Eggensperger

NEURIPS 2025posterarXiv:2505.05226
#22247

VAGUE: Visual Contexts Clarify Ambiguous Expressions

Heejeong Nam, Jinwoo Ahn, Keummin Ka et al.

ICCV 2025posterarXiv:2411.14137
#22248

Photolithography Overlay Map Generation with Implicit Knowledge Distillation Diffusion Transformer

YuanFu Yang, Hsiu-Hui Hsiao

ICCV 2025poster
#22249

SA-LUT: Spatial Adaptive 4D Look-Up Table for Photorealistic Style Transfer

Zerui Gong, Zhonghua Wu, Qingyi Tao et al.

ICCV 2025posterarXiv:2506.13465
#22250

What's Making That Sound Right Now? Video-centric Audio-Visual Localization

hahyeon choi, Junhoo Lee, Nojun Kwak

ICCV 2025posterarXiv:2507.04667
#22251

ProAPO: Progressively Automatic Prompt Optimization for Visual Classification

Xiangyan Qu, Gaopeng Gou, Jiamin Zhuang et al.

CVPR 2025posterarXiv:2502.19844
#22252

Accelerating Diffusion Sampling via Exploiting Local Transition Coherence

shangwen zhu, Han Zhang, Zhantao Yang et al.

ICCV 2025posterarXiv:2503.09675
#22253

VehicleMAE: View-asymmetry Mutual Learning for Vehicle Re-identification Pre-training via Masked AutoEncoders

Qi Wang, Zeyu Zhang, Dong Wang et al.

ICCV 2025poster
#22254

EEGMirror: Leveraging EEG data in the wild via Montage-Agnostic Self-Supervision for EEG to Video Decoding

Xuan-Hao Liu, Bao-liang Lu, Wei-Long Zheng

ICCV 2025poster
#22255

MagicCity: Geometry-Aware 3D City Generation from Satellite Imagery with Multi-View Consistency

Xingbo YAO, xuanmin Wang, Hao WU et al.

ICCV 2025poster
#22256

RARE: Refine Any Registration of Pairwise Point Clouds via Zero-Shot Learning

Chengyu Zheng, Honghua Chen, Jin Huang et al.

ICCV 2025posterarXiv:2507.19950
#22257

Multi-scenario Overlapping Text Segmentation with Depth Awareness

Yang Liu, Xudong Xie, Yuliang Liu et al.

ICCV 2025poster
#22258

Zero-Shot Vision Encoder Grafting via LLM Surrogates

Kaiyu Yue, Vasu Singla, Menglin Jia et al.

ICCV 2025posterarXiv:2505.22664
#22259

OV-SCAN: Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection

Adrian Chow, Evelien Riddell, Yimu Wang et al.

ICCV 2025posterarXiv:2503.06435
#22260

FullDiT: Video Generative Foundation Models with Multimodal Control via Full Attention

Xuan Ju, Weicai Ye, Quande Liu et al.

ICCV 2025poster
#22261

SC-Lane: Slope-aware and Consistent Road Height Estimation Framework for 3D Lane Detection

Chaesong Park, Eunbin Seo, JihyeonHwang JihyeonHwang et al.

ICCV 2025posterarXiv:2508.10411
#22262

Exploring the Visual Feature Space for Multimodal Neural Decoding

Weihao Xia, Cengiz Oztireli

ICCV 2025posterarXiv:2505.15755
#22263

Parametric Shadow Control for Portrait Generation in Text-to-Image Diffusion Models

Haoming Cai, Tsung-Wei Huang, Shiv Gehlot et al.

ICCV 2025posterarXiv:2503.21943
#22264

ConceptSplit: Decoupled Multi-Concept Personalization of Diffusion Models via Token-wise Adaptation and Attention Disentanglement

Habin Lim, Youngseob Won, Juwon Seo et al.

ICCV 2025posterarXiv:2510.04668
#22265

Task-Specific Gradient Adaptation for Few-Shot One-Class Classification

Yunlong Li, Xiabi Liu, Liyuan Pan et al.

CVPR 2025poster
#22266

Backdoor Defense via Enhanced Splitting and Trap Isolation

Hongrui Yu, Lu Qi, Wanyu Lin et al.

ICCV 2025poster
#22267

Learning Hierarchical Line Buffer for Image Processing

Jiacheng Li, Feiran Li, Daisuke Iso

ICCV 2025poster
#22268

ATAS: Any-to-Any Self-Distillation for Enhanced Open-Vocabulary Dense Prediction

Soonwoo Cha, Jiwoo Song, Juan Yeo et al.

ICCV 2025posterarXiv:2506.08678
#22269

Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing

Taihang Hu, Linxuan Li, Kai Wang et al.

ICCV 2025posterarXiv:2504.10434
#22270

ADMN: A Layer-Wise Adaptive Multimodal Network for Dynamic Input Noise and Compute Resources

Jason Wu, Yuyang Yuan, Kang Yang et al.

NEURIPS 2025posterarXiv:2502.07862
#22271

Humans as Checkerboards: Calibrating Camera Motion Scale for World-Coordinate Human Mesh Recovery

Fengyuan Yang, Kerui Gu, Ha Linh Nguyen et al.

ICCV 2025posterarXiv:2407.00574
#22272

D3: Training-Free AI-Generated Video Detection Using Second-Order Features

Chende Zheng, Ruiqi suo, Chenhao Lin et al.

ICCV 2025posterarXiv:2508.00701
#22273

Overcoming Dual Drift for Continual Long-Tailed Visual Question Answering

Feifei Zhang, Zhihao Wang, Xi Zhang et al.

ICCV 2025poster
#22274

Preserve Anything: Controllable Image Synthesis with Object Preservation

Prasen Kumar Sharma, Neeraj Matiyali, Siddharth Srivastava et al.

ICCV 2025posterarXiv:2506.22531