Most Cited CVPR "rating indeterminacy" Papers

5,589 papers found • Page 8 of 28

#1401

CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models

Felix Taubner, Ruihang Zhang, Mathieu Tuli et al.

CVPR 2025arXiv:2412.12093
25
citations
#1402

Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers

Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim

CVPR 2024arXiv:2403.10030
25
citations
#1403

Generate Subgoal Images before Act: Unlocking the Chain-of-Thought Reasoning in Diffusion Model for Robot Manipulation with Multimodal Prompts

Fei Ni, Jianye Hao, Shiguang Wu et al.

CVPR 2024
25
citations
#1404

Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment

Muhammad Sohail Danish, Muhammad Haris Khan, Muhammad Akhtar Munir et al.

CVPR 2024arXiv:2405.14497
25
citations
#1405

Small Scale Data-Free Knowledge Distillation

He Liu, Yikai Wang, Huaping Liu et al.

CVPR 2024arXiv:2406.07876
25
citations
#1406

Image-to-Image Matching via Foundation Models: A New Perspective for Open-Vocabulary Semantic Segmentation

Yuan Wang, Rui Sun, Naisong Luo et al.

CVPR 2024arXiv:2404.00262
25
citations
#1407

Perception-Oriented Video Frame Interpolation via Asymmetric Blending

Guangyang Wu, Xin Tao, Changlin Li et al.

CVPR 2024arXiv:2404.06692
25
citations
#1408

Permutation Equivariance of Transformers and Its Applications

Hengyuan Xu, Liyao Xiang, Hangyu Ye et al.

CVPR 2024arXiv:2304.07735
25
citations
#1409

FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes

Lue Fan, Hao ZHANG, Qitai Wang et al.

CVPR 2025arXiv:2412.03566
25
citations
#1410

Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion

Linzhan Mou, Jun-Kun Chen, Yu-Xiong Wang

CVPR 2024arXiv:2406.09402
25
citations
#1411

Federated Generalized Category Discovery

Nan Pu, Wenjing Li, Xinyuan Ji et al.

CVPR 2024arXiv:2305.14107
25
citations
#1412

Learning without Exact Guidance: Updating Large-scale High-resolution Land Cover Maps from Low-resolution Historical Labels

Zhuohong Li, Wei He, Jiepan Li et al.

CVPR 2024highlightarXiv:2403.02746
25
citations
#1413

360+x: A Panoptic Multi-modal Scene Understanding Dataset

Hao Chen, Yuqi Hou, Chenyuan Qu et al.

CVPR 2024arXiv:2404.00989
25
citations
#1414

MoDE: CLIP Data Experts via Clustering

Jiawei Ma, Po-Yao Huang, Saining Xie et al.

CVPR 2024arXiv:2404.16030
25
citations
#1415

DiffusionPoser: Real-time Human Motion Reconstruction From Arbitrary Sparse Sensors Using Autoregressive Diffusion

Tom Van Wouwe, Seunghwan Lee, Antoine Falisse et al.

CVPR 2024arXiv:2308.16682
25
citations
#1416

Towards Open-Vocabulary Audio-Visual Event Localization

Jinxing Zhou, Dan Guo, Ruohao Guo et al.

CVPR 2025arXiv:2411.11278
25
citations
#1417

Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting

Zijie Chen, Lichao Zhang, Fangsheng Weng et al.

CVPR 2024arXiv:2310.08129
25
citations
#1418

FastMAC: Stochastic Spectral Sampling of Correspondence Graph

Yifei Zhang, Hao Zhao, Hongyang Li et al.

CVPR 2024arXiv:2403.08770
25
citations
#1419

Doubly Abductive Counterfactual Inference for Text-based Image Editing

Xue Song, Jiequan Cui, Hanwang Zhang et al.

CVPR 2024arXiv:2403.02981
25
citations
#1420

SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures

Hui Liu, Chen Jia, Fan Shi et al.

CVPR 2025arXiv:2503.01113
25
citations
#1421

CPP-Net: Embracing Multi-Scale Feature Fusion into Deep Unfolding CP-PPA Network for Compressive Sensing

Zhen Guo, Hongping Gan

CVPR 2024
25
citations
#1422

CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation

Jiahao Li, Weijian Ma, Xueyang Li et al.

CVPR 2025arXiv:2505.04481
25
citations
#1423

AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM

Wang Jiarui, Huiyu Duan, Guangtao Zhai et al.

CVPR 2025arXiv:2411.17221
25
citations
#1424

MMRL: Multi-Modal Representation Learning for Vision-Language Models

Yuncheng Guo, Xiaodong Gu

CVPR 2025arXiv:2503.08497
25
citations
#1425

PhD: A ChatGPT-Prompted Visual Hallucination Evaluation Dataset

Jiazhen Liu, Yuhan Fu, Ruobing Xie et al.

CVPR 2025highlightarXiv:2403.11116
25
citations
#1426

Promptable Behaviors: Personalizing Multi-Objective Rewards from Human Preferences

Minyoung Hwang, Luca Weihs, Chanwoo Park et al.

CVPR 2024arXiv:2312.09337
25
citations
#1427

SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation

Zhixuan Liu, Peter Schaldenbrand, Beverley-Claire Okogwu et al.

CVPR 2024arXiv:2401.08053
25
citations
#1428

Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity

Ruijie Quan, Wenguan Wang, Zhibo Tian et al.

CVPR 2024arXiv:2403.20022
25
citations
#1429

Dual-Interrelated Diffusion Model for Few-Shot Anomaly Image Generation

Ying Jin, Jinlong Peng, Qingdong He et al.

CVPR 2025arXiv:2408.13509
25
citations
#1430

MANUS: Markerless Grasp Capture using Articulated 3D Gaussians

Chandradeep Pokhariya, Ishaan Shah, Angela Xing et al.

CVPR 2024arXiv:2312.02137
25
citations
#1431

Diffusion Time-step Curriculum for One Image to 3D Generation

YI Xuanyu, Zike Wu, Qingshan Xu et al.

CVPR 2024arXiv:2404.04562
25
citations
#1432

Learning Equi-angular Representations for Online Continual Learning

Minhyuk Seo, Hyunseo Koh, Wonje Jeung et al.

CVPR 2024arXiv:2404.01628
24
citations
#1433

TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model

Cheng Yang, Yang Sui, Jinqi Xiao et al.

CVPR 2025arXiv:2503.18278
24
citations
#1434

ID-Blau: Image Deblurring by Implicit Diffusion-based reBLurring AUgmentation

Jia-Hao Wu, Fu-Jen Tsai, Yan-Tsung Peng et al.

CVPR 2024arXiv:2312.10998
24
citations
#1435

VkD: Improving Knowledge Distillation using Orthogonal Projections

Roy Miles, Ismail Elezi, Jiankang Deng

CVPR 2024
24
citations
#1436

BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models

Fengyuan Shi, Jiaxi Gu, Hang Xu et al.

CVPR 2024arXiv:2312.02813
24
citations
#1437

Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration

JUNSEONG KIM, GeonU Kim, Kim Yu-Ji et al.

CVPR 2025highlightarXiv:2502.16652
24
citations
#1438

Scaling Up Video Summarization Pretraining with Large Language Models

Dawit Argaw Argaw, Seunghyun Yoon, Fabian Caba Heilbron et al.

CVPR 2024arXiv:2404.03398
24
citations
#1439

Open-Vocabulary Object 6D Pose Estimation

Jaime Corsetti, Davide Boscaini, Changjae Oh et al.

CVPR 2024highlightarXiv:2312.00690
24
citations
#1440

ODCR: Orthogonal Decoupling Contrastive Regularization for Unpaired Image Dehazing

Zhongze Wang, Haitao Zhao, Jingchao Peng et al.

CVPR 2024arXiv:2404.17825
24
citations
#1441

Face2Diffusion for Fast and Editable Face Personalization

Kaede Shiohara, Toshihiko Yamasaki

CVPR 2024arXiv:2403.05094
24
citations
#1442

CleanDIFT: Diffusion Features without Noise

Nick Stracke, Stefan Andreas Baumann, Kolja Bauer et al.

CVPR 2025arXiv:2412.03439
24
citations
#1443

EditAR: Unified Conditional Generation with Autoregressive Models

Jiteng Mu, Nuno Vasconcelos, Xiaolong Wang

CVPR 2025arXiv:2501.04699
24
citations
#1444

PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering

Yifan Gao, Zihang Lin, Chuanbin Liu et al.

CVPR 2025arXiv:2504.06632
24
citations
#1445

GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion

Jiapeng Tang, Davide Davoli, Tobias Kirschstein et al.

CVPR 2025arXiv:2412.10209
24
citations
#1446

AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning

Yuwei Tang, ZhenYi Lin, Qilong Wang et al.

CVPR 2024arXiv:2404.08958
24
citations
#1447

Towards More General Video-based Deepfake Detection through Facial Component Guided Adaptation for Foundation Model

Yue-Hua Han, Tai-Ming Huang, Kailung Hua et al.

CVPR 2025arXiv:2404.05583
24
citations
#1448

Tyche: Stochastic In-Context Learning for Medical Image Segmentation

Marianne Rakic, Hallee Wong, Jose Javier Gonzalez Ortiz et al.

CVPR 2024highlightarXiv:2401.13650
24
citations
#1449

ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations

Maitreya Patel, Changhoon Kim, Sheng Cheng et al.

CVPR 2024arXiv:2312.04655
24
citations
#1450

Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models

Andreas Müller, Denis Lukovnikov, Jonas Thietke et al.

CVPR 2025arXiv:2412.03283
24
citations
#1451

Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion

Hao Wen, Zehuan Huang, Yaohui Wang et al.

CVPR 2025arXiv:2406.03184
24
citations
#1452

Test-Time Adaptation for Depth Completion

Hyoungseob Park, Anjali W Gupta, Alex Wong

CVPR 2024arXiv:2402.03312
24
citations
#1453

Unknown Prompt the only Lacuna: Unveiling CLIP's Potential for Open Domain Generalization

Mainak Singha, Ankit Jha, Shirsha Bose et al.

CVPR 2024arXiv:2404.00710
24
citations
#1454

SANeRF-HQ: Segment Anything for NeRF in High Quality

Yichen Liu, Benran Hu, Chi-Keung Tang et al.

CVPR 2024arXiv:2312.01531
24
citations
#1455

Garment Recovery with Shape and Deformation Priors

Ren Li, Corentin Dumery, Benoît Guillard et al.

CVPR 2024arXiv:2311.10356
24
citations
#1456

SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models

Wufei Ma, Luoxin Ye, Nessa McWeeney et al.

CVPR 2025highlightarXiv:2505.00788
24
citations
#1457

NAYER: Noisy Layer Data Generation for Efficient and Effective Data-free Knowledge Distillation

Minh-Tuan Tran, Trung Le, Xuan-May Le et al.

CVPR 2024arXiv:2310.00258
24
citations
#1458

MonoDiff: Monocular 3D Object Detection and Pose Estimation with Diffusion Models

Yasiru Ranasinghe, Deepti Hegde, Vishal M. Patel

CVPR 2024
24
citations
#1459

From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons

Andrew Szot, Bogdan Mazoure, Omar Attia et al.

CVPR 2025arXiv:2412.08442
24
citations
#1460

HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative

CONG MA, Qiao Lei, Chengkai Zhu et al.

CVPR 2024arXiv:2403.02640
24
citations
#1461

Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition

Anqi Zhu, Qiuhong Ke, Mingming Gong et al.

CVPR 2024arXiv:2406.13327
24
citations
#1462

Model Poisoning Attacks to Federated Learning via Multi-Round Consistency

Yueqi Xie, Minghong Fang, Neil Zhenqiang Gong

CVPR 2025arXiv:2404.15611
24
citations
#1463

POSTA: A Go-to Framework for Customized Artistic Poster Generation

Haoyu Chen, Xiaojie Xu, Wenbo Li et al.

CVPR 2025arXiv:2503.14908
24
citations
#1464

Deep Equilibrium Diffusion Restoration with Parallel Sampling

Jiezhang Cao, Yue Shi, Kai Zhang et al.

CVPR 2024arXiv:2311.11600
24
citations
#1465

Contrastive Learning for DeepFake Classification and Localization via Multi-Label Ranking

Cheng-Yao Hong, Yen-Chi Hsu, Tyng-Luh Liu

CVPR 2024
24
citations
#1466

EgoGen: An Egocentric Synthetic Data Generator

Gen Li, Kaifeng Zhao, Siwei Zhang et al.

CVPR 2024arXiv:2401.08739
24
citations
#1467

Diffusion-driven GAN Inversion for Multi-Modal Face Image Generation

Jihyun Kim, Changjae Oh, Hoseok Do et al.

CVPR 2024arXiv:2405.04356
24
citations
#1468

ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers

Narges Norouzi, Svetlana Orlova, Daan de Geus et al.

CVPR 2024arXiv:2406.09936
24
citations
#1469

Gaussian Splashing: Unified Particles for Versatile Motion Synthesis and Rendering

Yutao Feng, Xiang Feng, Yintong Shang et al.

CVPR 2025arXiv:2401.15318
24
citations
#1470

LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation

Linfeng Yuan, Miaojing Shi, Zijie Yue et al.

CVPR 2024arXiv:2306.08736
24
citations
#1471

AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning

Duojun Huang, Xinyu Xiong, Jie Ma et al.

CVPR 2024arXiv:2406.00480
24
citations
#1472

MC^2: Multi-concept Guidance for Customized Multi-concept Generation

Jiaxiu Jiang, Yabo Zhang, Kailai Feng et al.

CVPR 2025arXiv:2404.05268
24
citations
#1473

GAFusion: Adaptive Fusing LiDAR and Camera with Multiple Guidance for 3D Object Detection

Xiaotian Li, Baojie Fan, Jiandong Tian et al.

CVPR 2024arXiv:2411.00340
24
citations
#1474

Curriculum Direct Preference Optimization for Diffusion and Consistency Models

Florinel Croitoru, Vlad Hondru, Radu Tudor Ionescu et al.

CVPR 2025arXiv:2405.13637
24
citations
#1475

Training-Free Pretrained Model Merging

Zhengqi Xu, Ke Yuan, Huiqiong Wang et al.

CVPR 2024arXiv:2403.01753
24
citations
#1476

Doodle Your 3D: From Abstract Freehand Sketches to Precise 3D Shapes

Hmrishav Bandyopadhyay, Subhadeep Koley, Ayan Das et al.

CVPR 2024arXiv:2312.04043
24
citations
#1477

AnimateAnything: Consistent and Controllable Animation for Video Generation

guojun lei, Chi Wang, Rong Zhang et al.

CVPR 2025arXiv:2411.10836
24
citations
#1478

Rethinking Multi-view Representation Learning via Distilled Disentangling

Guanzhou Ke, Bo Wang, Xiao-Li Wang et al.

CVPR 2024arXiv:2403.10897
23
citations
#1479

HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery

Jingtao Li, Yingyi Liu, XINYU WANG et al.

CVPR 2025arXiv:2503.21841
23
citations
#1480

APISR: Anime Production Inspired Real-World Anime Super-Resolution

Boyang Wang, Fengyu Yang, Xihang Yu et al.

CVPR 2024arXiv:2403.01598
23
citations
#1481

Spherical Mask: Coarse-to-Fine 3D Point Cloud Instance Segmentation with Spherical Representation

Sangyun Shin, Kaichen Zhou, Madhu Vankadari et al.

CVPR 2024arXiv:2312.11269
23
citations
#1482

PhysAnimator: Physics-Guided Generative Cartoon Animation

Tianyi Xie, Yiwei Zhao, Ying Jiang et al.

CVPR 2025arXiv:2501.16550
23
citations
#1483

PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion

Ying-Tian Liu, Yuan-Chen Guo, Guan Luo et al.

CVPR 2024arXiv:2312.09069
23
citations
#1484

Multi-agent Long-term 3D Human Pose Forecasting via Interaction-aware Trajectory Conditioning

Jaewoo Jeong, Daehee Park, Kuk-Jin Yoon

CVPR 2024highlightarXiv:2404.05218
23
citations
#1485

Large Language Models are Good Prompt Learners for Low-Shot Image Classification

Zhaoheng Zheng, Jingmin Wei, Xuefeng Hu et al.

CVPR 2024arXiv:2312.04076
23
citations
#1486

Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models

Gihyun Kwon, Simon Jenni, Ding Li et al.

CVPR 2024arXiv:2404.03913
23
citations
#1487

Back to 3D: Few-Shot 3D Keypoint Detection with Back-Projected 2D Features

Thomas Wimmer, Peter Wonka, Maks Ovsjanikov

CVPR 2024arXiv:2311.18113
23
citations
#1488

Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing

Xun Lin, Shuai Wang, RIZHAO CAI et al.

CVPR 2024highlightarXiv:2402.19298
23
citations
#1489

MeshArt: Generating Articulated Meshes with Structure-Guided Transformers

Daoyi Gao, Mohd Yawar Nihal Siddiqui, Lei Li et al.

CVPR 2025arXiv:2412.11596
23
citations
#1490

PanoContext-Former: Panoramic Total Scene Understanding with a Transformer

Yuan Dong, Chuan Fang, Liefeng Bo et al.

CVPR 2024arXiv:2305.12497
23
citations
#1491

Orchestrate Latent Expertise: Advancing Online Continual Learning with Multi-Level Supervision and Reverse Self-Distillation

Hongwei Yan, Liyuan Wang, Kaisheng Ma et al.

CVPR 2024arXiv:2404.00417
23
citations
#1492

PromptHMR: Promptable Human Mesh Recovery

Yufu Wang, Yu Sun, Priyanka Patel et al.

CVPR 2025arXiv:2504.06397
23
citations
#1493

Masked and Shuffled Blind Spot Denoising for Real-World Images

Hamadi Chihaoui, Paolo Favaro

CVPR 2024arXiv:2404.09389
23
citations
#1494

MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation

Zhenyu Wu, Yuheng Zhou, Xiuwei Xu et al.

CVPR 2025arXiv:2503.13446
23
citations
#1495

Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report Generation

Kang Liu, Zhuoqi Ma, Xiaolu Kang et al.

CVPR 2025arXiv:2502.20056
23
citations
#1496

KeyPoint Relative Position Encoding for Face Recognition

Minchul Kim, Feng Liu, Yiyang Su et al.

CVPR 2024arXiv:2403.14852
23
citations
#1497

Bayesian Diffusion Models for 3D Shape Reconstruction

Haiyang Xu, Yu lei, Zeyuan Chen et al.

CVPR 2024arXiv:2403.06973
23
citations
#1498

PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios

Jingbo Wang, Zhengyi Luo, Ye Yuan et al.

CVPR 2024arXiv:2404.19722
23
citations
#1499

ModaVerse: Efficiently Transforming Modalities with LLMs

Xinyu Wang, Bohan Zhuang, Qi Wu

CVPR 2024arXiv:2401.06395
23
citations
#1500

RankED: Addressing Imbalance and Uncertainty in Edge Detection Using Ranking-based Losses

bedrettin cetinkaya, Sinan Kalkan, Emre Akbas

CVPR 2024arXiv:2403.01795
23
citations
#1501

Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning

Yixiong Zou, Yicong Liu, Yiman Hu et al.

CVPR 2024arXiv:2403.00567
23
citations
#1502

SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE

YONGWEI CHEN, Yushi Lan, Shangchen Zhou et al.

CVPR 2025arXiv:2411.16856
23
citations
#1503

Domain Prompt Learning with Quaternion Networks

Qinglong Cao, Zhengqin Xu, Yuntian Chen et al.

CVPR 2024highlightarXiv:2312.08878
23
citations
#1504

FLAIR: VLM with Fine-grained Language-informed Image Representations

Rui Xiao, Sanghwan Kim, Iuliana Georgescu et al.

CVPR 2025arXiv:2412.03561
23
citations
#1505

VAREN: Very Accurate and Realistic Equine Network

Silvia Zuffi, Ylva Mellbin, Ci Li et al.

CVPR 2024
23
citations
#1506

Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content

Zicheng Zhang, Tengchuan Kou, Chunyi Li et al.

CVPR 2025arXiv:2503.02357
23
citations
#1507

EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams

Christen Millerdurai, Hiroyasu Akada, Jian Wang et al.

CVPR 2024arXiv:2404.08640
23
citations
#1508

AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction

Lingteng Qiu, Shenhao Zhu, Qi Zuo et al.

CVPR 2025arXiv:2412.02684
23
citations
#1509

OSV: One Step is Enough for High-Quality Image to Video Generation

Xiaofeng Mao, Zhengkai Jiang, Fu-Yun Wang et al.

CVPR 2025arXiv:2409.11367
23
citations
#1510

FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis

Wonjoon Jin, Qi Dai, Chong Luo et al.

CVPR 2025arXiv:2502.08244
23
citations
#1511

Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks

Han Wang, Gang Wang, Huan Zhang

CVPR 2025arXiv:2411.16721
23
citations
#1512

Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation

Yuhui Zhang, Yuchang Su, Yiming Liu et al.

CVPR 2025arXiv:2501.03225
23
citations
#1513

Composing Object Relations and Attributes for Image-Text Matching

Khoi Pham, Chuong Huynh, Ser-Nam Lim et al.

CVPR 2024arXiv:2406.11820
23
citations
#1514

Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation

Yudi Shi, Shangzhe Di, Qirui Chen et al.

CVPR 2025arXiv:2412.01694
23
citations
#1515

Dynamic LiDAR Re-simulation using Compositional Neural Fields

Hanfeng Wu, Xingxing Zuo, Stefan Leutenegger et al.

CVPR 2024highlightarXiv:2312.05247
23
citations
#1516

2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification

Jingwei Zhang, Anh Tien Nguyen, Xi Han et al.

CVPR 2025arXiv:2412.00678
23
citations
#1517

Towards Robust 3D Object Detection with LiDAR and 4D Radar Fusion in Various Weather Conditions

Yujeong Chae, Hyeonseong Kim, Kuk-Jin Yoon

CVPR 2024
23
citations
#1518

The Manga Whisperer: Automatically Generating Transcriptions for Comics

Ragav Sachdeva, Andrew Zisserman

CVPR 2024arXiv:2401.10224
23
citations
#1519

CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow

Chenbin Pan, Burhan Yaman, Senem Velipasalar et al.

CVPR 2024arXiv:2403.08919
23
citations
#1520

Overcoming Generic Knowledge Loss with Selective Parameter Update

Wenxuan Zhang, Paul Janson, Rahaf Aljundi et al.

CVPR 2024arXiv:2308.12462
23
citations
#1521

UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection

Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.

CVPR 2025arXiv:2412.03342
23
citations
#1522

FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance Head-pose and Facial Expression Features

Andre Rochow, Max Schwarz, Sven Behnke

CVPR 2024arXiv:2404.09736
23
citations
#1523

Continual Forgetting for Pre-trained Vision Models

Hongbo Zhao, Bolin Ni, Junsong Fan et al.

CVPR 2024arXiv:2403.11530
23
citations
#1524

Solving the Catastrophic Forgetting Problem in Generalized Category Discovery

Xinzi Cao, Xiawu Zheng, Guanhong Wang et al.

CVPR 2024arXiv:2501.05272
23
citations
#1525

Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation

Siteng Huang, Biao Gong, Yutong Feng et al.

CVPR 2024arXiv:2311.15841
23
citations
#1526

Desigen: A Pipeline for Controllable Design Template Generation

Haohan Weng, Danqing Huang, YU QIAO et al.

CVPR 2024arXiv:2403.09093
23
citations
#1527

Taming Self-Training for Open-Vocabulary Object Detection

Shiyu Zhao, Samuel Schulter, Long Zhao et al.

CVPR 2024arXiv:2308.06412
23
citations
#1528

DiSR-NeRF: Diffusion-Guided View-Consistent Super-Resolution NeRF

Jie Long Lee, Chen Li, Gim Hee Lee

CVPR 2024arXiv:2404.00874
23
citations
#1529

TinyFusion: Diffusion Transformers Learned Shallow

Gongfan Fang, Kunjun Li, Xinyin Ma et al.

CVPR 2025highlightarXiv:2412.01199
23
citations
#1530

POPDG: Popular 3D Dance Generation with PopDanceSet

Zhenye Luo, Min Ren, Xuecai Hu et al.

CVPR 2024arXiv:2405.03178
23
citations
#1531

DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaptation by Combining 3D GANs and Diffusion Priors

Biwen Lei, Kai Yu, Mengyang Feng et al.

CVPR 2024arXiv:2312.16837
23
citations
#1532

DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models

Zhendong Wang, Jianmin Bao, Shuyang Gu et al.

CVPR 2025arXiv:2503.01645
23
citations
#1533

GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation

WEIMING ZHANG, Yexin Liu, Xu Zheng et al.

CVPR 2024arXiv:2403.16370
23
citations
#1534

CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology

Yuxuan Sun, Yixuan Si, Chenglu Zhu et al.

CVPR 2025arXiv:2412.12077
23
citations
#1535

Dual-View Visual Contextualization for Web Navigation

Jihyung Kil, Chan Hee Song, Boyuan Zheng et al.

CVPR 2024arXiv:2402.04476
23
citations
#1536

PLACE: Adaptive Layout-Semantic Fusion for Semantic Image Synthesis

Zhengyao Lv, Yuxiang Wei, Wangmeng Zuo et al.

CVPR 2024highlightarXiv:2403.01852
23
citations
#1537

Latent Modulated Function for Computational Optimal Continuous Image Representation

Zongyao He, Zhi Jin

CVPR 2024highlightarXiv:2404.16451
22
citations
#1538

Language-Driven Anchors for Zero-Shot Adversarial Robustness

Xiao Li, Wei Zhang, Yining Liu et al.

CVPR 2024arXiv:2301.13096
22
citations
#1539

HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations

Peng Dai, Yang Zhang, Tao Liu et al.

CVPR 2024arXiv:2403.03561
22
citations
#1540

Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning

Bozhou Zhang, Nan Song, Xin Jin et al.

CVPR 2025arXiv:2503.14182
22
citations
#1541

Spatio-Temporal Turbulence Mitigation: A Translational Perspective

Xingguang Zhang, Nicholas M Chimitt, Yiheng Chi et al.

CVPR 2024arXiv:2401.04244
22
citations
#1542

Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval

Young Kyun Jang, Donghyun Kim, Zihang Meng et al.

CVPR 2024arXiv:2404.15516
22
citations
#1543

Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing

Pengcheng Xu, Boyuan Jiang, Xiaobin Hu et al.

CVPR 2025arXiv:2411.15843
22
citations
#1544

3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation

Songchun Zhang, Yibo Zhang, Quan Zheng et al.

CVPR 2024arXiv:2403.09439
22
citations
#1545

Targeted Representation Alignment for Open-World Semi-Supervised Learning

Ruixuan Xiao, Lei Feng, Kai Tang et al.

CVPR 2024
22
citations
#1546

Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy

Zaijing Li, Yuquan Xie, Rui Shao et al.

CVPR 2025arXiv:2502.19902
22
citations
#1547

Semantic-aware SAM for Point-Prompted Instance Segmentation

Zhaoyang Wei, Pengfei Chen, Xuehui Yu et al.

CVPR 2024highlightarXiv:2312.15895
22
citations
#1548

F-LMM: Grounding Frozen Large Multimodal Models

Size Wu, Sheng Jin, Wenwei Zhang et al.

CVPR 2025arXiv:2406.05821
22
citations
#1549

Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach

Wei Dong, Xing Zhang, Bihui Chen et al.

CVPR 2024arXiv:2403.19067
22
citations
#1550

LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment

yiming ren, xiao han, Chengfeng Zhao et al.

CVPR 2024highlightarXiv:2402.17171
22
citations
#1551

Mind Marginal Non-Crack Regions: Clustering-Inspired Representation Learning for Crack Segmentation

zhuangzhuang chen, Zhuonan Lai, Jie Chen et al.

CVPR 2024
22
citations
#1552

VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams

Liao Wang, Kaixin Yao, Chengcheng Guo et al.

CVPR 2024arXiv:2312.01407
22
citations
#1553

ASAM: Boosting Segment Anything Model with Adversarial Tuning

Bo Li, Haoke Xiao, Lv Tang

CVPR 2024arXiv:2405.00256
22
citations
#1554

CSTA: CNN-based Spatiotemporal Attention for Video Summarization

Jaewon Son, Jaehun Park, Kwangsu Kim

CVPR 2024arXiv:2405.11905
22
citations
#1555

Detecting Out-of-Distribution Through the Lens of Neural Collapse

Litian Liu, Yao Qin

CVPR 2025arXiv:2311.01479
22
citations
#1556

Rethinking Few-shot 3D Point Cloud Semantic Segmentation

Zhaochong An, Guolei Sun, Yun Liu et al.

CVPR 2024arXiv:2403.00592
22
citations
#1557

PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization

Zining Chen, Weiqiu Wang, Zhicheng Zhao et al.

CVPR 2024arXiv:2404.09011
22
citations
#1558

Generative Video Propagation

Shaoteng Liu, Tianyu Wang, Jui-Hsien Wang et al.

CVPR 2025arXiv:2412.19761
22
citations
#1559

Category-Level Multi-Part Multi-Joint 3D Shape Assembly

Yichen Li, Kaichun Mo, Yueqi Duan et al.

CVPR 2024arXiv:2303.06163
22
citations
#1560

End-to-End Spatio-Temporal Action Localisation with Video Transformers

Alexey Gritsenko, Xuehan Xiong, Josip Djolonga et al.

CVPR 2024arXiv:2304.12160
22
citations
#1561

Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning

Desai Xie, Jiahao Li, Hao Tan et al.

CVPR 2024arXiv:2312.13980
22
citations
#1562

GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation

Lang Lin, Xueyang Yu, Ziqi Pang et al.

CVPR 2025arXiv:2504.07962
22
citations
#1563

NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors

Yannan He, Garvita Tiwari, Tolga Birdal et al.

CVPR 2024highlightarXiv:2403.03122
22
citations
#1564

Time- Memory- and Parameter-Efficient Visual Adaptation

Otniel-Bogdan Mercea, Alexey Gritsenko, Cordelia Schmid et al.

CVPR 2024highlightarXiv:2402.02887
22
citations
#1565

COCONut: Modernizing COCO Segmentation

Xueqing Deng, Qihang Yu, Peng Wang et al.

CVPR 2024arXiv:2404.08639
22
citations
#1566

BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training

Xuanpu Zhang, Dan Song, pengxin zhan et al.

CVPR 2025arXiv:2408.06047
22
citations
#1567

Material Anything: Generating Materials for Any 3D Object via Diffusion

Xin Huang, Tengfei Wang, Ziwei Liu et al.

CVPR 2025highlightarXiv:2411.15138
22
citations
#1568

CrowdDiff: Multi-hypothesis Crowd Density Estimation using Diffusion Models

Yasiru Ranasinghe, Nithin Gopalakrishnan Nair, Wele Gedara Chaminda Bandara et al.

CVPR 2024arXiv:2303.12790
22
citations
#1569

Retrieval-Augmented Open-Vocabulary Object Detection

Jooyeon Kim, Eulrang Cho, Sehyung Kim et al.

CVPR 2024arXiv:2404.05687
22
citations
#1570

Utility-Fairness Trade-Offs and How to Find Them

Sepehr Dehdashtian, Bashir Sadeghi, Vishnu Naresh Boddeti

CVPR 2024arXiv:2404.09454
22
citations
#1571

Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh

Xiangjun Gao, Xiaoyu Li, Yiyu Zhuang et al.

CVPR 2025arXiv:2405.17811
22
citations
#1572

Language-guided Image Reflection Separation

Haofeng Zhong, Yuchen Hong, Shuchen Weng et al.

CVPR 2024arXiv:2402.11874
22
citations
#1573

HumanRef: Single Image to 3D Human Generation via Reference-Guided Diffusion

Jingbo Zhang, Xiaoyu Li, Qi Zhang et al.

CVPR 2024arXiv:2311.16961
22
citations
#1574

VecFusion: Vector Font Generation with Diffusion

Vikas Thamizharasan, Difan Liu, Shantanu Agarwal et al.

CVPR 2024highlightarXiv:2312.10540
22
citations
#1575

Depth Prompting for Sensor-Agnostic Depth Estimation

Jin-Hwi Park, Chanhwi Jeong, Junoh Lee et al.

CVPR 2024arXiv:2405.11867
22
citations
#1576

Narrative Action Evaluation with Prompt-Guided Multimodal Interaction

Shiyi Zhang, Sule Bai, Guangyi Chen et al.

CVPR 2024arXiv:2404.14471
22
citations
#1577

Distraction is All You Need for Multimodal Large Language Model Jailbreaking

Zuopeng Yang, Jiluan Fan, Anli Yan et al.

CVPR 2025highlightarXiv:2502.10794
22
citations
#1578

Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation

Razvan Pasca, Alexey Gavryushin, Muhammad Hamza et al.

CVPR 2024arXiv:2301.09209
22
citations
#1579

Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

Ke Fan, Zechen Bai, Tianjun Xiao et al.

CVPR 2024arXiv:2406.09196
22
citations
#1580

One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion

Chunyang Cheng, Tianyang Xu, Zhenhua Feng et al.

CVPR 2025arXiv:2502.19854
22
citations
#1581

Matrix3D: Large Photogrammetry Model All-in-One

Yuanxun Lu, Jingyang Zhang, Tian Fang et al.

CVPR 2025highlightarXiv:2502.07685
22
citations
#1582

TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation

Hongxiang Zhao, Xingchen Liu, Mutian Xu et al.

CVPR 2025arXiv:2503.11423
22
citations
#1583

Adv-CPG: A Customized Portrait Generation Framework with Facial Adversarial Attacks

Junying Wang, Hongyuan Zhang, Yuan Yuan

CVPR 2025arXiv:2503.08269
22
citations
#1584

T-VSL: Text-Guided Visual Sound Source Localization in Mixtures

Tanvir Mahmud, Yapeng Tian, Diana Marculescu

CVPR 2024arXiv:2404.01751
22
citations
#1585

Implicit Event-RGBD Neural SLAM

Delin Qu, Chi Yan, Dong Wang et al.

CVPR 2024highlightarXiv:2311.11013
22
citations
#1586

Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension

Quan Liu, Hongzi Zhu, Zhenxi Wang et al.

CVPR 2024arXiv:2403.03532
22
citations
#1587

Mind the Time: Temporally-Controlled Multi-Event Video Generation

Ziyi Wu, Aliaksandr Siarohin, Willi Menapace et al.

CVPR 2025arXiv:2412.05263
22
citations
#1588

Community Forensics: Using Thousands of Generators to Train Fake Image Detectors

Jeongsoo Park, Andrew Owens

CVPR 2025arXiv:2411.04125
22
citations
#1589

Active Prompt Learning in Vision Language Models

Jihwan Bang, Sumyeong Ahn, Jae-Gil Lee

CVPR 2024arXiv:2311.11178
22
citations
#1590

LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model

Chenjie Cao, Yunuo Cai, Qiaole Dong et al.

CVPR 2024arXiv:2305.11577
22
citations
#1591

MonoHair: High-Fidelity Hair Modeling from a Monocular Video

Keyu Wu, LINGCHEN YANG, Zhiyi Kuang et al.

CVPR 2024arXiv:2403.18356
22
citations
#1592

Any-Resolution AI-Generated Image Detection by Spectral Learning

Dimitrios Karageorgiou, Symeon Papadopoulos, Ioannis Kompatsiaris et al.

CVPR 2025arXiv:2411.19417
22
citations
#1593

TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models

Haomiao Ni, Bernhard Egger, Suhas Lohit et al.

CVPR 2024arXiv:2404.16306
22
citations
#1594

Neural Spline Fields for Burst Image Fusion and Layer Separation

Ilya Chugunov, David Shustin, Ruyu Yan et al.

CVPR 2024arXiv:2312.14235
21
citations
#1595

An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning

Jianqing Zhang, Yang Liu, Yang Hua et al.

CVPR 2024arXiv:2403.15760
21
citations
#1596

ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and Self-Prompting

Yankai Jiang, Zhongzhen Huang, Rongzhao Zhang et al.

CVPR 2024arXiv:2312.04964
21
citations
#1597

Real-time 3D-aware Portrait Video Relighting

Ziqi Cai, Kaiwen Jiang, Shu-Yu Chen et al.

CVPR 2024highlightarXiv:2410.18355
21
citations
#1598

DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer

Wei-Ting Chen, Gurunandan Krishnan, Qiang Gao et al.

CVPR 2024arXiv:2406.09622
21
citations
#1599

Collaborative Learning of Anomalies with Privacy (CLAP) for Unsupervised Video Anomaly Detection: A New Baseline

Anas Al-lahham, Muhammad Zaigham Zaheer, Nurbek Tastan et al.

CVPR 2024arXiv:2404.00847
21
citations
#1600

Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach

Mir Rayat Imtiaz Hossain, Mennatullah Siam, Leonid Sigal et al.

CVPR 2024arXiv:2404.11732
21
citations