Most Cited CVPR "false positive rate control" Papers

5,589 papers found • Page 8 of 28

Filters:Most Cited CVPR false positive rate control Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

#1401

CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models

Felix Taubner, Ruihang Zhang, Mathieu Tuli et al.

CVPR 2025arXiv:2412.12093

citations

#1402

Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers

Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim

CVPR 2024arXiv:2403.10030

citations

#1403

Generate Subgoal Images before Act: Unlocking the Chain-of-Thought Reasoning in Diffusion Model for Robot Manipulation with Multimodal Prompts

Fei Ni, Jianye Hao, Shiguang Wu et al.

CVPR 2024

citations

#1404

Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment

Muhammad Sohail Danish, Muhammad Haris Khan, Muhammad Akhtar Munir et al.

CVPR 2024arXiv:2405.14497

citations

#1405

Small Scale Data-Free Knowledge Distillation

He Liu, Yikai Wang, Huaping Liu et al.

CVPR 2024arXiv:2406.07876

citations

#1406

Image-to-Image Matching via Foundation Models: A New Perspective for Open-Vocabulary Semantic Segmentation

Yuan Wang, Rui Sun, Naisong Luo et al.

CVPR 2024arXiv:2404.00262

citations

#1407

Perception-Oriented Video Frame Interpolation via Asymmetric Blending

Guangyang Wu, Xin Tao, Changlin Li et al.

CVPR 2024arXiv:2404.06692

citations

#1408

Permutation Equivariance of Transformers and Its Applications

Hengyuan Xu, Liyao Xiang, Hangyu Ye et al.

CVPR 2024arXiv:2304.07735

citations

#1409

FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes

Lue Fan, Hao ZHANG, Qitai Wang et al.

CVPR 2025arXiv:2412.03566

citations

#1410

Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion

Linzhan Mou, Jun-Kun Chen, Yu-Xiong Wang

CVPR 2024arXiv:2406.09402

citations

#1411

Federated Generalized Category Discovery

Nan Pu, Wenjing Li, Xinyuan Ji et al.

CVPR 2024arXiv:2305.14107

citations

#1412

Learning without Exact Guidance: Updating Large-scale High-resolution Land Cover Maps from Low-resolution Historical Labels

Zhuohong Li, Wei He, Jiepan Li et al.

CVPR 2024highlightarXiv:2403.02746

citations

#1413

360+x: A Panoptic Multi-modal Scene Understanding Dataset

Hao Chen, Yuqi Hou, Chenyuan Qu et al.

CVPR 2024arXiv:2404.00989

citations

#1414

MoDE: CLIP Data Experts via Clustering

Jiawei Ma, Po-Yao Huang, Saining Xie et al.

CVPR 2024arXiv:2404.16030

citations

#1415

DiffusionPoser: Real-time Human Motion Reconstruction From Arbitrary Sparse Sensors Using Autoregressive Diffusion

Tom Van Wouwe, Seunghwan Lee, Antoine Falisse et al.

CVPR 2024arXiv:2308.16682

citations

#1416

Towards Open-Vocabulary Audio-Visual Event Localization

Jinxing Zhou, Dan Guo, Ruohao Guo et al.

CVPR 2025arXiv:2411.11278

citations

#1417

Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting

Zijie Chen, Lichao Zhang, Fangsheng Weng et al.

CVPR 2024arXiv:2310.08129

citations

#1418

FastMAC: Stochastic Spectral Sampling of Correspondence Graph

Yifei Zhang, Hao Zhao, Hongyang Li et al.

CVPR 2024arXiv:2403.08770

citations

#1419

Doubly Abductive Counterfactual Inference for Text-based Image Editing

Xue Song, Jiequan Cui, Hanwang Zhang et al.

CVPR 2024arXiv:2403.02981

citations

#1420

SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures

Hui Liu, Chen Jia, Fan Shi et al.

CVPR 2025arXiv:2503.01113

citations

#1421

CPP-Net: Embracing Multi-Scale Feature Fusion into Deep Unfolding CP-PPA Network for Compressive Sensing

Zhen Guo, Hongping Gan

CVPR 2024

citations

#1422

CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation

Jiahao Li, Weijian Ma, Xueyang Li et al.

CVPR 2025arXiv:2505.04481

citations

#1423

AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM

Wang Jiarui, Huiyu Duan, Guangtao Zhai et al.

CVPR 2025arXiv:2411.17221

citations

#1424

MMRL: Multi-Modal Representation Learning for Vision-Language Models

Yuncheng Guo, Xiaodong Gu

CVPR 2025arXiv:2503.08497

citations

#1425

PhD: A ChatGPT-Prompted Visual Hallucination Evaluation Dataset

Jiazhen Liu, Yuhan Fu, Ruobing Xie et al.

CVPR 2025highlightarXiv:2403.11116

citations

#1426

Promptable Behaviors: Personalizing Multi-Objective Rewards from Human Preferences

Minyoung Hwang, Luca Weihs, Chanwoo Park et al.

CVPR 2024arXiv:2312.09337

citations

#1427

SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation

Zhixuan Liu, Peter Schaldenbrand, Beverley-Claire Okogwu et al.

CVPR 2024arXiv:2401.08053

citations

#1428

Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity

Ruijie Quan, Wenguan Wang, Zhibo Tian et al.

CVPR 2024arXiv:2403.20022

citations

#1429

Dual-Interrelated Diffusion Model for Few-Shot Anomaly Image Generation

Ying Jin, Jinlong Peng, Qingdong He et al.

CVPR 2025arXiv:2408.13509

citations

#1430

MANUS: Markerless Grasp Capture using Articulated 3D Gaussians

Chandradeep Pokhariya, Ishaan Shah, Angela Xing et al.

CVPR 2024arXiv:2312.02137

citations

#1431

Diffusion Time-step Curriculum for One Image to 3D Generation

YI Xuanyu, Zike Wu, Qingshan Xu et al.

CVPR 2024arXiv:2404.04562

citations

#1432

Learning Equi-angular Representations for Online Continual Learning

Minhyuk Seo, Hyunseo Koh, Wonje Jeung et al.

CVPR 2024arXiv:2404.01628

citations

#1433

TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model

Cheng Yang, Yang Sui, Jinqi Xiao et al.

CVPR 2025arXiv:2503.18278

citations

#1434

ID-Blau: Image Deblurring by Implicit Diffusion-based reBLurring AUgmentation

Jia-Hao Wu, Fu-Jen Tsai, Yan-Tsung Peng et al.

CVPR 2024arXiv:2312.10998

citations

#1435

VkD: Improving Knowledge Distillation using Orthogonal Projections

Roy Miles, Ismail Elezi, Jiankang Deng

CVPR 2024

citations

#1436

BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models

Fengyuan Shi, Jiaxi Gu, Hang Xu et al.

CVPR 2024arXiv:2312.02813

citations

#1437

Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration

JUNSEONG KIM, GeonU Kim, Kim Yu-Ji et al.

CVPR 2025highlightarXiv:2502.16652

citations

#1438

Scaling Up Video Summarization Pretraining with Large Language Models

Dawit Argaw Argaw, Seunghyun Yoon, Fabian Caba Heilbron et al.

CVPR 2024arXiv:2404.03398

citations

#1439

Open-Vocabulary Object 6D Pose Estimation

Jaime Corsetti, Davide Boscaini, Changjae Oh et al.

CVPR 2024highlightarXiv:2312.00690

citations

#1440

ODCR: Orthogonal Decoupling Contrastive Regularization for Unpaired Image Dehazing

Zhongze Wang, Haitao Zhao, Jingchao Peng et al.

CVPR 2024arXiv:2404.17825

citations

#1441

Face2Diffusion for Fast and Editable Face Personalization

Kaede Shiohara, Toshihiko Yamasaki

CVPR 2024arXiv:2403.05094

citations

#1442

CleanDIFT: Diffusion Features without Noise

Nick Stracke, Stefan Andreas Baumann, Kolja Bauer et al.

CVPR 2025arXiv:2412.03439

citations

#1443

EditAR: Unified Conditional Generation with Autoregressive Models

Jiteng Mu, Nuno Vasconcelos, Xiaolong Wang

CVPR 2025arXiv:2501.04699

citations

#1444

PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering

Yifan Gao, Zihang Lin, Chuanbin Liu et al.

CVPR 2025arXiv:2504.06632

citations

#1445

GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion

Jiapeng Tang, Davide Davoli, Tobias Kirschstein et al.

CVPR 2025arXiv:2412.10209

citations

#1446

AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning

Yuwei Tang, ZhenYi Lin, Qilong Wang et al.

CVPR 2024arXiv:2404.08958

citations

#1447

Towards More General Video-based Deepfake Detection through Facial Component Guided Adaptation for Foundation Model

Yue-Hua Han, Tai-Ming Huang, Kailung Hua et al.

CVPR 2025arXiv:2404.05583

citations

#1448

Tyche: Stochastic In-Context Learning for Medical Image Segmentation

Marianne Rakic, Hallee Wong, Jose Javier Gonzalez Ortiz et al.

CVPR 2024highlightarXiv:2401.13650

citations

#1449

ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations

Maitreya Patel, Changhoon Kim, Sheng Cheng et al.

CVPR 2024arXiv:2312.04655

citations

#1450

Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models

Andreas Müller, Denis Lukovnikov, Jonas Thietke et al.

CVPR 2025arXiv:2412.03283

citations

#1451

Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion

Hao Wen, Zehuan Huang, Yaohui Wang et al.

CVPR 2025arXiv:2406.03184

citations

#1452

Test-Time Adaptation for Depth Completion

Hyoungseob Park, Anjali W Gupta, Alex Wong

CVPR 2024arXiv:2402.03312

citations

#1453

Unknown Prompt the only Lacuna: Unveiling CLIP's Potential for Open Domain Generalization

Mainak Singha, Ankit Jha, Shirsha Bose et al.

CVPR 2024arXiv:2404.00710

citations

#1454

SANeRF-HQ: Segment Anything for NeRF in High Quality

Yichen Liu, Benran Hu, Chi-Keung Tang et al.

CVPR 2024arXiv:2312.01531

citations

#1455

Garment Recovery with Shape and Deformation Priors

Ren Li, Corentin Dumery, Benoît Guillard et al.

CVPR 2024arXiv:2311.10356

citations

#1456

SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models

Wufei Ma, Luoxin Ye, Nessa McWeeney et al.

CVPR 2025highlightarXiv:2505.00788

citations

#1457

NAYER: Noisy Layer Data Generation for Efficient and Effective Data-free Knowledge Distillation

Minh-Tuan Tran, Trung Le, Xuan-May Le et al.

CVPR 2024arXiv:2310.00258

citations

#1458

MonoDiff: Monocular 3D Object Detection and Pose Estimation with Diffusion Models

Yasiru Ranasinghe, Deepti Hegde, Vishal M. Patel

CVPR 2024

citations

#1459

From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons

Andrew Szot, Bogdan Mazoure, Omar Attia et al.

CVPR 2025arXiv:2412.08442

citations

#1460

HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative

CONG MA, Qiao Lei, Chengkai Zhu et al.

CVPR 2024arXiv:2403.02640

citations

#1461

Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition

Anqi Zhu, Qiuhong Ke, Mingming Gong et al.

CVPR 2024arXiv:2406.13327

citations

#1462

Model Poisoning Attacks to Federated Learning via Multi-Round Consistency

Yueqi Xie, Minghong Fang, Neil Zhenqiang Gong

CVPR 2025arXiv:2404.15611

citations

#1463

POSTA: A Go-to Framework for Customized Artistic Poster Generation

Haoyu Chen, Xiaojie Xu, Wenbo Li et al.

CVPR 2025arXiv:2503.14908

citations

#1464

Deep Equilibrium Diffusion Restoration with Parallel Sampling

Jiezhang Cao, Yue Shi, Kai Zhang et al.

CVPR 2024arXiv:2311.11600

citations

#1465

Contrastive Learning for DeepFake Classification and Localization via Multi-Label Ranking

Cheng-Yao Hong, Yen-Chi Hsu, Tyng-Luh Liu

CVPR 2024

citations

#1466

EgoGen: An Egocentric Synthetic Data Generator

Gen Li, Kaifeng Zhao, Siwei Zhang et al.

CVPR 2024arXiv:2401.08739

citations

#1467

Diffusion-driven GAN Inversion for Multi-Modal Face Image Generation

Jihyun Kim, Changjae Oh, Hoseok Do et al.

CVPR 2024arXiv:2405.04356

citations

#1468

ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers

Narges Norouzi, Svetlana Orlova, Daan de Geus et al.

CVPR 2024arXiv:2406.09936

citations

#1469

Gaussian Splashing: Unified Particles for Versatile Motion Synthesis and Rendering

Yutao Feng, Xiang Feng, Yintong Shang et al.

CVPR 2025arXiv:2401.15318

citations

#1470

LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation

Linfeng Yuan, Miaojing Shi, Zijie Yue et al.

CVPR 2024arXiv:2306.08736

citations

#1471

AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning

Duojun Huang, Xinyu Xiong, Jie Ma et al.

CVPR 2024arXiv:2406.00480

citations

#1472

MC^2: Multi-concept Guidance for Customized Multi-concept Generation

Jiaxiu Jiang, Yabo Zhang, Kailai Feng et al.

CVPR 2025arXiv:2404.05268

citations

#1473

GAFusion: Adaptive Fusing LiDAR and Camera with Multiple Guidance for 3D Object Detection

Xiaotian Li, Baojie Fan, Jiandong Tian et al.

CVPR 2024arXiv:2411.00340

citations

#1474

Curriculum Direct Preference Optimization for Diffusion and Consistency Models

Florinel Croitoru, Vlad Hondru, Radu Tudor Ionescu et al.

CVPR 2025arXiv:2405.13637

citations

#1475

Training-Free Pretrained Model Merging

Zhengqi Xu, Ke Yuan, Huiqiong Wang et al.

CVPR 2024arXiv:2403.01753

citations

#1476

Doodle Your 3D: From Abstract Freehand Sketches to Precise 3D Shapes

Hmrishav Bandyopadhyay, Subhadeep Koley, Ayan Das et al.

CVPR 2024arXiv:2312.04043

citations

#1477

AnimateAnything: Consistent and Controllable Animation for Video Generation

guojun lei, Chi Wang, Rong Zhang et al.

CVPR 2025arXiv:2411.10836

citations

#1478

Rethinking Multi-view Representation Learning via Distilled Disentangling

Guanzhou Ke, Bo Wang, Xiao-Li Wang et al.

CVPR 2024arXiv:2403.10897

citations

#1479

HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery

Jingtao Li, Yingyi Liu, XINYU WANG et al.

CVPR 2025arXiv:2503.21841

citations

#1480

APISR: Anime Production Inspired Real-World Anime Super-Resolution

Boyang Wang, Fengyu Yang, Xihang Yu et al.

CVPR 2024arXiv:2403.01598

citations

#1481

Spherical Mask: Coarse-to-Fine 3D Point Cloud Instance Segmentation with Spherical Representation

Sangyun Shin, Kaichen Zhou, Madhu Vankadari et al.

CVPR 2024arXiv:2312.11269

citations

#1482

PhysAnimator: Physics-Guided Generative Cartoon Animation

Tianyi Xie, Yiwei Zhao, Ying Jiang et al.

CVPR 2025arXiv:2501.16550

citations

#1483

PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion

Ying-Tian Liu, Yuan-Chen Guo, Guan Luo et al.

CVPR 2024arXiv:2312.09069

citations

#1484

Multi-agent Long-term 3D Human Pose Forecasting via Interaction-aware Trajectory Conditioning

Jaewoo Jeong, Daehee Park, Kuk-Jin Yoon

CVPR 2024highlightarXiv:2404.05218

citations

#1485

Large Language Models are Good Prompt Learners for Low-Shot Image Classification

Zhaoheng Zheng, Jingmin Wei, Xuefeng Hu et al.

CVPR 2024arXiv:2312.04076

citations

#1486

Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models

Gihyun Kwon, Simon Jenni, Ding Li et al.

CVPR 2024arXiv:2404.03913

citations

#1487

Back to 3D: Few-Shot 3D Keypoint Detection with Back-Projected 2D Features

Thomas Wimmer, Peter Wonka, Maks Ovsjanikov

CVPR 2024arXiv:2311.18113

citations

#1488

Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing

Xun Lin, Shuai Wang, RIZHAO CAI et al.

CVPR 2024highlightarXiv:2402.19298

citations

#1489

MeshArt: Generating Articulated Meshes with Structure-Guided Transformers

Daoyi Gao, Mohd Yawar Nihal Siddiqui, Lei Li et al.

CVPR 2025arXiv:2412.11596

citations

#1490

PanoContext-Former: Panoramic Total Scene Understanding with a Transformer

Yuan Dong, Chuan Fang, Liefeng Bo et al.

CVPR 2024arXiv:2305.12497

citations

#1491

Orchestrate Latent Expertise: Advancing Online Continual Learning with Multi-Level Supervision and Reverse Self-Distillation

Hongwei Yan, Liyuan Wang, Kaisheng Ma et al.

CVPR 2024arXiv:2404.00417

citations

#1492

PromptHMR: Promptable Human Mesh Recovery

Yufu Wang, Yu Sun, Priyanka Patel et al.

CVPR 2025arXiv:2504.06397

citations

#1493

Masked and Shuffled Blind Spot Denoising for Real-World Images

Hamadi Chihaoui, Paolo Favaro

CVPR 2024arXiv:2404.09389

citations

#1494

MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation

Zhenyu Wu, Yuheng Zhou, Xiuwei Xu et al.

CVPR 2025arXiv:2503.13446

citations

#1495

Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report Generation

Kang Liu, Zhuoqi Ma, Xiaolu Kang et al.

CVPR 2025arXiv:2502.20056

citations

#1496

KeyPoint Relative Position Encoding for Face Recognition

Minchul Kim, Feng Liu, Yiyang Su et al.

CVPR 2024arXiv:2403.14852

citations

#1497

Bayesian Diffusion Models for 3D Shape Reconstruction

Haiyang Xu, Yu lei, Zeyuan Chen et al.

CVPR 2024arXiv:2403.06973

citations

#1498

PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios

Jingbo Wang, Zhengyi Luo, Ye Yuan et al.

CVPR 2024arXiv:2404.19722

citations

#1499

ModaVerse: Efficiently Transforming Modalities with LLMs

Xinyu Wang, Bohan Zhuang, Qi Wu

CVPR 2024arXiv:2401.06395

citations

#1500

RankED: Addressing Imbalance and Uncertainty in Edge Detection Using Ranking-based Losses

bedrettin cetinkaya, Sinan Kalkan, Emre Akbas

CVPR 2024arXiv:2403.01795

citations

#1501

Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning

Yixiong Zou, Yicong Liu, Yiman Hu et al.

CVPR 2024arXiv:2403.00567

citations

#1502

SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE

YONGWEI CHEN, Yushi Lan, Shangchen Zhou et al.

CVPR 2025arXiv:2411.16856

citations

#1503

Domain Prompt Learning with Quaternion Networks

Qinglong Cao, Zhengqin Xu, Yuntian Chen et al.

CVPR 2024highlightarXiv:2312.08878

citations

#1504

FLAIR: VLM with Fine-grained Language-informed Image Representations

Rui Xiao, Sanghwan Kim, Iuliana Georgescu et al.

CVPR 2025arXiv:2412.03561

citations

#1505

VAREN: Very Accurate and Realistic Equine Network

Silvia Zuffi, Ylva Mellbin, Ci Li et al.

CVPR 2024

citations

#1506

Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content

Zicheng Zhang, Tengchuan Kou, Chunyi Li et al.

CVPR 2025arXiv:2503.02357

citations

#1507

EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams

Christen Millerdurai, Hiroyasu Akada, Jian Wang et al.

CVPR 2024arXiv:2404.08640

citations

#1508

AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction

Lingteng Qiu, Shenhao Zhu, Qi Zuo et al.

CVPR 2025arXiv:2412.02684

citations

#1509

OSV: One Step is Enough for High-Quality Image to Video Generation

Xiaofeng Mao, Zhengkai Jiang, Fu-Yun Wang et al.

CVPR 2025arXiv:2409.11367

citations

#1510

FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis

Wonjoon Jin, Qi Dai, Chong Luo et al.

CVPR 2025arXiv:2502.08244

citations

#1511

Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks

Han Wang, Gang Wang, Huan Zhang

CVPR 2025arXiv:2411.16721

citations

#1512

Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation

Yuhui Zhang, Yuchang Su, Yiming Liu et al.

CVPR 2025arXiv:2501.03225

citations

#1513

Composing Object Relations and Attributes for Image-Text Matching

Khoi Pham, Chuong Huynh, Ser-Nam Lim et al.

CVPR 2024arXiv:2406.11820

citations

#1514

Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation

Yudi Shi, Shangzhe Di, Qirui Chen et al.

CVPR 2025arXiv:2412.01694

citations

#1515

Dynamic LiDAR Re-simulation using Compositional Neural Fields

Hanfeng Wu, Xingxing Zuo, Stefan Leutenegger et al.

CVPR 2024highlightarXiv:2312.05247

citations

#1516

2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification

Jingwei Zhang, Anh Tien Nguyen, Xi Han et al.

CVPR 2025arXiv:2412.00678

citations

#1517

Towards Robust 3D Object Detection with LiDAR and 4D Radar Fusion in Various Weather Conditions

Yujeong Chae, Hyeonseong Kim, Kuk-Jin Yoon

CVPR 2024

citations

#1518

The Manga Whisperer: Automatically Generating Transcriptions for Comics

Ragav Sachdeva, Andrew Zisserman

CVPR 2024arXiv:2401.10224

citations

#1519

CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow

Chenbin Pan, Burhan Yaman, Senem Velipasalar et al.

CVPR 2024arXiv:2403.08919

citations

#1520

Overcoming Generic Knowledge Loss with Selective Parameter Update

Wenxuan Zhang, Paul Janson, Rahaf Aljundi et al.

CVPR 2024arXiv:2308.12462

citations

#1521

UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection

Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.

CVPR 2025arXiv:2412.03342

citations

#1522

FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance Head-pose and Facial Expression Features

Andre Rochow, Max Schwarz, Sven Behnke

CVPR 2024arXiv:2404.09736

citations

#1523

Continual Forgetting for Pre-trained Vision Models

Hongbo Zhao, Bolin Ni, Junsong Fan et al.

CVPR 2024arXiv:2403.11530

citations

#1524

Solving the Catastrophic Forgetting Problem in Generalized Category Discovery

Xinzi Cao, Xiawu Zheng, Guanhong Wang et al.

CVPR 2024arXiv:2501.05272

citations

#1525

Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation

Siteng Huang, Biao Gong, Yutong Feng et al.

CVPR 2024arXiv:2311.15841

citations

#1526

Desigen: A Pipeline for Controllable Design Template Generation

Haohan Weng, Danqing Huang, YU QIAO et al.

CVPR 2024arXiv:2403.09093

citations

#1527

Taming Self-Training for Open-Vocabulary Object Detection

Shiyu Zhao, Samuel Schulter, Long Zhao et al.

CVPR 2024arXiv:2308.06412

citations

#1528

DiSR-NeRF: Diffusion-Guided View-Consistent Super-Resolution NeRF

Jie Long Lee, Chen Li, Gim Hee Lee

CVPR 2024arXiv:2404.00874

citations

#1529

TinyFusion: Diffusion Transformers Learned Shallow

Gongfan Fang, Kunjun Li, Xinyin Ma et al.

CVPR 2025highlightarXiv:2412.01199

citations

#1530

POPDG: Popular 3D Dance Generation with PopDanceSet

Zhenye Luo, Min Ren, Xuecai Hu et al.

CVPR 2024arXiv:2405.03178

citations

#1531

DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaptation by Combining 3D GANs and Diffusion Priors

Biwen Lei, Kai Yu, Mengyang Feng et al.

CVPR 2024arXiv:2312.16837

citations

#1532

DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models

Zhendong Wang, Jianmin Bao, Shuyang Gu et al.

CVPR 2025arXiv:2503.01645

citations

#1533

GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation

WEIMING ZHANG, Yexin Liu, Xu Zheng et al.

CVPR 2024arXiv:2403.16370

citations

#1534

CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology

Yuxuan Sun, Yixuan Si, Chenglu Zhu et al.

CVPR 2025arXiv:2412.12077

citations

#1535

Dual-View Visual Contextualization for Web Navigation

Jihyung Kil, Chan Hee Song, Boyuan Zheng et al.

CVPR 2024arXiv:2402.04476

citations

#1536

PLACE: Adaptive Layout-Semantic Fusion for Semantic Image Synthesis

Zhengyao Lv, Yuxiang Wei, Wangmeng Zuo et al.

CVPR 2024highlightarXiv:2403.01852

citations

#1537

Latent Modulated Function for Computational Optimal Continuous Image Representation

Zongyao He, Zhi Jin

CVPR 2024highlightarXiv:2404.16451

citations

#1538

Language-Driven Anchors for Zero-Shot Adversarial Robustness

Xiao Li, Wei Zhang, Yining Liu et al.

CVPR 2024arXiv:2301.13096

citations

#1539

HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations

Peng Dai, Yang Zhang, Tao Liu et al.

CVPR 2024arXiv:2403.03561

citations

#1540

Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning

Bozhou Zhang, Nan Song, Xin Jin et al.

CVPR 2025arXiv:2503.14182

citations

#1541

Spatio-Temporal Turbulence Mitigation: A Translational Perspective

Xingguang Zhang, Nicholas M Chimitt, Yiheng Chi et al.

CVPR 2024arXiv:2401.04244

citations

#1542

Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval

Young Kyun Jang, Donghyun Kim, Zihang Meng et al.

CVPR 2024arXiv:2404.15516

citations

#1543

Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing

Pengcheng Xu, Boyuan Jiang, Xiaobin Hu et al.

CVPR 2025arXiv:2411.15843

citations

#1544

3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation

Songchun Zhang, Yibo Zhang, Quan Zheng et al.

CVPR 2024arXiv:2403.09439

citations

#1545

Targeted Representation Alignment for Open-World Semi-Supervised Learning

Ruixuan Xiao, Lei Feng, Kai Tang et al.

CVPR 2024

citations

#1546

Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy

Zaijing Li, Yuquan Xie, Rui Shao et al.

CVPR 2025arXiv:2502.19902

citations

#1547

Semantic-aware SAM for Point-Prompted Instance Segmentation

Zhaoyang Wei, Pengfei Chen, Xuehui Yu et al.

CVPR 2024highlightarXiv:2312.15895

citations

#1548

F-LMM: Grounding Frozen Large Multimodal Models

Size Wu, Sheng Jin, Wenwei Zhang et al.

CVPR 2025arXiv:2406.05821

citations

#1549

Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach

Wei Dong, Xing Zhang, Bihui Chen et al.

CVPR 2024arXiv:2403.19067

citations

#1550

LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment

yiming ren, xiao han, Chengfeng Zhao et al.

CVPR 2024highlightarXiv:2402.17171

citations

#1551

Mind Marginal Non-Crack Regions: Clustering-Inspired Representation Learning for Crack Segmentation

zhuangzhuang chen, Zhuonan Lai, Jie Chen et al.

CVPR 2024

citations

#1552

VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams

Liao Wang, Kaixin Yao, Chengcheng Guo et al.

CVPR 2024arXiv:2312.01407

citations

#1553

ASAM: Boosting Segment Anything Model with Adversarial Tuning

Bo Li, Haoke Xiao, Lv Tang

CVPR 2024arXiv:2405.00256

citations

#1554

CSTA: CNN-based Spatiotemporal Attention for Video Summarization

Jaewon Son, Jaehun Park, Kwangsu Kim

CVPR 2024arXiv:2405.11905

citations

#1555

Detecting Out-of-Distribution Through the Lens of Neural Collapse

Litian Liu, Yao Qin

CVPR 2025arXiv:2311.01479

citations

#1556

Rethinking Few-shot 3D Point Cloud Semantic Segmentation

Zhaochong An, Guolei Sun, Yun Liu et al.

CVPR 2024arXiv:2403.00592

citations

#1557

PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization

Zining Chen, Weiqiu Wang, Zhicheng Zhao et al.

CVPR 2024arXiv:2404.09011

citations

#1558

Generative Video Propagation

Shaoteng Liu, Tianyu Wang, Jui-Hsien Wang et al.

CVPR 2025arXiv:2412.19761

citations

#1559

Category-Level Multi-Part Multi-Joint 3D Shape Assembly

Yichen Li, Kaichun Mo, Yueqi Duan et al.

CVPR 2024arXiv:2303.06163

citations

#1560

End-to-End Spatio-Temporal Action Localisation with Video Transformers

Alexey Gritsenko, Xuehan Xiong, Josip Djolonga et al.

CVPR 2024arXiv:2304.12160

citations

#1561

Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning

Desai Xie, Jiahao Li, Hao Tan et al.

CVPR 2024arXiv:2312.13980

citations

#1562

GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation

Lang Lin, Xueyang Yu, Ziqi Pang et al.

CVPR 2025arXiv:2504.07962

citations

#1563

NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors

Yannan He, Garvita Tiwari, Tolga Birdal et al.

CVPR 2024highlightarXiv:2403.03122

citations

#1564

Time- Memory- and Parameter-Efficient Visual Adaptation

Otniel-Bogdan Mercea, Alexey Gritsenko, Cordelia Schmid et al.

CVPR 2024highlightarXiv:2402.02887

citations

#1565

COCONut: Modernizing COCO Segmentation

Xueqing Deng, Qihang Yu, Peng Wang et al.

CVPR 2024arXiv:2404.08639

citations

#1566

BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training

Xuanpu Zhang, Dan Song, pengxin zhan et al.

CVPR 2025arXiv:2408.06047

citations

#1567

Material Anything: Generating Materials for Any 3D Object via Diffusion

Xin Huang, Tengfei Wang, Ziwei Liu et al.

CVPR 2025highlightarXiv:2411.15138

citations

#1568

CrowdDiff: Multi-hypothesis Crowd Density Estimation using Diffusion Models

Yasiru Ranasinghe, Nithin Gopalakrishnan Nair, Wele Gedara Chaminda Bandara et al.

CVPR 2024arXiv:2303.12790

citations

#1569

Retrieval-Augmented Open-Vocabulary Object Detection

Jooyeon Kim, Eulrang Cho, Sehyung Kim et al.

CVPR 2024arXiv:2404.05687

citations

#1570

Utility-Fairness Trade-Offs and How to Find Them

Sepehr Dehdashtian, Bashir Sadeghi, Vishnu Naresh Boddeti

CVPR 2024arXiv:2404.09454

citations

#1571

Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh

Xiangjun Gao, Xiaoyu Li, Yiyu Zhuang et al.

CVPR 2025arXiv:2405.17811

citations

#1572

Language-guided Image Reflection Separation

Haofeng Zhong, Yuchen Hong, Shuchen Weng et al.

CVPR 2024arXiv:2402.11874

citations

#1573

HumanRef: Single Image to 3D Human Generation via Reference-Guided Diffusion

Jingbo Zhang, Xiaoyu Li, Qi Zhang et al.

CVPR 2024arXiv:2311.16961

citations

#1574

VecFusion: Vector Font Generation with Diffusion

Vikas Thamizharasan, Difan Liu, Shantanu Agarwal et al.

CVPR 2024highlightarXiv:2312.10540

citations

#1575

Depth Prompting for Sensor-Agnostic Depth Estimation

Jin-Hwi Park, Chanhwi Jeong, Junoh Lee et al.

CVPR 2024arXiv:2405.11867

citations

#1576

Narrative Action Evaluation with Prompt-Guided Multimodal Interaction

Shiyi Zhang, Sule Bai, Guangyi Chen et al.

CVPR 2024arXiv:2404.14471

citations

#1577

Distraction is All You Need for Multimodal Large Language Model Jailbreaking

Zuopeng Yang, Jiluan Fan, Anli Yan et al.

CVPR 2025highlightarXiv:2502.10794

citations

#1578

Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation

Razvan Pasca, Alexey Gavryushin, Muhammad Hamza et al.

CVPR 2024arXiv:2301.09209

citations

#1579

Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

Ke Fan, Zechen Bai, Tianjun Xiao et al.

CVPR 2024arXiv:2406.09196

citations

#1580

One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion

Chunyang Cheng, Tianyang Xu, Zhenhua Feng et al.

CVPR 2025arXiv:2502.19854

citations

#1581

Matrix3D: Large Photogrammetry Model All-in-One

Yuanxun Lu, Jingyang Zhang, Tian Fang et al.

CVPR 2025highlightarXiv:2502.07685

citations

#1582

TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation

Hongxiang Zhao, Xingchen Liu, Mutian Xu et al.

CVPR 2025arXiv:2503.11423

citations

#1583

Adv-CPG: A Customized Portrait Generation Framework with Facial Adversarial Attacks

Junying Wang, Hongyuan Zhang, Yuan Yuan

CVPR 2025arXiv:2503.08269

citations

#1584

T-VSL: Text-Guided Visual Sound Source Localization in Mixtures

Tanvir Mahmud, Yapeng Tian, Diana Marculescu

CVPR 2024arXiv:2404.01751

citations

#1585

Implicit Event-RGBD Neural SLAM

Delin Qu, Chi Yan, Dong Wang et al.

CVPR 2024highlightarXiv:2311.11013

citations

#1586

Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension

Quan Liu, Hongzi Zhu, Zhenxi Wang et al.

CVPR 2024arXiv:2403.03532

citations

#1587

Mind the Time: Temporally-Controlled Multi-Event Video Generation

Ziyi Wu, Aliaksandr Siarohin, Willi Menapace et al.

CVPR 2025arXiv:2412.05263

citations

#1588

Community Forensics: Using Thousands of Generators to Train Fake Image Detectors

Jeongsoo Park, Andrew Owens

CVPR 2025arXiv:2411.04125

citations

#1589

Active Prompt Learning in Vision Language Models

Jihwan Bang, Sumyeong Ahn, Jae-Gil Lee

CVPR 2024arXiv:2311.11178

citations

#1590

LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model

Chenjie Cao, Yunuo Cai, Qiaole Dong et al.

CVPR 2024arXiv:2305.11577

citations

#1591

MonoHair: High-Fidelity Hair Modeling from a Monocular Video

Keyu Wu, LINGCHEN YANG, Zhiyi Kuang et al.

CVPR 2024arXiv:2403.18356

citations

#1592

Any-Resolution AI-Generated Image Detection by Spectral Learning

Dimitrios Karageorgiou, Symeon Papadopoulos, Ioannis Kompatsiaris et al.

CVPR 2025arXiv:2411.19417

citations

#1593

TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models

Haomiao Ni, Bernhard Egger, Suhas Lohit et al.

CVPR 2024arXiv:2404.16306

citations

#1594

Neural Spline Fields for Burst Image Fusion and Layer Separation

Ilya Chugunov, David Shustin, Ruyu Yan et al.

CVPR 2024arXiv:2312.14235

citations

#1595

An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning

Jianqing Zhang, Yang Liu, Yang Hua et al.

CVPR 2024arXiv:2403.15760

citations

#1596

ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and Self-Prompting

Yankai Jiang, Zhongzhen Huang, Rongzhao Zhang et al.

CVPR 2024arXiv:2312.04964

citations

#1597

Real-time 3D-aware Portrait Video Relighting

Ziqi Cai, Kaiwen Jiang, Shu-Yu Chen et al.

CVPR 2024highlightarXiv:2410.18355

citations

#1598

DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer

Wei-Ting Chen, Gurunandan Krishnan, Qiang Gao et al.

CVPR 2024arXiv:2406.09622

citations

#1599

Collaborative Learning of Anomalies with Privacy (CLAP) for Unsupervised Video Anomaly Detection: A New Baseline

Anas Al-lahham, Muhammad Zaigham Zaheer, Nurbek Tastan et al.

CVPR 2024arXiv:2404.00847

citations

#1600

Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach

Mir Rayat Imtiaz Hossain, Mennatullah Siam, Leonid Sigal et al.

CVPR 2024arXiv:2404.11732

citations

← Previous

1...6 7 8 9 10...28