Most Cited CVPR "3d pose synthesis" Papers

5,589 papers found • Page 4 of 28

#601

Wonderland: Navigating 3D Scenes from a Single Image

Hanwen Liang, Junli Cao, Vidit Goel et al.

CVPR 2025arXiv:2412.12091
55
citations
#602

EGTR: Extracting Graph from Transformer for Scene Graph Generation

Jinbae Im, JeongYeon Nam, Nokyung Park et al.

CVPR 2024arXiv:2404.02072
55
citations
#603

RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation

Peng Lu, Tao Jiang, Yining Li et al.

CVPR 2024arXiv:2312.07526
55
citations
#604

EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

Rang Meng, Xingyu Zhang, Yuming Li et al.

CVPR 2025arXiv:2411.10061
55
citations
#605

Accelerating Diffusion Sampling with Optimized Time Steps

Shuchen Xue, Zhaoqiang Liu, Fei Chen et al.

CVPR 2024arXiv:2402.17376
55
citations
#606

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Orr Zohar, Xiaohan Wang, Yann Dubois et al.

CVPR 2025arXiv:2412.10360
55
citations
#607

GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving

Zebin Xing, Xingyu Zhang, Yang Hu et al.

CVPR 2025arXiv:2503.05689
55
citations
#608

Multiple Object Tracking as ID Prediction

Ruopeng Gao, Ji Qi, Limin Wang

CVPR 2025arXiv:2403.16848
55
citations
#609

FakeInversion: Learning to Detect Images from Unseen Text-to-Image Models by Inverting Stable Diffusion

George Cazenavette, Avneesh Sud, Thomas Leung et al.

CVPR 2024arXiv:2406.08603
55
citations
#610

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

Xiang Wang, Shiwei Zhang, Hangjie Yuan et al.

CVPR 2024arXiv:2312.15770
55
citations
#611

Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation

Zihan Wang, Xiangyang Li, Jiahao Yang et al.

CVPR 2024highlightarXiv:2404.01943
54
citations
#612

Relightful Harmonization: Lighting-aware Portrait Background Replacement

Mengwei Ren, Wei Xiong, Jae Shin Yoon et al.

CVPR 2024arXiv:2312.06886
54
citations
#613

Towards Text-guided 3D Scene Composition

Qihang Zhang, Chaoyang Wang, Aliaksandr Siarohin et al.

CVPR 2024arXiv:2312.08885
54
citations
#614

Open-Vocabulary Segmentation with Semantic-Assisted Calibration

Yong Liu, Sule Bai, Guanbin Li et al.

CVPR 2024arXiv:2312.04089
54
citations
#615

Shadows Don't Lie and Lines Can't Bend! Generative Models don't know Projective Geometry...for now

Ayush Sarkar, Hanlin Mai, Amitabh Mahapatra et al.

CVPR 2024arXiv:2311.17138
54
citations
#616

SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation

Yamei Chen, Yan Di, Guangyao Zhai et al.

CVPR 2024arXiv:2311.11125
54
citations
#617

Visual In-Context Prompting

Feng Li, Qing Jiang, Hao Zhang et al.

CVPR 2024arXiv:2311.13601
54
citations
#618

Goku: Flow Based Video Generative Foundation Models

Shoufa Chen, Chongjian GE, Yuqi Zhang et al.

CVPR 2025highlightarXiv:2502.04896
54
citations
#619

GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh

Jing Wen, Xiaoming Zhao, Jason Ren et al.

CVPR 2024arXiv:2404.07991
54
citations
#620

A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness Constraint

Xiaofeng Cong, Jie Gui, Jing Zhang et al.

CVPR 2024arXiv:2403.18548
54
citations
#621

Dense Optical Tracking: Connecting the Dots

Guillaume Le Moing, Jean Ponce, Cordelia Schmid

CVPR 2024highlightarXiv:2312.00786
54
citations
#622

SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World

Kiana Ehsani, Tanmay Gupta, Rose Hendrix et al.

CVPR 2024arXiv:2312.02976
54
citations
#623

End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames

Shuming Liu, Chenlin Zhang, Chen Zhao et al.

CVPR 2024arXiv:2311.17241
54
citations
#624

Domain-Agnostic Mutual Prompting for Unsupervised Domain Adaptation

Zhekai Du, Xinyao Li, Fengling Li et al.

CVPR 2024arXiv:2403.02899
53
citations
#625

Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion

Yuanxun Lu, Jingyang Zhang, Shiwei Li et al.

CVPR 2024arXiv:2311.15980
53
citations
#626

MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling

Yifang Men, Yuan Yao, Miaomiao Cui et al.

CVPR 2025arXiv:2409.16160
53
citations
#627

WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models

Changhoon Kim, Kyle Min, Maitreya Patel et al.

CVPR 2024arXiv:2306.04744
53
citations
#628

MultiDiff: Consistent Novel View Synthesis from a Single Image

Norman Müller, Katja Schwarz, Barbara Roessle et al.

CVPR 2024arXiv:2406.18524
53
citations
#629

Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning

Zichen Miao, Jiang Wang, Ze Wang et al.

CVPR 2024
53
citations
#630

Arbitrary-steps Image Super-resolution via Diffusion Inversion

Zongsheng Yue, Kang Liao, Chen Change Loy

CVPR 2025arXiv:2412.09013
53
citations
#631

Few-Shot Object Detection with Foundation Models

Guangxing Han, Ser-Nam Lim

CVPR 2024
53
citations
#632

RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models

Greg Heinrich, Mike Ranzinger, Danny Yin et al.

CVPR 2025arXiv:2412.07679
53
citations
#633

Text-Image Alignment for Diffusion-Based Perception

Neehar Kondapaneni, Markus Marks, Manuel Knott et al.

CVPR 2024arXiv:2310.00031
53
citations
#634

PointOBB: Learning Oriented Object Detection via Single Point Supervision

Junwei Luo, Xue Yang, Yi Yu et al.

CVPR 2024arXiv:2311.14757
53
citations
#635

FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models

Shivangi Aneja, Justus Thies, Angela Dai et al.

CVPR 2024arXiv:2312.08459
53
citations
#636

ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation

Suraj Patni, Aradhye Agarwal, Chetan Arora

CVPR 2024arXiv:2403.18807
53
citations
#637

FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition

Ganggui Ding, Canyu Zhao, Wen Wang et al.

CVPR 2024arXiv:2405.13870
53
citations
#638

LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant

Yikun Liu, Yajie Zhang, jiayin cai et al.

CVPR 2025arXiv:2412.01720
53
citations
#639

Towards Scalable 3D Anomaly Detection and Localization: A Benchmark via 3D Anomaly Synthesis and A Self-Supervised Learning Network

wenqiao Li, Xiaohao Xu, Yao Gu et al.

CVPR 2024arXiv:2311.14897
53
citations
#640

Consistent Prompting for Rehearsal-Free Continual Learning

Zhanxin Gao, Jun Cen, Xiaobin Chang

CVPR 2024arXiv:2403.08568
53
citations
#641

DiffusionLight: Light Probes for Free by Painting a Chrome Ball

Pakkapon Phongthawee, Worameth Chinchuthakun, Nontaphat Sinsunthithet et al.

CVPR 2024arXiv:2312.09168
52
citations
#642

Enhancing Multimodal Cooperation via Sample-level Modality Valuation

Yake Wei, Ruoxuan Feng, Zihe Wang et al.

CVPR 2024arXiv:2309.06255
52
citations
#643

Describing Differences in Image Sets with Natural Language

Lisa Dunlap, Yuhui Zhang, Xiaohan Wang et al.

CVPR 2024arXiv:2312.02974
52
citations
#644

Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation

Haofeng Liu, Chenshu Xu, Yifei Yang et al.

CVPR 2024arXiv:2404.01050
52
citations
#645

Dual Diffusion for Unified Image Generation and Understanding

Zijie Li, Henry Li, Yichun Shi et al.

CVPR 2025arXiv:2501.00289
52
citations
#646

Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior

Zike Wu, Pan Zhou, YI Xuanyu et al.

CVPR 2024arXiv:2401.09050
52
citations
#647

MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World

Yining Hong, Zishuo Zheng, Peihao Chen et al.

CVPR 2024arXiv:2401.08577
52
citations
#648

SubT-MRS Dataset: Pushing SLAM Towards All-weather Environments

Shibo Zhao, Yuanjun Gao, Tianhao Wu et al.

CVPR 2024arXiv:2307.07607
52
citations
#649

CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing

Ajian Liu, Shuai Xue, Gan Jianwen et al.

CVPR 2024highlightarXiv:2403.14333
52
citations
#650

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs

shiyu xuan, Qingpei Guo, Ming Yang et al.

CVPR 2024arXiv:2310.00582
52
citations
#651

AvatarGPT: All-in-One Framework for Motion Understanding Planning Generation and Beyond

Zixiang Zhou, Yu Wan, Baoyuan Wang

CVPR 2024
52
citations
#652

BigGait: Learning Gait Representation You Want by Large Vision Models

Dingqiang Ye, Chao Fan, Jingzhe Ma et al.

CVPR 2024arXiv:2402.19122
52
citations
#653

ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation

Dar-Yen Chen, Hamish Tennent, Ching-Wen Hsu

CVPR 2024arXiv:2312.02109
52
citations
#654

SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation

Aysim Toker, Marvin Eisenberger, Daniel Cremers et al.

CVPR 2024arXiv:2403.16605
52
citations
#655

Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving

Junhao Zheng, Chenhao Lin, Jiahao Sun et al.

CVPR 2024arXiv:2403.17301
52
citations
#656

PIE-NeRF: Physics-based Interactive Elastodynamics with NeRF

Yutao Feng, Yintong Shang, Xuan Li et al.

CVPR 2024arXiv:2311.13099
51
citations
#657

Language-driven All-in-one Adverse Weather Removal

Hao Yang, Liyuan Pan, Yan Yang et al.

CVPR 2024arXiv:2312.01381
51
citations
#658

Tri-Perspective View Decomposition for Geometry-Aware Depth Completion

Zhiqiang Yan, Yuankai Lin, Kun Wang et al.

CVPR 2024arXiv:2403.15008
51
citations
#659

Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects

Yijia Weng, Bowen Wen, Jonathan Tremblay et al.

CVPR 2024arXiv:2404.01440
51
citations
#660

PIGEON: Predicting Image Geolocations

Lukas Haas, Michal Skreta, Silas Alberti et al.

CVPR 2024highlightarXiv:2307.05845
51
citations
#661

Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models

Yabin Zhang, Wenjie Zhu, Hui Tang et al.

CVPR 2024arXiv:2403.17589
51
citations
#662

Jack of All Tasks Master of Many: Designing General-Purpose Coarse-to-Fine Vision-Language Model

Shraman Pramanick, Guangxing Han, Rui Hou et al.

CVPR 2024highlightarXiv:2312.12423
51
citations
#663

JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation

Yu Zeng, Vishal M. Patel, Haochen Wang et al.

CVPR 2024arXiv:2407.06187
51
citations
#664

ViTamin: Designing Scalable Vision Models in the Vision-Language Era

Jieneng Chen, Qihang Yu, Xiaohui Shen et al.

CVPR 2024arXiv:2404.02132
51
citations
#665

Retrieval-Augmented Egocentric Video Captioning

Jilan Xu, Yifei Huang, Junlin Hou et al.

CVPR 2024arXiv:2401.00789
51
citations
#666

UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity

Jialong Zuo, Hanyu Zhou, Ying Nie et al.

CVPR 2024arXiv:2312.03441
51
citations
#667

MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures

Zhangyang Xiong, Chenghong Li, Kenkun Liu et al.

CVPR 2024arXiv:2312.02963
51
citations
#668

Sonic: Shifting Focus to Global Audio Perception in Portrait Animation

Xiaozhong Ji, Xiaobin Hu, Zhihong Xu et al.

CVPR 2025arXiv:2411.16331
51
citations
#669

SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment

Katrin Renz, Long Chen, Elahe Arani et al.

CVPR 2025highlightarXiv:2503.09594
51
citations
#670

FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation

Shuai Yang, Yifan Zhou, Ziwei Liu et al.

CVPR 2024arXiv:2403.12962
51
citations
#671

ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models

Xinyu Tian, Shu Zou, Zhaoyuan Yang et al.

CVPR 2024arXiv:2311.16494
51
citations
#672

CONFORM: Contrast is All You Need for High-Fidelity Text-to-Image Diffusion Models

Tuna Han Salih Meral, Enis Simsar, Federico Tombari et al.

CVPR 2024arXiv:2312.06059
51
citations
#673

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

Jiayi Guo, Xingqian Xu, Yifan Pu et al.

CVPR 2024arXiv:2312.04410
51
citations
#674

On the Test-Time Zero-Shot Generalization of Vision-Language Models: Do We Really Need Prompt Learning?

Maxime Zanella, Ismail Ben Ayed

CVPR 2024arXiv:2405.02266
51
citations
#675

DAP: A Dynamic Adversarial Patch for Evading Person Detectors

Amira Guesmi, Ruitian Ding, Muhammad Abdullah Hanif et al.

CVPR 2024arXiv:2305.11618
50
citations
#676

Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications

Junyi Ma, Xieyuanli Chen, Jiawei Huang et al.

CVPR 2024arXiv:2311.17663
50
citations
#677

SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection

JUNSU KIM, Hoseong Cho, Jihyeon Kim et al.

CVPR 2024highlightarXiv:2402.17323
50
citations
#678

Neural Markov Random Field for Stereo Matching

Tongfan Guan, Chen Wang, Yun-Hui Liu

CVPR 2024arXiv:2403.11193
50
citations
#679

Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks

Yuhao Liu, Zhanghan Ke, Fang Liu et al.

CVPR 2024arXiv:2403.00644
50
citations
#680

EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models

Sijie Cheng, Zhicheng Guo, Jingwen Wu et al.

CVPR 2024highlightarXiv:2311.15596
50
citations
#681

MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation

Mi Yan, Jiazhao Zhang, Yan Zhu et al.

CVPR 2024arXiv:2401.07745
50
citations
#682

Matching Anything by Segmenting Anything

Siyuan Li, Lei Ke, Martin Danelljan et al.

CVPR 2024highlightarXiv:2406.04221
50
citations
#683

One-Shot Open Affordance Learning with Foundation Models

Gen Li, Deqing Sun, Laura Sevilla-Lara et al.

CVPR 2024arXiv:2311.17776
50
citations
#684

Towards Memorization-Free Diffusion Models

Chen Chen, Daochang Liu, Chang Xu

CVPR 2024arXiv:2404.00922
50
citations
#685

Correlation-aware Coarse-to-fine MLPs for Deformable Medical Image Registration

Mingyuan Meng, Dagan Feng, Lei Bi et al.

CVPR 2024arXiv:2406.00123
50
citations
#686

MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning

Chaoyi Zhang, Kevin Lin, Zhengyuan Yang et al.

CVPR 2024highlightarXiv:2311.17435
50
citations
#687

Discovering and Mitigating Visual Biases through Keyword Explanation

Younghyun Kim, Sangwoo Mo, Minkyu Kim et al.

CVPR 2024highlightarXiv:2301.11104
50
citations
#688

MatSynth: A Modern PBR Materials Dataset

Giuseppe Vecchio, Valentin Deschaintre

CVPR 2024arXiv:2401.06056
50
citations
#689

MatFuse: Controllable Material Generation with Diffusion Models

Giuseppe Vecchio, Renato Sortino, Simone Palazzo et al.

CVPR 2024arXiv:2308.11408
50
citations
#690

LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models

Fan-Yun Sun, Weiyu Liu, Siyi Gu et al.

CVPR 2025arXiv:2412.02193
50
citations
#691

TextureDreamer: Image-Guided Texture Synthesis Through Geometry-Aware Diffusion

Yu-Ying Yeh, Jia-Bin Huang, Changil Kim et al.

CVPR 2024arXiv:2401.09416
50
citations
#692

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

Haotong Lin, Sida Peng, Jingxiao Chen et al.

CVPR 2025arXiv:2412.14015
49
citations
#693

Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval

Yucheng Suo, Fan Ma, Linchao Zhu et al.

CVPR 2024arXiv:2403.16005
49
citations
#694

Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders

Rui Chen, Jianfeng Zhang, Yixun Liang et al.

CVPR 2025arXiv:2412.17808
49
citations
#695

Communication-Efficient Collaborative Perception via Information Filling with Codebook

Yue Hu, Juntong Peng, Sifei Liu et al.

CVPR 2024arXiv:2405.04966
49
citations
#696

Accurate Spatial Gene Expression Prediction by Integrating Multi-Resolution Features

Youngmin Chung, Ji Hun Ha, Kyeong Chan Im et al.

CVPR 2024arXiv:2403.07592
49
citations
#697

MS-DETR: Efficient DETR Training with Mixed Supervision

Chuyang Zhao, Yifan Sun, Wenhao Wang et al.

CVPR 2024arXiv:2401.03989
49
citations
#698

Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

Pingping Zhang, Yuhao Wang, Yang Liu et al.

CVPR 2024arXiv:2403.10254
49
citations
#699

Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach

Lingchen Sun, Rongyuan Wu, Zhiyuan Ma et al.

CVPR 2025arXiv:2412.03017
49
citations
#700

VCoder: Versatile Vision Encoders for Multimodal Large Language Models

Jitesh Jain, Jianwei Yang, Humphrey Shi

CVPR 2024arXiv:2312.14233
49
citations
#701

Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis

Xin Zhou, Dingkang Liang, Wei Xu et al.

CVPR 2024arXiv:2403.01439
49
citations
#702

Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing

Hyelin Nam, Gihyun Kwon, Geon Yeong Park et al.

CVPR 2024arXiv:2311.18608
49
citations
#703

Generative Latent Coding for Ultra-Low Bitrate Image Compression

Zhaoyang Jia, Jiahao Li, Bin Li et al.

CVPR 2024arXiv:2512.20194
49
citations
#704

You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale

Baorui Ma, Huachen Gao, Haoge Deng et al.

CVPR 2025highlightarXiv:2412.06699
49
citations
#705

OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers

Han Liang, Jiacheng Bao, Ruichi Zhang et al.

CVPR 2024arXiv:2312.08985
48
citations
#706

LightIt: Illumination Modeling and Control for Diffusion Models

Peter Kocsis, Kalyan Sunkavalli, Julien Philip et al.

CVPR 2024arXiv:2403.10615
48
citations
#707

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Kai Chen, Yunhao Gou, Runhui Huang et al.

CVPR 2025arXiv:2409.18042
48
citations
#708

Leveraging Vision-Language Models for Improving Domain Generalization in Image Classification

Sravanti Addepalli, Ashish Asokan, Lakshay Sharma et al.

CVPR 2024arXiv:2310.08255
48
citations
#709

Generating Human Motion in 3D Scenes from Text Descriptions

Zhi Cen, Huaijin Pi, Sida Peng et al.

CVPR 2024arXiv:2405.07784
48
citations
#710

Pre-trained Vision and Language Transformers Are Few-Shot Incremental Learners

Keon Hee Park, Kyungwoo Song, Gyeong-Moon Park

CVPR 2024arXiv:2404.02117
48
citations
#711

Scaling Mesh Generation via Compressive Tokenization

Haohan Weng, Zibo Zhao, Biwen Lei et al.

CVPR 2025arXiv:2411.07025
48
citations
#712

Mosaic-SDF for 3D Generative Models

Lior Yariv, Omri Puny, Oran Gafni et al.

CVPR 2024arXiv:2312.09222
48
citations
#713

Grounded Question-Answering in Long Egocentric Videos

Shangzhe Di, Weidi Xie

CVPR 2024arXiv:2312.06505
48
citations
#714

MeaCap: Memory-Augmented Zero-shot Image Captioning

Zequn Zeng, Yan Xie, Hao Zhang et al.

CVPR 2024arXiv:2403.03715
48
citations
#715

Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation

Song Wang, Jiawei Yu, Wentong Li et al.

CVPR 2024arXiv:2404.11958
48
citations
#716

SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples

Phillip Howard, Avinash Madasu, Tiep Le et al.

CVPR 2024arXiv:2312.00825
48
citations
#717

DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection

Lewei Yao, Renjie Pi, Jianhua Han et al.

CVPR 2024arXiv:2404.09216
48
citations
#718

Breathing Life Into Sketches Using Text-to-Video Priors

Rinon Gal, Yael Vinker, Yuval Alaluf et al.

CVPR 2024highlightarXiv:2311.13608
48
citations
#719

PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation

Zhenyu Li, Shariq Bhat, Peter Wonka

CVPR 2024arXiv:2312.02284
47
citations
#720

Polos: Multimodal Metric Learning from Human Feedback for Image Captioning

Yuiga Wada, Kanta Kaneda, Daichi Saito et al.

CVPR 2024highlightarXiv:2402.18091
47
citations
#721

Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness

Sibo Wang, Jie Zhang, Zheng Yuan et al.

CVPR 2024arXiv:2401.04350
47
citations
#722

Boosting Diffusion Models with Moving Average Sampling in Frequency Domain

Yurui Qian, Qi Cai, Yingwei Pan et al.

CVPR 2024arXiv:2403.17870
47
citations
#723

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Linke Ouyang, Yuan Qu, Hongbin Zhou et al.

CVPR 2025arXiv:2412.07626
47
citations
#724

OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints

Mingjie Pan, Jiyao Zhang, Tianshu Wu et al.

CVPR 2025highlightarXiv:2501.03841
47
citations
#725

One-Prompt to Segment All Medical Images

Wu, Min Xu

CVPR 2024arXiv:2305.10300
47
citations
#726

Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization

Siyan Dong, Shuzhe Wang, Shaohui Liu et al.

CVPR 2025arXiv:2412.08376
47
citations
#727

Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions

Weizhen He, Yiheng Deng, SHIXIANG TANG et al.

CVPR 2024arXiv:2306.07520
47
citations
#728

S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data

Xuyang Li, Danfeng Hong, Jocelyn Chanussot

CVPR 2024
47
citations
#729

HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction

Yi ZHOU, Hui Zhang, Jiaqian Yu et al.

CVPR 2024arXiv:2403.08639
47
citations
#730

MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer

Jianjian Cao, Peng Ye, Shengze Li et al.

CVPR 2024arXiv:2403.02991
47
citations
#731

CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update

Zhi Gao, Yuntao Du., Xintong Zhang et al.

CVPR 2024arXiv:2312.10908
47
citations
#732

Digital Life Project: Autonomous 3D Characters with Social Intelligence

Zhongang Cai, Jianping Jiang, Zhongfei Qing et al.

CVPR 2024arXiv:2312.04547
47
citations
#733

Multimodal Industrial Anomaly Detection by Crossmodal Feature Mapping

Alex Costanzino, Pierluigi Zama Ramirez, Giuseppe Lisanti et al.

CVPR 2024arXiv:2312.04521
46
citations
#734

Abductive Ego-View Accident Video Understanding for Safe Driving Perception

Jianwu Fang, Lei-lei Li, Junfei Zhou et al.

CVPR 2024highlightarXiv:2403.00436
46
citations
#735

DAVE - A Detect-and-Verify Paradigm for Low-Shot Counting

Jer Pelhan, Alan Lukezic, Vitjan Zavrtanik et al.

CVPR 2024arXiv:2404.16622
46
citations
#736

GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction

Sicheng Zuo, Wenzhao Zheng, Yuanhui Huang et al.

CVPR 2025arXiv:2412.10373
46
citations
#737

VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis

Enric Corona, Andrei Zanfir, Eduard Gabriel Bazavan et al.

CVPR 2025arXiv:2403.08764
46
citations
#738

DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment

Dahyun Kang, Piotr Bojanowski, Huy V. Vo et al.

CVPR 2025arXiv:2412.16334
46
citations
#739

Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs

Hao Fei, Shengqiong Wu, Wei Ji et al.

CVPR 2024arXiv:2308.13812
46
citations
#740

SocialCircle: Learning the Angle-based Social Interaction Representation for Pedestrian Trajectory Prediction

Conghao Wong, Beihao Xia, Ziqian Zou et al.

CVPR 2024arXiv:2310.05370
46
citations
#741

Universal Actions for Enhanced Embodied Foundation Models

Jinliang Zheng, Jianxiong Li, Dongxiu Liu et al.

CVPR 2025arXiv:2501.10105
46
citations
#742

MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction

Xiaolu Liu, Song Wang, Wentong Li et al.

CVPR 2024arXiv:2404.00876
46
citations
#743

LEAD: Learning Decomposition for Source-free Universal Domain Adaptation

Sanqing Qu, Tianpei Zou, Lianghua He et al.

CVPR 2024arXiv:2403.03421
46
citations
#744

LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry

Weirong Chen, Le Chen, Rui Wang et al.

CVPR 2024arXiv:2401.01887
46
citations
#745

Point Segment and Count: A Generalized Framework for Object Counting

Zhizhong Huang, Mingliang Dai, Yi Zhang et al.

CVPR 2024arXiv:2311.12386
46
citations
#746

Global and Local Prompts Cooperation via Optimal Transport for Federated Learning

Hongxia Li, Wei Huang, Jingya Wang et al.

CVPR 2024arXiv:2403.00041
46
citations
#747

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation

Minghong Cai, Xiaodong Cun, Xiaoyu Li et al.

CVPR 2025arXiv:2412.18597
46
citations
#748

RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features

Geonho Bang, Kwangjin Choi, Jisong Kim et al.

CVPR 2024arXiv:2403.05061
46
citations
#749

Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation

Haojie Zhang, Yongyi Su, Xun Xu et al.

CVPR 2024arXiv:2312.03502
46
citations
#750

M-LLM Based Video Frame Selection for Efficient Video Understanding

Kai Hu, Feng Gao, Xiaohan Nie et al.

CVPR 2025arXiv:2502.19680
46
citations
#751

Improving Image Restoration through Removing Degradations in Textual Representations

Jingbo Lin, Zhilu Zhang, Yuxiang Wei et al.

CVPR 2024arXiv:2312.17334
46
citations
#752

Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation

Zhipeng Du, Miaojing Shi, Jiankang Deng

CVPR 2024arXiv:2312.01220
46
citations
#753

Gradient-based Parameter Selection for Efficient Fine-Tuning

Zhi Zhang, Qizhe Zhang, Zijun Gao et al.

CVPR 2024arXiv:2312.10136
45
citations
#754

Neural Sign Actors: A Diffusion Model for 3D Sign Language Production from Text

Vasileios Baltatzis, Rolandos Alexandros Potamias, Evangelos Ververas et al.

CVPR 2024arXiv:2312.02702
45
citations
#755

Towards Surveillance Video-and-Language Understanding: New Dataset Baselines and Challenges

Tongtong Yuan, Xuange Zhang, Kun Liu et al.

CVPR 2024arXiv:2309.13925
45
citations
#756

Learning Transferable Negative Prompts for Out-of-Distribution Detection

Tianqi Li, Guansong Pang, wenjun miao et al.

CVPR 2024arXiv:2404.03248
45
citations
#757

Towards Practical Real-Time Neural Video Compression

Zhaoyang Jia, Bin Li, Jiahao Li et al.

CVPR 2025arXiv:2502.20762
45
citations
#758

One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models

Lin Li, Haoyan Guan, Jianing Qiu et al.

CVPR 2024arXiv:2403.01849
45
citations
#759

SingularTrajectory: Universal Trajectory Predictor Using Diffusion Model

Inhwan Bae, Young-Jae Park, Hae-Gon Jeon

CVPR 2024arXiv:2403.18452
45
citations
#760

Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring

Xin Gao, Tianheng Qiu, Xinyu Zhang et al.

CVPR 2024arXiv:2401.00027
45
citations
#761

Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles

Rui Song, Chenwei Liang, Hu Cao et al.

CVPR 2024arXiv:2402.07635
45
citations
#762

EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension

Jiaxuan Li, Duc Minh Vo, Akihiro Sugimoto et al.

CVPR 2024arXiv:2311.15879
45
citations
#763

ReGenNet: Towards Human Action-Reaction Synthesis

Liang Xu, Yizhou Zhou, Yichao Yan et al.

CVPR 2024arXiv:2403.11882
45
citations
#764

Vision-and-Language Navigation via Causal Learning

Liuyi Wang, Zongtao He, Ronghao Dang et al.

CVPR 2024arXiv:2404.10241
45
citations
#765

Bridging Remote Sensors with Multisensor Geospatial Foundation Models

Boran Han, Shuai Zhang, Xingjian Shi et al.

CVPR 2024arXiv:2404.01260
45
citations
#766

AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation

Haonan Wang, Qixiang ZHANG, Yi Li et al.

CVPR 2024arXiv:2403.01818
45
citations
#767

CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided Attention

Mohammad Sadil Khan, Elona Dupont, Sk Aziz Ali et al.

CVPR 2024highlightarXiv:2402.17678
45
citations
#768

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models

Rongjie Li, Songyang Zhang, Dahua Lin et al.

CVPR 2024arXiv:2404.00906
45
citations
#769

Exploiting Diffusion Prior for Generalizable Dense Prediction

Hsin-Ying Lee, Hung-Yu Tseng, Hsin-Ying Lee et al.

CVPR 2024arXiv:2311.18832
45
citations
#770

Amodal Ground Truth and Completion in the Wild

Guanqi Zhan, Chuanxia Zheng, Weidi Xie et al.

CVPR 2024arXiv:2312.17247
45
citations
#771

MobileMamba: Lightweight Multi-Receptive Visual Mamba Network

Haoyang He, Jiangning Zhang, Yuxuan Cai et al.

CVPR 2025arXiv:2411.15941
45
citations
#772

Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision

Yi Yu, Xue Yang, Qingyun Li et al.

CVPR 2024arXiv:2311.14758
45
citations
#773

Multimodal Prompt Perceiver: Empower Adaptiveness Generalizability and Fidelity for All-in-One Image Restoration

Yuang Ai, Huaibo Huang, Xiaoqiang Zhou et al.

CVPR 2024arXiv:2312.02918
45
citations
#774

Don't Shake the Wheel: Momentum-Aware Planning in End-to-End Autonomous Driving

Ziying Song, Caiyan Jia, Lin Liu et al.

CVPR 2025arXiv:2503.03125
44
citations
#775

Posterior Distillation Sampling

Juil Koo, Chanho Park, Minhyuk Sung

CVPR 2024arXiv:2311.13831
44
citations
#776

KVQ: Kwai Video Quality Assessment for Short-form Videos

Yiting Lu, Xin Li, Yajing Pei et al.

CVPR 2024arXiv:2402.07220
44
citations
#777

Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation

Daichi Horita, Naoto Inoue, Kotaro Kikuchi et al.

CVPR 2024arXiv:2311.13602
44
citations
#778

BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection

Zhenxin Li, Shiyi Lan, Jose M. Alvarez et al.

CVPR 2024arXiv:2312.01696
44
citations
#779

InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions

Sirui Xu, Hung Yu Ling, Yu-Xiong Wang et al.

CVPR 2025highlightarXiv:2502.20390
44
citations
#780

AA-CLIP: Enhancing Zero-Shot Anomaly Detection via Anomaly-Aware CLIP

wenxin ma, Xu Zhang, Qingsong Yao et al.

CVPR 2025arXiv:2503.06661
44
citations
#781

A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation

Qucheng Peng, Ce Zheng, Chen Chen

CVPR 2024arXiv:2403.11310
44
citations
#782

Sonata: Self-Supervised Learning of Reliable Point Representations

Xiaoyang Wu, Daniel DeTone, Duncan Frost et al.

CVPR 2025highlightarXiv:2503.16429
44
citations
#783

4D-DRESS: A 4D Dataset of Real-World Human Clothing With Semantic Annotations

Wenbo Wang, Hsuan-I Ho, Chen Guo et al.

CVPR 2024highlightarXiv:2404.18630
44
citations
#784

LLMs are Good Action Recognizers

Haoxuan Qu, Yujun Cai, Jun Liu

CVPR 2024arXiv:2404.00532
44
citations
#785

Uncovering What Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly

Hang Du, Sicheng Zhang, Binzhu Xie et al.

CVPR 2024arXiv:2405.00181
44
citations
#786

All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

Ashmal Vayani, Dinura Dissanayake, Hasindri Watawana et al.

CVPR 2025highlightarXiv:2411.16508
44
citations
#787

UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All

Yuanhuiyi Lyu, Xu Zheng, Jiazhou Zhou et al.

CVPR 2024arXiv:2403.12532
44
citations
#788

FreeKD: Knowledge Distillation via Semantic Frequency Prompt

Yuan Zhang, Tao Huang, Jiaming Liu et al.

CVPR 2024arXiv:2311.12079
44
citations
#789

MET3R: Measuring Multi-View Consistency in Generated Images

Mohammad Asim, Christopher Wewer, Thomas Wimmer et al.

CVPR 2025arXiv:2501.06336
44
citations
#790

Map-Relative Pose Regression for Visual Re-Localization

Shuai Chen, Tommaso Cavallari, Victor Adrian Prisacariu et al.

CVPR 2024highlightarXiv:2404.09884
43
citations
#791

Mask Grounding for Referring Image Segmentation

Yong Xien Chng, Henry Zheng, Yizeng Han et al.

CVPR 2024arXiv:2312.12198
43
citations
#792

Scene Adaptive Sparse Transformer for Event-based Object Detection

Yansong Peng, Li Hebei, Yueyi Zhang et al.

CVPR 2024arXiv:2404.01882
43
citations
#793

5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks

Dongshuo Yin, Leiyi Hu, Bin Li et al.

CVPR 2025arXiv:2408.08345
43
citations
#794

EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning

Hongxia Xie, Chu-Jun Peng, Yu-Wen Tseng et al.

CVPR 2024arXiv:2404.16670
43
citations
#795

LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding

Hongyu Li, Jinyu Chen, Ziyu Wei et al.

CVPR 2025arXiv:2501.08282
43
citations
#796

WorDepth: Variational Language Prior for Monocular Depth Estimation

Ziyao Zeng, Hyoungseob Park, Fengyu Yang et al.

CVPR 2024arXiv:2404.03635
43
citations
#797

Fair Federated Learning under Domain Skew with Local Consistency and Domain Diversity

Yuhang Chen, Wenke Huang, Mang Ye

CVPR 2024arXiv:2405.16585
43
citations
#798

SemCity: Semantic Scene Generation with Triplane Diffusion

Jumin Lee, Sebin Lee, Changho Jo et al.

CVPR 2024arXiv:2403.07773
43
citations
#799

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Yuqian Yuan, Hang Zhang, Wentong Li et al.

CVPR 2025arXiv:2501.00599
43
citations
#800

On Exact Inversion of DPM-Solvers

Seongmin Hong, Kyeonghyun Lee, Suh Yoon Jeon et al.

CVPR 2024arXiv:2311.18387
43
citations