Most Cited CVPR 2024 "extrapolated view synthesis" Papers

2,716 papers found • Page 1 of 14

#1

DETRs Beat YOLOs on Real-time Object Detection

Yian Zhao, Wenyu Lv, Shangliang Xu et al.

CVPR 2024posterarXiv:2304.08069
2424
citations
#2

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Zhe Chen, Jiannan Wu, Wenhai Wang et al.

CVPR 2024posterarXiv:2312.14238
2210
citations
#3

4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

Guanjun Wu, Taoran Yi, Jiemin Fang et al.

CVPR 2024posterarXiv:2310.08528
1061
citations
#4

VBench: Comprehensive Benchmark Suite for Video Generative Models

Ziqi Huang, Yinan He, Jiashuo Yu et al.

CVPR 2024highlightarXiv:2311.17982
996
citations
#5

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

Kunchang Li, Yali Wang, Yinan He et al.

CVPR 2024highlightarXiv:2311.17005
864
citations
#6

LISA: Reasoning Segmentation via Large Language Model

Xin Lai, Zhuotao Tian, Yukang Chen et al.

CVPR 2024posterarXiv:2308.00692
721
citations
#7

Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction

Ziyi Yang, Xinyu Gao, Wen Zhou et al.

CVPR 2024posterarXiv:2309.13101
686
citations
#8

VILA: On Pre-training for Visual Language Models

Ji Lin, Danny Yin, Wei Ping et al.

CVPR 2024posterarXiv:2312.07533
685
citations
#9

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Qinghao Ye, Haiyang Xu, Jiabo Ye et al.

CVPR 2024highlightarXiv:2311.04257
601
citations
#10

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

Tao Lu, Mulin Yu, Linning Xu et al.

CVPR 2024highlightarXiv:2312.00109
589
citations
#11

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs

Shengbang Tong, Zhuang Liu, Yuexiang Zhai et al.

CVPR 2024posterarXiv:2401.06209
570
citations
#12

SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities

Boyuan Chen, Zhuo Xu, Sean Kirmani et al.

CVPR 2024posterarXiv:2401.12168
550
citations
#13

One-step Diffusion with Distribution Matching Distillation

Tianwei Yin, Michaël Gharbi, Richard Zhang et al.

CVPR 2024posterarXiv:2311.18828
543
citations
#14

pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

David Charatan, Sizhe Lester Li, Andrea Tagliasacchi et al.

CVPR 2024posterarXiv:2312.12337
496
citations
#15

SplaTAM: Splat Track & Map 3D Gaussians for Dense RGB-D SLAM

Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula et al.

CVPR 2024posterarXiv:2312.02126
477
citations
#16

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

Enxin Song, Wenhao Chai, Guanhong Wang et al.

CVPR 2024posterarXiv:2307.16449
457
citations
#17

Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

Sicong Leng, Hang Zhang, Guanzheng Chen et al.

CVPR 2024highlightarXiv:2311.16922
449
citations
#18

Generative Multimodal Models are In-Context Learners

Quan Sun, Yufeng Cui, Xiaosong Zhang et al.

CVPR 2024posterarXiv:2312.13286
422
citations
#19

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Bowen Wen, Wei Yang, Jan Kautz et al.

CVPR 2024highlightarXiv:2312.08344
412
citations
#20

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Zhang Li, Biao Yang, Qiang Liu et al.

CVPR 2024highlightarXiv:2311.06607
384
citations
#21

InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning

Jing Shi, Wei Xiong, Zhe Lin et al.

CVPR 2024posterarXiv:2304.03411
369
citations
#22

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Qidong Huang, Xiaoyi Dong, Pan Zhang et al.

CVPR 2024highlightarXiv:2311.17911
365
citations
#23

GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting

Chi Yan, Delin Qu, Dong Wang et al.

CVPR 2024highlightarXiv:2311.11700
359
citations
#24

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Shuhuai Ren, Linli Yao, Shicheng Li et al.

CVPR 2024posterarXiv:2312.02051
356
citations
#25

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Peng Jin, Ryuichi Takanobu, Cai Zhang et al.

CVPR 2024highlightarXiv:2311.08046
354
citations
#26

HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models

Tianrui Guan, Fuxiao Liu, Xiyang Wu et al.

CVPR 2024posterarXiv:2310.14566
354
citations
#27

Compact 3D Gaussian Representation for Radiance Field

Joo Chan Lee, Daniel Rho, Xiangyu Sun et al.

CVPR 2024highlightarXiv:2311.13681
348
citations
#28

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Tianyu Yu, Yuan Yao, Haoye Zhang et al.

CVPR 2024posterarXiv:2312.00849
344
citations
#29

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace et al.

CVPR 2024posterarXiv:2402.19479
341
citations
#30

Text-to-3D using Gaussian Splatting

Zilong Chen, Feng Wang, Yikai Wang et al.

CVPR 2024posterarXiv:2309.16585
330
citations
#31

Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields

Shijie Zhou, Haoran Chang, Sicheng Jiang et al.

CVPR 2024highlightarXiv:2312.03203
327
citations
#32

V?: Guided Visual Search as a Core Mechanism in Multimodal LLMs

Penghao Wu, Saining Xie

CVPR 2024posterarXiv:2312.14135
327
citations
#33

GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting

Yiwen Chen, Zilong Chen, Chi Zhang et al.

CVPR 2024posterarXiv:2311.14521
321
citations
#34

MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model

Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew et al.

CVPR 2024posterarXiv:2311.16498
318
citations
#35

Splatter Image: Ultra-Fast Single-View 3D Reconstruction

Stanislaw Szymanowicz, Christian Rupprecht, Andrea Vedaldi

CVPR 2024posterarXiv:2312.13150
316
citations
#36

Video-P2P: Video Editing with Cross-attention Control

Shaoteng Liu, Yuechen Zhang, Wenbo Li et al.

CVPR 2024posterarXiv:2303.04761
309
citations
#37

DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing

Yujun Shi, Chuhui Xue, Jun Hao Liew et al.

CVPR 2024highlightarXiv:2306.14435
308
citations
#38

SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes

Yihua Huang, Yangtian Sun, Ziyi Yang et al.

CVPR 2024posterarXiv:2312.14937
302
citations
#39

ReconFusion: 3D Reconstruction with Diffusion Priors

Rundi Wu, Ben Mildenhall, Philipp Henzler et al.

CVPR 2024posterarXiv:2312.02981
283
citations
#40

Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers

Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo et al.

CVPR 2024posterarXiv:2312.09147
266
citations
#41

DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision

Lu Ling, Yichen Sheng, Zhi Tu et al.

CVPR 2024posterarXiv:2312.16256
266
citations
#42

DeepCache: Accelerating Diffusion Models for Free

Xinyin Ma, Gongfan Fang, Xinchao Wang

CVPR 2024posterarXiv:2312.00858
265
citations
#43

SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution

Rongyuan Wu, Tao Yang, Lingchen Sun et al.

CVPR 2024posterarXiv:2311.16518
256
citations
#44

On Scaling Up a Multilingual Vision and Language Model

Xi Chen, Josip Djolonga, Piotr Padlewski et al.

CVPR 2024posterarXiv:2305.18565
254
citations
#45

VTimeLLM: Empower LLM to Grasp Video Moments

Bin Huang, Xin Wang, Hong Chen et al.

CVPR 2024highlightarXiv:2311.18445
244
citations
#46

GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models

Taoran Yi, Jiemin Fang, Junjie Wang et al.

CVPR 2024posterarXiv:2310.08529
241
citations
#47

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

Yunyang Xiong, Balakrishnan Varadarajan, Lemeng Wu et al.

CVPR 2024highlightarXiv:2312.00863
241
citations
#48

Emu Edit: Precise Image Editing via Recognition and Generation Tasks

Shelly Sheynin, Adam Polyak, Uriel Singer et al.

CVPR 2024highlightarXiv:2311.10089
238
citations
#49

RoMa: Robust Dense Feature Matching

Johan Edstedt, Qiyu Sun, Georg Bökman et al.

CVPR 2024posterarXiv:2305.15404
238
citations
#50

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

Yaofang Liu, Xiaodong Cun, Xuebo Liu et al.

CVPR 2024posterarXiv:2310.11440
237
citations
#51

SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery

Xin Guo, Jiangwei Lao, Bo Dang et al.

CVPR 2024posterarXiv:2312.10115
236
citations
#52

DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization

Jiahe Li, Jiawei Zhang, Xiao Bai et al.

CVPR 2024posterarXiv:2403.06912
232
citations
#53

Sequential Modeling Enables Scalable Learning for Large Vision Models

Yutong Bai, Xinyang Geng, Karttikeya Mangalam et al.

CVPR 2024posterarXiv:2312.00785
230
citations
#54

GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces

Yingwenqi Jiang, Jiadong Tu, Yuan Liu et al.

CVPR 2024posterarXiv:2311.17977
228
citations
#55

SinSR: Diffusion-Based Image Super-Resolution in a Single Step

Yufei Wang, Wenhan Yang, Xinyuan Chen et al.

CVPR 2024posterarXiv:2311.14760
214
citations
#56

Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer

Jiwoo Chung, Sangeek Hyun, Jae-Pil Heo

CVPR 2024highlightarXiv:2312.09008
211
citations
#57

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

Le Xue, Ning Yu, Shu Zhang et al.

CVPR 2024posterarXiv:2305.08275
192
citations
#58

Putting the Object Back into Video Object Segmentation

Ho Kei Cheng, Seoung Wug Oh, Brian Price et al.

CVPR 2024highlightarXiv:2310.12982
182
citations
#59

StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

Jeongho Kim, Gyojung Gu, Minho Park et al.

CVPR 2024posterarXiv:2312.01725
176
citations
#60

RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection

Ximiao Zhang, Min Xu, Xiuzhuang Zhou

CVPR 2024posterarXiv:2403.05897
171
citations
#61

Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding

Jin-Chuan Shi, Miao Wang, Haobin Duan et al.

CVPR 2024posterarXiv:2311.18482
171
citations
#62

Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?

Zhiqi Li, Zhiding Yu, Shiyi Lan et al.

CVPR 2024posterarXiv:2312.03031
169
citations
#63

4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling

Sherwin Bahmani, Ivan Skorokhodov, Victor Rong et al.

CVPR 2024posterarXiv:2311.17984
168
citations
#64

Compositional Chain-of-Thought Prompting for Large Multimodal Models

Chancharik Mitra, Brandon Huang, Trevor Darrell et al.

CVPR 2024posterarXiv:2311.17076
167
citations
#65

SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting

Zhijing Shao, Wang Zhaolong, Zhuang Li et al.

CVPR 2024posterarXiv:2403.05087
165
citations
#66

BioCLIP: A Vision Foundation Model for the Tree of Life

Samuel Stevens, Jiaman Wu, Matthew Thompson et al.

CVPR 2024posterarXiv:2311.18803
165
citations
#67

GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions

Junjie Wang, Jiemin Fang, Xiaopeng Zhang et al.

CVPR 2024posterarXiv:2311.16037
164
citations
#68

HIVE: Harnessing Human Feedback for Instructional Visual Editing

Shu Zhang, Xinyi Yang, Yihao Feng et al.

CVPR 2024posterarXiv:2303.09618
164
citations
#69

GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

Shunyuan Zheng, Boyao ZHOU, Ruizhi Shao et al.

CVPR 2024highlightarXiv:2312.02155
160
citations
#70

Grounded Text-to-Image Synthesis with Attention Refocusing

Quynh Phung, Songwei Ge, Jia-Bin Huang

CVPR 2024posterarXiv:2306.05427
157
citations
#71

Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering

Zhiwen Yan, Weng Fei Low, Yu Chen et al.

CVPR 2024posterarXiv:2311.17089
155
citations
#72

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Mu Cai, Haotian Liu, Siva Mustikovela et al.

CVPR 2024posterarXiv:2312.00784
153
citations
#73

Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians

Yuelang Xu, Benwang Chen, Zhe Li et al.

CVPR 2024poster
147
citations
#74

Osprey: Pixel Understanding with Visual Instruction Tuning

Yuqian Yuan, Wentong Li, Jian liu et al.

CVPR 2024posterarXiv:2312.10032
147
citations
#75

Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed

Yifan Wang, Xingyi He, Sida Peng et al.

CVPR 2024highlightarXiv:2403.04765
142
citations
#76

MMA-Diffusion: MultiModal Attack on Diffusion Models

Yijun Yang, Ruiyuan Gao, Xiaosen Wang et al.

CVPR 2024posterarXiv:2311.17516
141
citations
#77

SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models

Yuzhou Huang, Liangbin Xie, Xintao Wang et al.

CVPR 2024highlightarXiv:2312.06739
139
citations
#78

Optimal Transport Aggregation for Visual Place Recognition

Sergio Izquierdo, Javier Civera

CVPR 2024posterarXiv:2311.15937
138
citations
#79

Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection

Zhiyuan Yan, Yuhao Luo, Siwei Lyu et al.

CVPR 2024posterarXiv:2311.11278
133
citations
#80

ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions

Chunlong Xia, Xinliang Wang, Feng Lv et al.

CVPR 2024highlightarXiv:2403.07392
131
citations
#81

Probing the 3D Awareness of Visual Foundation Models

Mohamed El Banani, Amit Raj, Kevis-kokitsi Maninis et al.

CVPR 2024posterarXiv:2404.08636
130
citations
#82

GART: Gaussian Articulated Template Models

Jiahui Lei, Yufu Wang, Georgios Pavlakos et al.

CVPR 2024highlightarXiv:2311.16099
129
citations
#83

VLP: Vision Language Planning for Autonomous Driving

Chenbin Pan, Burhan Yaman, Tommaso Nesti et al.

CVPR 2024posterarXiv:2401.05577
127
citations
#84

XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies

Xuanchi Ren, Jiahui Huang, Xiaohui Zeng et al.

CVPR 2024highlightarXiv:2312.03806
127
citations
#85

GSVA: Generalized Segmentation via Multimodal Large Language Models

Zhuofan Xia, Dongchen Han, Yizeng Han et al.

CVPR 2024posterarXiv:2312.10103
127
citations
#86

Relightable Gaussian Codec Avatars

Shunsuke Saito, Gabriel Schwartz, Tomas Simon et al.

CVPR 2024posterarXiv:2312.03704
127
citations
#87

NeuRAD: Neural Rendering for Autonomous Driving

Adam Tonderski, Carl Lindström, Georg Hess et al.

CVPR 2024highlightarXiv:2311.15260
126
citations
#88

Text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion

Xunpeng Yi, Han Xu, HAO ZHANG et al.

CVPR 2024posterarXiv:2403.16387
123
citations
#89

Generalized Predictive Model for Autonomous Driving

Jiazhi Yang, Shenyuan Gao, Yihang Qiu et al.

CVPR 2024highlightarXiv:2403.09630
122
citations
#90

Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection

Chengjie Wang, wenbing zhu, Bin-Bin Gao et al.

CVPR 2024posterarXiv:2403.12580
120
citations
#91

Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers

Jinxia Xie, Bineng Zhong, Zhiyi Mo et al.

CVPR 2024poster
118
citations
#92

VideoBooth: Diffusion-based Video Generation with Image Prompts

Yuming Jiang, Tianxing Wu, Shuai Yang et al.

CVPR 2024posterarXiv:2312.00777
118
citations
#93

OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning

Lingyi Hong, Shilin Yan, Renrui Zhang et al.

CVPR 2024highlightarXiv:2403.09634
118
citations
#94

EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection

Xuanyu Zhang, Runyi Li, Jiwen Yu et al.

CVPR 2024posterarXiv:2312.08883
118
citations
#95

InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning

Yan-Shuo Liang, Wu-Jun Li

CVPR 2024posterarXiv:2404.00228
117
citations
#96

Towards Learning a Generalist Model for Embodied Navigation

Duo Zheng, Shijia Huang, Lin Zhao et al.

CVPR 2024highlightarXiv:2312.02010
117
citations
#97

Hallucination Augmented Contrastive Learning for Multimodal Large Language Model

Chaoya Jiang, Haiyang Xu, Mengfan Dong et al.

CVPR 2024posterarXiv:2312.06968
116
citations
#98

Human Gaussian Splatting: Real-time Rendering of Animatable Avatars

Arthur Moreau, Jifei Song, Helisa Dhamo et al.

CVPR 2024posterarXiv:2311.17113
113
citations
#99

One-dimensional Adapter to Rule Them All: Concepts Diffusion Models and Erasing Applications

Mengyao Lyu, Yuhong Yang, Haiwen Hong et al.

CVPR 2024highlightarXiv:2312.16145
112
citations
#100

Can I Trust Your Answer? Visually Grounded Video Question Answering

Junbin Xiao, Angela Yao, Yicong Li et al.

CVPR 2024highlightarXiv:2309.01327
109
citations
#101

Binding Touch to Everything: Learning Unified Multimodal Tactile Representations

Fengyu Yang, Chao Feng, Ziyang Chen et al.

CVPR 2024posterarXiv:2401.18084
109
citations
#102

Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

Yazhou Xing, Yingqing He, Zeyue Tian et al.

CVPR 2024posterarXiv:2402.17723
109
citations
#103

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

Dewei Zhou, You Li, Fan Ma et al.

CVPR 2024highlightarXiv:2402.05408
109
citations
#104

VideoLLM-online: Online Video Large Language Model for Streaming Video

Joya Chen, Zhaoyang Lv, Shiwei Wu et al.

CVPR 2024posterarXiv:2406.11816
109
citations
#105

Efficient Test-Time Adaptation of Vision-Language Models

Adilbek Karmanov, Dayan Guan, Shijian Lu et al.

CVPR 2024posterarXiv:2403.18293
109
citations
#106

Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

Xianfang Zeng, Xin Chen, Zhongqi Qi et al.

CVPR 2024posterarXiv:2312.13913
108
citations
#107

SimDA: Simple Diffusion Adapter for Efficient Video Generation

Zhen Xing, Qi Dai, Han Hu et al.

CVPR 2024posterarXiv:2308.09710
106
citations
#108

FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization

Jiahui Zhang, Fangneng Zhan, MUYU XU et al.

CVPR 2024posterarXiv:2403.06908
106
citations
#109

OMG-Seg: Is One Model Good Enough For All Segmentation?

Xiangtai Li, Haobo Yuan, Wei Li et al.

CVPR 2024posterarXiv:2401.10229
106
citations
#110

GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence

Van Nguyen Nguyen, Thibault Groueix, Mathieu Salzmann et al.

CVPR 2024posterarXiv:2311.14155
105
citations
#111

PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection

Xiaofan Li, Zhizhong Zhang, Xin Tan et al.

CVPR 2024posterarXiv:2404.05231
104
citations
#112

RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding

Jihan Yang, Runyu Ding, Weipeng DENG et al.

CVPR 2024posterarXiv:2304.00962
103
citations
#113

Zero-Reference Low-Light Enhancement via Physical Quadruple Priors

Wenjing Wang, Huan Yang, Jianlong Fu et al.

CVPR 2024posterarXiv:2403.12933
101
citations
#114

PLGSLAM: Progressive Neural Scene Represenation with Local to Global Bundle Adjustment

Tianchen Deng, Guole Shen, Tong Qin et al.

CVPR 2024posterarXiv:2312.09866
101
citations
#115

Stronger Fewer & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation

ZHIXIANG WEI, Lin Chen, Xiaoxiao Ma et al.

CVPR 2024posterarXiv:2312.04265
100
citations
#116

3D Geometry-Aware Deformable Gaussian Splatting for Dynamic View Synthesis

Zhicheng Lu, xiang guo, Le Hui et al.

CVPR 2024posterarXiv:2404.06270
99
citations
#117

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

Chuwei Luo, Yufan Shen, Zhaoqing Zhu et al.

CVPR 2024posterarXiv:2404.05225
98
citations
#118

SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design

Seokju Yun, Youngmin Ro

CVPR 2024posterarXiv:2401.16456
97
citations
#119

GARField: Group Anything with Radiance Fields

Chung Min Kim, Mingxuan Wu, Justin Kerr et al.

CVPR 2024posterarXiv:2401.09419
96
citations
#120

Single-Model and Any-Modality for Video Object Tracking

Zongwei Wu, Jilai Zheng, Xiangxuan Ren et al.

CVPR 2024posterarXiv:2311.15851
96
citations
#121

HIPTrack: Visual Tracking with Historical Prompts

Wenrui Cai, Qingjie Liu, Yunhong Wang

CVPR 2024posterarXiv:2311.02072
96
citations
#122

Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer

Rafail Fridman, Danah Yatim, Omer Bar-Tal et al.

CVPR 2024posterarXiv:2311.17009
96
citations
#123

Self-correcting LLM-controlled Diffusion Models

Tsung-Han Wu, Long Lian, Joseph Gonzalez et al.

CVPR 2024posterarXiv:2311.16090
95
citations
#124

Generative Image Dynamics

Zhengqi Li, Richard Tucker, Noah Snavely et al.

CVPR 2024posterarXiv:2309.07906
93
citations
#125

SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation

Bin Xie, Jiale Cao, Jin Xie et al.

CVPR 2024posterarXiv:2311.15537
90
citations
#126

Bayes' Rays: Uncertainty Quantification for Neural Radiance Fields

Leili Goli, Cody Reading, Silvia Sellán et al.

CVPR 2024highlightarXiv:2309.03185
89
citations
#127

DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing

Chong Mou, Xintao Wang, Jiechong Song et al.

CVPR 2024posterarXiv:2402.02583
89
citations
#128

VidToMe: Video Token Merging for Zero-Shot Video Editing

Xirui Li, Chao Ma, Xiaokang Yang et al.

CVPR 2024posterarXiv:2312.10656
89
citations
#129

Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation

Sihan liu, Yiwei Ma, Xiaoqing Zhang et al.

CVPR 2024posterarXiv:2312.12470
89
citations
#130

GES : Generalized Exponential Splatting for Efficient Radiance Field Rendering

Abdullah J Hamdi, Luke Melas-Kyriazi, Jinjie Mai et al.

CVPR 2024posterarXiv:2402.10128
88
citations
#131

Boosting Adversarial Transferability by Block Shuffle and Rotation

Kunyu Wang, he xuanran, Wenxuan Wang et al.

CVPR 2024posterarXiv:2308.10299
88
citations
#132

DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models

Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood et al.

CVPR 2024posterarXiv:2405.14881
85
citations
#133

ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image

Kyle Sargent, Zizhang Li, Tanmay Shah et al.

CVPR 2024posterarXiv:2310.17994
85
citations
#134

AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error

Jonas Ricker, Denis Lukovnikov, Asja Fischer

CVPR 2024posterarXiv:2401.17879
85
citations
#135

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri et al.

CVPR 2024posterarXiv:2311.17049
84
citations
#136

Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation

Alexander Raistrick, Lingjie Mei, Karhan Kayan et al.

CVPR 2024posterarXiv:2406.11824
84
citations
#137

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World

Yifei Huang, Guo Chen, Jilan Xu et al.

CVPR 2024posterarXiv:2403.16182
84
citations
#138

TUMTraf V2X Cooperative Perception Dataset

Walter Zimmer, Gerhard Arya Wardana, Suren Sritharan et al.

CVPR 2024posterarXiv:2403.01316
83
citations
#139

Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining

Xiang Chen, Jinshan Pan, Jiangxin Dong

CVPR 2024posterarXiv:2404.01547
83
citations
#140

Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline

Xiao Wang, Shiao Wang, Chuanming Tang et al.

CVPR 2024posterarXiv:2309.14611
82
citations
#141

Video ReCap: Recursive Captioning of Hour-Long Videos

Md Mohaiminul Islam, Vu Bao Ngan Ho, Xitong Yang et al.

CVPR 2024posterarXiv:2402.13250
82
citations
#142

IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection

Junbo Yin, Wenguan Wang, Runnan Chen et al.

CVPR 2024highlightarXiv:2403.15241
81
citations
#143

InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization

Xiefan Guo, Jinlin Liu, Miaomiao Cui et al.

CVPR 2024posterarXiv:2404.04650
81
citations
#144

HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting

Yuheng Jiang, Zhehao Shen, Penghao Wang et al.

CVPR 2024posterarXiv:2312.03461
80
citations
#145

On the Robustness of Large Multimodal Models Against Image Adversarial Attacks

Xuanming Cui, Alejandro Aparcedo, Young Kyun Jang et al.

CVPR 2024posterarXiv:2312.03777
80
citations
#146

NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis

Nilesh Kulkarni, Davis Rempe, Kyle Genova et al.

CVPR 2024posterarXiv:2307.07511
80
citations
#147

InstructVideo: Instructing Video Diffusion Models with Human Feedback

Hangjie Yuan, Shiwei Zhang, Xiang Wang et al.

CVPR 2024posterarXiv:2312.12490
80
citations
#148

General Object Foundation Model for Images and Videos at Scale

Junfeng Wu, Yi Jiang, Qihao Liu et al.

CVPR 2024highlightarXiv:2312.09158
79
citations
#149

FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian Embedding

Jun Xiang, Xuan Gao, Yudong Guo et al.

CVPR 2024posterarXiv:2312.02214
78
citations
#150

EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling

Haiyang Liu, Zihao Zhu, Giorgio Becherini et al.

CVPR 2024posterarXiv:2401.00374
78
citations
#151

CLIP-KD: An Empirical Study of CLIP Model Distillation

Chuanguang Yang, Zhulin An, Libo Huang et al.

CVPR 2024posterarXiv:2307.12732
78
citations
#152

Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery

Mubashir Noman, Muzammal Naseer, Hisham Cholakkal et al.

CVPR 2024posterarXiv:2403.05419
78
citations
#153

Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement

Xiuquan Hou, Meiqin Liu, Senlin Zhang et al.

CVPR 2024posterarXiv:2403.16131
78
citations
#154

Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following

Yutong Feng, Biao Gong, Di Chen et al.

CVPR 2024posterarXiv:2311.17002
78
citations
#155

Move as You Say Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

Zan Wang, Yixin Chen, Baoxiong Jia et al.

CVPR 2024highlightarXiv:2403.18036
78
citations
#156

Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training

Xiaoyang Wu, Zhuotao Tian, Xin Wen et al.

CVPR 2024posterarXiv:2308.09718
77
citations
#157

CCEdit: Creative and Controllable Video Editing via Diffusion Models

Ruoyu Feng, Wenming Weng, Yanhui Wang et al.

CVPR 2024posterarXiv:2309.16496
77
citations
#158

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception

Yiran Qin, Enshen Zhou, Qichang Liu et al.

CVPR 2024posterarXiv:2312.07472
76
citations
#159

Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary

Leheng Zhang, Yawei Li, Xingyu Zhou et al.

CVPR 2024posterarXiv:2401.08209
76
citations
#160

Learning Multi-Dimensional Human Preference for Text-to-Image Generation

Sixian Zhang, Bohan Wang, Junqiang Wu et al.

CVPR 2024posterarXiv:2405.14705
76
citations
#161

SAI3D: Segment Any Instance in 3D Scenes

Yingda Yin, Yuzheng Liu, Yang Xiao et al.

CVPR 2024posterarXiv:2312.11557
76
citations
#162

GROUNDHOG: Grounding Large Language Models to Holistic Segmentation

Yichi Zhang, Ziqiao Ma, Xiaofeng Gao et al.

CVPR 2024posterarXiv:2402.16846
75
citations
#163

Structure-Aware Sparse-View X-ray 3D Reconstruction

Yuanhao Cai, Jiahao Wang, Alan L. Yuille et al.

CVPR 2024posterarXiv:2311.10959
75
citations
#164

Dynamic Graph Representation with Knowledge-aware Attention for Histopathology Whole Slide Image Analysis

Jiawen Li, Yuxuan Chen, Hongbo Chu et al.

CVPR 2024posterarXiv:2403.07719
75
citations
#165

Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering

Kim Youwang, Tae-Hyun Oh, Gerard Pons-Moll

CVPR 2024posterarXiv:2312.11360
75
citations
#166

CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition

Feng Lu, Xiangyuan Lan, Lijun Zhang et al.

CVPR 2024posterarXiv:2402.19231
75
citations
#167

Grounding Everything: Emerging Localization Properties in Vision-Language Transformers

Walid Bousselham, Felix Petersen, Vittorio Ferrari et al.

CVPR 2024posterarXiv:2312.00878
74
citations
#168

Morphological Prototyping for Unsupervised Slide Representation Learning in Computational Pathology

Andrew Song, Richard J. Chen, Tong Ding et al.

CVPR 2024posterarXiv:2405.11643
74
citations
#169

LLaFS: When Large Language Models Meet Few-Shot Segmentation

Lanyun Zhu, Tianrun Chen, Deyi Ji et al.

CVPR 2024posterarXiv:2311.16926
73
citations
#170

COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction

Qihang Ma, Xin Tan, Yanyun Qu et al.

CVPR 2024posterarXiv:2312.01919
73
citations
#171

MoCha-Stereo: Motif Channel Attention Network for Stereo Matching

Ziyang Chen, Wei Long, He Yao et al.

CVPR 2024posterarXiv:2404.06842
72
citations
#172

ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion

Jiayu Yang, Ziang Cheng, Yunfei Duan et al.

CVPR 2024posterarXiv:2310.10343
72
citations
#173

Depth Information Assisted Collaborative Mutual Promotion Network for Single Image Dehazing

Yafei Zhang, Shen Zhou, Huafeng Li

CVPR 2024posterarXiv:2403.01105
71
citations
#174

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Evonne Ng, Javier Romero, Timur Bagautdinov et al.

CVPR 2024posterarXiv:2401.01885
71
citations
#175

CoSeR: Bridging Image and Language for Cognitive Super-Resolution

Haoze Sun, Wenbo Li, Jianzhuang Liu et al.

CVPR 2024posterarXiv:2311.16512
71
citations
#176

MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception

Thien-Minh Nguyen, Shenghai Yuan, Thien Nguyen et al.

CVPR 2024highlightarXiv:2403.11496
70
citations
#177

Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection

Zhiwei Yang, Jing Liu, Peng Wu

CVPR 2024posterarXiv:2404.08531
70
citations
#178

VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis

Linshan Wu, Jia-Xin Zhuang, Hao Chen

CVPR 2024posterarXiv:2402.17300
70
citations
#179

OpenBias: Open-set Bias Detection in Text-to-Image Generative Models

Moreno D&#x27, Incà, Elia Peruzzo et al.

CVPR 2024highlight
69
citations
#180

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

Ye Yuan, Xueting Li, Yangyi Huang et al.

CVPR 2024highlightarXiv:2312.11461
69
citations
#181

SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion

Hsuan-I Ho, Jie Song, Otmar Hilliges

CVPR 2024posterarXiv:2311.15855
69
citations
#182

OmniGlue: Generalizable Feature Matching with Foundation Model Guidance

Hanwen Jiang, Arjun Karpur, Bingyi Cao et al.

CVPR 2024posterarXiv:2405.12979
68
citations
#183

Optimizing Diffusion Noise Can Serve As Universal Motion Priors

Korrawe Karunratanakul, Konpat Preechakul, Emre Aksan et al.

CVPR 2024posterarXiv:2312.11994
68
citations
#184

BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP

Jiawang Bai, Kuofeng Gao, Shaobo Min et al.

CVPR 2024posterarXiv:2311.16194
68
citations
#185

Free3D: Consistent Novel View Synthesis without 3D Representation

Chuanxia Zheng, Andrea Vedaldi

CVPR 2024posterarXiv:2312.04551
68
citations
#186

Scaling Laws for Data Filtering— Data Curation cannot be Compute Agnostic

Sachin Goyal, Pratyush Maini, Zachary Lipton et al.

CVPR 2024posterarXiv:2404.07177
67
citations
#187

OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning

Haiyang Ying, Yixuan Yin, Jinzhi Zhang et al.

CVPR 2024posterarXiv:2311.11666
66
citations
#188

DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing

Kaiwen Zhang, Yifan Zhou, Xudong XU et al.

CVPR 2024posterarXiv:2312.07409
66
citations
#189

FINER: Flexible Spectral-bias Tuning in Implicit NEural Representation by Variable-periodic Activation Functions

Zhen Liu, Hao Zhu, Qi Zhang et al.

CVPR 2024posterarXiv:2312.02434
66
citations
#190

Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation

Shanshan Zhong, Zhongzhan Huang, Shanghua Gao et al.

CVPR 2024posterarXiv:2312.02439
65
citations
#191

Open-Vocabulary Video Anomaly Detection

Peng Wu, Xuerong Zhou, Guansong Pang et al.

CVPR 2024posterarXiv:2311.07042
64
citations
#192

NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild

Weining Ren, Zihan Zhu, Boyang Sun et al.

CVPR 2024posterarXiv:2405.18715
64
citations
#193

MonoCD: Monocular 3D Object Detection with Complementary Depths

Longfei Yan, Pei Yan, Shengzhou Xiong et al.

CVPR 2024posterarXiv:2404.03181
64
citations
#194

SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

Zhixuan Liang, Yao Mu, Hengbo Ma et al.

CVPR 2024posterarXiv:2312.11598
64
citations
#195

Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation

Shuting He, Henghui Ding

CVPR 2024posterarXiv:2404.03645
64
citations
#196

Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions

Oindrila Saha, Grant Horn, Subhransu Maji

CVPR 2024posterarXiv:2401.02460
64
citations
#197

IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation

Yizhi Song, Zhifei Zhang, Zhe Lin et al.

CVPR 2024posterarXiv:2403.10701
63
citations
#198

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

Yuchao Gu, Yipin Zhou, Bichen Wu et al.

CVPR 2024posterarXiv:2312.02087
63
citations
#199

Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation

guo, Tianwei Lin

CVPR 2024posterarXiv:2312.10113
63
citations
#200

Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval

Jiamian Wang, Guohao Sun, Pichao Wang et al.

CVPR 2024highlightarXiv:2403.17998
63
citations
PreviousNext