Most Cited CVPR "multivariate prompt distributions" Papers

5,589 papers found • Page 7 of 28

#1201

Beyond Text: Frozen Large Language Models in Visual Signal Comprehension

Lei Zhu, Fangyun Wei, Yanye Lu

CVPR 2024arXiv:2403.07874
30
citations
#1202

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

Jianing "Jed" Yang, Xuweiyi Chen, Nikhil Madaan et al.

CVPR 2025arXiv:2406.05132
30
citations
#1203

Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

Hongjie Wang, Difan Liu, Yan Kang et al.

CVPR 2024arXiv:2405.05252
30
citations
#1204

iKUN: Speak to Trackers without Retraining

Yunhao Du, Cheng Lei, Zhicheng Zhao et al.

CVPR 2024arXiv:2312.16245
30
citations
#1205

EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models

Jingyuan Yang, Jiawei Feng, Hui Huang

CVPR 2024arXiv:2401.04608
30
citations
#1206

Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring

Chengxu Liu, Xuan Wang, Xiangyu Xu et al.

CVPR 2024arXiv:2404.13153
30
citations
#1207

Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation

Wenhao Li, Mengyuan Liu, Hong Liu et al.

CVPR 2024highlightarXiv:2311.12028
30
citations
#1208

3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features

Chenfeng Xu, Huan Ling, Sanja Fidler et al.

CVPR 2024arXiv:2311.04391
30
citations
#1209

THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models

Prannay Kaul, Zhizhong Li, Hao Yang et al.

CVPR 2024arXiv:2405.05256
30
citations
#1210

Adversarial Diffusion Compression for Real-World Image Super-Resolution

Bin Chen, Gehui Li, Rongyuan Wu et al.

CVPR 2025arXiv:2411.13383
30
citations
#1211

Holo-Relighting: Controllable Volumetric Portrait Relighting from a Single Image

Yiqun Mei, Yu Zeng, He Zhang et al.

CVPR 2024arXiv:2403.09632
30
citations
#1212

A Conditional Denoising Diffusion Probabilistic Model for Point Cloud Upsampling

Wentao Qu, Yuantian Shao, Lingwu Meng et al.

CVPR 2024arXiv:2312.02719
30
citations
#1213

LEOD: Label-Efficient Object Detection for Event Cameras

Ziyi Wu, Mathias Gehrig, Qing Lyu et al.

CVPR 2024arXiv:2311.17286
30
citations
#1214

MINIMA: Modality Invariant Image Matching

Jiangwei Ren, Xingyu Jiang, Zizhuo Li et al.

CVPR 2025arXiv:2412.19412
30
citations
#1215

Adapting to Length Shift: FlexiLength Network for Trajectory Prediction

Yi Xu, Yun Fu

CVPR 2024arXiv:2404.00742
29
citations
#1216

UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion

Junsheng Zhou, Weiqi Zhang, Baorui Ma et al.

CVPR 2024arXiv:2404.06851
29
citations
#1217

SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding

Ying Chen, Guoan Wang, Yuanfeng Ji et al.

CVPR 2025arXiv:2410.11761
29
citations
#1218

LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences

Hongyan Zhi, Peihao Chen, Junyan Li et al.

CVPR 2025arXiv:2412.01292
29
citations
#1219

Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework

Vu Minh Hieu Phan, Yutong Xie, Yuankai Qi et al.

CVPR 2024arXiv:2403.07636
29
citations
#1220

The Devil is in the Fine-Grained Details: Evaluating Open-Vocabulary Object Detectors for Fine-Grained Understanding

Lorenzo Bianchi, Fabio Carrara, Nicola Messina et al.

CVPR 2024highlightarXiv:2311.17518
29
citations
#1221

Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Navve Wasserman, Noam Rotstein, Roy Ganz et al.

CVPR 2025arXiv:2404.18212
29
citations
#1222

Adapt Before Comparison: A New Perspective on Cross-Domain Few-Shot Segmentation

Jonas Herzog

CVPR 2024arXiv:2402.17614
29
citations
#1223

Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models

Pablo Marcos-Manchón, Roberto Alcover-Couso, Juan SanMiguel et al.

CVPR 2024arXiv:2403.14291
29
citations
#1224

WANDR: Intention-guided Human Motion Generation

Markos Diomataris, Nikos Athanasiou, Omid Taheri et al.

CVPR 2024arXiv:2404.15383
29
citations
#1225

Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection

Tahira Shehzadi, Khurram Azeem Hashmi, Didier Stricker et al.

CVPR 2024arXiv:2404.01819
29
citations
#1226

Elite360D: Towards Efficient 360 Depth Estimation via Semantic- and Distance-Aware Bi-Projection Fusion

Hao Ai, Addison, Lin Wang

CVPR 2024arXiv:2403.16376
29
citations
#1227

Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding

Le Zhang, Rabiul Awal, Aishwarya Agrawal

CVPR 2024arXiv:2306.08832
29
citations
#1228

Boosting Neural Representations for Videos with a Conditional Decoder

XINJIE ZHANG, Ren Yang, Dailan He et al.

CVPR 2024highlightarXiv:2402.18152
29
citations
#1229

Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning

Wenjin Hou, Shiming Chen, Shuhuang Chen et al.

CVPR 2024arXiv:2404.14808
29
citations
#1230

NARUTO: Neural Active Reconstruction from Uncertain Target Observations

Ziyue Feng, Huangying Zhan, Zheng Chen et al.

CVPR 2024arXiv:2402.18771
29
citations
#1231

LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity

Hongjie Wang, Chih-Yao Ma, Yen-Cheng Liu et al.

CVPR 2025arXiv:2412.09856
29
citations
#1232

Distilling Multi-modal Large Language Models for Autonomous Driving

Deepti Hegde, Rajeev Yasarla, Hong Cai et al.

CVPR 2025arXiv:2501.09757
29
citations
#1233

A Simple Baseline for Efficient Hand Mesh Reconstruction

zhishan zhou, shihao zhou, Zhi Lv et al.

CVPR 2024arXiv:2403.01813
29
citations
#1234

Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch

Xidong Wu, Shangqian Gao, Zeyu Zhang et al.

CVPR 2024arXiv:2403.14729
29
citations
#1235

Domain-Rectifying Adapter for Cross-Domain Few-Shot Segmentation

Jiapeng Su, Qi Fan, Wenjie Pei et al.

CVPR 2024arXiv:2404.10322
29
citations
#1236

DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding

Geng Li, Jinglin Xu, Yunzhen Zhao et al.

CVPR 2025highlightarXiv:2504.14920
29
citations
#1237

DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks

Jiaxin Zhang, Dezhi Peng, Chongyu Liu et al.

CVPR 2024arXiv:2405.04408
29
citations
#1238

A Bias-Free Training Paradigm for More General AI-generated Image Detection

Fabrizio Guillaro, Giada Zingarini, Ben Usman et al.

CVPR 2025arXiv:2412.17671
29
citations
#1239

I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions

Chengfeng Zhao, Juze Zhang, Jiashen Du et al.

CVPR 2024arXiv:2312.08869
29
citations
#1240

Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection

Jin Yang, Ping Wei, Huan Li et al.

CVPR 2024arXiv:2404.09263
29
citations
#1241

Prompt-Enhanced Multiple Instance Learning for Weakly Supervised Video Anomaly Detection

Junxi Chen, Liang Li, Li Su et al.

CVPR 2024
29
citations
#1242

All in One Framework for Multimodal Re-identification in the Wild

He Li, Mang Ye, Ming Zhang et al.

CVPR 2024arXiv:2405.04741
29
citations
#1243

Interleaved-Modal Chain-of-Thought

Jun Gao, Yongqi Li, Ziqiang Cao et al.

CVPR 2025arXiv:2411.19488
29
citations
#1244

Neural Refinement for Absolute Pose Regression with Feature Synthesis

Shuai Chen, Yash Bhalgat, Xinghui Li et al.

CVPR 2024arXiv:2303.10087
29
citations
#1245

Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model

Xu He, Qiaochu Huang, Zhensong Zhang et al.

CVPR 2024arXiv:2404.01862
29
citations
#1246

Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding

Hoang-Quan Nguyen, Thanh-Dat Truong, Xuan-Bac Nguyen et al.

CVPR 2024highlightarXiv:2311.15206
29
citations
#1247

Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder

Jinseok Kim, Tae-Kyun Kim

CVPR 2024arXiv:2403.10255
29
citations
#1248

Self-Supervised Facial Representation Learning with Facial Region Awareness

Zheng Gao, Ioannis Patras

CVPR 2024arXiv:2403.02138
29
citations
#1249

Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis

Zanlin Ni, Yulin Wang, Renping Zhou et al.

CVPR 2024arXiv:2406.05478
28
citations
#1250

MVIP-NeRF: Multi-view 3D Inpainting on NeRF Scenes via Diffusion Prior

Honghua Chen, Chen Change Loy, Xingang Pan

CVPR 2024arXiv:2405.02859
28
citations
#1251

Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors

Lihe Ding, Shaocong Dong, Zhanpeng Huang et al.

CVPR 2024arXiv:2312.04963
28
citations
#1252

DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly

Gianluca Scarpellini, Stefano Fiorini, Francesco Giuliari et al.

CVPR 2024arXiv:2402.19302
28
citations
#1253

ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning

Beomyoung Kim, Joonsang Yu, Sung Ju Hwang

CVPR 2024arXiv:2403.20126
28
citations
#1254

Retrieval-Augmented Embodied Agents

Yichen Zhu, Zhicai Ou, Xiaofeng Mou et al.

CVPR 2024arXiv:2404.11699
28
citations
#1255

MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images

Junwen Huang, Hao Yu, Kuan-Ting Yu et al.

CVPR 2024arXiv:2403.01517
28
citations
#1256

DropGaussian: Structural Regularization for Sparse-view Gaussian Splatting

Hyunwoo Park, Gun Ryu, Wonjun Kim

CVPR 2025arXiv:2504.00773
28
citations
#1257

SPIN: Simultaneous Perception Interaction and Navigation

Shagun Uppal, Ananye Agarwal, Haoyu Xiong et al.

CVPR 2024arXiv:2405.07991
28
citations
#1258

What’s in the Image? A Deep-Dive into the Vision of Vision Language Models

Omri Kaduri, Shai Bagon, Tali Dekel

CVPR 2025arXiv:2411.17491
28
citations
#1259

Nearest is Not Dearest: Towards Practical Defense against Quantization-conditioned Backdoor Attacks

Boheng Li, Yishuo Cai, Haowei Li et al.

CVPR 2024arXiv:2405.12725
28
citations
#1260

HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding

Trong-Thuan Nguyen, Pha Nguyen, Khoa Luu

CVPR 2024arXiv:2312.03050
28
citations
#1261

Unifying Correspondence Pose and NeRF for Generalized Pose-Free Novel View Synthesis

Sunghwan Hong, Jaewoo Jung, Heeseong Shin et al.

CVPR 2024highlight
28
citations
#1262

Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion

Zuoyue Li, Zhenqiang Li, Zhaopeng Cui et al.

CVPR 2024highlightarXiv:2401.10786
28
citations
#1263

TextCraftor: Your Text Encoder Can be Image Quality Controller

Yanyu Li, Xian Liu, Anil Kag et al.

CVPR 2024arXiv:2403.18978
28
citations
#1264

Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation

Ziyang Xie, Zhizheng Liu, Zhenghao Peng et al.

CVPR 2025arXiv:2501.06693
28
citations
#1265

Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior

Chen Cheng, Xiaofeng Yang, Fan Yang et al.

CVPR 2024arXiv:2403.09140
28
citations
#1266

BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding

Shuming Liu, Chen Zhao, Tianqi Xu et al.

CVPR 2025arXiv:2503.21483
28
citations
#1267

GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding

Hao Li, Dingwen Zhang, Yalun Dai et al.

CVPR 2024highlightarXiv:2311.11863
28
citations
#1268

TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization

Liang Pan, Zeshi Yang, Zhiyang Dou et al.

CVPR 2025arXiv:2503.19901
28
citations
#1269

G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation

Tianxing Chen, Yao Mu, Zhixuan Liang et al.

CVPR 2025arXiv:2411.18369
28
citations
#1270

DiPrompT: Disentangled Prompt Tuning for Multiple Latent Domain Generalization in Federated Learning

Sikai Bai, Jie ZHANG, Song Guo et al.

CVPR 2024arXiv:2403.08506
28
citations
#1271

Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection

Ting Lei, Shaofeng Yin, Yang Liu

CVPR 2024arXiv:2404.06194
28
citations
#1272

3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation

Dale Decatur, Itai Lang, Kfir Aberman et al.

CVPR 2024arXiv:2311.09571
28
citations
#1273

MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation

Yanhui Wang, Jianmin Bao, Wenming Weng et al.

CVPR 2024highlightarXiv:2311.18829
28
citations
#1274

Each Test Image Deserves A Specific Prompt: Continual Test-Time Adaptation for 2D Medical Image Segmentation

Ziyang Chen, Yongsheng Pan, Yiwen Ye et al.

CVPR 2024arXiv:2311.18363
28
citations
#1275

TEA: Test-time Energy Adaptation

Yige Yuan, Bingbing Xu, Liang Hou et al.

CVPR 2024arXiv:2311.14402
28
citations
#1276

LaneCPP: Continuous 3D Lane Detection using Physical Priors

Maximilian Pittner, Joel Janai, Alexandru Paul Condurache

CVPR 2024arXiv:2406.08381
28
citations
#1277

LoCoNet: Long-Short Context Network for Active Speaker Detection

Xizi Wang, Feng Cheng, Gedas Bertasius

CVPR 2024arXiv:2301.08237
28
citations
#1278

Erasing Undesirable Influence in Diffusion Models

Jing Wu, Trung Le, Munawar Hayat et al.

CVPR 2025arXiv:2401.05779
28
citations
#1279

MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing

Cong Wang, Di Kang, Heyi Sun et al.

CVPR 2025arXiv:2404.19026
28
citations
#1280

Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning

Xinshun Wang, Zhongbin Fang, Xia Li et al.

CVPR 2024arXiv:2312.03703
28
citations
#1281

Label Propagation for Zero-shot Classification with Vision-Language Models

Vladan Stojnić, Yannis Kalantidis, Giorgos Tolias

CVPR 2024arXiv:2404.04072
28
citations
#1282

AutoPresent: Designing Structured Visuals from Scratch

Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou et al.

CVPR 2025arXiv:2501.00912
28
citations
#1283

CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation

Jun Wang, Yuzhe Qin, Kaiming Kuang et al.

CVPR 2024arXiv:2402.14795
28
citations
#1284

FMA-Net: Flow-Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and Deblurring

Geunhyuk Youk, Jihyong Oh, Munchurl Kim

CVPR 2024arXiv:2401.03707
28
citations
#1285

HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data

Mengqi Zhang, Yang Fu, Zheng Ding et al.

CVPR 2024arXiv:2403.12011
28
citations
#1286

EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality

Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim

CVPR 2025arXiv:2411.15241
27
citations
#1287

M&M VTO: Multi-Garment Virtual Try-On and Editing

Luyang Zhu, Yingwei Li, Nan Liu et al.

CVPR 2024highlightarXiv:2406.04542
27
citations
#1288

Strong Transferable Adversarial Attacks via Ensembled Asymptotically Normal Distribution Learning

Zhengwei Fang, Rui Wang, Tao Huang et al.

CVPR 2024highlightarXiv:2209.11964
27
citations
#1289

Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching

Peng Xu, Zhiyu Xiang, Chengyu Qiao et al.

CVPR 2024arXiv:2306.15612
27
citations
#1290

Multi-modal Learning for Geospatial Vegetation Forecasting

Vitus Benson, Claire Robin, Christian Requena-Mesa et al.

CVPR 2024arXiv:2303.16198
27
citations
#1291

Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models

Jinjin Zhang, qiuyu Huang, Junjie Liu et al.

CVPR 2025arXiv:2503.18352
27
citations
#1292

CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos

Xinhao Liu, Jintong Li, Yicheng Jiang et al.

CVPR 2025arXiv:2411.17820
27
citations
#1293

Rethinking Interactive Image Segmentation with Low Latency High Quality and Diverse Prompts

Qin Liu, Jaemin Cho, Mohit Bansal et al.

CVPR 2024arXiv:2404.00741
27
citations
#1294

Distribution-aware Knowledge Prototyping for Non-exemplar Lifelong Person Re-identification

Kunlun Xu, Xu Zou, Yuxin Peng et al.

CVPR 2024
27
citations
#1295

Blind Image Quality Assessment Based on Geometric Order Learning

Nyeong-Ho Shin, Seon-Ho Lee, Chang-Su Kim

CVPR 2024
27
citations
#1296

A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs

Wangbo Zhao, Yizeng Han, Jiasheng Tang et al.

CVPR 2025arXiv:2412.03324
27
citations
#1297

Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation

Jiaming Liu, Ran Xu, Senqiao Yang et al.

CVPR 2024arXiv:2312.12480
27
citations
#1298

Sparse Global Matching for Video Frame Interpolation with Large Motion

Chunxu Liu, Guozhen Zhang, Rui Zhao et al.

CVPR 2024arXiv:2404.06913
27
citations
#1299

SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation

Claudia Cuttano, Gabriele Trivigno, Gabriele Rosi et al.

CVPR 2025highlightarXiv:2411.17646
27
citations
#1300

Generative Region-Language Pretraining for Open-Ended Object Detection

Chuang Lin, Yi Jiang, Lizhen Qu et al.

CVPR 2024arXiv:2403.10191
27
citations
#1301

UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory

Haiwen Diao, Bo Wan, Ying Zhang et al.

CVPR 2024arXiv:2308.14316
27
citations
#1302

OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows

Shufan Li, Konstantinos Kallidromitis, Akash Gokul et al.

CVPR 2025arXiv:2412.01169
27
citations
#1303

ProMark: Proactive Diffusion Watermarking for Causal Attribution

Vishal Asnani, John Collomosse, Tu Bui et al.

CVPR 2024arXiv:2403.09914
27
citations
#1304

Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization

Deng Li, Aming Wu, Yaowei Wang et al.

CVPR 2024arXiv:2402.18447
27
citations
#1305

AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion

Mingzhen Sun, Weining Wang, Li et al.

CVPR 2025arXiv:2503.07418
27
citations
#1306

Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models

Jiacong Xu, Shao-Yuan Lo, Bardia Safaei et al.

CVPR 2025highlightarXiv:2502.07601
27
citations
#1307

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

Wenyi Hong, Yean Cheng, Zhuoyi Yang et al.

CVPR 2025arXiv:2501.02955
27
citations
#1308

Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles

Vanessa Sklyarova, Egor Zakharov, Otmar Hilliges et al.

CVPR 2024arXiv:2312.11666
27
citations
#1309

Audio-Visual Segmentation via Unlabeled Frame Exploitation

Jinxiang Liu, Yikun Liu, Ferenas et al.

CVPR 2024arXiv:2403.11074
27
citations
#1310

VideoGigaGAN: Towards Detail-rich Video Super-Resolution

Yiran Xu, Taesung Park, Richard Zhang et al.

CVPR 2025arXiv:2404.12388
27
citations
#1311

No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation

Xiangyang Zhu, Renrui Zhang, Bowei He et al.

CVPR 2024highlightarXiv:2404.04050
27
citations
#1312

Efficient Visual State Space Model for Image Deblurring

Lingshun Kong, Jiangxin Dong, Jinhui Tang et al.

CVPR 2025arXiv:2405.14343
27
citations
#1313

KPConvX: Modernizing Kernel Point Convolution with Kernel Attention

Hugues Thomas, Yao-Hung Hubert Tsai, Timothy Barfoot et al.

CVPR 2024arXiv:2405.13194
27
citations
#1314

Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation

Ming Xu, Stephen Gould

CVPR 2024arXiv:2404.01518
27
citations
#1315

Synthesize Step-by-Step: Tools Templates and LLMs as Data Generators for Reasoning-Based Chart VQA

Zhuowan Li, Bhavan Jasani, Peng Tang et al.

CVPR 2024arXiv:2403.16385
27
citations
#1316

DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis

Yuming Gu, Hongyi Xu, You Xie et al.

CVPR 2024highlightarXiv:2312.13016
27
citations
#1317

Estimating Body and Hand Motion in an Ego‑sensed World

Brent Yi, Vickie Ye, Maya Zheng et al.

CVPR 2025highlightarXiv:2410.03665
27
citations
#1318

VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents

Ryota Tanaka, Taichi Iki, Taku Hasegawa et al.

CVPR 2025arXiv:2504.09795
27
citations
#1319

T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation

Lijun Li, Zhelun Shi, Xuhao Hu et al.

CVPR 2025arXiv:2501.12612
27
citations
#1320

TIM: A Time Interval Machine for Audio-Visual Action Recognition

Jacob Chalk, Jaesung Huh, Evangelos Kazakos et al.

CVPR 2024arXiv:2404.05559
27
citations
#1321

Dispel Darkness for Better Fusion: A Controllable Visual Enhancer based on Cross-modal Conditional Adversarial Learning

HAO ZHANG, Linfeng Tang, Xinyu Xiang et al.

CVPR 2024
27
citations
#1322

Sparse Voxels Rasterization: Real-time High-fidelity Radiance Field Rendering

Cheng Sun, Jaesung Choe, Charles Loop et al.

CVPR 2025arXiv:2412.04459
27
citations
#1323

Learning Correlation Structures for Vision Transformers

Manjin Kim, Paul Hongsuck Seo, Cordelia Schmid et al.

CVPR 2024arXiv:2404.03924
27
citations
#1324

Frequency Dynamic Convolution for Dense Image Prediction

Linwei Chen, Lin Gu, Liang Li et al.

CVPR 2025arXiv:2503.18783
27
citations
#1325

Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline

Junlong Cheng, Bin Fu, Jin Ye et al.

CVPR 2025arXiv:2411.12814
27
citations
#1326

Motion2VecSets: 4D Latent Vector Set Diffusion for Non-rigid Shape Reconstruction and Tracking

Wei Cao, Chang Luo, Biao Zhang et al.

CVPR 2024arXiv:2401.06614
27
citations
#1327

Your ViT is Secretly an Image Segmentation Model

Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans et al.

CVPR 2025highlightarXiv:2503.19108
26
citations
#1328

MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation

Petru-Daniel Tudosiu, Yongxin Yang, Shifeng Zhang et al.

CVPR 2024arXiv:2404.02790
26
citations
#1329

Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation

Jin Wang, Bingfeng Zhang, Jian Pang et al.

CVPR 2024arXiv:2405.08458
26
citations
#1330

GALA: Generating Animatable Layered Assets from a Single Scan

Taeksoo Kim, Byungjun Kim, Shunsuke Saito et al.

CVPR 2024arXiv:2401.12979
26
citations
#1331

MagicQuill: An Intelligent Interactive Image Editing System

Zichen Liu, Yue Yu, Hao Ouyang et al.

CVPR 2025arXiv:2411.09703
26
citations
#1332

XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?

Fengxiang Wang, hongzhen wang, Zonghao Guo et al.

CVPR 2025highlightarXiv:2503.23771
26
citations
#1333

Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection

Zhen Qu, Xian Tao, Xinyi Gong et al.

CVPR 2025arXiv:2503.10080
26
citations
#1334

Generative Multi-modal Models are Good Class Incremental Learners

Xusheng Cao, Haori Lu, Linlan Huang et al.

CVPR 2024arXiv:2403.18383
26
citations
#1335

HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation

Ce Zhang, Simon Stepputtis, Joseph Campbell et al.

CVPR 2024arXiv:2403.12033
26
citations
#1336

FineVQ: Fine-Grained User Generated Content Video Quality Assessment

Huiyu Duan, Qiang Hu, Wang Jiarui et al.

CVPR 2025highlightarXiv:2412.19238
26
citations
#1337

DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes

Hao Yan, Zhihui Ke, Xiaobo Zhou et al.

CVPR 2024arXiv:2403.15679
26
citations
#1338

ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model

Shunlin Lu, Jingbo Wang, Zeyu Lu et al.

CVPR 2025arXiv:2412.14559
26
citations
#1339

Link-Context Learning for Multimodal LLMs

Yan Tai, Weichen Fan, Zhao Zhang et al.

CVPR 2024arXiv:2308.07891
26
citations
#1340

Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering

Zaid Khan, Yun Fu

CVPR 2024arXiv:2404.10193
26
citations
#1341

CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution

Xin Liu, Jie Liu, Jie Tang et al.

CVPR 2025arXiv:2503.06896
26
citations
#1342

Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

Qi Yang, Xing Nie, Tong Li et al.

CVPR 2024highlightarXiv:2312.06462
26
citations
#1343

Mirasol3B: A Multimodal Autoregressive Model for Time-Aligned and Contextual Modalities

AJ Piergiovanni, Isaac Noble, Dahun Kim et al.

CVPR 2024arXiv:2311.05698
26
citations
#1344

6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation

Li Xu, Haoxuan Qu, Yujun Cai et al.

CVPR 2024arXiv:2401.00029
26
citations
#1345

Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector

Xiao Guo, Xiufeng Song, Yue Zhang et al.

CVPR 2025arXiv:2503.20188
26
citations
#1346

SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System

Yunfei Fan, Tianyu Zhao, Guidong Wang

CVPR 2024arXiv:2312.01616
26
citations
#1347

Supervised Anomaly Detection for Complex Industrial Images

Aimira Baitieva, David Hurych, Victor Besnier et al.

CVPR 2024arXiv:2405.04953
26
citations
#1348

Learning CNN on ViT: A Hybrid Model to Explicitly Class-specific Boundaries for Domain Adaptation

Ba Hung Ngo, Nhat-Tuong Do-Tran, Tuan-Ngoc Nguyen et al.

CVPR 2024arXiv:2403.18360
26
citations
#1349

ZeroRF: Fast Sparse View 360° Reconstruction with Zero Pretraining

Ruoxi Shi, Xinyue Wei, Cheng Wang et al.

CVPR 2024arXiv:2312.09249
26
citations
#1350

Light3R-SfM: Towards Feed-forward Structure-from-Motion

Sven Elflein, Qunjie Zhou, Laura Leal-Taixe

CVPR 2025highlightarXiv:2501.14914
26
citations
#1351

UniGS: Unified Representation for Image Generation and Segmentation

Lu Qi, Lehan Yang, Weidong Guo et al.

CVPR 2024arXiv:2312.01985
26
citations
#1352

PointBeV: A Sparse Approach for BeV Predictions

Loick Chambon, Éloi Zablocki, Mickaël Chen et al.

CVPR 2024arXiv:2312.00703
26
citations
#1353

3D Human Pose Perception from Egocentric Stereo Videos

Hiroyasu Akada, Jian Wang, Vladislav Golyanik et al.

CVPR 2024highlightarXiv:2401.00889
26
citations
#1354

Delving into the Trajectory Long-tail Distribution for Muti-object Tracking

Sijia Chen, En Yu, Jinyang Li et al.

CVPR 2024arXiv:2403.04700
26
citations
#1355

StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image Translation

Sidi Wu, Yizi Chen, Loic Landrieu et al.

CVPR 2024arXiv:2403.20142
26
citations
#1356

MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding

Chun-Peng Chang, Shaoxiang Wang, Alain Pagani et al.

CVPR 2024arXiv:2403.03077
26
citations
#1357

Rethinking Boundary Discontinuity Problem for Oriented Object Detection

Hang Xu, Xinyuan Liu, Haonan Xu et al.

CVPR 2024arXiv:2305.10061
26
citations
#1358

Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection

Le Yang, Ziwei Zheng, Boxu Chen et al.

CVPR 2025arXiv:2412.13817
26
citations
#1359

Calibrated Multi-Preference Optimization for Aligning Diffusion Models

Kyungmin Lee, Xiaohang Li, Qifei Wang et al.

CVPR 2025arXiv:2502.02588
26
citations
#1360

DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing

Jia-Wei Liu, Yan-Pei Cao, Jay Zhangjie Wu et al.

CVPR 2024arXiv:2310.10624
26
citations
#1361

Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget

Vikash Sehwag, Xianghao Kong, Jingtao Li et al.

CVPR 2025arXiv:2407.15811
26
citations
#1362

FIRE: Robust Detection of Diffusion-Generated Images via Frequency-Guided Reconstruction Error

Beilin Chu, Xuan Xu, Xin Wang et al.

CVPR 2025arXiv:2412.07140
26
citations
#1363

Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond

Guanyao Wu, Haoyu Liu, Hongming Fu et al.

CVPR 2025arXiv:2503.01210
26
citations
#1364

Boosting Spike Camera Image Reconstruction from a Perspective of Dealing with Spike Fluctuations

Rui Zhao, Ruiqin Xiong, Jing Zhao et al.

CVPR 2024
26
citations
#1365

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

Alejandro Lozano, Min Woo Sun, James Burgess et al.

CVPR 2025arXiv:2501.07171
26
citations
#1366

MAPSeg: Unified Unsupervised Domain Adaptation for Heterogeneous Medical Image Segmentation Based on 3D Masked Autoencoding and Pseudo-Labeling

Xuzhe Zhang, Yuhao Wu, Elsa Angelini et al.

CVPR 2024arXiv:2303.09373
26
citations
#1367

IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing

Shaofei Wang, Bozidar Antic, Andreas Geiger et al.

CVPR 2024arXiv:2312.05210
26
citations
#1368

PartDistill: 3D Shape Part Segmentation by Vision-Language Model Distillation

Ardian Umam, Cheng-Kun Yang, Min-Hung Chen et al.

CVPR 2024arXiv:2312.04016
26
citations
#1369

Seeing the World through Your Eyes

Hadi Alzayer, Kevin Zhang, Brandon Y. Feng et al.

CVPR 2024arXiv:2306.09348
26
citations
#1370

CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement

Yun Liu, Chengwen Zhang, Ruofan Xing et al.

CVPR 2025arXiv:2406.19353
26
citations
#1371

AffordDP: Generalizable Diffusion Policy with Transferable Affordance

Shijie Wu, Yihang Zhu, Yunao Huang et al.

CVPR 2025arXiv:2412.03142
26
citations
#1372

GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding

Chengyao Wang, Li Jiang, Xiaoyang Wu et al.

CVPR 2024arXiv:2403.09639
26
citations
#1373

Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift

Siyuan Liang, Jiawei Liang, Tianyu Pang et al.

CVPR 2025arXiv:2406.18844
26
citations
#1374

MimicDiffusion: Purifying Adversarial Perturbation via Mimicking Clean Diffusion Model

Kaiyu Song, Hanjiang Lai, Yan Pan et al.

CVPR 2024arXiv:2312.04802
26
citations
#1375

MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection

Jakub Micorek, Horst Possegger, Dominik Narnhofer et al.

CVPR 2024arXiv:2403.14497
26
citations
#1376

LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis

Hanlin Wang, Hao Ouyang, Qiuyu Wang et al.

CVPR 2025highlightarXiv:2412.15214
26
citations
#1377

CPR: Retrieval Augmented Generation for Copyright Protection

Aditya Golatkar, Alessandro Achille, Luca Zancato et al.

CVPR 2024arXiv:2403.18920
26
citations
#1378

Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

Shengqu Cai, Duygu Ceylan, Matheus Gadelha et al.

CVPR 2024arXiv:2312.01409
26
citations
#1379

360+x: A Panoptic Multi-modal Scene Understanding Dataset

Hao Chen, Yuqi Hou, Chenyuan Qu et al.

CVPR 2024arXiv:2404.00989
25
citations
#1380

Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment

Muhammad Sohail Danish, Muhammad Haris Khan, Muhammad Akhtar Munir et al.

CVPR 2024arXiv:2405.14497
25
citations
#1381

Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces

Chenyangguang Zhang, Alexandros Delitzas, Fangjinhua Wang et al.

CVPR 2025highlightarXiv:2503.19199
25
citations
#1382

Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding

feilong tang, Chengzhi Liu, Zhongxing Xu et al.

CVPR 2025arXiv:2505.16652
25
citations
#1383

MANUS: Markerless Grasp Capture using Articulated 3D Gaussians

Chandradeep Pokhariya, Ishaan Shah, Angela Xing et al.

CVPR 2024arXiv:2312.02137
25
citations
#1384

Diffusion Time-step Curriculum for One Image to 3D Generation

YI Xuanyu, Zike Wu, Qingshan Xu et al.

CVPR 2024arXiv:2404.04562
25
citations
#1385

Permutation Equivariance of Transformers and Its Applications

Hengyuan Xu, Liyao Xiang, Hangyu Ye et al.

CVPR 2024arXiv:2304.07735
25
citations
#1386

Context-based and Diversity-driven Specificity in Compositional Zero-Shot Learning

Yun Li, Zhe Liu, Hang Chen et al.

CVPR 2024arXiv:2402.17251
25
citations
#1387

Unifying Top-down and Bottom-up Scanpath Prediction Using Transformers

Zhibo Yang, Sounak Mondal, Seoyoung Ahn et al.

CVPR 2024arXiv:2303.09383
25
citations
#1388

SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting

Gyeongjin Kang, Jisang Yoo, Jihyeon Park et al.

CVPR 2025arXiv:2411.17190
25
citations
#1389

Perception-Oriented Video Frame Interpolation via Asymmetric Blending

Guangyang Wu, Xin Tao, Changlin Li et al.

CVPR 2024arXiv:2404.06692
25
citations
#1390

Curriculum Point Prompting for Weakly-Supervised Referring Image Segmentation

Qiyuan Dai, Sibei Yang

CVPR 2024arXiv:2404.11998
25
citations
#1391

Multi-Modal Proxy Learning Towards Personalized Visual Multiple Clustering

Jiawei Yao, Qi Qian, Juhua Hu

CVPR 2024arXiv:2404.15655
25
citations
#1392

Attention Distillation: A Unified Approach to Visual Characteristics Transfer

Yang Zhou, Xu Gao, Zichong Chen et al.

CVPR 2025arXiv:2502.20235
25
citations
#1393

SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation

Yunxiang Fu, Meng Lou, Yizhou Yu

CVPR 2025arXiv:2412.11890
25
citations
#1394

FastMAC: Stochastic Spectral Sampling of Correspondence Graph

Yifei Zhang, Hao Zhao, Hongyang Li et al.

CVPR 2024arXiv:2403.08770
25
citations
#1395

Image-to-Image Matching via Foundation Models: A New Perspective for Open-Vocabulary Semantic Segmentation

Yuan Wang, Rui Sun, Naisong Luo et al.

CVPR 2024arXiv:2404.00262
25
citations
#1396

CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models

Felix Taubner, Ruihang Zhang, Mathieu Tuli et al.

CVPR 2025arXiv:2412.12093
25
citations
#1397

Towards Open-Vocabulary Audio-Visual Event Localization

Jinxing Zhou, Dan Guo, Ruohao Guo et al.

CVPR 2025arXiv:2411.11278
25
citations
#1398

Federated Generalized Category Discovery

Nan Pu, Wenjing Li, Xinyuan Ji et al.

CVPR 2024arXiv:2305.14107
25
citations
#1399

Dual DETRs for Multi-Label Temporal Action Detection

Yuhan Zhu, Guozhen Zhang, Jing Tan et al.

CVPR 2024arXiv:2404.00653
25
citations
#1400

Multi-Object Tracking in the Dark

Xinzhe Wang, Kang Ma, Qiankun Liu et al.

CVPR 2024arXiv:2405.06600
25
citations