Most Cited CVPR "glycemic response modeling" Papers

5,589 papers found • Page 5 of 28

#801

VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging

Yufan He, Pengfei Guo, Yucheng Tang et al.

CVPR 2025arXiv:2406.05285
43
citations
#802

LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding

Hongyu Li, Jinyu Chen, Ziyu Wei et al.

CVPR 2025arXiv:2501.08282
43
citations
#803

SemCity: Semantic Scene Generation with Triplane Diffusion

Jumin Lee, Sebin Lee, Changho Jo et al.

CVPR 2024arXiv:2403.07773
43
citations
#804

Mask Grounding for Referring Image Segmentation

Yong Xien Chng, Henry Zheng, Yizeng Han et al.

CVPR 2024arXiv:2312.12198
43
citations
#805

Fair Federated Learning under Domain Skew with Local Consistency and Domain Diversity

Yuhang Chen, Wenke Huang, Mang Ye

CVPR 2024arXiv:2405.16585
43
citations
#806

5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks

Dongshuo Yin, Leiyi Hu, Bin Li et al.

CVPR 2025arXiv:2408.08345
43
citations
#807

LEMON: Learning 3D Human-Object Interaction Relation from 2D Images

Yuhang Yang, Wei Zhai, Hongchen Luo et al.

CVPR 2024arXiv:2312.08963
43
citations
#808

Readout Guidance: Learning Control from Diffusion Features

Grace Luo, Trevor Darrell, Oliver Wang et al.

CVPR 2024highlightarXiv:2312.02150
43
citations
#809

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Yuqian Yuan, Hang Zhang, Wentong Li et al.

CVPR 2025arXiv:2501.00599
43
citations
#810

EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning

Hongxia Xie, Chu-Jun Peng, Yu-Wen Tseng et al.

CVPR 2024arXiv:2404.16670
43
citations
#811

OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation

Hui Li, Mingwang Xu, Qingkun Su et al.

CVPR 2025highlightarXiv:2412.00115
43
citations
#812

Perception Tokens Enhance Visual Reasoning in Multimodal Language Models

Mahtab Bigverdi, Zelun Luo, Cheng-Yu Hsieh et al.

CVPR 2025arXiv:2412.03548
43
citations
#813

360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model

Qian Wang, Weiqi Li, Chong Mou et al.

CVPR 2024arXiv:2401.06578
43
citations
#814

SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding

Rong Li, Shijie Li, Lingdong Kong et al.

CVPR 2025arXiv:2412.04383
43
citations
#815

Towards Robust Event-guided Low-Light Image Enhancement: A Large-Scale Real-World Event-Image Dataset and Novel Approach

Guoqiang Liang, Kanghao Chen, Hangyu Li et al.

CVPR 2024arXiv:2404.00834
43
citations
#816

A Distractor-Aware Memory for Visual Object Tracking with SAM2

Alan Lukezic, Jovana Videnović, Matej Kristan

CVPR 2025arXiv:2411.17576
42
citations
#817

FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization

Shuai Tan, Bin Ji, Ye Pan

CVPR 2024arXiv:2403.06375
42
citations
#818

Language Models as Black-Box Optimizers for Vision-Language Models

Shihong Liu, Samuel Yu, Zhiqiu Lin et al.

CVPR 2024arXiv:2309.05950
42
citations
#819

3D Face Reconstruction with the Geometric Guidance of Facial Part Segmentation

Zidu Wang, Xiangyu Zhu, Tianshuo Zhang et al.

CVPR 2024highlightarXiv:2312.00311
42
citations
#820

OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning

Noor Ahmed, Anna Kukleva, Bernt Schiele

CVPR 2024highlightarXiv:2403.18550
42
citations
#821

GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control

Mariam Hassan, Sebastian Stapf, Ahmad Rahimi et al.

CVPR 2025arXiv:2412.11198
42
citations
#822

vid-TLDR: Training Free Token Merging for Light-weight Video Transformer

Joonmyung Choi, Sanghyeok Lee, Jaewon Chu et al.

CVPR 2024arXiv:2403.13347
42
citations
#823

HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances

Supreeth Narasimhaswamy, Uttaran Bhattacharya, Xiang Chen et al.

CVPR 2024arXiv:2403.01693
42
citations
#824

Balancing Act: Distribution-Guided Debiasing in Diffusion Models

Rishubh Parihar, Abhijnya Bhat, Abhipsa Basu et al.

CVPR 2024arXiv:2402.18206
42
citations
#825

PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation

Qiyao Xue, Xiangyu Yin, Boyuan Yang et al.

CVPR 2025arXiv:2412.00596
42
citations
#826

Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion

Kiran Chhatre, Radek Danecek, Nikos Athanasiou et al.

CVPR 2024arXiv:2312.04466
42
citations
#827

SeD: Semantic-Aware Discriminator for Image Super-Resolution

Bingchen Li, Xin Li, Hanxin Zhu et al.

CVPR 2024arXiv:2402.19387
42
citations
#828

A Vision Check-up for Language Models

Pratyusha Sharma, Tamar Rott Shaham, Manel Baradad et al.

CVPR 2024highlightarXiv:2401.01862
42
citations
#829

Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization

Zhanhao Liang, Yuhui Yuan, Shuyang Gu et al.

CVPR 2025arXiv:2406.04314
42
citations
#830

TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution

linwei dong, Qingnan Fan, Yihong Guo et al.

CVPR 2025arXiv:2411.18263
42
citations
#831

Investigating and Mitigating the Side Effects of Noisy Views for Self-Supervised Clustering Algorithms in Practical Multi-View Scenarios

Jie Xu, Yazhou Ren, Xiaolong Wang et al.

CVPR 2024arXiv:2303.17245
42
citations
#832

CAGE: Controllable Articulation GEneration

Jiayi Liu, Hou In Ivan Tam, Ali Mahdavi Amiri et al.

CVPR 2024arXiv:2312.09570
42
citations
#833

LP++: A Surprisingly Strong Linear Probe for Few-Shot CLIP

Yunshi HUANG, Fereshteh Shakeri, Jose Dolz et al.

CVPR 2024arXiv:2404.02285
42
citations
#834

Parallelized Autoregressive Visual Generation

Yuqing Wang, Shuhuai Ren, Zhijie Lin et al.

CVPR 2025highlightarXiv:2412.15119
42
citations
#835

Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features

Niladri Shekhar Dutt, Sanjeev Muralikrishnan, Niloy J. Mitra

CVPR 2024arXiv:2311.17024
42
citations
#836

SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing

Zeyinzi Jiang, Chaojie Mao, Yulin Pan et al.

CVPR 2024highlightarXiv:2312.11392
42
citations
#837

ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning

David Junhao Zhang, Roni Paiss, Shiran Zada et al.

CVPR 2025arXiv:2411.05003
42
citations
#838

Prompt Learning via Meta-Regularization

Jinyoung Park, Juyeon Ko, Hyunwoo J. Kim

CVPR 2024arXiv:2404.00851
42
citations
#839

DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars

Tobias Kirschstein, Simon Giebenhain, Matthias Nießner

CVPR 2024arXiv:2311.18635
42
citations
#840

SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer

Rui Zhu, Yingwei Pan, Yehao Li et al.

CVPR 2024arXiv:2403.17004
42
citations
#841

Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment

Zheren Fu, Lei Zhang, Hou Xia et al.

CVPR 2024
41
citations
#842

Diffusion Self-Distillation for Zero-Shot Customized Image Generation

Shengqu Cai, Eric Ryan Chan, Yunzhi Zhang et al.

CVPR 2025arXiv:2411.18616
41
citations
#843

PAD: Patch-Agnostic Defense against Adversarial Patch Attacks

Lihua Jing, Rui Wang, Wenqi Ren et al.

CVPR 2024arXiv:2404.16452
41
citations
#844

Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method

Xinshuai Song, weixing chen, Yang Liu et al.

CVPR 2025arXiv:2412.09082
41
citations
#845

OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels

Meng Lou, Yizhou Yu

CVPR 2025arXiv:2502.20087
41
citations
#846

Transfer CLIP for Generalizable Image Denoising

Jun Cheng, Dong Liang, Shan Tan

CVPR 2024arXiv:2403.15132
41
citations
#847

Generative Proxemics: A Prior for 3D Social Interaction from Images

Vickie Ye, Vickie Ye, Georgios Pavlakos et al.

CVPR 2024arXiv:2306.09337
41
citations
#848

UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

Hang Yin, Xiuwei Xu, Linqing Zhao et al.

CVPR 2025arXiv:2503.10630
41
citations
#849

Towards Efficient Replay in Federated Incremental Learning

Yichen Li, Qunwei Li, Haozhao Wang et al.

CVPR 2024arXiv:2403.05890
41
citations
#850

Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning

Zhiyuan Yan, Yandan Zhao, Shen Chen et al.

CVPR 2025arXiv:2408.17065
41
citations
#851

Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities

Mingcheng Li, Dingkang Yang, Xiao Zhao et al.

CVPR 2024arXiv:2404.16456
41
citations
#852

Test-Time Domain Generalization for Face Anti-Spoofing

Qianyu Zhou, Ke-Yue Zhang, Taiping Yao et al.

CVPR 2024arXiv:2403.19334
41
citations
#853

MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation

Zehuan Huang, Yuanchen Guo, Xingqiao An et al.

CVPR 2025arXiv:2412.03558
41
citations
#854

Context-Aware Integration of Language and Visual References for Natural Language Tracking

Yanyan Shao, Shuting He, Qi Ye et al.

CVPR 2024arXiv:2403.19975
41
citations
#855

Re-thinking Temporal Search for Long-Form Video Understanding

Jinhui Ye, Zihan Wang, Haosen Sun et al.

CVPR 2025arXiv:2504.02259
41
citations
#856

DiffAvatar: Simulation-Ready Garment Optimization with Differentiable Simulation

Yifei Li, Hsiaoyu Chen, Egor Larionov et al.

CVPR 2024arXiv:2311.12194
41
citations
#857

FastVLM: Efficient Vision Encoding for Vision Language Models

Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li et al.

CVPR 2025arXiv:2412.13303
41
citations
#858

A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames

Pinelopi Papalampidi, Skanda Koppula, Shreya Pathak et al.

CVPR 2024arXiv:2312.07395
41
citations
#859

Image Sculpting: Precise Object Editing with 3D Geometry Control

Jiraphon Yenphraphai, Xichen Pan, Sainan Liu et al.

CVPR 2024arXiv:2401.01702
41
citations
#860

SD4Match: Learning to Prompt Stable Diffusion Model for Semantic Matching

Xinghui Li, Jingyi Lu, Kai Han et al.

CVPR 2024arXiv:2310.17569
41
citations
#861

Context-Guided Spatio-Temporal Video Grounding

Xin Gu, Heng Fan, Yan Huang et al.

CVPR 2024arXiv:2401.01578
41
citations
#862

ID-like Prompt Learning for Few-Shot Out-of-Distribution Detection

Yichen Bai, Zongbo Han, Bing Cao et al.

CVPR 2024arXiv:2311.15243
41
citations
#863

MuseChat: A Conversational Music Recommendation System for Videos

Zhikang Dong, Bin Chen, Xiulong Liu et al.

CVPR 2024highlightarXiv:2310.06282
41
citations
#864

GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction

Yuanhui Huang, Amonnut Thammatadatrakoon, Wenzhao Zheng et al.

CVPR 2025arXiv:2412.04384
40
citations
#865

HD-EPIC: A Highly-Detailed Egocentric Video Dataset

Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha et al.

CVPR 2025arXiv:2502.04144
40
citations
#866

3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer

Jiajun Deng, Tianyu He, Li Jiang et al.

CVPR 2025arXiv:2501.01163
40
citations
#867

SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors

Dave Zhenyu Chen, Haoxuan Li, Hsin-Ying Lee et al.

CVPR 2024highlightarXiv:2311.17261
40
citations
#868

Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval

Minkuk Kim, Hyeon Bae Kim, Jinyoung Moon et al.

CVPR 2024arXiv:2404.07610
40
citations
#869

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

Junbo Niu, Yifei Li, Ziyang Miao et al.

CVPR 2025arXiv:2501.05510
40
citations
#870

Rethinking Diffusion for Text-Driven Human Motion Generation: Redundant Representations, Evaluation, and Masked Autoregression

Zichong Meng, Yiming Xie, Xiaogang Peng et al.

CVPR 2025arXiv:2411.16575
40
citations
#871

Taming Mode Collapse in Score Distillation for Text-to-3D Generation

Peihao Wang, Dejia Xu, Zhiwen Fan et al.

CVPR 2024arXiv:2401.00909
40
citations
#872

NOPE: Novel Object Pose Estimation from a Single Image

Van Nguyen Nguyen, Thibault Groueix, Georgy Ponimatkin et al.

CVPR 2024arXiv:2303.13612
40
citations
#873

Revisiting Single Image Reflection Removal In the Wild

Yurui Zhu, Bo Li, Xueyang Fu et al.

CVPR 2024arXiv:2311.17320
40
citations
#874

Multi-subject Open-set Personalization in Video Generation

Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace et al.

CVPR 2025arXiv:2501.06187
40
citations
#875

AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation

Qingping SUN, Yanjun Wang, Ailing Zeng et al.

CVPR 2024arXiv:2403.17934
40
citations
#876

DEFOM-Stereo: Depth Foundation Model Based Stereo Matching

Hualie Jiang, Zhiqiang Lou, Laiyan Ding et al.

CVPR 2025arXiv:2501.09466
40
citations
#877

Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On

Xu Yang, Changxing Ding, Zhibin Hong et al.

CVPR 2024arXiv:2404.01089
40
citations
#878

FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models

Jinglin Xu, Yijie Guo, Yuxin Peng

CVPR 2024highlightarXiv:2405.05216
40
citations
#879

Friendly Sharpness-Aware Minimization

Tao Li, Pan Zhou, Zhengbao He et al.

CVPR 2024arXiv:2403.12350
40
citations
#880

PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting

Alex Hanson, Allen Tu, Vasu Singla et al.

CVPR 2025arXiv:2406.10219
40
citations
#881

NoiseCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions in Diffusion Models

Yusuf Dalva, Pinar Yanardag

CVPR 2024arXiv:2312.05390
40
citations
#882

UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes

David Rozenberszki, Or Litany, Angela Dai

CVPR 2024arXiv:2303.14541
40
citations
#883

Gradient Reweighting: Towards Imbalanced Class-Incremental Learning

Jiangpeng He

CVPR 2024arXiv:2402.18528
40
citations
#884

Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity

Huaxin Zhang, Xiaohao Xu, Xiang Wang et al.

CVPR 2025highlightarXiv:2412.06171
40
citations
#885

Video-Guided Foley Sound Generation with Multimodal Controls

Ziyang Chen, Prem Seetharaman, Bryan Russell et al.

CVPR 2025arXiv:2411.17698
40
citations
#886

ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification

Jiangbo Shi, Chen Li, Tieliang Gong et al.

CVPR 2024arXiv:2502.08391
40
citations
#887

BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning

Ruyang Liu, Chen Li, Yixiao Ge et al.

CVPR 2024arXiv:2309.15785
39
citations
#888

ZeroShape: Regression-based Zero-shot Shape Reconstruction

Zixuan Huang, Stefan Stojanov, Anh Thai et al.

CVPR 2024arXiv:2312.14198
39
citations
#889

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization

Guopeng Li, Ming Qian, Gui-Song Xia

CVPR 2024arXiv:2403.14198
39
citations
#890

DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention

Lianghui Zhu, Zilong Huang, Bencheng Liao et al.

CVPR 2025arXiv:2405.18428
39
citations
#891

OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning

Siddharth Srivastava, Gaurav Sharma

CVPR 2024arXiv:2507.13364
39
citations
#892

Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models

Daniel Geng, Inbum Park, Andrew Owens

CVPR 2024arXiv:2311.17919
39
citations
#893

Portrait4D: Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data

Yu Deng, Duomin Wang, Xiaohang Ren et al.

CVPR 2024arXiv:2311.18729
39
citations
#894

Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception

Junwen He, Yifan Wang, Lijun Wang et al.

CVPR 2024highlightarXiv:2403.02969
39
citations
#895

Can Protective Perturbation Safeguard Personal Data from Being Exploited by Stable Diffusion?

Zhengyue Zhao, Jinhao Duan, Kaidi Xu et al.

CVPR 2024arXiv:2312.00084
39
citations
#896

Number it: Temporal Grounding Videos like Flipping Manga

Yongliang Wu, Xinting Hu, Yuyang Sun et al.

CVPR 2025arXiv:2411.10332
39
citations
#897

Depth-aware Test-Time Training for Zero-shot Video Object Segmentation

Weihuang Liu, Xi Shen, Haolun Li et al.

CVPR 2024arXiv:2403.04258
39
citations
#898

DuPL: Dual Student with Trustworthy Progressive Learning for Robust Weakly Supervised Semantic Segmentation

Yuanchen Wu, Xichen Ye, KequanYang et al.

CVPR 2024arXiv:2403.11184
39
citations
#899

Text-Driven Image Editing via Learnable Regions

Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai et al.

CVPR 2024arXiv:2311.16432
39
citations
#900

DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception

Yibo Wang, Ruiyuan Gao, Kai Chen et al.

CVPR 2024arXiv:2403.13304
39
citations
#901

Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

Wenbin An, Feng Tian, Sicong Leng et al.

CVPR 2025arXiv:2406.12718
39
citations
#902

GLACE: Global Local Accelerated Coordinate Encoding

Fangjinhua Wang, Xudong Jiang, Silvano Galliani et al.

CVPR 2024arXiv:2406.04340
39
citations
#903

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

Mengqi Huang, Zhendong Mao, Mingcong Liu et al.

CVPR 2024arXiv:2403.00483
39
citations
#904

DrVideo: Document Retrieval Based Long Video Understanding

Ziyu Ma, Chenhui Gou, Hengcan Shi et al.

CVPR 2025arXiv:2406.12846
39
citations
#905

SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction

Yang Zhou, Hao Shao, Letian Wang et al.

CVPR 2024arXiv:2403.11492
39
citations
#906

Communication-Efficient Federated Learning with Accelerated Client Gradient

Geeho Kim, Jinkyu Kim, Bohyung Han

CVPR 2024arXiv:2201.03172
39
citations
#907

HomoFormer: Homogenized Transformer for Image Shadow Removal

Jie Xiao, Xueyang Fu, Yurui Zhu et al.

CVPR 2024
39
citations
#908

Learning Occupancy for Monocular 3D Object Detection

Liang Peng, Junkai Xu, Haoran Cheng et al.

CVPR 2024arXiv:2305.15694
39
citations
#909

MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration

Boyun Li, Haiyu Zhao, Wenxin Wang et al.

CVPR 2025arXiv:2412.20066
39
citations
#910

Incremental Residual Concept Bottleneck Models

Chenming Shang, Shiji Zhou, Hengyuan Zhang et al.

CVPR 2024arXiv:2404.08978
39
citations
#911

Learning Diffusion Texture Priors for Image Restoration

Tian Ye, Sixiang Chen, Wenhao Chai et al.

CVPR 2024highlight
39
citations
#912

HIR-Diff: Unsupervised Hyperspectral Image Restoration Via Improved Diffusion Models

Li Pang, Xiangyu Rui, Long Cui et al.

CVPR 2024arXiv:2402.15865
39
citations
#913

Unbiased Faster R-CNN for Single-source Domain Generalized Object Detection

Yajing Liu, Shijun Zhou, Xiyao Liu et al.

CVPR 2024highlightarXiv:2405.15225
39
citations
#914

Score-Guided Diffusion for 3D Human Recovery

Anastasis Stathopoulos, Ligong Han, Dimitris N. Metaxas

CVPR 2024arXiv:2403.09623
39
citations
#915

Enhancing Video Super-Resolution via Implicit Resampling-based Alignment

Kai Xu, Ziwei Yu, Xin Wang et al.

CVPR 2024highlightarXiv:2305.00163
39
citations
#916

DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models

Keda Tao, Can Qin, Haoxuan You et al.

CVPR 2025arXiv:2411.15024
39
citations
#917

Mitigating Motion Blur in Neural Radiance Fields with Events and Frames

Marco Cannici, Davide Scaramuzza

CVPR 2024arXiv:2403.19780
38
citations
#918

UnO: Unsupervised Occupancy Fields for Perception and Forecasting

Ben Agro, Quinlan Sykora, Sergio Casas et al.

CVPR 2024arXiv:2406.08691
38
citations
#919

Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection

Jiawen Zhu, Choubo Ding, Yu Tian et al.

CVPR 2024arXiv:2310.12790
38
citations
#920

AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring

Xintian Mao, Xiwen Gao, Yan Wang

CVPR 2024arXiv:2406.09135
38
citations
#921

DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior

Tianyu Huang, Yihan Zeng, Zhilu Zhang et al.

CVPR 2024arXiv:2312.06439
38
citations
#922

Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives

Alex Hanson, Allen Tu, Geng Lin et al.

CVPR 2025arXiv:2412.00578
38
citations
#923

Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance

Dazhong Shen, Guanglu Song, Zeyue Xue et al.

CVPR 2024arXiv:2404.05384
38
citations
#924

SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving

Georg Hess, Carl Lindström, Maryam Fatemi et al.

CVPR 2025arXiv:2411.16816
38
citations
#925

Vision-Language Models Do Not Understand Negation

Kumail Alhamoud, Shaden Alshammari, Yonglong Tian et al.

CVPR 2025arXiv:2501.09425
38
citations
#926

Prompt Highlighter: Interactive Control for Multi-Modal LLMs

Yuechen Zhang, Shengju Qian, Bohao Peng et al.

CVPR 2024arXiv:2312.04302
38
citations
#927

CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification

Haoran Lai, Qingsong Yao, Zihang Jiang et al.

CVPR 2024arXiv:2402.17417
38
citations
#928

VicTR: Video-conditioned Text Representations for Activity Recognition

Kumara Kahatapitiya, Anurag Arnab, Arsha Nagrani et al.

CVPR 2024arXiv:2304.02560
38
citations
#929

Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key

Zhihe Yang, Xufang Luo, Dongqi Han et al.

CVPR 2025arXiv:2501.09695
38
citations
#930

Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving

JunDa Cheng, Wei Yin, Kaixuan Wang et al.

CVPR 2024arXiv:2403.07535
38
citations
#931

Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

Fangfu Liu, Diankun Wu, Yi Wei et al.

CVPR 2024arXiv:2312.06655
38
citations
#932

PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness

Anh-Quan Cao, Angela Dai, Raoul de Charette

CVPR 2024arXiv:2312.02158
38
citations
#933

Multi-view Aggregation Network for Dichotomous Image Segmentation

Qian Yu, Xiaoqi Zhao, Youwei Pang et al.

CVPR 2024highlightarXiv:2404.07445
38
citations
#934

Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion

Lucas Nunes, Rodrigo Marcuzzi, Benedikt Mersch et al.

CVPR 2024arXiv:2403.13470
38
citations
#935

ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models

Xubing Ye, Yukang Gan, Yixiao Ge et al.

CVPR 2025arXiv:2412.00447
38
citations
#936

Dynamic Prompt Optimizing for Text-to-Image Generation

Wenyi Mo, Tianyu Zhang, Yalong Bai et al.

CVPR 2024arXiv:2404.04095
38
citations
#937

Content-Adaptive Non-Local Convolution for Remote Sensing Pansharpening

Yule Duan, Xiao Wu, Haoyu Deng et al.

CVPR 2024arXiv:2404.07543
38
citations
#938

How to Configure Good In-Context Sequence for Visual Question Answering

Li Li, Jiawei Peng, huiyi chen et al.

CVPR 2024arXiv:2312.01571
38
citations
#939

DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection

Yuhao Sun, Lingyun Yu, Hongtao Xie et al.

CVPR 2024arXiv:2405.09882
38
citations
#940

Unsupervised Universal Image Segmentation

XuDong Wang, Dantong Niu, Xinyang Han et al.

CVPR 2024arXiv:2312.17243
38
citations
#941

Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model

Zhicai Wang, Longhui Wei, Tan Wang et al.

CVPR 2024arXiv:2403.19600
38
citations
#942

Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models

Xin Li, Yunfei Wu, Xinghua Jiang et al.

CVPR 2024arXiv:2402.19014
38
citations
#943

TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models

Zhongwei Zhang, Fuchen Long, Yingwei Pan et al.

CVPR 2024arXiv:2403.17005
38
citations
#944

SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration

Jianyi Wang, Zhijie Lin, Meng Wei et al.

CVPR 2025highlightarXiv:2501.01320
38
citations
#945

Disentangled Prompt Representation for Domain Generalization

De Cheng, Zhipeng Xu, XINYANG JIANG et al.

CVPR 2024
37
citations
#946

DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Iterative Diffusion-Based Refinement

Jiuming Liu, Guangming Wang, Weicai Ye et al.

CVPR 2024
37
citations
#947

MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric

Haokun Lin, Haoli Bai, Zhili Liu et al.

CVPR 2024arXiv:2403.07839
37
citations
#948

SimAC: A Simple Anti-Customization Method for Protecting Face Privacy against Text-to-Image Synthesis of Diffusion Models

Feifei Wang, Zhentao Tan, Tianyi Wei et al.

CVPR 2024arXiv:2312.07865
37
citations
#949

SUGAR: Pre-training 3D Visual Representations for Robotics

Shizhe Chen, Ricardo Garcia Pinel, Ivan Laptev et al.

CVPR 2024arXiv:2404.01491
37
citations
#950

EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues

Sagar Soni, Akshay Dudhane, Hiyam Debary et al.

CVPR 2025arXiv:2412.15190
37
citations
#951

Question Aware Vision Transformer for Multimodal Reasoning

Roy Ganz, Yair Kittenplon, Aviad Aberdam et al.

CVPR 2024highlightarXiv:2402.05472
37
citations
#952

Action Scene Graphs for Long-Form Understanding of Egocentric Videos

Ivan Rodin, Antonino Furnari, Kyle Min et al.

CVPR 2024arXiv:2312.03391
37
citations
#953

Convolutional Prompting meets Language Models for Continual Learning

Anurag Roy, Riddhiman Moulick, Vinay Verma et al.

CVPR 2024arXiv:2403.20317
37
citations
#954

OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies

Lingdong Kong, Youquan Liu, Lai Xing Ng et al.

CVPR 2024highlightarXiv:2405.05259
37
citations
#955

MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation

Hanzhe Hu, Zhizhuo Zhou, Varun Jampani et al.

CVPR 2024arXiv:2404.03656
37
citations
#956

Towards Accurate Post-training Quantization for Diffusion Models

Changyuan Wang, Ziwei Wang, Xiuwei Xu et al.

CVPR 2024highlightarXiv:2305.18723
37
citations
#957

Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator

Chaehun Shin, Jooyoung Choi, Heeseung Kim et al.

CVPR 2025arXiv:2411.15466
37
citations
#958

FedSelect: Personalized Federated Learning with Customized Selection of Parameters for Fine-Tuning

Rishub Tamirisa, Chulin Xie, Wenxuan Bao et al.

CVPR 2024arXiv:2404.02478
37
citations
#959

Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding

seil kang, Jinyeong Kim, Junhyeok Kim et al.

CVPR 2025highlightarXiv:2503.06287
37
citations
#960

FairRAG: Fair Human Generation via Fair Retrieval Augmentation

Robik Shrestha, Yang Zou, Qiuyu Chen et al.

CVPR 2024arXiv:2403.19964
37
citations
#961

RobustSAM: Segment Anything Robustly on Degraded Images

Wei-Ting Chen, Yu Jiet Vong, Sy-Yen Kuo et al.

CVPR 2024highlightarXiv:2406.09627
37
citations
#962

MetaCloak: Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning

Yixin Liu, Chenrui Fan, Yutong Dai et al.

CVPR 2024arXiv:2311.13127
37
citations
#963

VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation

XuDong Wang, Ishan Misra, Ziyun Zeng et al.

CVPR 2024arXiv:2308.14710
37
citations
#964

ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation

Moayed Haji Ali, Guha Balakrishnan, Vicente Ordonez

CVPR 2024arXiv:2311.18822
37
citations
#965

ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association

Shuxiao Ding, Lukas Schneider, Marius Cordts et al.

CVPR 2024arXiv:2405.08909
37
citations
#966

GenZI: Zero-Shot 3D Human-Scene Interaction Generation

Lei Li, Angela Dai

CVPR 2024arXiv:2311.17737
37
citations
#967

GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo

Jiang Wu, Rui Li, Haofei Xu et al.

CVPR 2024arXiv:2404.07992
37
citations
#968

Towards General Visual-Linguistic Face Forgery Detection

Ke Sun, Shen Chen, Taiping Yao et al.

CVPR 2025arXiv:2307.16545
37
citations
#969

Contrastive Mean-Shift Learning for Generalized Category Discovery

Sua Choi, Dahyun Kang, Minsu Cho

CVPR 2024arXiv:2404.09451
37
citations
#970

Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning

Shiming Chen, Wenjin Hou, Salman Khan et al.

CVPR 2024arXiv:2404.07713
36
citations
#971

AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data Generation

Datao Tang, Xiangyong Cao, Xuan Wu et al.

CVPR 2025arXiv:2411.15497
36
citations
#972

Tactile-Augmented Radiance Fields

Yiming Dou, Fengyu Yang, Yi Liu et al.

CVPR 2024arXiv:2405.04534
36
citations
#973

Amodal Completion via Progressive Mixed Context Diffusion

Katherine Xu, Lingzhi Zhang, Jianbo Shi

CVPR 2024highlightarXiv:2312.15540
36
citations
#974

Gradient Alignment for Cross-Domain Face Anti-Spoofing

MINH BINH LE, Simon Woo

CVPR 2024arXiv:2402.18817
36
citations
#975

Situational Awareness Matters in 3D Vision Language Reasoning

Yunze Man, Liang-Yan Gui, Yu-Xiong Wang

CVPR 2024arXiv:2406.07544
36
citations
#976

Control4D: Efficient 4D Portrait Editing with Text

Ruizhi Shao, Jingxiang Sun, Cheng Peng et al.

CVPR 2024arXiv:2305.20082
36
citations
#977

IDOL: Instant Photorealistic 3D Human Creation from a Single Image

Yiyu Zhuang, Jiaxi Lv, Hao Wen et al.

CVPR 2025arXiv:2412.14963
36
citations
#978

Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds

Tianrui Lou, Xiaojun Jia, Jindong Gu et al.

CVPR 2024arXiv:2403.05247
36
citations
#979

ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions

Jeonghwan Kim, Jisoo Kim, Jeonghyeon Na et al.

CVPR 2025arXiv:2401.10232
36
citations
#980

Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models

Shweta Mahajan, Tanzila Rahman, Kwang Moo Yi et al.

CVPR 2024arXiv:2312.12416
36
citations
#981

Robust Emotion Recognition in Context Debiasing

Dingkang Yang, Kun Yang, Mingcheng Li et al.

CVPR 2024arXiv:2403.05963
36
citations
#982

GenFlow: Generalizable Recurrent Flow for 6D Pose Refinement of Novel Objects

Sungphill Moon, Hyeontae Son, Dongcheol Hur et al.

CVPR 2024arXiv:2403.11510
36
citations
#983

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

Rui Qian, Shuangrui Ding, Xiaoyi Dong et al.

CVPR 2025arXiv:2501.03218
36
citations
#984

VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge

Vishwesh Nath, Wenqi Li, Dong Yang et al.

CVPR 2025highlightarXiv:2411.12915
36
citations
#985

ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning

Kailin Li, Puhao Li, Tengyu Liu et al.

CVPR 2025arXiv:2503.21860
36
citations
#986

Distilling Semantic Priors from SAM to Efficient Image Restoration Models

Quan Zhang, Xiaoyu Liu, Wei Li et al.

CVPR 2024arXiv:2403.16368
36
citations
#987

StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models

Yunzhi Yan, Zhen Xu, Haotong Lin et al.

CVPR 2025arXiv:2412.13188
36
citations
#988

Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis

Bichen Wu, Ching-Yao Chuang, Xiaoyan Wang et al.

CVPR 2024arXiv:2312.13834
36
citations
#989

View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network

Quan Zhang, Lei Wang, Vishal M. Patel et al.

CVPR 2024arXiv:2403.14513
36
citations
#990

CosmicMan: A Text-to-Image Foundation Model for Humans

Shikai Li, Jianglin Fu, Kaiyuan Liu et al.

CVPR 2024highlightarXiv:2404.01294
36
citations
#991

Neural Redshift: Random Networks are not Random Functions

Damien Teney, Armand Nicolicioiu, Valentin Hartmann et al.

CVPR 2024arXiv:2403.02241
36
citations
#992

Interactive Continual Learning: Fast and Slow Thinking

Biqing Qi, Xinquan Chen, Junqi Gao et al.

CVPR 2024arXiv:2403.02628
36
citations
#993

Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark

Ziyang Chen, Israel D. Gebru, Christian Richardt et al.

CVPR 2024highlightarXiv:2403.18821
36
citations
#994

Equivariant Plug-and-Play Image Reconstruction

Matthieu Terris, Thomas Moreau, Nelly Pustelnik et al.

CVPR 2024arXiv:2312.01831
36
citations
#995

Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts

Cansu Korkmaz, Ahmet Murat Tekalp, Zafer Dogan

CVPR 2024arXiv:2402.19215
36
citations
#996

SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

Hao Chen, Ze Wang, Xiang Li et al.

CVPR 2025arXiv:2412.10958
36
citations
#997

Producing and Leveraging Online Map Uncertainty in Trajectory Prediction

Xunjiang Gu, Guanyu Song, Igor Gilitschenski et al.

CVPR 2024arXiv:2403.16439
36
citations
#998

LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models

Shenghao Fu, Qize Yang, Qijie Mo et al.

CVPR 2025highlightarXiv:2501.18954
35
citations
#999

Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors

Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy et al.

CVPR 2025arXiv:2503.17316
35
citations
#1000

ICP-Flow: LiDAR Scene Flow Estimation with ICP

Yancong Lin, Holger Caesar

CVPR 2024arXiv:2402.17351
35
citations