Most Cited CVPR "video caption understanding" Papers

5,589 papers found • Page 11 of 28

Filters:Most Cited CVPR video caption understanding Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

#2001

Diversified and Personalized Multi-rater Medical Image Segmentation

Yicheng Wu, Xiangde Luo, Zhe Xu et al.

CVPR 2024highlightarXiv:2403.13417

citations

#2002

Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices

Junyan Lin, Haoran Chen, Yue Fan et al.

CVPR 2025arXiv:2503.06063

citations

#2003

Make Me a BNN: A Simple Strategy for Estimating Bayesian Uncertainty from Pre-trained Models

Gianni Franchi, Olivier Laurent, Maxence Leguéry et al.

CVPR 2024arXiv:2312.15297

citations

#2004

Unlocking Pre-trained Image Backbones for Semantic Image Synthesis

Tariq Berrada, Jakob Verbeek, camille couprie et al.

CVPR 2024arXiv:2312.13314

citations

#2005

DiffSCI: Zero-Shot Snapshot Compressive Imaging via Iterative Spectral Diffusion Model

Zhenghao Pan, Haijin Zeng, Jiezhang Cao et al.

CVPR 2024arXiv:2311.11417

citations

#2006

MCPNet: An Interpretable Classifier via Multi-Level Concept Prototypes

Bor Shiun Wang, Chien-Yi Wang, Wei-Chen Chiu

CVPR 2024arXiv:2404.08968

citations

#2007

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?

Yunlong Tang, JunJia Guo, Hang Hua et al.

CVPR 2025arXiv:2411.10979

citations

#2008

KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling

Yu Wang, Xin Li, Shengzhao Wen et al.

CVPR 2024arXiv:2211.08071

citations

#2009

InNeRF360: Text-Guided 3D-Consistent Object Inpainting on 360-degree Neural Radiance Fields

Dongqing Wang, Tong Zhang, Alaa Abboud et al.

CVPR 2024arXiv:2305.15094

citations

#2010

Generating Content for HDR Deghosting from Frequency View

Tao Hu, Qingsen Yan, Yuankai Qi et al.

CVPR 2024arXiv:2404.00849

citations

#2011

Scaling Vision Pre-Training to 4K Resolution

Baifeng Shi, Boyi Li, Han Cai et al.

CVPR 2025highlightarXiv:2503.19903

citations

#2012

AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation

Jeongsoo Choi, Se Jin Park, Minsu Kim et al.

CVPR 2024highlightarXiv:2312.02512

citations

#2013

Binarized Low-light Raw Video Enhancement

Gengchen Zhang, Yulun Zhang, Xin Yuan et al.

CVPR 2024arXiv:2403.19944

citations

#2014

How to Merge Your Multimodal Models Over Time?

Sebastian Dziadzio, Vishaal Udandarao, Karsten Roth et al.

CVPR 2025arXiv:2412.06712

citations

#2015

ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark

Ronghao Dang, Yuqian Yuan, Wenqi Zhang et al.

CVPR 2025arXiv:2501.05031

citations

#2016

SkillMimic: Learning Basketball Interaction Skills from Demonstrations

Yinhuai Wang, Qihan Zhao, Runyi Yu et al.

CVPR 2025highlightarXiv:2408.15270

citations

#2017

Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches

Qing Yu, Mikihiro Tanaka, Kent Fujiwara

CVPR 2024arXiv:2405.04771

citations

#2018

Enhancing Vision-Language Pre-training with Rich Supervisions

Yuan Gao, Kunyu Shi, Pengkai Zhu et al.

CVPR 2024highlightarXiv:2403.03346

citations

#2019

Adapters Strike Back

Jan-Martin Steitz, Stefan Roth

CVPR 2024arXiv:2406.06820

citations

#2020

ScanFormer: Referring Expression Comprehension by Iteratively Scanning

Wei Su, Peihan Miao, Huanzhang Dou et al.

CVPR 2024arXiv:2406.18048

citations

#2021

Inverse Rendering of Glossy Objects via the Neural Plenoptic Function and Radiance Fields

Haoyuan Wang, Wenbo Hu, Lei Zhu et al.

CVPR 2024arXiv:2403.16224

citations

#2022

FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model

Jun Zhou, Jiahao Li, Zunnan Xu et al.

CVPR 2025arXiv:2503.19839

citations

#2023

DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery

Yixuan Zhu, Ao Li, Yansong Tang et al.

CVPR 2024arXiv:2404.01424

citations

#2024

Plug-and-Play Diffusion Distillation

Yi-Ting Hsiao, Siavash Khodadadeh, Kevin Duarte et al.

CVPR 2024arXiv:2406.01954

citations

#2025

Multiview Aerial Visual RECognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?

Aritra Dutta, Srijan Das, Jacob Nielsen et al.

CVPR 2024arXiv:2312.04548

citations

#2026

How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval?

Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain et al.

CVPR 2024arXiv:2403.07203

citations

#2027

Multiple View Geometry Transformers for 3D Human Pose Estimation

Ziwei Liao, jialiang zhu, Chunyu Wang et al.

CVPR 2024arXiv:2311.10983

citations

#2028

NeRF Director: Revisiting View Selection in Neural Volume Rendering

Wenhui Xiao, Rodrigo Santa Cruz, David Ahmedt-Aristizabal et al.

CVPR 2024arXiv:2406.08839

citations

#2029

ViewFusion: Towards Multi-View Consistency via Interpolated Denoising

Xianghui Yang, Gil Avraham, Yan Zuo et al.

CVPR 2024arXiv:2402.18842

citations

#2030

Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding

Wei Suo, Lijun Zhang, Mengyang Sun et al.

CVPR 2025highlightarXiv:2503.00361

citations

#2031

Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection

Taeheon Kim, Sebin Shin, Youngjoon Yu et al.

CVPR 2024arXiv:2403.01300

citations

#2032

Degradation-Aware Feature Perturbation for All-in-One Image Restoration

Xiangpeng Tian, Xiangyu Liao, Xiao Liu et al.

CVPR 2025arXiv:2505.12630

citations

#2033

SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild

Andreas Engelhardt, Amit Raj, Mark Boss et al.

CVPR 2024arXiv:2401.10171

citations

#2034

ConTex-Human: Free-View Rendering of Human from a Single Image with Texture-Consistent Synthesis

Xiangjun Gao, Xiaoyu Li, Chaopeng Zhang et al.

CVPR 2024arXiv:2311.17123

citations

#2035

Wired Perspectives: Multi-View Wire Art Embraces Generative AI

Zhiyu Qu, LAN YANG, Honggang Zhang et al.

CVPR 2024arXiv:2311.15421

citations

#2036

Adversarial Score Distillation: When score distillation meets GAN

Min Wei, Jingkai Zhou, Junyao Sun et al.

CVPR 2024arXiv:2312.00739

citations

#2037

UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping

Wenbo Wang, Fangyun Wei, Lei Zhou et al.

CVPR 2025arXiv:2412.02699

citations

#2038

EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing

Gaoxiang Cong, Jiadong Pan, Liang Li et al.

CVPR 2025highlightarXiv:2412.08988

citations

#2039

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

Jiuhai Chen, Jianwei Yang, Haiping Wu et al.

CVPR 2025arXiv:2412.04424

citations

#2040

Programmable Motion Generation for Open-Set Motion Control Tasks

Hanchao Liu, Xiaohang Zhan, Shaoli Huang et al.

CVPR 2024highlightarXiv:2405.19283

citations

#2041

MaGGIe: Masked Guided Gradual Human Instance Matting

Chuong Huynh, Seoung Wug Oh, Abhinav Shrivastava et al.

CVPR 2024arXiv:2404.16035

citations

#2042

TULIP: Transformer for Upsampling of LiDAR Point Clouds

Bin Yang, Patrick Pfreundschuh, Roland Siegwart et al.

CVPR 2024arXiv:2312.06733

citations

#2043

Generative Unlearning for Any Identity

Juwon Seo, Sung-Hoon Lee, Tae-Young Lee et al.

CVPR 2024arXiv:2405.09879

citations

#2044

Comparing the Decision-Making Mechanisms by Transformers and CNNs via Explanation Methods

Mingqi Jiang, Saeed Khorram, Li Fuxin

CVPR 2024arXiv:2212.06872

citations

#2045

Continual Segmentation with Disentangled Objectness Learning and Class Recognition

Yizheng Gong, Siyue Yu, Xiaoyang Wang et al.

CVPR 2024arXiv:2403.03477

citations

#2046

A Noisy Elephant in the Room: Is Your Out-of-Distribution Detector Robust to Label Noise?

Galadrielle Humblot-Renaux, Sergio Escalera, Thomas B. Moeslund

CVPR 2024arXiv:2404.01775

citations

#2047

Tri-Modal Motion Retrieval by Learning a Joint Embedding Space

Kangning Yin, Shihao Zou, Yuxuan Ge et al.

CVPR 2024highlightarXiv:2403.00691

citations

#2048

Kandinsky Conformal Prediction: Efficient Calibration of Image Segmentation Algorithms

Joren Brunekreef, Eric Marcus, Ray Sheombarsing et al.

CVPR 2024arXiv:2311.11837

citations

#2049

Few-Shot Recognition via Stage-Wise Retrieval-Augmented Finetuning

Tian Liu, Huixin Zhang, Shubham Parashar et al.

CVPR 2025arXiv:2406.11148

citations

#2050

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

Runhui Huang, Xinpeng Ding, Chunwei Wang et al.

CVPR 2025arXiv:2407.08706

citations

#2051

Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models

Weiwei Cao, Jianpeng Zhang, Yingda Xia et al.

CVPR 2024arXiv:2404.04936

citations

#2052

VisionArena: 230k Real World User-VLM Conversations with Preference Labels

Christopher Chou, Lisa Dunlap, Wei-Lin Chiang et al.

CVPR 2025arXiv:2412.08687

citations

#2053

Scaling Properties of Diffusion Models For Perceptual Tasks

Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran et al.

CVPR 2025arXiv:2411.08034

citations

#2054

MRC-Net: 6-DoF Pose Estimation with MultiScale Residual Correlation

Yuelong Li, Yafei Mao, Raja Bala et al.

CVPR 2024arXiv:2403.08019

citations

#2055

Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge

Dongjin Kim, Sung Jin Um, Sangmin Lee et al.

CVPR 2024arXiv:2403.17420

citations

#2056

OmniMotionGPT: Animal Motion Generation with Limited Data

Zhangsihao Yang, Mingyuan Zhou, Mengyi Shan et al.

CVPR 2024arXiv:2311.18303

citations

#2057

MVSAnywhere: Zero-Shot Multi-View Stereo

Sergio Izquierdo, Mohamed Sayed, Michael Firman et al.

CVPR 2025arXiv:2503.22430

citations

#2058

PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs

Michael Dorkenwald, Nimrod Barazani, Cees G. M. Snoek et al.

CVPR 2024arXiv:2402.08657

citations

#2059

Deformable One-shot Face Stylization via DINO Semantic Guidance

Yang Zhou, Zichong Chen, Hui Huang

CVPR 2024arXiv:2403.00459

citations

#2060

Dynamic Cues-Assisted Transformer for Robust Point Cloud Registration

Hong Chen, Pei Yan, sihe xiang et al.

CVPR 2024highlight

citations

#2061

R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning

Lijun Sheng, Jian Liang, Zilei Wang et al.

CVPR 2025arXiv:2504.11195

citations

#2062

Partial-to-Partial Shape Matching with Geometric Consistency

Viktoria Ehm, Maolin Gao, Paul Roetzer et al.

CVPR 2024arXiv:2404.12209

citations

#2063

Multiway Point Cloud Mosaicking with Diffusion and Global Optimization

Shengze Jin, Iro Armeni, Marc Pollefeys et al.

CVPR 2024arXiv:2404.00429

citations

#2064

Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval

Yuanmin Tang, Jing Yu, Keke Gai et al.

CVPR 2025arXiv:2503.17109

citations

#2065

A Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid Transformer and Contrastive Learning

Yuelin Zhang, Pengyu Zheng, Wanquan Yan et al.

CVPR 2024arXiv:2403.02611

citations

#2066

AniDoc: Animation Creation Made Easier

Yihao Meng, Hao Ouyang, Hanlin Wang et al.

CVPR 2025arXiv:2412.14173

citations

#2067

ManiFPT: Defining and Analyzing Fingerprints of Generative Models

Hae Jin Song, Mahyar Khayatkhoei, Wael AbdAlmageed

CVPR 2024arXiv:2402.10401

citations

#2068

Efficient Vision-Language Pre-training by Cluster Masking

Zihao Wei, Zixuan Pan, Andrew Owens

CVPR 2024arXiv:2405.08815

citations

#2069

HiLo: Detailed and Robust 3D Clothed Human Reconstruction with High-and Low-Frequency Information of Parametric Models

Yifan Yang, Dong Liu, Shuhai Zhang et al.

CVPR 2024arXiv:2404.04876

citations

#2070

CoSDH: Communication-Efficient Collaborative Perception via Supply-Demand Awareness and Intermediate-Late Hybridization

Junhao Xu, Yanan Zhang, Zhi Cai et al.

CVPR 2025arXiv:2503.03430

citations

#2071

ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems

Xiangyuan Xue, Zeyu Lu, Di Huang et al.

CVPR 2025arXiv:2409.01392

citations

#2072

Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models

Qirui Jiao, Daoyuan Chen, Yilun Huang et al.

CVPR 2025arXiv:2408.04594

citations

#2073

MS-MANO: Enabling Hand Pose Tracking with Biomechanical Constraints

Pengfei Xie, Wenqiang Xu, Tutian Tang et al.

CVPR 2024arXiv:2404.10227

citations

#2074

MBQ: Modality-Balanced Quantization for Large Vision-Language Models

Shiyao Li, Yingchun Hu, Xuefei Ning et al.

CVPR 2025arXiv:2412.19509

citations

#2075

Unlearning through Knowledge Overwriting: Reversible Federated Unlearning via Selective Sparse Adapter

Zhengyi Zhong, Weidong Bao, Ji Wang et al.

CVPR 2025arXiv:2502.20709

citations

#2076

Personalized Preference Fine-tuning of Diffusion Models

Meihua Dang, Anikait Singh, Linqi Zhou et al.

CVPR 2025arXiv:2501.06655

citations

#2077

Active Data Curation Effectively Distills Large-Scale Multimodal Models

Vishaal Udandarao, Nikhil Parthasarathy, Muhammad Ferjad Naeem et al.

CVPR 2025arXiv:2411.18674

citations

#2078

NIVeL: Neural Implicit Vector Layers for Text-to-Vector Generation

Vikas Thamizharasan, Difan Liu, Matthew Fisher et al.

CVPR 2024arXiv:2405.15217

citations

#2079

Bidirectional Autoregessive Diffusion Model for Dance Generation

Canyu Zhang, Youbao Tang, NING Zhang et al.

CVPR 2024

citations

#2080

FineLIP: Extending CLIP’s Reach via Fine-Grained Alignment with Longer Text Inputs

Mothilal Asokan, Kebin wu, Fatima Albreiki

CVPR 2025arXiv:2504.01916

citations

#2081

IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation

Yiren Song, Pei Yang, Hai Ci et al.

CVPR 2025arXiv:2412.11638

citations

#2082

One-Shot Structure-Aware Stylized Image Synthesis

Hansam Cho, Jonghyun Lee, Seunggyu Chang et al.

CVPR 2024arXiv:2402.17275

citations

#2083

Assessing and Learning Alignment of Unimodal Vision and Language Models

Le Zhang, Qian Yang, Aishwarya Agrawal

CVPR 2025highlightarXiv:2412.04616

citations

#2084

X-Dyna: Expressive Dynamic Human Image Animation

Di Chang, Hongyi Xu, You Xie et al.

CVPR 2025highlightarXiv:2501.10021

citations

#2085

DRAWER: Digital Reconstruction and Articulation With Environment Realism

Hongchi Xia, Entong Su, Marius Memmel et al.

CVPR 2025arXiv:2504.15278

citations

#2086

SOAC: Spatio-Temporal Overlap-Aware Multi-Sensor Calibration using Neural Radiance Fields

Quentin HERAU, Nathan Piasco, Moussab Bennehar et al.

CVPR 2024arXiv:2311.15803

citations

#2087

DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes

Jinxiu Liu, Shaoheng Lin, Yinxiao Li et al.

CVPR 2025arXiv:2412.11100

citations

#2088

ECVC: Exploiting Non-Local Correlations in Multiple Frames for Contextual Video Compression

Wei Jiang, Junru Li, Kai Zhang et al.

CVPR 2025arXiv:2410.09706

citations

#2089

LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes

Shanlin Sun, Bingbing Zhuang, Ziyu Jiang et al.

CVPR 2024highlightarXiv:2405.00900

citations

#2090

Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models

Jiaming Zhang, Junhong Ye, Xingjun Ma et al.

CVPR 2025arXiv:2410.05346

citations

#2091

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation

Ziyang Luo, Haoning Wu, Dongxu Li et al.

CVPR 2025arXiv:2411.13281

citations

#2092

Versatile Medical Image Segmentation Learned from Multi-Source Datasets via Model Self-Disambiguation

Xiaoyang Chen, Hao Zheng, Yuemeng LI et al.

CVPR 2024arXiv:2311.10696

citations

#2093

S2Gaussian: Sparse-View Super-Resolution 3D Gaussian Splatting

Yecong Wan, Mingwen Shao, Yuanshuo Cheng et al.

CVPR 2025arXiv:2503.04314

citations

#2094

On the Road to Portability: Compressing End-to-End Motion Planner for Autonomous Driving

Kaituo Feng, Changsheng Li, Dongchun Ren et al.

CVPR 2024arXiv:2403.01238

citations

#2095

TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models

Xin Wang, Kai Chen, Jiaming Zhang et al.

CVPR 2025arXiv:2411.13136

citations

#2096

MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research

James Burgess, Jeffrey J Nirschl, Laura Bravo-Sánchez et al.

CVPR 2025arXiv:2503.13399

citations

#2097

Dissecting and Mitigating Diffusion Bias via Mechanistic Interpretability

Yingdong Shi, Changming Li, Yifan Wang et al.

CVPR 2025arXiv:2503.20483

citations

#2098

Docopilot: Improving Multimodal Models for Document-Level Understanding

Yuchen Duan, Zhe Chen, Yusong Hu et al.

CVPR 2025arXiv:2507.14675

citations

#2099

Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos

Chiara Plizzari, Alessio Tonioni, Yongqin Xian et al.

CVPR 2025arXiv:2503.13646

citations

#2100

GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery

Enguang Wang, Zhimao Peng, Zhengyuan Xie et al.

CVPR 2025arXiv:2403.09974

citations

#2101

Customization Assistant for Text-to-Image Generation

Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu et al.

CVPR 2024arXiv:2312.03045

citations

#2102

GenN2N: Generative NeRF2NeRF Translation

Xiangyue Liu, Han Xue, Kunming Luo et al.

CVPR 2024arXiv:2404.02788

citations

#2103

Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis

Jiapeng Zhu, Ceyuan Yang, Kecheng Zheng et al.

CVPR 2025arXiv:2309.03904

citations

#2104

DGC-GNN: Leveraging Geometry and Color Cues for Visual Descriptor-Free 2D-3D Matching

Shuzhe Wang, Juho Kannala, Daniel Barath

CVPR 2024arXiv:2306.12547

citations

#2105

Instance-Aware Group Quantization for Vision Transformers

Jaehyeon Moon, Dohyung Kim, Jun Yong Cheon et al.

CVPR 2024arXiv:2404.00928

citations

#2106

Leveraging Predicate and Triplet Learning for Scene Graph Generation

Jiankai Li, Yunhong Wang, Xiefan Guo et al.

CVPR 2024arXiv:2406.02038

citations

#2107

Neural Clustering based Visual Representation Learning

Guikun Chen, Xia Li, Yi Yang et al.

CVPR 2024arXiv:2403.17409

citations

#2108

TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes

Xuying Zhang, Bo-Wen Yin, yuming chen et al.

CVPR 2024arXiv:2312.04248

citations

#2109

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation

Yunhao Ge, Yihe Tang, Jiashu Xu et al.

CVPR 2024highlightarXiv:2405.09546

citations

#2110

DiffHuman: Probabilistic Photorealistic 3D Reconstruction of Humans

Akash Sengupta, Thiemo Alldieck, NIKOS KOLOTOUROS et al.

CVPR 2024arXiv:2404.00485

citations

#2111

BiPer: Binary Neural Networks using a Periodic Function

Edwin Vargas, Claudia Correa, Carlos Hinojosa et al.

CVPR 2024arXiv:2404.01278

citations

#2112

En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data

Yifang Men, Biwen Lei, Yuan Yao et al.

CVPR 2024arXiv:2401.01173

citations

#2113

Complementing Event Streams and RGB Frames for Hand Mesh Reconstruction

Jianping Jiang, xinyu zhou, Bingxuan Wang et al.

CVPR 2024arXiv:2403.07346

citations

#2114

GenesisTex: Adapting Image Denoising Diffusion to Texture Space

Chenjian Gao, Boyan Jiang, Xinghui Li et al.

CVPR 2024arXiv:2403.17782

citations

#2115

Learned Image Compression with Dictionary-based Entropy Model

Jingbo Lu, Leheng Zhang, Xingyu Zhou et al.

CVPR 2025arXiv:2504.00496

citations

#2116

UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and Unfavorable Sets

Youngju Na, Woo Jae Kim, Kyu Han et al.

CVPR 2024arXiv:2403.05086

citations

#2117

DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data

Chengxiang Fan, Muzhi Zhu, Hao Chen et al.

CVPR 2024arXiv:2405.10185

citations

#2118

Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models

Zhihang Liu, Chen-Wei Xie, Pandeng Li et al.

CVPR 2025arXiv:2503.16036

citations

#2119

ILIAS: Instance-Level Image retrieval At Scale

Giorgos Kordopatis-Zilos, Vladan Stojnić, Anna Manko et al.

CVPR 2025arXiv:2502.11748

citations

#2120

SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models

Zilan Wang, Junfeng Guo, Jiacheng Zhu et al.

CVPR 2025arXiv:2412.04852

citations

#2121

SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving

Xuesong Chen, Linjiang Huang, Tao Ma et al.

CVPR 2025arXiv:2505.16805

citations

#2122

4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video

Qiang Hu, Zihan Zheng, Houqiang Zhong et al.

CVPR 2025arXiv:2503.18421

citations

#2123

PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild

Kun Yuan, Hongbo Liu, Mading Li et al.

CVPR 2024arXiv:2405.17765

citations

#2124

Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D Environments

Liyuan Zhu, Shengyu Huang, Konrad Schindler et al.

CVPR 2024highlightarXiv:2312.09138

citations

#2125

Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues

Youngjoon Jang, Haran Raajesh, Liliane Momeni et al.

CVPR 2025arXiv:2501.09754

citations

#2126

Towards Universal Soccer Video Understanding

Jiayuan Rao, Haoning Wu, Hao Jiang et al.

CVPR 2025arXiv:2412.01820

citations

#2127

MangaNinja: Line Art Colorization with Precise Reference Following

Zhiheng Liu, Ka Leong Cheng, Xi Chen et al.

CVPR 2025highlightarXiv:2501.08332

citations

#2128

Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers

Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain et al.

CVPR 2024arXiv:2403.07214

citations

#2129

FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models

Haokun Chen, Hang Li, Yao Zhang et al.

CVPR 2025arXiv:2410.04810

citations

#2130

Dynamic Camera Poses and Where to Find Them

Chris Rockwell, Joseph Tung, Tsung-Yi Lin et al.

CVPR 2025arXiv:2504.17788

citations

#2131

Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes

Chi-Hsi Kung, 書緯呂, Yi-Hsuan Tsai et al.

CVPR 2024arXiv:2311.17948

citations

#2132

OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd Representation

Ganlong Zhao, Guanbin Li, Weikai Chen et al.

CVPR 2024arXiv:2403.17334

citations

#2133

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes

Myeongseob Ko, Feiyang Kang, Weiyan Shi et al.

CVPR 2024arXiv:2402.08922

citations

#2134

Cyclic Learning for Binaural Audio Generation and Localization

Zhaojian Li, Bin Zhao, Yuan Yuan

CVPR 2024

citations

#2135

Synthetic Data is an Elegant GIFT for Continual Vision-Language Models

Bin Wu, Wuxuan Shi, Jinqiao Wang et al.

CVPR 2025arXiv:2503.04229

citations

#2136

FreeTimeGS: Free Gaussian Primitives at Anytime Anywhere for Dynamic Scene Reconstruction

Yifan Wang, Peishan Yang, Zhen Xu et al.

CVPR 2025

citations

#2137

STEP: Enhancing Video-LLMs’ Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training

Haiyi Qiu, Minghe Gao, Long Qian et al.

CVPR 2025arXiv:2412.00161

citations

#2138

A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing

Li Maomao, Yu Li, Tianyu Yang et al.

CVPR 2024arXiv:2312.05856

citations

#2139

RENO: Real-Time Neural Compression for 3D LiDAR Point Clouds

Kang You, Tong Chen, Dandan Ding et al.

CVPR 2025arXiv:2503.12382

citations

#2140

JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups

Simindokht Jahangard, Zhixi Cai, Shiki Wen et al.

CVPR 2024arXiv:2404.04458

citations

#2141

Improving Spectral Snapshot Reconstruction with Spectral-Spatial Rectification

Jiancheng Zhang, Haijin Zeng, Yongyong Chen et al.

CVPR 2024

citations

#2142

NeRF Analogies: Example-Based Visual Attribute Transfer for NeRFs

Michael Fischer, Zhengqin Li, Thu Nguyen-Phuoc et al.

CVPR 2024arXiv:2402.08622

citations

#2143

What You See is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs

Alex Trevithick, Matthew Chan, Towaki Takikawa et al.

CVPR 2024arXiv:2401.02411

citations

#2144

JamMa: Ultra-lightweight Local Feature Matching with Joint Mamba

Xiaoyong Lu, Songlin Du

CVPR 2025arXiv:2503.03437

citations

#2145

IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera

Jian Huang, Chengrui Dong, Xuanhua Chen et al.

CVPR 2025highlightarXiv:2410.08107

citations

#2146

CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images

Chen Cheng, Jiacheng Wei, Tianrun Chen et al.

CVPR 2025arXiv:2504.04753

citations

#2147

NeRSP: Neural 3D Reconstruction for Reflective Objects with Sparse Polarized Images

Yufei Han, Heng Guo, Koki Fukai et al.

CVPR 2024arXiv:2406.07111

citations

#2148

ConsistDreamer: 3D-Consistent 2D Diffusion for High-Fidelity Scene Editing

Jun-Kun Chen, Samuel Rota Bulò, Norman Müller et al.

CVPR 2024arXiv:2406.09404

citations

#2149

Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations

Sangmin Lee, Bolin Lai, Fiona Ryan et al.

CVPR 2024arXiv:2403.02090

citations

#2150

SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes

Cheng-De Fan, Chen-Wei Chang, Yi-Ruei Liu et al.

CVPR 2025arXiv:2410.17249

citations

#2151

Hybrid Proposal Refiner: Revisiting DETR Series from the Faster R-CNN Perspective

Jinjing Zhao, Fangyun Wei, Chang Xu

CVPR 2024

citations

#2152

Codebook Transfer with Part-of-Speech for Vector-Quantized Image Modeling

Baoquan Zhang, Huaibin Wang, Luo Chuyao et al.

CVPR 2024arXiv:2403.10071

citations

#2153

HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions

Hao Xu, Li Haipeng, Yinqiao Wang et al.

CVPR 2024arXiv:2403.18575

citations

#2154

Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation

Guy Yariv, Yuval Kirstain, Amit Zohar et al.

CVPR 2025arXiv:2501.03059

citations

#2155

Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition

Zihan Wang, Siyang Song, Cheng Luo et al.

CVPR 2024arXiv:2404.06443

citations

#2156

Unifying Automatic and Interactive Matting with Pretrained ViTs

Zixuan Ye, Wenze Liu, He Guo et al.

CVPR 2024

citations

#2157

Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments

Luke Rowe, Roger Girgis, Anthony Gosselin et al.

CVPR 2025arXiv:2503.22496

citations

#2158

SeaBird: Segmentation in Bird’s View with Dice Loss Improves Monocular 3D Detection of Large Objects

Abhinav Kumar, Yuliang Guo, Xinyu Huang et al.

CVPR 2024arXiv:2403.20318

citations

#2159

ProTeCt: Prompt Tuning for Taxonomic Open Set Classification

Tz-Ying Wu, Chih-Hui Ho, Nuno Vasconcelos

CVPR 2024arXiv:2306.02240

citations

#2160

ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models

Junzhe Chen, Tianshu Zhang, Shiyu Huang et al.

CVPR 2025arXiv:2411.15268

citations

#2161

MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild

Zeren Jiang, Chen Guo, Manuel Kaufmann et al.

CVPR 2024arXiv:2406.01595

citations

#2162

Unified Entropy Optimization for Open-Set Test-Time Adaptation

Zhengqing Gao, Xu-Yao Zhang, Cheng-Lin Liu

CVPR 2024arXiv:2404.06065

citations

#2163

STEREO: A Two-Stage Framework for Adversarially Robust Concept Erasing from Text-to-Image Diffusion Models

Koushik Srivatsan, Fahad Shamshad, Muzammal Naseer et al.

CVPR 2025highlightarXiv:2408.16807

citations

#2164

Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera

Zhengdi Yu, Stefanos Zafeiriou, Tolga Birdal

CVPR 2025highlightarXiv:2412.12861

citations

#2165

Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation

Jingyun Wang, Guoliang Kang

CVPR 2024arXiv:2408.06747

citations

#2166

Region-Based Representations Revisited

Michal Shlapentokh-Rothman, Ansel Blume, Yao Xiao et al.

CVPR 2024arXiv:2402.02352

citations

#2167

SVDinsTN: A Tensor Network Paradigm for Efficient Structure Search from Regularized Modeling Perspective

Yu-Bang Zheng, Xile Zhao, Junhua Zeng et al.

CVPR 2024highlightarXiv:2305.14912

citations

#2168

Misalignment-Robust Frequency Distribution Loss for Image Transformation

Zhangkai Ni, Juncheng Wu, Zian Wang et al.

CVPR 2024arXiv:2402.18192

citations

#2169

BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models

Taha Koleilat, Hojat Asgariandehkordi, Hassan Rivaz et al.

CVPR 2025arXiv:2411.15232

citations

#2170

Functional Diffusion

Biao Zhang, Peter Wonka

CVPR 2024arXiv:2311.15435

citations

#2171

GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding

Yawen Shao, Wei Zhai, Yuhang Yang et al.

CVPR 2025arXiv:2411.19626

citations

#2172

Novel Class Discovery for Ultra-Fine-Grained Visual Categorization

Qi Jia, Yaqi Cai, Qi Jia et al.

CVPR 2024highlightarXiv:2405.06283

citations

#2173

Human Motion Instruction Tuning

Lei Li, Sen Jia, Jianhao Wang et al.

CVPR 2025arXiv:2411.16805

citations

#2174

AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis

Khiem Vuong, Anurag Ghosh, Deva Ramanan et al.

CVPR 2025arXiv:2504.13157

citations

#2175

EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild

Yumeng Liu, Xiaoxiao Long, Zemin Yang et al.

CVPR 2025arXiv:2411.14280

citations

#2176

DPFlow: Adaptive Optical Flow Estimation with a Dual-Pyramid Framework

Henrique Morimitsu, Xiaobin Zhu, Roberto M. Cesar Jr et al.

CVPR 2025arXiv:2503.14880

citations

#2177

Pippo: High-Resolution Multi-View Humans from a Single Image

Yash Kant, Ethan Weber, Jin Kyu Kim et al.

CVPR 2025highlightarXiv:2502.07785

citations

#2178

Memory-based Adapters for Online 3D Scene Perception

Xiuwei Xu, Chong Xia, Ziwei Wang et al.

CVPR 2024arXiv:2403.06974

citations

#2179

Dynamic Policy-Driven Adaptive Multi-Instance Learning for Whole Slide Image Classification

Tingting Zheng, Kui Jiang, Hongxun Yao

CVPR 2024highlightarXiv:2403.07939

citations

#2180

MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation

Weijia Wu, Mingyu Liu, Zeyu Zhu et al.

CVPR 2025arXiv:2411.15262

citations

#2181

Navigate Beyond Shortcuts: Debiased Learning Through the Lens of Neural Collapse

Yining Wang, Junjie Sun, Chenyue Wang et al.

CVPR 2024highlightarXiv:2405.05587

citations

#2182

MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations

Kyungho Bae, Jinhyung Kim, Sihaeng Lee et al.

CVPR 2025highlightarXiv:2503.15871

citations

#2183

MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling

Jian Yang, Dacheng Yin, Yizhou Zhou et al.

CVPR 2025arXiv:2410.10798

citations

#2184

Adversarial Backdoor Attack by Naturalistic Data Poisoning on Trajectory Prediction in Autonomous Driving

Mozhgan Pourkeshavarz, Mohammad Sabokrou, Amir Rasouli

CVPR 2024arXiv:2306.15755

citations

#2185

Bridging Modalities: Improving Universal Multimodal Retrieval by Multimodal Large Language Models

Xin Zhang, Yanzhao Zhang, Wen Xie et al.

CVPR 2025

citations

#2186

NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training

Dar-Yen Chen, Hmrishav Bandyopadhyay, Kai Zou et al.

CVPR 2025arXiv:2412.02030

citations

#2187

SALAD: Skeleton-aware Latent Diffusion for Text-driven Motion Generation and Editing

Seokhyeon Hong, Chaelin Kim, Serin Yoon et al.

CVPR 2025arXiv:2503.13836

citations

#2188

3D Neural Edge Reconstruction

Lei Li, Songyou Peng, Zehao Yu et al.

CVPR 2024arXiv:2405.19295

citations

#2189

MAGiC-SLAM: Multi-Agent Gaussian Globally Consistent SLAM

Vladimir Yugay, Theo Gevers, Martin R. Oswald

CVPR 2025arXiv:2411.16785

citations

#2190

Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting

Runsong Zhu, Shi Qiu, ZHENGZHE LIU et al.

CVPR 2025arXiv:2503.14029

citations

#2191

DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction

Junwen Xiong, Peng Zhang, Tao You et al.

CVPR 2024arXiv:2403.01226

citations

#2192

Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space

Yifan Zhou, Zeqi Xiao, Shuai Yang et al.

CVPR 2025arXiv:2503.09419

citations

#2193

SAIST: Segment Any Infrared Small Target Model Guided by Contrastive Language-Image Pretraining

Mingjin Zhang, Xiaolong Li, Fei Gao et al.

CVPR 2025

citations

#2194

MoSAR: Monocular Semi-Supervised Model for Avatar Reconstruction using Differentiable Shading

Abdallah Dib, Luiz Gustavo Hafemann, Emeline Got et al.

CVPR 2024arXiv:2312.13091

citations

#2195

From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing

Jingxuan Wei, Cheng Tan, Qi Chen et al.

CVPR 2025highlightarXiv:2411.11916

citations

#2196

Learning Intra-view and Cross-view Geometric Knowledge for Stereo Matching

Rui Gong, Weide Liu, ZAIWANG GU et al.

CVPR 2024arXiv:2402.19270

citations

#2197

HEAL-SWIN: A Vision Transformer On The Sphere

Oscar Carlsson, Jan E. Gerken, Hampus Linander et al.

CVPR 2024arXiv:2307.07313

citations

#2198

DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data

Hanrong Ye, Dan Xu

CVPR 2024arXiv:2403.15389

citations

#2199

Robust Distillation via Untargeted and Targeted Intermediate Adversarial Samples

Junhao Dong, Piotr Koniusz, Junxi Chen et al.

CVPR 2024

citations

#2200

Explaining CLIP's Performance Disparities on Data from Blind/Low Vision Users

Daniela Massiceti, Camilla Longden, Agnieszka Słowik et al.

CVPR 2024arXiv:2311.17315

citations

← Previous

1...9 10 11 12 13...28