Most Cited CVPR "generalization to unseen models" Papers

5,589 papers found • Page 9 of 28

#1601

Distilling Monocular Foundation Model for Fine-grained Depth Completion

Yingping Liang, Yutao Hu, Wenqi Shao et al.

CVPR 2025posterarXiv:2503.16970
9
citations
#1602

Deformable Radial Kernel Splatting

Yihua Huang, Mingxian Lin, Yangtian Sun et al.

CVPR 2025posterarXiv:2412.11752
9
citations
#1603

LITA-GS: Illumination-Agnostic Novel View Synthesis via Reference-Free 3D Gaussian Splatting and Physical Priors

Han Zhou, Wei Dong, Jun Chen

CVPR 2025posterarXiv:2504.00219
9
citations
#1604

GLASS: Guided Latent Slot Diffusion for Object-Centric Learning

Krishnakant Singh, Simone Schaub-Meyer, Stefan Roth

CVPR 2025posterarXiv:2407.17929
9
citations
#1605

Knowledge-Aligned Counterfactual-Enhancement Diffusion Perception for Unsupervised Cross-Domain Visual Emotion Recognition

Wen Yin, Yong Wang, Guiduo Duan et al.

CVPR 2025posterarXiv:2505.19694
9
citations
#1606

Neural Video Compression with Context Modulation

Chuanbo Tang, Zhuoyuan Li, Yifan Bian et al.

CVPR 2025posterarXiv:2505.14541
9
citations
#1607

UnCommon Objects in 3D

Xingchen Liu, Piyush Tayal, Jianyuan Wang et al.

CVPR 2025posterarXiv:2501.07574
9
citations
#1608

Physical Plausibility-aware Trajectory Prediction via Locomotion Embodiment

Hiromu Taketsugu, Takeru Oba, Takahiro Maeda et al.

CVPR 2025posterarXiv:2503.17267
9
citations
#1609

PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution

Zhu Li Bo, Jianze Li, Haotong Qin et al.

CVPR 2025posterarXiv:2411.17106
9
citations
#1610

Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models

Jiaming Zhang, Junhong Ye, Xingjun Ma et al.

CVPR 2025posterarXiv:2410.05346
9
citations
#1611

Dual Prompting Image Restoration with Diffusion Transformers

Dehong Kong, Fan Li, Zhixin Wang et al.

CVPR 2025posterarXiv:2504.17825
9
citations
#1612

An End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion Models

Wentao Qu, Jing Wang, Yongshun Gong et al.

CVPR 2025posterarXiv:2411.16308
9
citations
#1613

SAM2Object: Consolidating View Consistency via SAM2 for Zero-Shot 3D Instance Segmentation

Jihuai Zhao, Junbao Zhuo, Jiansheng Chen et al.

CVPR 2025poster
9
citations
#1614

High-Quality Facial Geometry and Appearance Capture at Home

Yuxuan Han, Junfeng Lyu, Feng Xu

CVPR 2024posterarXiv:2312.03442
9
citations
#1615

SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation

Leigang Qu, Haochuan Li, Wenjie Wang et al.

CVPR 2025posterarXiv:2412.05818
9
citations
#1616

DepthCues: Evaluating Monocular Depth Perception in Large Vision Models

Duolikun Danier, Mehmet Aygun, Changjian Li et al.

CVPR 2025posterarXiv:2411.17385
9
citations
#1617

C3Net: Compound Conditioned ControlNet for Multimodal Content Generation

Juntao Zhang, Yuehuai LIU, Yu-Wing Tai et al.

CVPR 2024posterarXiv:2311.17951
9
citations
#1618

AAMDM: Accelerated Auto-regressive Motion Diffusion Model

Tianyu Li, Calvin Zhuhan Qiao, Ren Guanqiao et al.

CVPR 2024posterarXiv:2401.06146
9
citations
#1619

Mimic In-Context Learning for Multimodal Tasks

Yuchu Jiang, Jiale Fu, chenduo hao et al.

CVPR 2025posterarXiv:2504.08851
9
citations
#1620

MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining

Yunze Liu, Li Yi

CVPR 2025posterarXiv:2410.00871
9
citations
#1621

VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models

Muchao Ye, Weiyang Liu, Pan He

CVPR 2025posterarXiv:2412.01095
9
citations
#1622

MP-SfM: Monocular Surface Priors for Robust Structure-from-Motion

Zador Pataki, Paul-Edouard Sarlin, Johannes Schönberger et al.

CVPR 2025posterarXiv:2504.20040
9
citations
#1623

ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos

Tanveer Hannan, Md Mohaiminul Islam, Jindong Gu et al.

CVPR 2025posterarXiv:2411.14901
9
citations
#1624

Memory-Scalable and Simplified Functional Map Learning

Robin Magnet, Maks Ovsjanikov

CVPR 2024posterarXiv:2404.00330
9
citations
#1625

CoA: Towards Real Image Dehazing via Compression-and-Adaptation

Long Ma, Yuxin Feng, Yan Zhang et al.

CVPR 2025posterarXiv:2504.05590
9
citations
#1626

Bayesian Test-Time Adaptation for Vision-Language Models

Lihua Zhou, Mao Ye, Shuaifeng Li et al.

CVPR 2025posterarXiv:2503.09248
9
citations
#1627

DreamText: High Fidelity Scene Text Synthesis

Yibin Wang, Weizhong Zhang, honghui xu et al.

CVPR 2025posterarXiv:2405.14701
9
citations
#1628

Meta-Point Learning and Refining for Category-Agnostic Pose Estimation

Junjie Chen, Jiebin Yan, Yuming Fang et al.

CVPR 2024posterarXiv:2403.13647
9
citations
#1629

Rethinking Query-based Transformer for Continual Image Segmentation

Yuchen Zhu, Cheng Shi, Dingyou Wang et al.

CVPR 2025posterarXiv:2507.07831
9
citations
#1630

Towards Transformer-Based Aligned Generation with Self-Coherence Guidance

Shulei Wang, Wang Lin, Hai Huang et al.

CVPR 2025posterarXiv:2503.17675
9
citations
#1631

MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes

Ruijie Lu, Yixin Chen, Junfeng Ni et al.

CVPR 2025posterarXiv:2412.11457
9
citations
#1632

FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication

Eric Slyman, Stefan Lee, Scott Cohen et al.

CVPR 2024posterarXiv:2404.16123
9
citations
#1633

EventGPT: Event Stream Understanding with Multimodal Large Language Models

shaoyu liu, Jianing Li, guanghui zhao et al.

CVPR 2025posterarXiv:2412.00832
9
citations
#1634

Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes

Diandian Guo, Deng-Ping Fan, Tongyu Lu et al.

CVPR 2024highlightarXiv:2401.15261
9
citations
#1635

RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method

Ming Yan, Yan Zhang, Shuqiang Cai et al.

CVPR 2024posterarXiv:2403.19501
9
citations
#1636

Time-Efficient Light-Field Acquisition Using Coded Aperture and Events

Shuji Habuchi, Keita Takahashi, Chihiro Tsutake et al.

CVPR 2024posterarXiv:2403.07244
9
citations
#1637

MP-GUI: Modality Perception with MLLMs for GUI Understanding

Ziwei Wang, Weizhi Chen, Leyang Yang et al.

CVPR 2025posterarXiv:2503.14021
9
citations
#1638

Hierarchical Correlation Clustering and Tree Preserving Embedding

Morteza Haghir Chehreghani, Mostafa Haghir Chehreghani

CVPR 2024posterarXiv:2002.07756
9
citations
#1639

InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment

Yunhong Lu, Qichao Wang, Hengyuan Cao et al.

CVPR 2025highlightarXiv:2503.18454
9
citations
#1640

GenDeg: Diffusion-based Degradation Synthesis for Generalizable All-In-One Image Restoration

Sudarshan Rajagopalan, Nithin Gopalakrishnan Nair, Jay Paranjape et al.

CVPR 2025posterarXiv:2411.17687
9
citations
#1641

Probability Density Geodesics in Image Diffusion Latent Space

Qingtao Yu, Jaskirat Singh, Zhaoyuan Yang et al.

CVPR 2025posterarXiv:2504.06675
9
citations
#1642

Language Guided Concept Bottleneck Models for Interpretable Continual Learning

Lu Yu, HaoYu Han, Zhe Tao et al.

CVPR 2025posterarXiv:2503.23283
9
citations
#1643

Neural Super-Resolution for Real-time Rendering with Radiance Demodulation

Jia Li, Ziling Chen, Xiaolong Wu et al.

CVPR 2024posterarXiv:2308.06699
9
citations
#1644

O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models

Ashshak Sharifdeen, Muhammad Akhtar Munir, Sanoojan Baliah et al.

CVPR 2025highlightarXiv:2503.12096
9
citations
#1645

REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning

Jihyun Lee, Weipeng Xu, Alexander Richard et al.

CVPR 2025posterarXiv:2504.04956
9
citations
#1646

DiG-IN: Diffusion Guidance for Investigating Networks - Uncovering Classifier Differences Neuron Visualisations and Visual Counterfactual Explanations

Maximilian Augustin, Yannic Neuhaus, Matthias Hein

CVPR 2024posterarXiv:2311.17833
9
citations
#1647

Improving Physics-Augmented Continuum Neural Radiance Field-Based Geometry-Agnostic System Identification with Lagrangian Particle Optimization

Takuhiro Kaneko

CVPR 2024posterarXiv:2406.04155
9
citations
#1648

FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation

Dong Zhao, Jinlong Li, Shuang Wang et al.

CVPR 2025posterarXiv:2503.17940
9
citations
#1649

UnSAMFlow: Unsupervised Optical Flow Guided by Segment Anything Model

Shuai Yuan, Lei Luo, Zhuo Hui et al.

CVPR 2024posterarXiv:2405.02608
9
citations
#1650

Motion Diversification Networks

Hee Jae Kim, Eshed Ohn-Bar

CVPR 2024poster
9
citations
#1651

MultiGO: Towards Multi-level Geometry Learning for Monocular 3D Textured Human Reconstruction

Gangjian Zhang, Nanjie Yao, Shunsi Zhang et al.

CVPR 2025posterarXiv:2412.03103
9
citations
#1652

TULIP: Multi-camera 3D Precision Assessment of Parkinson’s Disease

Kyungdo Kim, Sihan Lyu, Sneha Mantri et al.

CVPR 2024poster
9
citations
#1653

RNG: Relightable Neural Gaussians

Jiahui Fan, Fujun Luan, Jian Yang et al.

CVPR 2025posterarXiv:2409.19702
9
citations
#1654

Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval

Arun Reddy, Alexander Martin, Eugene Yang et al.

CVPR 2025posterarXiv:2503.19009
9
citations
#1655

Post-pre-training for Modality Alignment in Vision-Language Foundation Models

Shin'ya Yamaguchi, Dewei Feng, Sekitoshi Kanai et al.

CVPR 2025posterarXiv:2504.12717
9
citations
#1656

Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks

Miran Heo, Min-Hung Chen, De-An Huang et al.

CVPR 2025posterarXiv:2501.08326
9
citations
#1657

Boosting Flow-based Generative Super-Resolution Models via Learned Prior

Li-Yuan Tsao, Yi-Chen Lo, Chia-Che Chang et al.

CVPR 2024posterarXiv:2403.10988
9
citations
#1658

Seurat: From Moving Points to Depth

Seokju Cho, Gabriel Huang, Seungryong Kim et al.

CVPR 2025highlightarXiv:2504.14687
9
citations
#1659

DIFFER: Disentangling Identity Features via Semantic Cues for Clothes-Changing Person Re-ID

Xin Liang, Yogesh S. Rawat

CVPR 2025posterarXiv:2503.22912
9
citations
#1660

HumanRig: Learning Automatic Rigging for Humanoid Character in a Large Scale Dataset

Zedong Chu, Feng Xiong, Meiduo Liu et al.

CVPR 2025highlightarXiv:2412.02317
9
citations
#1661

Active Object Detection with Knowledge Aggregation and Distillation from Large Models

Dejie Yang, Yang Liu

CVPR 2024posterarXiv:2405.12509
9
citations
#1662

Circumventing Shortcuts in Audio-visual Deepfake Detection Datasets with Unsupervised Learning

Stefan Smeu, Dragos-Alexandru Boldisor, Dan Oneata et al.

CVPR 2025highlightarXiv:2412.00175
9
citations
#1663

Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos

Sagnik Majumder, Ziad Al-Halah, Kristen Grauman

CVPR 2024posterarXiv:2307.04760
9
citations
#1664

GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency

Dongyue Lu, Lingdong Kong, Tianxin Huang et al.

CVPR 2025posterarXiv:2412.09511
9
citations
#1665

K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences

Zhikai Li, Xuewen Liu, Dongrong Joe Fu et al.

CVPR 2025posterarXiv:2408.14468
9
citations
#1666

GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields

Fangyin Wei, Hanlin Chen, Gim Hee Lee

CVPR 2024posterarXiv:2404.00931
9
citations
#1667

ScribbleLight: Single Image Indoor Relighting with Scribbles

Jun Myeong Choi, Annie N. Wang, Pieter Peers et al.

CVPR 2025posterarXiv:2411.17696
9
citations
#1668

Facial Identity Anonymization via Intrinsic and Extrinsic Attention Distraction

Zhenzhong Kuang, Xiaochen Yang, Yingjie Shen et al.

CVPR 2024posterarXiv:2406.17219
9
citations
#1669

VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation

Weiming Ren, Huan Yang, Jie Min et al.

CVPR 2025posterarXiv:2412.00927
9
citations
#1670

Bilateral Event Mining and Complementary for Event Stream Super-Resolution

Zhilin Huang, Quanmin Liang, Yijie Yu et al.

CVPR 2024posterarXiv:2405.10037
9
citations
#1671

Fully Convolutional Slice-to-Volume Reconstruction for Single-Stack MRI

Sean I. Young, Yaël Balbastre, Bruce Fischl et al.

CVPR 2024posterarXiv:2312.03102
9
citations
#1672

PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes

Bin Tan, Rui Yu, Yujun Shen et al.

CVPR 2025highlightarXiv:2412.03451
9
citations
#1673

TurboSL: Dense Accurate and Fast 3D by Neural Inverse Structured Light

Parsa Mirdehghan, Maxx Wu, Wenzheng Chen et al.

CVPR 2024poster
9
citations
#1674

LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging

Maximilian Rokuss, Yannick Kirchhoff, Seval Akbal et al.

CVPR 2025posterarXiv:2502.20985
9
citations
#1675

SketchVideo: Sketch-based Video Generation and Editing

Feng-Lin Liu, Hongbo Fu, Xintao Wang et al.

CVPR 2025posterarXiv:2503.23284
9
citations
#1676

Mamba4D: Efficient 4D Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models

Jiuming Liu, Jinru Han, Lihao Liu et al.

CVPR 2025poster
9
citations
#1677

Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing

Hanhui Wang, Yihua Zhang, Ruizheng Bai et al.

CVPR 2025posterarXiv:2411.16832
8
citations
#1678

FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs

Xiaoqin Wang, Xusen Ma, Xianxu Hou et al.

CVPR 2025posterarXiv:2503.21457
8
citations
#1679

Efficient Fine-Tuning and Concept Suppression for Pruned Diffusion Models

Reza Shirkavand, Peiran Yu, Shangqian Gao et al.

CVPR 2025posterarXiv:2412.15341
8
citations
#1680

Image Quality Assessment: Investigating Causal Perceptual Effects with Abductive Counterfactual Inference

Wenhao Shen, Mingliang Zhou, Yu Chen et al.

CVPR 2025posterarXiv:2412.16939
8
citations
#1681

RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models

Haoran Hao, Jiaming Han, Changsheng Li et al.

CVPR 2025posterarXiv:2410.13360
8
citations
#1682

ACE: Anti-Editing Concept Erasure in Text-to-Image Models

Zihao Wang, Yuxiang Wei, Fan Li et al.

CVPR 2025posterarXiv:2501.01633
8
citations
#1683

AMO Sampler: Enhancing Text Rendering with Overshooting

Xixi Hu, Keyang Xu, Bo Liu et al.

CVPR 2025posterarXiv:2411.19415
8
citations
#1684

Diversity-aware Channel Pruning for StyleGAN Compression

Jiwoo Chung, Sangeek Hyun, Sang-Heon Shim et al.

CVPR 2024posterarXiv:2403.13548
8
citations
#1685

Instruction-based Image Manipulation by Watching How Things Move

Mingdeng Cao, Xuaner Zhang, Yinqiang Zheng et al.

CVPR 2025highlightarXiv:2412.12087
8
citations
#1686

FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding

Thanh-Dat Truong, Utsav Prabhu, Bhiksha Raj et al.

CVPR 2025posterarXiv:2311.15965
8
citations
#1687

DORNet: A Degradation Oriented and Regularized Network for Blind Depth Super-Resolution

Zhengxue Wang, Zhiqiang Yan, Jinshan Pan et al.

CVPR 2025posterarXiv:2410.11666
8
citations
#1688

Edge-SD-SR: Low Latency and Parameter Efficient On-device Super-Resolution with Stable Diffusion via Bidirectional Conditioning

Isma Hadji, Mehdi Noroozi, Victor Escorcia et al.

CVPR 2025posterarXiv:2412.06978
8
citations
#1689

BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting

Yiren Lu, Yunlai Zhou, Disheng Liu et al.

CVPR 2025posterarXiv:2503.15835
8
citations
#1690

Entity-NeRF: Detecting and Removing Moving Entities in Urban Scenes

Takashi Otonari, Satoshi Ikehata, Kiyoharu Aizawa

CVPR 2024posterarXiv:2403.16141
8
citations
#1691

3D-GSW: 3D Gaussian Splatting for Robust Watermarking

Youngdong Jang, Hyunje Park, Feng Yang et al.

CVPR 2025posterarXiv:2409.13222
8
citations
#1692

SapiensID: Foundation for Human Recognition

Minchul Kim, Dingqiang Ye, Yiyang Su et al.

CVPR 2025posterarXiv:2504.04708
8
citations
#1693

Focusing on Tracks for Online Multi-Object Tracking

Kyujin Shim, Kangwook Ko, YuJin Yang et al.

CVPR 2025poster
8
citations
#1694

Synthetic Prior for Few-Shot Drivable Head Avatar Inversion

Wojciech Zielonka, Stephan J. Garbin, Alexandros Lattas et al.

CVPR 2025posterarXiv:2501.06903
8
citations
#1695

DiffFNO: Diffusion Fourier Neural Operator

Xiaoyi Liu, Hao Tang

CVPR 2025posterarXiv:2411.09911
8
citations
#1696

Hypergraph Vision Transformers: Images are More than Nodes, More than Edges

Joshua Fixelle

CVPR 2025posterarXiv:2504.08710
8
citations
#1697

Noise Calibration and Spatial-Frequency Interactive Network for STEM Image Enhancement

Hesong Li, Ziqi Wu, Ruiwen Shao et al.

CVPR 2025posterarXiv:2504.02555
8
citations
#1698

UltraFusion: Ultra High Dynamic Imaging using Exposure Fusion

Zixuan Chen, Yujin Wang, Xin Cai et al.

CVPR 2025highlightarXiv:2501.11515
8
citations
#1699

Unveiling Differences in Generative Models: A Scalable Differential Clustering Approach

Jingwei Zhang, Mohammad Jalali, Cheuk Ting Li et al.

CVPR 2025highlightarXiv:2405.02700
8
citations
#1700

Learning Discriminative Dynamics with Label Corruption for Noisy Label Detection

Suyeon Kim, Dongha Lee, SeongKu Kang et al.

CVPR 2024posterarXiv:2405.19902
8
citations
#1701

Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition

Chengxiang Huang, Yake Wei, Zequn Yang et al.

CVPR 2025posterarXiv:2503.18595
8
citations
#1702

Deep Imbalanced Regression via Hierarchical Classification Adjustment

Haipeng Xiong, Angela Yao

CVPR 2024posterarXiv:2310.17154
8
citations
#1703

Sparse2DGS: Geometry-Prioritized Gaussian Splatting for Surface Reconstruction from Sparse Views

Jiang Wu, Rui Li, Yu Zhu et al.

CVPR 2025posterarXiv:2504.20378
8
citations
#1704

Joint Out-of-Distribution Filtering and Data Discovery Active Learning

Sebastian Schmidt, Leonard Schenk, Leo Schwinn et al.

CVPR 2025posterarXiv:2503.02491
8
citations
#1705

VidLA: Video-Language Alignment at Scale

Mamshad Nayeem Rizve, Fan Fei, Jayakrishnan Unnikrishnan et al.

CVPR 2024posterarXiv:2403.14870
8
citations
#1706

Generative Zero-Shot Composed Image Retrieval

Lan Wang, Wei Ao, Vishnu Naresh Boddeti et al.

CVPR 2025poster
8
citations
#1707

Task-Aware Encoder Control for Deep Video Compression

Xingtong Ge, Jixiang Luo, XINJIE ZHANG et al.

CVPR 2024posterarXiv:2404.04848
8
citations
#1708

RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training

Raktim Gautam Goswami, Prashanth Krishnamurthy, Yann LeCun et al.

CVPR 2025highlightarXiv:2411.17662
8
citations
#1709

PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation

HsiaoYuan Hsu, Yuxin Peng

CVPR 2025posterarXiv:2505.07843
8
citations
#1710

Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model

Yingying Fan, Quanwei Yang, Kaisiyuan Wang et al.

CVPR 2025posterarXiv:2503.16942
8
citations
#1711

FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation

Sen Wang, Le Wang, Sanping Zhou et al.

CVPR 2025posterarXiv:2506.16201
8
citations
#1712

DTGBrepGen: A Novel B-rep Generative Model through Decoupling Topology and Geometry

Jing Li, Yihang Fu, Falai Chen

CVPR 2025posterarXiv:2503.13110
8
citations
#1713

Mask^2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation

Tianhao Qi, Jianlong Yuan, Wanquan Feng et al.

CVPR 2025poster
8
citations
#1714

Stable Neighbor Denoising for Source-free Domain Adaptive Segmentation

Dong Zhao, Shuang Wang, Qi Zang et al.

CVPR 2024posterarXiv:2406.06813
8
citations
#1715

DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery

Jiadong Tang, Yu Gao, Dianyi Yang et al.

CVPR 2025highlightarXiv:2503.16964
8
citations
#1716

ROD-MLLM: Towards More Reliable Object Detection in Multimodal Large Language Models

Heng Yin, Yuqiang Ren, Ke Yan et al.

CVPR 2025poster
8
citations
#1717

JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments

Duy Tho Le, Chenhui Gou, Stavya Datta et al.

CVPR 2024posterarXiv:2404.01686
8
citations
#1718

DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation

Haonan Lin

CVPR 2024posterarXiv:2403.19235
8
citations
#1719

GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement

Linfang Zheng, Tze Ho Elden Tse, Chen Wang et al.

CVPR 2024posterarXiv:2404.11139
8
citations
#1720

Adversarially Robust Few-shot Learning via Parameter Co-distillation of Similarity and Class Concept Learners

Junhao Dong, Piotr Koniusz, Junxi Chen et al.

CVPR 2024poster
8
citations
#1721

PICO: Reconstructing 3D People In Contact with Objects

Alpár Cseke, Shashank Tripathi, Sai Kumar Dwivedi et al.

CVPR 2025posterarXiv:2504.17695
8
citations
#1722

Tartan IMU: A Light Foundation Model for Inertial Positioning in Robotics

Shibo Zhao, Sifan Zhou, Raphael Blanchard et al.

CVPR 2025poster
8
citations
#1723

PELA: Learning Parameter-Efficient Models with Low-Rank Approximation

Yangyang Guo, Guangzhi Wang, Mohan Kankanhalli

CVPR 2024posterarXiv:2310.10700
8
citations
#1724

BHViT: Binarized Hybrid Vision Transformer

Tian Gao, Yu Zhang, Zhiyuan Zhang et al.

CVPR 2025posterarXiv:2503.02394
8
citations
#1725

Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation

Ziheng Zhang, Jianyang Gu, Arpita Chowdhury et al.

CVPR 2025posterarXiv:2501.11309
8
citations
#1726

MammAlps: A Multi-view Video Behavior Monitoring Dataset of Wild Mammals in the Swiss Alps

Valentin Gabeff, Haozhe Qi, Brendan Flaherty et al.

CVPR 2025highlightarXiv:2503.18223
8
citations
#1727

Task-Adaptive Saliency Guidance for Exemplar-free Class Incremental Learning

Xialei Liu, Jiang-Tian Zhai, Andrew Bagdanov et al.

CVPR 2024posterarXiv:2212.08251
8
citations
#1728

ProKeR: A Kernel Perspective on Few-Shot Adaptation of Large Vision-Language Models

Yassir Bendou, Amine Ouasfi, Vincent Gripon et al.

CVPR 2025posterarXiv:2501.11175
8
citations
#1729

The Change You Want To Detect: Semantic Change Detection In Earth Observation With Hybrid Data Generationf

Yanis Benidir, Nicolas Gonthier, Clement Mallet

CVPR 2025poster
8
citations
#1730

Diffusion-FOF: Single-View Clothed Human Reconstruction via Diffusion-Based Fourier Occupancy Field

Yuanzhen Li, Fei LUO, Chunxia Xiao

CVPR 2024poster
8
citations
#1731

Do Visual Imaginations Improve Vision-and-Language Navigation Agents?

Akhil Perincherry, Jacob Krantz, Stefan Lee

CVPR 2025posterarXiv:2503.16394
8
citations
#1732

Boost Your Human Image Generation Model via Direct Preference Optimization

Sanghyeon Na, Yonggyu Kim, Hyunjoon Lee

CVPR 2025highlightarXiv:2405.20216
8
citations
#1733

General Point Model Pretraining with Autoencoding and Autoregressive

Zhe Li, Zhangyang Gao, Cheng Tan et al.

CVPR 2024poster
8
citations
#1734

Exploring CLIP's Dense Knowledge for Weakly Supervised Semantic Segmentation

Zhiwei Yang, Yucong Meng, Kexue Fu et al.

CVPR 2025posterarXiv:2503.20826
8
citations
#1735

vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation

Bastian Wittmann, Yannick Wattenberg, Tamaz Amiranashvili et al.

CVPR 2025posterarXiv:2411.17386
8
citations
#1736

Rashomon Sets for Prototypical-Part Networks: Editing Interpretable Models in Real-Time

Jon Donnelly, Zhicheng Guo, Alina Jade Barnett et al.

CVPR 2025posterarXiv:2503.01087
8
citations
#1737

Self-Cross Diffusion Guidance for Text-to-Image Synthesis of Similar Subjects

Weimin Qiu, Jieke Wang, Meng Tang

CVPR 2025posterarXiv:2411.18936
8
citations
#1738

AFL: A Single-Round Analytic Approach for Federated Learning with Pre-trained Models

Run He, Kai Tong, Di Fang et al.

CVPR 2025posterarXiv:2405.16240
8
citations
#1739

DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions

Yunxiao Shi, Manish Singh, Hong Cai et al.

CVPR 2024posterarXiv:2403.12202
8
citations
#1740

EdgeTAM: On-Device Track Anything Model

Chong Zhou, Chenchen Zhu, Yunyang Xiong et al.

CVPR 2025posterarXiv:2501.07256
8
citations
#1741

Cross-Dimension Affinity Distillation for 3D EM Neuron Segmentation

Xiaoyu Liu, Miaomiao Cai, Yinda Chen et al.

CVPR 2024poster
8
citations
#1742

RAD: Region-Aware Diffusion Models for Image Inpainting

Sora Kim, Sungho Suh, Minsik Lee

CVPR 2025posterarXiv:2412.09191
8
citations
#1743

Show and Segment: Universal Medical Image Segmentation via In-Context Learning

Yunhe Gao, Di Liu, Zhuowei Li et al.

CVPR 2025posterarXiv:2503.19359
8
citations
#1744

AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings

Jamie Watson, Filippo Aleotti, Mohamed Sayed et al.

CVPR 2024posterarXiv:2406.08960
8
citations
#1745

StreamingFlow: Streaming Occupancy Forecasting with Asynchronous Multi-modal Data Streams via Neural Ordinary Differential Equation

Yining Shi, Kun JIANG, Ke Wang et al.

CVPR 2024highlightarXiv:2302.09585
8
citations
#1746

Cross Initialization for Face Personalization of Text-to-Image Models

Lianyu Pang, Jian Yin, Haoran Xie et al.

CVPR 2024poster
8
citations
#1747

Event-based Structure-from-Orbit

Ethan Elms, Yasir Latif, Tae Ha Park et al.

CVPR 2024highlightarXiv:2405.06216
8
citations
#1748

Clustering for Protein Representation Learning

Ruijie Quan, Wenguan Wang, Fan Ma et al.

CVPR 2024posterarXiv:2404.00254
8
citations
#1749

SoMA: Singular Value Decomposed Minor Components Adaptation for Domain Generalizable Representation Learning

Seokju Yun, Seunghye Chae, Dongheon Lee et al.

CVPR 2025highlightarXiv:2412.04077
8
citations
#1750

Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification

Yang Qin, Chao Chen, Zhihang Fu et al.

CVPR 2025posterarXiv:2506.11036
8
citations
#1751

Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images

Kazi Sajeed Mehrab, M. Maruf, Arka Daw et al.

CVPR 2025posterarXiv:2407.08027
8
citations
#1752

Lessons and Insights from a Unifying Study of Parameter-Efficient Fine-Tuning (PEFT) in Visual Recognition

Zheda Mai, Ping Zhang, Cheng-Hao Tu et al.

CVPR 2025highlightarXiv:2409.16434
8
citations
#1753

An N-Point Linear Solver for Line and Motion Estimation with Event Cameras

Ling Gao, Daniel Gehrig, Hang Su et al.

CVPR 2024posterarXiv:2404.00842
8
citations
#1754

In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face Editing

Yiran Xu, Zhixin Shu, Cameron Smith et al.

CVPR 2024posterarXiv:2302.04871
8
citations
#1755

Relational Matching for Weakly Semi-Supervised Oriented Object Detection

Wenhao Wu, Hau San Wong, Si Wu et al.

CVPR 2024poster
8
citations
#1756

MonoSplat: Generalizable 3D Gaussian Splatting from Monocular Depth Foundation Models

Yifan Liu, Keyu Fan, Weihao Yu et al.

CVPR 2025posterarXiv:2505.15185
8
citations
#1757

Automatic Controllable Colorization via Imagination

Xiaoyan Cong, Yue Wu, Qifeng Chen et al.

CVPR 2024posterarXiv:2404.05661
8
citations
#1758

Multi-Granularity Class Prototype Topology Distillation for Class-Incremental Source-Free Unsupervised Domain Adaptation

Peihua Deng, Jiehua Zhang, Xichun Sheng et al.

CVPR 2025posterarXiv:2411.16064
8
citations
#1759

AniMer: Animal Pose and Shape Estimation Using Family Aware Transformer

Jin Lyu, Tianyi Zhu, Yi Gu et al.

CVPR 2025posterarXiv:2412.00837
8
citations
#1760

Multirate Neural Image Compression with Adaptive Lattice Vector Quantization

Hao Xu, Xiaolin Wu, Xi Zhang

CVPR 2025highlight
8
citations
#1761

MedBN: Robust Test-Time Adaptation against Malicious Test Samples

Hyejin Park, Jeongyeon Hwang, Sunung Mun et al.

CVPR 2024posterarXiv:2403.19326
8
citations
#1762

Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration

Zilong Huang, Jun He, Junyan Ye et al.

CVPR 2025posterarXiv:2504.00387
8
citations
#1763

Masked Spatial Propagation Network for Sparsity-Adaptive Depth Refinement

Jinyoung Jun, Jae-Han Lee, Chang-Su Kim

CVPR 2024posterarXiv:2404.19294
8
citations
#1764

A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition

Yusheng Dai, HangChen, Jun Du et al.

CVPR 2024posterarXiv:2403.04245
8
citations
#1765

g3D-LF: Generalizable 3D-Language Feature Fields for Embodied Tasks

Zihan Wang, Gim Hee Lee

CVPR 2025posterarXiv:2411.17030
8
citations
#1766

SfM-Free 3D Gaussian Splatting via Hierarchical Training

Bo Ji, Angela Yao

CVPR 2025posterarXiv:2412.01553
8
citations
#1767

LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty

Christoforos N. Spartalis, Theodoros Semertzidis, Efstratios Gavves et al.

CVPR 2025posterarXiv:2503.18314
8
citations
#1768

T2ICount: Enhancing Cross-modal Understanding for Zero-Shot Counting

Yifei Qian, Zhongliang Guo, Bowen Deng et al.

CVPR 2025highlightarXiv:2502.20625
8
citations
#1769

Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves

Shihan Wu, Ji Zhang, Pengpeng Zeng et al.

CVPR 2025posterarXiv:2412.11509
8
citations
#1770

FFF: Fixing Flawed Foundations in Contrastive Pre-Training Results in Very Strong Vision-Language Models

Adrian Bulat, Yassine Ouali, Georgios Tzimiropoulos

CVPR 2024posterarXiv:2405.10286
8
citations
#1771

DiC: Rethinking Conv3x3 Designs in Diffusion Models

Yuchuan Tian, Jing Han, Chengcheng Wang et al.

CVPR 2025posterarXiv:2501.00603
8
citations
#1772

AVF-MAE++: Scaling Affective Video Facial Masked Autoencoders via Efficient Audio-Visual Self-Supervised Learning

Xuecheng Wu, Heli Sun, Yifan Wang et al.

CVPR 2025poster
7
citations
#1773

Gaussian Splatting for Efficient Satellite Image Photogrammetry

Luca Savant Aira, Gabriele Facciolo, Thibaud Ehret

CVPR 2025posterarXiv:2412.13047
7
citations
#1774

Benchmarking Segmentation Models with Mask-Preserved Attribute Editing

Zijin Yin, Kongming Liang, Bing Li et al.

CVPR 2024posterarXiv:2403.01231
7
citations
#1775

Language Model Guided Interpretable Video Action Reasoning

Ning Wang, Guangming Zhu, Hongsheng Li et al.

CVPR 2024posterarXiv:2404.01591
7
citations
#1776

DeNVeR: Deformable Neural Vessel Representations for Unsupervised Video Vessel Segmentation

Chun-Hung Wu, Shih-Hong Chen, Chih Yao Hu et al.

CVPR 2025posterarXiv:2406.01591
7
citations
#1777

Video Recognition in Portrait Mode

Mingfei Han, Linjie Yang, Xiaojie Jin et al.

CVPR 2024posterarXiv:2312.13746
7
citations
#1778

ProbPose: A Probabilistic Approach to 2D Human Pose Estimation

Miroslav Purkrábek, Jiri Matas

CVPR 2025posterarXiv:2412.02254
7
citations
#1779

BiTT: Bi-directional Texture Reconstruction of Interacting Two Hands from a Single Image

Minje Kim, Tae-Kyun Kim

CVPR 2024posterarXiv:2403.08262
7
citations
#1780

Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization

zefeng zhang, Hengzhu Tang, Jiawei Sheng et al.

CVPR 2025posterarXiv:2503.17928
7
citations
#1781

L-MAGIC: Language Model Assisted Generation of Images with Coherence

zhipeng cai, Matthias Mueller, Reiner Birkl et al.

CVPR 2024posterarXiv:2406.01843
7
citations
#1782

Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment

Ziteng Cui, Xuangeng Chu, Tatsuya Harada

CVPR 2025posterarXiv:2504.01503
7
citations
#1783

Instance-based Max-margin for Practical Few-shot Recognition

Minghao Fu, Ke Zhu

CVPR 2024posterarXiv:2305.17368
7
citations
#1784

Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models

Takami Sato, Justin Yue, Nanze Chen et al.

CVPR 2024posterarXiv:2308.15692
7
citations
#1785

The Computer Vision Foundation

Yancheng Cai, Fei Yin, Dounia Hammou et al.

CVPR 2025arXiv:2502.20256
7
citations
#1786

Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation

Yuying Ge, Yizhuo Li, Yixiao Ge et al.

CVPR 2025posterarXiv:2412.04432
7
citations
#1787

Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation

Zelin Peng, Zhengqin Xu, Zhilin Zeng et al.

CVPR 2025posterarXiv:2405.18840
7
citations
#1788

Cross-modal Causal Relation Alignment for Video Question Grounding

weixing chen, Yang Liu, Binglin Chen et al.

CVPR 2025highlightarXiv:2503.07635
7
citations
#1789

Hyperbolic Category Discovery

Yuanpei Liu, Zhenqi He, Kai Han

CVPR 2025posterarXiv:2504.06120
7
citations
#1790

G3DR: Generative 3D Reconstruction in ImageNet

Pradyumna Reddy, Ismail Elezi, Jiankang Deng

CVPR 2024posterarXiv:2403.00939
7
citations
#1791

Dynamic Updates for Language Adaptation in Visual-Language Tracking

Xiaohai Li, Bineng Zhong, Qihua Liang et al.

CVPR 2025posterarXiv:2503.06621
7
citations
#1792

LookCloser: Frequency-aware Radiance Field for Tiny-Detail Scene

Xiaoyu Zhang, Weihong Pan, Chong Bao et al.

CVPR 2025posterarXiv:2503.18513
7
citations
#1793

Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory

Wenliang Zhong, Haoyu Tang, Qinghai Zheng et al.

CVPR 2025posterarXiv:2406.19827
7
citations
#1794

Panorama Generation From NFoV Image Done Right

Dian Zheng, Cheng Zhang, Xiao-Ming Wu et al.

CVPR 2025highlightarXiv:2503.18420
7
citations
#1795

Enhancing 3D Gaze Estimation in the Wild using Weak Supervision with Gaze Following Labels

Pierre Vuillecard, Jean-marc Odobez

CVPR 2025posterarXiv:2502.20249
7
citations
#1796

Artist-Friendly Relightable and Animatable Neural Heads

Yingyan Xu, Prashanth Chandran, Sebastian Weiss et al.

CVPR 2024posterarXiv:2312.03420
7
citations
#1797

Detail-Preserving Latent Diffusion for Stable Shadow Removal

Jiamin Xu, Yuxin Zheng, Zelong Li et al.

CVPR 2025posterarXiv:2412.17630
7
citations
#1798

When Visual Grounding Meets Gigapixel-level Large-scale Scenes: Benchmark and Approach

TAO MA, Bing Bai, Haozhe Lin et al.

CVPR 2024poster
7
citations
#1799

Towards Generalizable Scene Change Detection

Jae-Woo KIM, Ue-Hwan Kim

CVPR 2025posterarXiv:2409.06214
7
citations
#1800

DiffLocks: Generating 3D Hair from a Single Image using Diffusion Models

Radu Alexandru Rosu, Keyu Wu, Yao Feng et al.

CVPR 2025posterarXiv:2505.06166
7
citations