Most Cited CVPR "behavioral modeling" Papers

5,589 papers found • Page 25 of 28

#4801

Real-time Free-view Human Rendering from Sparse-view RGB Videos using Double Unprojected Textures

Guoxing Sun, Rishabh Dabral, Heming Zhu et al.

CVPR 2025highlightarXiv:2412.13183
#4802

Convolutional Prompting meets Language Models for Continual Learning

Anurag Roy, Riddhiman Moulick, Vinay Verma et al.

CVPR 2024posterarXiv:2403.20317
#4803

Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning

Zihua Zhao, Mengxi Chen, Tianjie Dai et al.

CVPR 2024posterarXiv:2405.16996
#4804

UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image

Xingyu Liu, Gu Wang, Ruida Zhang et al.

CVPR 2025posterarXiv:2411.16106
#4805

Contextual Augmented Global Contrast for Multimodal Intent Recognition

Kaili Sun, Zhiwen Xie, Mang Ye et al.

CVPR 2024poster
#4806

Intraoperative 2D/3D Image Registration via Differentiable X-ray Rendering

Vivek Gopalakrishnan, Neel Dey, Polina Golland

CVPR 2024posterarXiv:2312.06358
#4807

AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting

Chung-Ho Wu, Yang-Jung Chen, Ying-Huan Chen et al.

CVPR 2025posterarXiv:2502.05176
#4808

SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction

Enrico Pallotta, Sina Mokhtarzadeh Azar, Shuai Li et al.

CVPR 2025posterarXiv:2503.18933
#4809

V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts

Adnen Abdessaied, Anna Rohrbach, Marcus Rohrbach et al.

CVPR 2025poster
#4810

Seeing the Abstract: Translating the Abstract Language for Vision Language Models

Davide Talon, Federico Girella, Ziyue Liu et al.

CVPR 2025posterarXiv:2505.03242
#4811

Relaxed Contrastive Learning for Federated Learning

Seonguk Seo, Jinkyu Kim, Geeho Kim et al.

CVPR 2024posterarXiv:2401.04928
#4812

EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension

Jiaxuan Li, Duc Minh Vo, Akihiro Sugimoto et al.

CVPR 2024posterarXiv:2311.15879
#4813

LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation

Kibum Kim, Kanghoon Yoon, Jaehyeong Jeon et al.

CVPR 2024posterarXiv:2310.10404
#4814

Mind the Gap: Detecting Black-box Adversarial Attacks in the Making through Query Update Analysis

Jeonghwan Park, Niall McLaughlin, Ihsen Alouani

CVPR 2025posterarXiv:2503.02986
#4815

Geometrically-driven Aggregation for Zero-shot 3D Point Cloud Understanding

Guofeng Mei, Luigi Riz, Yiming Wang et al.

CVPR 2024highlightarXiv:2312.02244
#4816

GauCho: Gaussian Distributions with Cholesky Decomposition for Oriented Object Detection

Jeffri Erwin Murrugarra Llerena, José Henrique Marques, Claudio Jung

CVPR 2025posterarXiv:2502.01565
#4817

Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration

JUNSEONG KIM, GeonU Kim, Kim Yu-Ji et al.

CVPR 2025highlightarXiv:2502.16652
#4818

EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision

Yiming Zhao, Taein Kwon, Paul Streli et al.

CVPR 2025highlightarXiv:2409.02224
#4819

Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera

Zhengdi Yu, Stefanos Zafeiriou, Tolga Birdal

CVPR 2025highlightarXiv:2412.12861
#4820

Beyond Textual Constraints: Learning Novel Diffusion Conditions with Fewer Examples

Yuyang Yu, Bangzhen Liu, Chenxi Zheng et al.

CVPR 2024poster
#4821

Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations

Daan de Geus, Gijs Dubbelman

CVPR 2024posterarXiv:2406.10114
#4822

FreeKD: Knowledge Distillation via Semantic Frequency Prompt

Yuan Zhang, Tao Huang, Jiaming Liu et al.

CVPR 2024posterarXiv:2311.12079
#4823

Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning

Wei Zhang, Chaoqun Wan, Tongliang Liu et al.

CVPR 2024poster
#4824

ProjAttacker: A Configurable Physical Adversarial Attack for Face Recognition via Projector

Yuanwei Liu, Hui Wei, Chengyu Jia et al.

CVPR 2025poster
#4825

Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning

Huabin Liu, Filip Ilievski, Cees G. M. Snoek

CVPR 2025posterarXiv:2501.05069
#4826

ShapeWords: Guiding Text-to-Image Synthesis with 3D Shape-Aware Prompts

Dmitrii M Petrov, Pradyumn Goyal, Divyansh Shivashok et al.

CVPR 2025posterarXiv:2412.02912
#4827

SNIDA: Unlocking Few-Shot Object Detection with Non-linear Semantic Decoupling Augmentation

Yanjie Wang, Xu Zou, Luxin Yan et al.

CVPR 2024poster
#4828

ProAPO: Progressively Automatic Prompt Optimization for Visual Classification

Xiangyan Qu, Gaopeng Gou, Jiamin Zhuang et al.

CVPR 2025posterarXiv:2502.19844
#4829

A Simple Data Augmentation for Feature Distribution Skewed Federated Learning

Yunlu Yan, Huazhu Fu, Yuexiang Li et al.

CVPR 2025posterarXiv:2306.09363
#4830

Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching

Peng Xu, Zhiyu Xiang, Chengyu Qiao et al.

CVPR 2024posterarXiv:2306.15612
#4831

Temporal Score Analysis for Understanding and Correcting Diffusion Artifacts

Yu Cao, Zengqun Zhao, Ioannis Patras et al.

CVPR 2025posterarXiv:2503.16218
#4832

Driving-Video Dehazing with Non-Aligned Regularization for Safety Assistance

Junkai Fan, Jiangwei Weng, Kun Wang et al.

CVPR 2024posterarXiv:2405.09996
#4833

Exploring Region-Word Alignment in Built-in Detector for Open-Vocabulary Object Detection

Heng Zhang, Qiuyu Zhao, Linyu Zheng et al.

CVPR 2024poster
#4834

L0-Sampler: An L0 Model Guided Volume Sampling for NeRF

Liangchen Li, Juyong Zhang

CVPR 2024poster
#4835

Nonisotropic Gaussian Diffusion for Realistic 3D Human Motion Prediction

Cecilia Curreli, Dominik Muhle, Abhishek Saroha et al.

CVPR 2025posterarXiv:2501.06035
#4836

3D Gaussian Inpainting with Depth-Guided Cross-View Consistency

Sheng-Yu Huang, Zi-Ting Chou, Yu-Chiang Frank Wang

CVPR 2025posterarXiv:2502.11801
#4837

Free-viewpoint Human Animation with Pose-correlated Reference Selection

Fa-Ting Hong, Zhan Xu, Haiyang Liu et al.

CVPR 2025highlightarXiv:2412.17290
#4838

Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features

Niladri Shekhar Dutt, Sanjeev Muralikrishnan, Niloy J. Mitra

CVPR 2024posterarXiv:2311.17024
#4839

Unsupervised Occupancy Learning from Sparse Point Cloud

Amine Ouasfi, Adnane Boukhayma

CVPR 2024highlightarXiv:2404.02759
#4840

OSDFace: One-Step Diffusion Model for Face Restoration

Jingkai Wang, Jue Gong, Lin Zhang et al.

CVPR 2025posterarXiv:2411.17163
#4841

GLOW: Global Layout Aware Attacks on Object Detection

Jun Bao, Buyu Liu, Kui Ren et al.

CVPR 2024posterarXiv:2302.14166
#4842

Context-based and Diversity-driven Specificity in Compositional Zero-Shot Learning

Yun Li, Zhe Liu, Hang Chen et al.

CVPR 2024posterarXiv:2402.17251
#4843

Neural Underwater Scene Representation

Yunkai Tang, Chengxuan Zhu, Renjie Wan et al.

CVPR 2024poster
#4844

Scaled Decoupled Distillation

Shicai Wei, Chunbo Luo, Yang Luo

CVPR 2024poster
#4845

VISTA-LLAMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens

Fan Ma, Xiaojie Jin, Heng Wang et al.

CVPR 2024posterarXiv:2312.08870
#4846

Towards Lossless Implicit Neural Representation via Bit Plane Decomposition

Woo Kyoung Han, Byeonghun Lee, Hyunmin Cho et al.

CVPR 2025posterarXiv:2502.21001
#4847

DPFlow: Adaptive Optical Flow Estimation with a Dual-Pyramid Framework

Henrique Morimitsu, Xiaobin Zhu, Roberto M. Cesar Jr et al.

CVPR 2025posterarXiv:2503.14880
#4848

Hierarchical Intra-modal Correlation Learning for Label-free 3D Semantic Segmentation

Xin Kang, Lei Chu, Jiahao Li et al.

CVPR 2024poster
#4849

UCM-VeID V2: A Richer Dataset and A Pre-training Method for UAV Cross-Modality Vehicle Re-Identification

Xingyue Liu, Jiahao Qi, Chen Chen et al.

CVPR 2025poster
#4850

Vision-Language Model IP Protection via Prompt-based Learning

Lianyu Wang, Meng Wang, Huazhu Fu et al.

CVPR 2025posterarXiv:2503.02393
#4851

InceptionNeXt: When Inception Meets ConvNeXt

Weihao Yu, Pan Zhou, Shuicheng Yan et al.

CVPR 2024posterarXiv:2303.16900
#4852

PARA-Drive: Parallelized Architecture for Real-time Autonomous Driving

Xinshuo Weng, Boris Ivanovic, Yan Wang et al.

CVPR 2024poster
#4853

Towards Generalizable Tumor Synthesis

Qi Chen, Xiaoxi Chen, Haorui Song et al.

CVPR 2024posterarXiv:2402.19470
#4854

S^3-Face: SSS-Compliant Facial Reflectance Estimation via Diffusion Priors

Xingyu Ren, Jiankang Deng, Yuhao Cheng et al.

CVPR 2025poster
#4855

Adaptive Hyper-graph Aggregation for Modality-Agnostic Federated Learning

Fan Qi, Shuai Li

CVPR 2024poster
#4856

Bi-SSC: Geometric-Semantic Bidirectional Fusion for Camera-based 3D Semantic Scene Completion

Yujie Xue, Ruihui Li, F anWu et al.

CVPR 2024poster
#4857

Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment

Angchi Xu, Wei-Shi Zheng

CVPR 2024posterarXiv:2403.19225
#4858

Depth-Aware Concealed Crop Detection in Dense Agricultural Scenes

Liqiong Wang, Jinyu Yang, Yanfu Zhang et al.

CVPR 2024poster
#4859

FC-GNN: Recovering Reliable and Accurate Correspondences from Interferences

Haobo Xu, Jun Zhou, Hua Yang et al.

CVPR 2024poster
#4860

ShowMak3r: Compositional TV Show Reconstruction

Sangmin Kim, Seunguk Do, Jaesik Park

CVPR 2025posterarXiv:2504.19584
#4861

MoMask: Generative Masked Modeling of 3D Human Motions

chuan guo, Yuxuan Mu, Muhammad Gohar Javed et al.

CVPR 2024posterarXiv:2312.00063
#4862

CapsFusion: Rethinking Image-Text Data at Scale

Qiying Yu, Quan Sun, Xiaosong Zhang et al.

CVPR 2024posterarXiv:2310.20550
#4863

Disco4D: Disentangled 4D Human Generation and Animation from a Single Image

Hui En Pang, Shuai Liu, Zhongang Cai et al.

CVPR 2025posterarXiv:2409.17280
#4864

A General and Efficient Training for Transformer via Token Expansion

Wenxuan Huang, Yunhang Shen, Jiao Xie et al.

CVPR 2024posterarXiv:2404.00672
#4865

BigGait: Learning Gait Representation You Want by Large Vision Models

Dingqiang Ye, Chao Fan, Jingzhe Ma et al.

CVPR 2024posterarXiv:2402.19122
#4866

Event-based Visible and Infrared Fusion via Multi-task Collaboration

Mengyue Geng, Lin Zhu, Lizhi Wang et al.

CVPR 2024poster
#4867

Breathing Life Into Sketches Using Text-to-Video Priors

Rinon Gal, Yael Vinker, Yuval Alaluf et al.

CVPR 2024highlightarXiv:2311.13608
#4868

Gaussian Shell Maps for Efficient 3D Human Generation

Rameen Abdal, Wang Yifan, Zifan Shi et al.

CVPR 2024posterarXiv:2311.17857
#4869

3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes

Jan Held, Renaud Vandeghen, Abdullah J Hamdi et al.

CVPR 2025highlightarXiv:2411.14974
#4870

Byzantine-robust Decentralized Federated Learning via Dual-domain Clustering and Trust Bootstrapping

Peng Sun, Xinyang Liu, Zhibo Wang et al.

CVPR 2024poster
#4871

Scaling Mesh Generation via Compressive Tokenization

Haohan Weng, Zibo Zhao, Biwen Lei et al.

CVPR 2025posterarXiv:2411.07025
#4872

MotionEditor: Editing Video Motion via Content-Aware Diffusion

Shuyuan Tu, Qi Dai, Zhi-Qi Cheng et al.

CVPR 2024posterarXiv:2311.18830
#4873

State Space Models for Event Cameras

Nikola Zubic, Mathias Gehrig, Davide Scaramuzza

CVPR 2024poster
#4874

DiffInDScene: Diffusion-based High-Quality 3D Indoor Scene Generation

Xiaoliang Ju, Zhaoyang Huang, Yijin Li et al.

CVPR 2024posterarXiv:2306.00519
#4875

Towards Calibrated Multi-label Deep Neural Networks

Jiacheng Cheng, Nuno Vasconcelos

CVPR 2024poster
#4876

TIM: A Time Interval Machine for Audio-Visual Action Recognition

Jacob Chalk, Jaesung Huh, Evangelos Kazakos et al.

CVPR 2024posterarXiv:2404.05559
#4877

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Yuhao Dong, Zuyan Liu, Hai-Long Sun et al.

CVPR 2025highlightarXiv:2411.14432
#4878

Test-Time Linear Out-of-Distribution Detection

Ke Fan, Tong Liu, Xingyu Qiu et al.

CVPR 2024poster
#4879

World-consistent Video Diffusion with Explicit 3D Modeling

Qihang Zhang, Shuangfei Zhai, Miguel Ángel Bautista et al.

CVPR 2025highlightarXiv:2412.01821
#4880

StageDesigner: Artistic Stage Generation for Scenography via Theater Scripts

Zhaoxing Gan, Mengtian Li, Ruhua Chen et al.

CVPR 2025posterarXiv:2503.02595
#4881

Exploiting Style Latent Flows for Generalizing Deepfake Video Detection

Jongwook Choi, Taehoon Kim, Yonghyun Jeong et al.

CVPR 2024posterarXiv:2403.06592
#4882

Learning Compatible Multi-Prize Subnetworks for Asymmetric Retrieval

Yushuai Sun, Zikun Zhou, Dongmei Jiang et al.

CVPR 2025posterarXiv:2504.11879
#4883

LeGO: Leveraging a Surface Deformation Network for Animatable Stylized Face Generation with One Example

Soyeon Yoon, Kwan Yun, Kwanggyoon Seo et al.

CVPR 2024highlightarXiv:2403.15227
#4884

Domain Generalization in CLIP via Learning with Diverse Text Prompts

Changsong Wen, Zelin Peng, Yu Huang et al.

CVPR 2025poster
#4885

Blood Flow Speed Estimation with Optical Coherence Tomography Angiography Images

Wensheng Cheng, Zhenghong Li, Jiaxiang Ren et al.

CVPR 2025poster
#4886

Leveraging Predicate and Triplet Learning for Scene Graph Generation

Jiankai Li, Yunhong Wang, Xiefan Guo et al.

CVPR 2024posterarXiv:2406.02038
#4887

Unsupervised Semantic Segmentation Through Depth-Guided Feature Correlation and Sampling

Leon Sick, Dominik Engel, Pedro Hermosilla et al.

CVPR 2024posterarXiv:2309.12378
#4888

HHMR: Holistic Hand Mesh Recovery by Enhancing the Multimodal Controllability of Graph Diffusion Models

Mengcheng Li, Hongwen Zhang, Yuxiang Zhang et al.

CVPR 2024highlightarXiv:2406.01334
#4889

Associative Transformer

Yuwei Sun, Hideya Ochiai, Zhirong Wu et al.

CVPR 2025posterarXiv:2309.12862
#4890

Identity-Preserving Text-to-Video Generation by Frequency Decomposition

Shenghai Yuan, Jinfa Huang, Xianyi He et al.

CVPR 2025highlightarXiv:2411.17440
#4891

Enhancing Visual Continual Learning with Language-Guided Supervision

Bolin Ni, Hongbo Zhao, Chenghao Zhang et al.

CVPR 2024posterarXiv:2403.16124
#4892

PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI

Yandan Yang, Baoxiong Jia, Peiyuan Zhi et al.

CVPR 2024highlightarXiv:2404.09465
#4893

RestorGS: Depth-aware Gaussian Splatting for Efficient 3D Scene Restoration

Yuanjian Qiao, Mingwen Shao, Lingzhuang Meng et al.

CVPR 2025poster
#4894

High Dynamic Range Video Compression: A Large-Scale Benchmark Dataset and A Learned Bit-depth Scalable Compression Algorithm

Zhaoyi Tian, Feifeng Wang, Shiwei Wang et al.

CVPR 2025posterarXiv:2503.00410
#4895

Effective Cloud Removal for Remote Sensing Images by an Improved Mean-Reverting Denoising Model with Elucidated Design Space

Yi Liu, Wengen Li, Jihong Guan et al.

CVPR 2025posterarXiv:2503.23717
#4896

APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers

Zhuguanyu Wu, Jiayi Zhang, Jiaxin Chen et al.

CVPR 2025posterarXiv:2504.02508
#4897

Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer

Yuang Ai, Xiaoqiang Zhou, Huaibo Huang et al.

CVPR 2024posterarXiv:2303.17783
#4898

Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge

Haoxiang Ma, Modi Shi, Boyang GAO et al.

CVPR 2024posterarXiv:2404.01727
#4899

MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection

Hou-I Liu, Christine Wu, Jen-Hao Cheng et al.

CVPR 2025posterarXiv:2404.04910
#4900

Making Vision Transformers Truly Shift-Equivariant

Renan A. Rojas-Gomez, Teck-Yian Lim, Minh Do et al.

CVPR 2024posterarXiv:2305.16316
#4901

DreamVideo: Composing Your Dream Videos with Customized Subject and Motion

Yujie Wei, Shiwei Zhang, Zhiwu Qing et al.

CVPR 2024posterarXiv:2312.04433
#4902

MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts

Peijie Wang, Zhong-Zhi Li, Fei Yin et al.

CVPR 2025posterarXiv:2502.20808
#4903

Co-op: Correspondence-based Novel Object Pose Estimation

Sungphill Moon, Hyeontae Son, Dongcheol Hur et al.

CVPR 2025posterarXiv:2503.17731
#4904

RankED: Addressing Imbalance and Uncertainty in Edge Detection Using Ranking-based Losses

bedrettin cetinkaya, Sinan Kalkan, Emre Akbas

CVPR 2024posterarXiv:2403.01795
#4905

What’s in the Image? A Deep-Dive into the Vision of Vision Language Models

Omri Kaduri, Shai Bagon, Tali Dekel

CVPR 2025posterarXiv:2411.17491
#4906

Text Embedding is Not All You Need: Attention Control for Text-to-Image Semantic Alignment with Text Self-Attention Maps

Jeeyung Kim, Erfan Esmaeili Fakhabi, Qiang Qiu

CVPR 2025posterarXiv:2411.15236
#4907

Fine-Grained Bipartite Concept Factorization for Clustering

Chong Peng, Pengfei Zhang, Yongyong Chen et al.

CVPR 2024poster
#4908

Generalized Event Cameras

Varun Sundar, Matthew Dutson, Andrei Ardelean et al.

CVPR 2024posterarXiv:2407.02683
#4909

Multimodal Prompt Perceiver: Empower Adaptiveness Generalizability and Fidelity for All-in-One Image Restoration

Yuang Ai, Huaibo Huang, Xiaoqiang Zhou et al.

CVPR 2024posterarXiv:2312.02918
#4910

Dual Diffusion for Unified Image Generation and Understanding

Zijie Li, Henry Li, Yichun Shi et al.

CVPR 2025posterarXiv:2501.00289
#4911

BEVSpread: Spread Voxel Pooling for Bird’s-Eye-View Representation in Vision-based Roadside 3D Object Detection

Wenjie Wang, Yehao Lu, Guangcong Zheng et al.

CVPR 2024posterarXiv:2406.08785
#4912

Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval

Rohan Sarkar, Avinash Kak

CVPR 2024posterarXiv:2403.00272
#4913

Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer

Zhen Zhao, Jingqun Tang, Chunhui Lin et al.

CVPR 2024posterarXiv:2311.13120
#4914

NIVeL: Neural Implicit Vector Layers for Text-to-Vector Generation

Vikas Thamizharasan, Difan Liu, Matthew Fisher et al.

CVPR 2024posterarXiv:2405.15217
#4915

Hyperbolic Anomaly Detection

Huimin Li, Zhentao Chen, Yunhao Xu et al.

CVPR 2024poster
#4916

Selective Nonlinearities Removal from Digital Signals

Krzysztof Maliszewski, Magdalena Urbanska, Varvara Vetrova et al.

CVPR 2024posterarXiv:2403.09731
#4917

SimLTD: Simple Supervised and Semi-Supervised Long-Tailed Object Detection

Phi Vu Tran

CVPR 2025posterarXiv:2412.20047
#4918

Backdoor Defense via Test-Time Detecting and Repairing

Jiyang Guan, Jian Liang, Ran He

CVPR 2024poster
#4919

Towards a Perceptual Evaluation Framework for Lighting Estimation

Justine Giroux, Mohammad Reza Karimi Dastjerdi, Yannick Hold-Geoffroy et al.

CVPR 2024posterarXiv:2312.04334
#4920

DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance

Zixuan Wang, Jia Jia, Shikun Sun et al.

CVPR 2024posterarXiv:2403.13667
#4921

Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space

Yifan Zhou, Zeqi Xiao, Shuai Yang et al.

CVPR 2025posterarXiv:2503.09419
#4922

LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting

Xiaoyan Xing, Konrad Groh, Sezer Karaoglu et al.

CVPR 2025posterarXiv:2412.00177
#4923

HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative

CONG MA, Qiao Lei, Chengkai Zhu et al.

CVPR 2024posterarXiv:2403.02640
#4924

Navigating Image Restoration with VAR’s Distribution Alignment Prior

Siyang Wang, Naishan Zheng, Jie Huang et al.

CVPR 2025posterarXiv:2412.21063
#4925

What Sketch Explainability Really Means for Downstream Tasks?

Hmrishav Bandyopadhyay, Pinaki Nath Chowdhury, Ayan Kumar Bhunia et al.

CVPR 2024posterarXiv:2403.09480
#4926

Leveraging Frame Affinity for sRGB-to-RAW Video De-rendering

Chen Zhang, Wencheng Han, Yang Zhou et al.

CVPR 2024poster
#4927

Learning CNN on ViT: A Hybrid Model to Explicitly Class-specific Boundaries for Domain Adaptation

Ba Hung Ngo, Nhat-Tuong Do-Tran, Tuan-Ngoc Nguyen et al.

CVPR 2024posterarXiv:2403.18360
#4928

GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo

Jiang Wu, Rui Li, Haofei Xu et al.

CVPR 2024posterarXiv:2404.07992
#4929

Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition

Yang Chen, Jingcai Guo, Song Guo et al.

CVPR 2025posterarXiv:2411.11288
#4930

From Correspondences to Pose: Non-minimal Certifiably Optimal Relative Pose without Disambiguation

Javier Tirado-Garín, Javier Civera

CVPR 2024highlightarXiv:2312.05995
#4931

CommonCanvas: Open Diffusion Models Trained on Creative-Commons Images

Aaron Gokaslan, A. Feder Cooper, Jasmine Collins et al.

CVPR 2024poster
#4932

Choose What You Need: Disentangled Representation Learning for Scene Text Recognition Removal and Editing

Boqiang Zhang, Hongtao Xie, Zuan Gao et al.

CVPR 2024posterarXiv:2405.04377
#4933

Consistency Posterior Sampling for Diverse Image Synthesis

Vishal Purohit, Matthew Repasky, Jianfeng Lu et al.

CVPR 2025poster
#4934

Weakly Supervised Contrastive Adversarial Training for Learning Robust Features from Semi-supervised Data

Lilin Zhang, Chengpei Wu, Ning Yang

CVPR 2025posterarXiv:2503.11032
#4935

Memory-based Adapters for Online 3D Scene Perception

Xiuwei Xu, Chong Xia, Ziwei Wang et al.

CVPR 2024posterarXiv:2403.06974
#4936

MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Ho Kei Cheng, Masato Ishii, Akio Hayakawa et al.

CVPR 2025posterarXiv:2412.15322
#4937

VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models

Byung-Kwan Lee, Ryo Hachiuma, Yu-Chiang Frank Wang et al.

CVPR 2025posterarXiv:2412.01822
#4938

Cross-spectral Gated-RGB Stereo Depth Estimation

Samuel Brucker, Stefanie Walz, Mario Bijelic et al.

CVPR 2024highlightarXiv:2405.12759
#4939

MINIMA: Modality Invariant Image Matching

Jiangwei Ren, Xingyu Jiang, Zizhuo Li et al.

CVPR 2025posterarXiv:2412.19412
#4940

GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding

Yawen Shao, Wei Zhai, Yuhang Yang et al.

CVPR 2025posterarXiv:2411.19626
#4941

Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding

Wenxuan Guo, Xiuwei Xu, Ziwei Wang et al.

CVPR 2025highlightarXiv:2502.10392
#4942

Test-time Augmentation Improves Efficiency in Conformal Prediction

Divya M Shanmugam, Helen Lu, Swami Sankaranarayanan et al.

CVPR 2025posterarXiv:2505.22764
#4943

CacheQuant: Comprehensively Accelerated Diffusion Models

Xuewen Liu, Zhikai Li, Qingyi Gu

CVPR 2025posterarXiv:2503.01323
#4944

From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech

Jihoon Kim, Jeongsoo Choi, Jaehun Kim et al.

CVPR 2025highlightarXiv:2503.16956
#4945

EASE-DETR: Easing the Competition among Object Queries

Yulu Gao, Yifan Sun, Xudong Ding et al.

CVPR 2024poster
#4946

GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding

Zi-Ting Chou, Sheng-Yu Huang, I-Jieh Liu et al.

CVPR 2024posterarXiv:2403.03608
#4947

CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model

Jianhao Zeng, Dan Song, Weizhi Nie et al.

CVPR 2024posterarXiv:2311.18405
#4948

Readout Guidance: Learning Control from Diffusion Features

Grace Luo, Trevor Darrell, Oliver Wang et al.

CVPR 2024highlightarXiv:2312.02150
#4949

Token Cropr: Faster ViTs for Quite a Few Tasks

Benjamin Bergner, Christoph Lippert, Aravindh Mahendran

CVPR 2025posterarXiv:2412.00965
#4950

Action Detection via an Image Diffusion Process

Lin Geng Foo, Tianjiao Li, Hossein Rahmani et al.

CVPR 2024posterarXiv:2404.01051
#4951

Transcriptomics-guided Slide Representation Learning in Computational Pathology

Guillaume Jaume, Lukas Oldenburg, Anurag Vaidya et al.

CVPR 2024posterarXiv:2405.11618
#4952

SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance Field

Lizhe Liu, Bohua Wang, Hongwei Xie et al.

CVPR 2024highlightarXiv:2403.14366
#4953

A Unified Latent Schrödinger Bridge Diffusion Model for Unsupervised Anomaly Detection and Localization

Shilhora Akshay, Niveditha Lakshmi Narasimhan, Jacob George et al.

CVPR 2025poster
#4954

MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning

Zhe Li, Laurence Yang, Bocheng Ren et al.

CVPR 2024posterarXiv:2402.02045
#4955

Active Open-Vocabulary Recognition: Let Intelligent Moving Mitigate CLIP Limitations

Lei Fan, Jianxiong Zhou, Xiaoying Xing et al.

CVPR 2024posterarXiv:2311.17938
#4956

Continuous Locomotive Crowd Behavior Generation

Inhwan Bae, Junoh Lee, Hae-Gon Jeon

CVPR 2025posterarXiv:2504.04756
#4957

DyBluRF: Dynamic Neural Radiance Fields from Blurry Monocular Video

Huiqiang Sun, Xingyi Li, Liao Shen et al.

CVPR 2024posterarXiv:2403.10103
#4958

GIF: Generative Inspiration for Face Recognition at Scale

Mohammad Saadabadi Saadabadi, Sahar Rahimi Malakshan, Ali Dabouei et al.

CVPR 2025poster
#4959

Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model

Yuting Zhang, Hao Lu, Qingyong Hu et al.

CVPR 2025posterarXiv:2505.24476
#4960

SAOR: Single-View Articulated Object Reconstruction

Mehmet Aygun, Oisin Mac Aodha

CVPR 2024poster
#4961

GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos

Tomas Soucek, Dima Damen, Michael Wray et al.

CVPR 2024poster
#4962

SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training

Jierun Chen, Dongting Hu, Xijie Huang et al.

CVPR 2025highlightarXiv:2412.09619
#4963

Density-Adaptive Model Based on Motif Matrix for Multi-Agent Trajectory Prediction

Di Wen, Haoran Xu, Zhaocheng He et al.

CVPR 2024poster
#4964

Towards Accurate Post-training Quantization for Diffusion Models

Changyuan Wang, Ziwei Wang, Xiuwei Xu et al.

CVPR 2024highlightarXiv:2305.18723
#4965

VoCo-LLaMA: Towards Vision Compression with Large Language Models

Xubing Ye, Yukang Gan, Xiaoke Huang et al.

CVPR 2025posterarXiv:2406.12275
#4966

MoST: Multi-Modality Scene Tokenization for Motion Prediction

Norman Mu, Jingwei Ji, Zhenpei Yang et al.

CVPR 2024posterarXiv:2404.19531
#4967

Unraveling Normal Anatomy via Fluid-Driven Anomaly Randomization

Peirong Liu, Ana Lawry Aguila, Juan Iglesias

CVPR 2025posterarXiv:2501.13370
#4968

Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling

Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang et al.

CVPR 2024highlightarXiv:2406.03723
#4969

3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

Zhaoxi Chen, Jiaxiang Tang, Yuhao Dong et al.

CVPR 2025highlightarXiv:2409.12957
#4970

MultiDiff: Consistent Novel View Synthesis from a Single Image

Norman Müller, Katja Schwarz, Barbara Roessle et al.

CVPR 2024posterarXiv:2406.18524
#4971

Self-supervised ControlNet with Spatio-Temporal Mamba for Real-world Video Super-resolution

Shijun Shi, Jing Xu, Lijing Lu et al.

CVPR 2025posterarXiv:2506.01037
#4972

Multi-Scale Video Anomaly Detection by Multi-Grained Spatio-Temporal Representation Learning

Menghao Zhang, Jingyu Wang, Qi Qi et al.

CVPR 2024highlight
#4973

Uncertainty-aware Action Decoupling Transformer for Action Anticipation

Hongji Guo, Nakul Agarwal, Shao-Yuan Lo et al.

CVPR 2024highlight
#4974

Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions

Stefan Andreas Baumann, Felix Krause, Michael Neumayr et al.

CVPR 2025posterarXiv:2403.17064
#4975

PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection

Kuan-Chih Huang, Weijie Lyu, Ming-Hsuan Yang et al.

CVPR 2024posterarXiv:2312.08371
#4976

SpecTRe-GS: Modeling Highly Specular Surfaces with Reflected Nearby Objects by Tracing Rays in 3D Gaussian Splatting

Jiajun Tang, Fan Fei, Zhihao Li et al.

CVPR 2025highlight
#4977

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Haoxin Chen, Yong Zhang, Xiaodong Cun et al.

CVPR 2024posterarXiv:2401.09047
#4978

TextNeRF: A Novel Scene-Text Image Synthesis Method based on Neural Radiance Fields

Jialei Cui, Jianwei Du, Wenzhuo Liu et al.

CVPR 2024poster
#4979

ChatGarment: Garment Estimation, Generation and Editing via Large Language Models

Siyuan Bian, Chenghao Xu, Yuliang Xiu et al.

CVPR 2025posterarXiv:2412.17811
#4980

DiverseFlow: Sample-Efficient Diverse Mode Coverage in Flows

Mashrur M. Morshed, Vishnu Naresh Boddeti

CVPR 2025posterarXiv:2504.07894
#4981

VidTwin: Video VAE with Decoupled Structure and Dynamics

Yuchi Wang, Junliang Guo, Xinyi Xie et al.

CVPR 2025posterarXiv:2412.17726
#4982

An Asymmetric Augmented Self-Supervised Learning Method for Unsupervised Fine-Grained Image Hashing

Feiran Hu, Chenlin Zhang, Jiangliang GUO et al.

CVPR 2024poster
#4983

MimicDiffusion: Purifying Adversarial Perturbation via Mimicking Clean Diffusion Model

Kaiyu Song, Hanjiang Lai, Yan Pan et al.

CVPR 2024posterarXiv:2312.04802
#4984

Action Scene Graphs for Long-Form Understanding of Egocentric Videos

Ivan Rodin, Antonino Furnari, Kyle Min et al.

CVPR 2024posterarXiv:2312.03391
#4985

Dynamic Integration of Task-Specific Adapters for Class Incremental Learning

Jiashuo Li, Shaokun Wang, Bo Qian et al.

CVPR 2025posterarXiv:2409.14983
#4986

DiffusionTrack: Point Set Diffusion Model for Visual Object Tracking

Fei Xie, Zhongdao Wang, Chao Ma

CVPR 2024poster
#4987

EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models

Jingyuan Yang, Jiawei Feng, Hui Huang

CVPR 2024posterarXiv:2401.04608
#4988

Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation

Jiantao Lin, Xin Yang, Meixi Chen et al.

CVPR 2025posterarXiv:2503.01370
#4989

SpiderMatch: 3D Shape Matching with Global Optimality and Geometric Consistency

Paul Roetzer, Florian Bernard

CVPR 2024poster
#4990

Realigning Confidence with Temporal Saliency Information for Point-Level Weakly-Supervised Temporal Action Localization

Ziying Xia, Jian Cheng, Siyu Liu et al.

CVPR 2024poster
#4991

3D Facial Expressions through Analysis-by-Neural-Synthesis

George Retsinas, Panagiotis Filntisis, Radek Danecek et al.

CVPR 2024posterarXiv:2404.04104
#4992

Unsupervised Discovery of Facial Landmarks and Head Pose

Satyajit Tourani, Siddharth Tourani, Arif Mahmood et al.

CVPR 2025poster
#4993

Segment and Caption Anything

Xiaoke Huang, Jianfeng Wang, Yansong Tang et al.

CVPR 2024posterarXiv:2312.00869
#4994

Brush2Prompt: Contextual Prompt Generator for Object Inpainting

Mang Tik Chiu, Yuqian Zhou, Lingzhi Zhang et al.

CVPR 2024poster
#4995

Using Diffusion Priors for Video Amodal Segmentation

Kaihua Chen, Deva Ramanan, Tarasha Khurana

CVPR 2025posterarXiv:2412.04623
#4996

G^3-LQ: Marrying Hyperbolic Alignment with Explicit Semantic-Geometric Modeling for 3D Visual Grounding

Yuan Wang, Yali Li, Shengjin Wang

CVPR 2024poster
#4997

Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images

Chaoqin Huang, Aofan Jiang, Jinghao Feng et al.

CVPR 2024highlightarXiv:2403.12570
#4998

NightCC: Nighttime Color Constancy via Adaptive Channel Masking

Shuwei Li, Robby T. Tan

CVPR 2024poster
#4999

One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion

Chunyang Cheng, Tianyang Xu, Zhenhua Feng et al.

CVPR 2025posterarXiv:2502.19854
#5000

Sparse Views Near Light: A Practical Paradigm for Uncalibrated Point-light Photometric Stereo

Mohammed Brahimi, Bjoern Haefner, Zhenzhang Ye et al.

CVPR 2024posterarXiv:2404.00098