Most Cited CVPR "decentralized policy" Papers

5,589 papers found • Page 5 of 28

#801

Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation

Razvan Pasca, Alexey Gavryushin, Muhammad Hamza et al.

CVPR 2024arXiv:2301.09209
22
citations
#802

Domain Prompt Learning with Quaternion Networks

Qinglong Cao, Zhengqin Xu, Yuntian Chen et al.

CVPR 2024highlightarXiv:2312.08878
22
citations
#803

Rethinking Multi-view Representation Learning via Distilled Disentangling

Guanzhou Ke, Bo Wang, Xiao-Li Wang et al.

CVPR 2024arXiv:2403.10897
22
citations
#804

Unifying Top-down and Bottom-up Scanpath Prediction Using Transformers

Zhibo Yang, Sounak Mondal, Seoyoung Ahn et al.

CVPR 2024arXiv:2303.09383
22
citations
#805

Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding

feilong tang, Chengzhi Liu, Zhongxing Xu et al.

CVPR 2025arXiv:2505.16652
22
citations
#806

Semantic-aware SAM for Point-Prompted Instance Segmentation

Zhaoyang Wei, Pengfei Chen, Xuehui Yu et al.

CVPR 2024highlightarXiv:2312.15895
22
citations
#807

CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology

Yuxuan Sun, Yixuan Si, Chenglu Zhu et al.

CVPR 2025arXiv:2412.12077
22
citations
#808

SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation

Yunxiang Fu, Meng Lou, Yizhou Yu

CVPR 2025arXiv:2412.11890
22
citations
#809

F-LMM: Grounding Frozen Large Multimodal Models

Size Wu, Sheng Jin, Wenwei Zhang et al.

CVPR 2025arXiv:2406.05821
22
citations
#810

Large Language Models are Good Prompt Learners for Low-Shot Image Classification

Zhaoheng Zheng, Jingmin Wei, Xuefeng Hu et al.

CVPR 2024arXiv:2312.04076
22
citations
#811

Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation

Yudi Shi, Shangzhe Di, Qirui Chen et al.

CVPR 2025arXiv:2412.01694
22
citations
#812

StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image Translation

Sidi Wu, Yizi Chen, Loic Landrieu et al.

CVPR 2024arXiv:2403.20142
22
citations
#813

Material Anything: Generating Materials for Any 3D Object via Diffusion

Xin Huang, Tengfei Wang, Ziwei Liu et al.

CVPR 2025highlightarXiv:2411.15138
22
citations
#814

GEARS: Local Geometry-aware Hand-object Interaction Synthesis

Keyang Zhou, Bharat Lal Bhatnagar, Jan Lenssen et al.

CVPR 2024arXiv:2404.01758
22
citations
#815

GAFusion: Adaptive Fusing LiDAR and Camera with Multiple Guidance for 3D Object Detection

Xiaotian Li, Baojie Fan, Jiandong Tian et al.

CVPR 2024arXiv:2411.00340
22
citations
#816

BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training

Xuanpu Zhang, Dan Song, pengxin zhan et al.

CVPR 2025arXiv:2408.06047
22
citations
#817

Category-Level Multi-Part Multi-Joint 3D Shape Assembly

Yichen Li, Kaichun Mo, Yueqi Duan et al.

CVPR 2024arXiv:2303.06163
22
citations
#818

Mind the Time: Temporally-Controlled Multi-Event Video Generation

Ziyi Wu, Aliaksandr Siarohin, Willi Menapace et al.

CVPR 2025arXiv:2412.05263
22
citations
#819

Spatio-Temporal Turbulence Mitigation: A Translational Perspective

Xingguang Zhang, Nicholas M Chimitt, Yiheng Chi et al.

CVPR 2024arXiv:2401.04244
22
citations
#820

HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations

Peng Dai, Yang Zhang, Tao Liu et al.

CVPR 2024arXiv:2403.03561
21
citations
#821

Language-Driven Anchors for Zero-Shot Adversarial Robustness

Xiao Li, Wei Zhang, Yining Liu et al.

CVPR 2024arXiv:2301.13096
21
citations
#822

PrPSeg: Universal Proposition Learning for Panoramic Renal Pathology Segmentation

Ruining Deng, Quan Liu, Can Cui et al.

CVPR 2024arXiv:2402.19286
21
citations
#823

Real-time 3D-aware Portrait Video Relighting

Ziqi Cai, Kaiwen Jiang, Shu-Yu Chen et al.

CVPR 2024highlightarXiv:2410.18355
21
citations
#824

Guided Slot Attention for Unsupervised Video Object Segmentation

Minhyeok Lee, Suhwan Cho, Dogyoon Lee et al.

CVPR 2024arXiv:2303.08314
21
citations
#825

Rethinking Few-shot 3D Point Cloud Semantic Segmentation

Zhaochong An, Guolei Sun, Yun Liu et al.

CVPR 2024arXiv:2403.00592
21
citations
#826

MonoHair: High-Fidelity Hair Modeling from a Monocular Video

Keyu Wu, LINGCHEN YANG, Zhiyi Kuang et al.

CVPR 2024arXiv:2403.18356
21
citations
#827

Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy

Zaijing Li, Yuquan Xie, Rui Shao et al.

CVPR 2025arXiv:2502.19902
21
citations
#828

NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors

Yannan He, Garvita Tiwari, Tolga Birdal et al.

CVPR 2024highlightarXiv:2403.03122
21
citations
#829

PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization

Zining Chen, Weiqiu Wang, Zhicheng Zhao et al.

CVPR 2024arXiv:2404.09011
21
citations
#830

Composed Video Retrieval via Enriched Context and Discriminative Embeddings

Omkar Thawakar, Muzammal Naseer, Rao Anwer et al.

CVPR 2024arXiv:2403.16997
21
citations
#831

GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion

Jiapeng Tang, Davide Davoli, Tobias Kirschstein et al.

CVPR 2025arXiv:2412.10209
21
citations
#832

PromptHMR: Promptable Human Mesh Recovery

Yufu Wang, Yu Sun, Priyanka Patel et al.

CVPR 2025arXiv:2504.06397
21
citations
#833

Self-Supervised Multi-Object Tracking with Path Consistency

Zijia Lu, Bing Shuai, Yanbei Chen et al.

CVPR 2024highlightarXiv:2404.05136
21
citations
#834

Neural Spline Fields for Burst Image Fusion and Layer Separation

Ilya Chugunov, David Shustin, Ruyu Yan et al.

CVPR 2024arXiv:2312.14235
21
citations
#835

Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval

Young Kyun Jang, Donghyun Kim, Zihang Meng et al.

CVPR 2024arXiv:2404.15516
21
citations
#836

SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE

YONGWEI CHEN, Yushi Lan, Shangchen Zhou et al.

CVPR 2025arXiv:2411.16856
21
citations
#837

PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering

Yifan Gao, Zihang Lin, Chuanbin Liu et al.

CVPR 2025arXiv:2504.06632
21
citations
#838

Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated Learning

wenlong deng, Christos Thrampoulidis, Xiaoxiao Li

CVPR 2024arXiv:2310.18285
21
citations
#839

VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams

Liao Wang, Kaixin Yao, Chengcheng Guo et al.

CVPR 2024arXiv:2312.01407
21
citations
#840

Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach

Mir Rayat Imtiaz Hossain, Mennatullah Siam, Leonid Sigal et al.

CVPR 2024arXiv:2404.11732
21
citations
#841

GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation

Lang Lin, Xueyang Yu, Ziqi Pang et al.

CVPR 2025arXiv:2504.07962
21
citations
#842

TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models

Haomiao Ni, Bernhard Egger, Suhas Lohit et al.

CVPR 2024arXiv:2404.16306
21
citations
#843

UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video Enhancement

yaofeng xie, Lingwei Kong, Kai Chen et al.

CVPR 2024arXiv:2404.14542
21
citations
#844

Matrix3D: Large Photogrammetry Model All-in-One

Yuanxun Lu, Jingyang Zhang, Tian Fang et al.

CVPR 2025highlightarXiv:2502.07685
21
citations
#845

Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation

Yuanbo Yang, Jiahao Shao, Xinyang Li et al.

CVPR 2025arXiv:2412.21117
21
citations
#846

DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling

Linqi Zhou, Andy Shih, Chenlin Meng et al.

CVPR 2024highlightarXiv:2311.17082
21
citations
#847

MC^2: Multi-concept Guidance for Customized Multi-concept Generation

Jiaxiu Jiang, Yabo Zhang, Kailai Feng et al.

CVPR 2025arXiv:2404.05268
21
citations
#848

Clustering Propagation for Universal Medical Image Segmentation

Yuhang Ding, Liulei Li, Wenguan Wang et al.

CVPR 2024arXiv:2403.16646
21
citations
#849

3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation

Songchun Zhang, Yibo Zhang, Quan Zheng et al.

CVPR 2024arXiv:2403.09439
21
citations
#850

Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning

Desai Xie, Jiahao Li, Hao Tan et al.

CVPR 2024arXiv:2312.13980
21
citations
#851

TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model

Cheng Yang, Yang Sui, Jinqi Xiao et al.

CVPR 2025arXiv:2503.18278
20
citations
#852

HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery

Jingtao Li, Yingyi Liu, XINYU WANG et al.

CVPR 2025arXiv:2503.21841
20
citations
#853

Dual Prior Unfolding for Snapshot Compressive Imaging

Jiancheng Zhang, Haijin Zeng, Jiezhang Cao et al.

CVPR 2024
20
citations
#854

Improving Plasticity in Online Continual Learning via Collaborative Learning

Maorong Wang, Nicolas Michel, Ling Xiao et al.

CVPR 2024arXiv:2312.00600
20
citations
#855

Generative Image Layer Decomposition with Visual Effects

Jinrui Yang, Qing Liu, Yijun Li et al.

CVPR 2025arXiv:2411.17864
20
citations
#856

VideoMAC: Video Masked Autoencoders Meet ConvNets

Gensheng Pei, Tao Chen, Xiruo Jiang et al.

CVPR 2024arXiv:2402.19082
20
citations
#857

Adv-CPG: A Customized Portrait Generation Framework with Facial Adversarial Attacks

Junying Wang, Hongyuan Zhang, Yuan Yuan

CVPR 2025arXiv:2503.08269
20
citations
#858

Loose Inertial Poser: Motion Capture with IMU-attached Loose-Wear Jacket

Chengxu Zuo, Yiming Wang, Lishuang Zhan et al.

CVPR 2024
20
citations
#859

Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use

Imad Eddine Toubal, Aditya Avinash, Neil Alldrin et al.

CVPR 2024arXiv:2403.02626
20
citations
#860

One-Class Face Anti-spoofing via Spoof Cue Map-Guided Feature Learning

Pei-Kai Huang, Cheng-Hsuan Chiang, Tzu-Hsien Chen et al.

CVPR 2024
20
citations
#861

Video Depth without Video Models

Bingxin Ke, Dominik Narnhofer, Shengyu Huang et al.

CVPR 2025arXiv:2411.19189
20
citations
#862

ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and Self-Prompting

Yankai Jiang, Zhongzhen Huang, Rongzhao Zhang et al.

CVPR 2024arXiv:2312.04964
20
citations
#863

SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device

Yushu Wu, Zhixing Zhang, Yanyu Li et al.

CVPR 2025arXiv:2412.10494
20
citations
#864

Steerers: A Framework for Rotation Equivariant Keypoint Descriptors

Georg Bökman, Johan Edstedt, Michael Felsberg et al.

CVPR 2024arXiv:2312.02152
20
citations
#865

AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis

Tao Tang, Guangrun Wang, Yixing Lao et al.

CVPR 2024highlightarXiv:2402.17483
20
citations
#866

ASAM: Boosting Segment Anything Model with Adversarial Tuning

Bo Li, Haoke Xiao, Lv Tang

CVPR 2024arXiv:2405.00256
20
citations
#867

SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video

Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello et al.

CVPR 2025arXiv:2412.09982
20
citations
#868

Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient

Zigeng Chen, Xinyin Ma, Gongfan Fang et al.

CVPR 2025arXiv:2411.17787
20
citations
#869

NeRF-HuGS: Improved Neural Radiance Fields in Non-static Scenes Using Heuristics-Guided Segmentation

Jiahao Chen, Yipeng Qin, Lingjie Liu et al.

CVPR 2024arXiv:2403.17537
20
citations
#870

FlowTrack: Revisiting Optical Flow for Long-Range Dense Tracking

Seokju Cho, Gabriel Huang, Seungryong Kim et al.

CVPR 2024
20
citations
#871

Instant Gaussian Stream: Fast and Generalizable Streaming of Dynamic Scene Reconstruction via Gaussian Splatting

Jinbo Yan, Rui Peng, Zhiyan Wang et al.

CVPR 2025highlightarXiv:2503.16979
20
citations
#872

You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval

Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain et al.

CVPR 2024arXiv:2403.07222
20
citations
#873

360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries

Huajian Huang, Changkun Liu, Yipeng Zhu et al.

CVPR 2024arXiv:2311.17389
20
citations
#874

DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness

Yiming Zhong, Qi Jiang, Jingyi Yu et al.

CVPR 2025highlightarXiv:2503.08257
20
citations
#875

Generative Video Propagation

Shaoteng Liu, Tianyu Wang, Jui-Hsien Wang et al.

CVPR 2025arXiv:2412.19761
20
citations
#876

OccMamba: Semantic Occupancy Prediction with State Space Models

Heng Li, Yuenan Hou, Xiaohan Xing et al.

CVPR 2025arXiv:2408.09859
20
citations
#877

Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report Generation

Kang Liu, Zhuoqi Ma, Xiaolu Kang et al.

CVPR 2025arXiv:2502.20056
20
citations
#878

D^3: Scaling Up Deepfake Detection by Learning from Discrepancy

Yongqi Yang, Zhihao Qian, Ye Zhu et al.

CVPR 2025arXiv:2404.04584
20
citations
#879

DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement

Hao Wu, Huabin Liu, Yu Qiao et al.

CVPR 2024arXiv:2404.02755
20
citations
#880

A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified Benchmark

Jakub Paplham, Vojtech Franc

CVPR 2024arXiv:2307.04570
20
citations
#881

Distilling Vision-Language Models on Millions of Videos

Yue Zhao, Long Zhao, Xingyi Zhou et al.

CVPR 2024arXiv:2401.06129
20
citations
#882

Long-Tailed Anomaly Detection with Learnable Class Names

Chih-Hui Ho, Kuan-Chuan Peng, Nuno Vasconcelos

CVPR 2024arXiv:2403.20236
20
citations
#883

LSNet: See Large, Focus Small

Ao Wang, Hui Chen, Zijia Lin et al.

CVPR 2025arXiv:2503.23135
20
citations
#884

ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

Yifan Pu, Yiming Zhao, Zhicong Tang et al.

CVPR 2025arXiv:2502.18364
20
citations
#885

Structure-Guided Adversarial Training of Diffusion Models

Ling Yang, Haotian Qian, Zhilong Zhang et al.

CVPR 2024arXiv:2402.17563
20
citations
#886

2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification

Jingwei Zhang, Anh Tien Nguyen, Xi Han et al.

CVPR 2025arXiv:2412.00678
20
citations
#887

TeTriRF: Temporal Tri-Plane Radiance Fields for Efficient Free-Viewpoint Video

Minye Wu, Zehao Wang, Georgios Kouros et al.

CVPR 2024arXiv:2312.06713
20
citations
#888

Learning to Predict Activity Progress by Self-Supervised Video Alignment

Gerard Donahue, Ehsan Elhamifar

CVPR 2024
20
citations
#889

Multi-Sensor Object Anomaly Detection: Unifying Appearance, Geometry, and Internal Properties

wenqiao Li, BoZhong Zheng, Xiaohao Xu et al.

CVPR 2025arXiv:2412.14592
20
citations
#890

PerAda: Parameter-Efficient Federated Learning Personalization with Generalization Guarantees

Chulin Xie, De-An Huang, Wenda Chu et al.

CVPR 2024arXiv:2302.06637
20
citations
#891

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices

Xudong LU, Yinghao Chen, chencheng Chen et al.

CVPR 2025arXiv:2411.10640
20
citations
#892

Taming Teacher Forcing for Masked Autoregressive Video Generation

Deyu Zhou, Quan Sun, Yuang Peng et al.

CVPR 2025arXiv:2501.12389
20
citations
#893

An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning

Jianqing Zhang, Yang Liu, Yang Hua et al.

CVPR 2024arXiv:2403.15760
20
citations
#894

Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions

Zeyu Han, Fangrui Zhu, Qianru Lao et al.

CVPR 2024arXiv:2311.17048
20
citations
#895

Grid Diffusion Models for Text-to-Video Generation

Taegyeong Lee, Soyeong Kwon, Taehwan Kim

CVPR 2024arXiv:2404.00234
20
citations
#896

Multi-Level Neural Scene Graphs for Dynamic Urban Environments

Tobias Fischer, Lorenzo Porzi, Samuel Rota Bulò et al.

CVPR 2024arXiv:2404.00168
20
citations
#897

Any-Resolution AI-Generated Image Detection by Spectral Learning

Dimitrios Karageorgiou, Symeon Papadopoulos, Ioannis Kompatsiaris et al.

CVPR 2025arXiv:2411.19417
20
citations
#898

Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models

Andreas Müller, Denis Lukovnikov, Jonas Thietke et al.

CVPR 2025arXiv:2412.03283
20
citations
#899

MLP Can Be A Good Transformer Learner

Sihao Lin, Pumeng Lyu, Dongrui Liu et al.

CVPR 2024arXiv:2404.05657
20
citations
#900

NViST: In the Wild New View Synthesis from a Single Image with Transformers

Wonbong Jang, Lourdes Agapito

CVPR 2024arXiv:2312.08568
19
citations
#901

Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards

Zijing Hu, Fengda Zhang, Long Chen et al.

CVPR 2025arXiv:2503.11240
19
citations
#902

Dexterous Grasp Transformer

Guo-Hao Xu, Yi-Lin Wei, Dian Zheng et al.

CVPR 2024arXiv:2404.18135
19
citations
#903

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

Yunhan Yang, Yukun Huang, Xiaoyang Wu et al.

CVPR 2024arXiv:2312.03611
19
citations
#904

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought

Yunze Man, De-An Huang, Guilin Liu et al.

CVPR 2025arXiv:2505.23766
19
citations
#905

Theoretically Achieving Continuous Representation of Oriented Bounding Boxes

Zikai Xiao, Guo-Ye Yang, Xue Yang et al.

CVPR 2024arXiv:2402.18975
19
citations
#906

As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion Priors

Seungwoo Yoo, Kunho Kim, Vladimir G. Kim et al.

CVPR 2024arXiv:2311.16739
19
citations
#907

Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection

Jiacheng Zhang, Jiaming Li, Xiangru Lin et al.

CVPR 2024arXiv:2403.17387
19
citations
#908

The Devil is in Temporal Token: High Quality Video Reasoning Segmentation

Sitong Gong, Yunzhi Zhuge, Lu Zhang et al.

CVPR 2025arXiv:2501.08549
19
citations
#909

LAN: Learning to Adapt Noise for Image Denoising

Changjin Kim, Tae Hyun Kim, Sungyong Baik

CVPR 2024arXiv:2412.10651
19
citations
#910

ODIN: A Single Model for 2D and 3D Segmentation

Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios et al.

CVPR 2024highlightarXiv:2401.02416
19
citations
#911

OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

Pengfei Zhou, Xiaopeng Peng, Jiajun Song et al.

CVPR 2025arXiv:2411.18499
19
citations
#912

Segmenting Maxillofacial Structures in CBCT Volumes

Federico Bolelli, Kevin Marchesini, Niels van Nistelrooij et al.

CVPR 2025
19
citations
#913

Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly

Yexin Liu, Zhengyang Liang, Yueze Wang et al.

CVPR 2025arXiv:2406.10638
19
citations
#914

Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer

Hyeongjin Nam, Daniel Jung, Gyeongsik Moon et al.

CVPR 2024arXiv:2404.04819
19
citations
#915

Feat2GS: Probing Visual Foundation Models with Gaussian Splatting

Yue Chen, Xingyu Chen, Anpei Chen et al.

CVPR 2025arXiv:2412.09606
19
citations
#916

Distraction is All You Need for Multimodal Large Language Model Jailbreaking

Zuopeng Yang, Jiluan Fan, Anli Yan et al.

CVPR 2025highlightarXiv:2502.10794
19
citations
#917

QMambaBSR: Burst Image Super-Resolution with Query State Space Model

Xin Di, Long Peng, Peizhe Xia et al.

CVPR 2025arXiv:2408.08665
19
citations
#918

Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation

Ji-Jia Wu, Andy Chia-Hao Chang, Chieh-Yu Chuang et al.

CVPR 2024arXiv:2404.04231
19
citations
#919

DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer

Wei-Ting Chen, Gurunandan Krishnan, Qiang Gao et al.

CVPR 2024arXiv:2406.09622
19
citations
#920

ParamISP: Learned Forward and Inverse ISPs using Camera Parameters

Woohyeok Kim, Geonu Kim, Junyong Lee et al.

CVPR 2024arXiv:2312.13313
19
citations
#921

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

ziang yan, Zhilin Li, Yinan He et al.

CVPR 2025arXiv:2412.19326
19
citations
#922

Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation

Xinyao Li, Yuke Li, Zhekai Du et al.

CVPR 2024arXiv:2403.06946
19
citations
#923

Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification

Chao Yi, Lu Ren, De-Chuan Zhan et al.

CVPR 2024arXiv:2404.17753
19
citations
#924

GlitchBench: Can Large Multimodal Models Detect Video Game Glitches?

Mohammad Reza Taesiri, Tianjun Feng, Cor-Paul Bezemer et al.

CVPR 2024arXiv:2312.05291
19
citations
#925

Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders

Fiona Ryan, Ajay Bati, Sangmin Lee et al.

CVPR 2025highlightarXiv:2412.09586
19
citations
#926

Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices

Huancheng Chen, Haris Vikalo

CVPR 2024arXiv:2311.18129
19
citations
#927

The Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement

Gabriele Trivigno, Carlo Masone, Barbara Caputo et al.

CVPR 2024highlightarXiv:2404.10438
19
citations
#928

SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis

Hyojun Go, byeongjun park, Jiho Jang et al.

CVPR 2025arXiv:2411.16443
19
citations
#929

RecDiffusion: Rectangling for Image Stitching with Diffusion Models

Tianhao Zhou, Li Haipeng, Ziyi Wang et al.

CVPR 2024arXiv:2403.19164
19
citations
#930

Improving Transferable Targeted Adversarial Attacks with Model Self-Enhancement

Han Wu, Guanyan Ou, Weibin Wu et al.

CVPR 2024
19
citations
#931

Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking

You Wu, Xucheng Wang, Xiangyang Yang et al.

CVPR 2025arXiv:2504.09228
19
citations
#932

Task-driven Image Fusion with Learnable Fusion Loss

Haowen Bai, Jiangshe Zhang, Zixiang Zhao et al.

CVPR 2025highlightarXiv:2412.03240
19
citations
#933

Unsupervised Keypoints from Pretrained Diffusion Models

Eric Hedlin, Gopal Sharma, Shweta Mahajan et al.

CVPR 2024highlightarXiv:2312.00065
19
citations
#934

LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels

Tuo Feng, Wenguan Wang, Fan Ma et al.

CVPR 2024arXiv:2403.15173
19
citations
#935

LiDAR-based Person Re-identification

Wenxuan Guo, Zhiyu Pan, Yingping Liang et al.

CVPR 2024arXiv:2312.03033
19
citations
#936

Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation

Zilyu Ye, Zhiyang Chen, Tiancheng Li et al.

CVPR 2025arXiv:2412.01243
19
citations
#937

CA-Jaccard: Camera-aware Jaccard Distance for Person Re-identification

Yiyu Chen, Zheyi Fan, Zhaoru Chen et al.

CVPR 2024arXiv:2311.10605
19
citations
#938

LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion

Pancheng Zhao, Peng Xu, Pengda Qin et al.

CVPR 2024arXiv:2404.00292
19
citations
#939

Robust Image Denoising through Adversarial Frequency Mixup

Donghun Ryou, Inju Ha, Hyewon Yoo et al.

CVPR 2024
19
citations
#940

Improved Video VAE for Latent Video Diffusion Model

Pingyu Wu, Kai Zhu, Yu Liu et al.

CVPR 2025arXiv:2411.06449
19
citations
#941

MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

Wei Cheng, Juncheng Mu, Xianfang Zeng et al.

CVPR 2025arXiv:2411.02336
19
citations
#942

EmoEdit: Evoking Emotions through Image Manipulation

Jingyuan Yang, Jiawei Feng, Weibin Luo et al.

CVPR 2025arXiv:2405.12661
19
citations
#943

Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach

Wei Dong, Xing Zhang, Bihui Chen et al.

CVPR 2024arXiv:2403.19067
19
citations
#944

DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving

Zhenhua Xu, Yan Bai, Yujia Zhang et al.

CVPR 2025highlight
19
citations
#945

Scalable 3D Registration via Truncated Entry-wise Absolute Residuals

Tianyu Huang, Liangzu Peng, Rene Vidal et al.

CVPR 2024arXiv:2404.00915
19
citations
#946

DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models

Zhendong Wang, Jianmin Bao, Shuyang Gu et al.

CVPR 2025arXiv:2503.01645
19
citations
#947

LaVin-DiT: Large Vision Diffusion Transformer

Zhaoqing Wang, Xiaobo Xia, Runnan Chen et al.

CVPR 2025arXiv:2411.11505
19
citations
#948

TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation

Hongxiang Zhao, Xingchen Liu, Mutian Xu et al.

CVPR 2025arXiv:2503.11423
19
citations
#949

VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning

Kang Chen, Xiangqian Wu

CVPR 2024arXiv:2303.02635
19
citations
#950

Discriminability-Driven Channel Selection for Out-of-Distribution Detection

Yue Yuan, Rundong He, Yicong Dong et al.

CVPR 2024
19
citations
#951

CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset

Xiao Wang, Fuling Wang, Yuehang Li et al.

CVPR 2025arXiv:2410.00379
19
citations
#952

HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields

Haozhe Qi, Chen Zhao, Mathieu Salzmann et al.

CVPR 2024arXiv:2402.17062
19
citations
#953

ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval

Fang Kaipeng, Jingkuan Song, Lianli Gao et al.

CVPR 2024arXiv:2312.12478
19
citations
#954

Textured Gaussians for Enhanced 3D Scene Appearance Modeling

Brian Chao, Hung-Yu Tseng, Lorenzo Porzi et al.

CVPR 2025arXiv:2411.18625
19
citations
#955

Exploring Efficient Asymmetric Blind-Spots for Self-Supervised Denoising in Real-World Scenarios

Shiyan Chen, Jiyuan Zhang, Zhaofei Yu et al.

CVPR 2024arXiv:2303.16783
19
citations
#956

UniK3D: Universal Camera Monocular 3D Estimation

Luigi Piccinelli, Christos Sakaridis, Mattia Segu et al.

CVPR 2025arXiv:2503.16591
19
citations
#957

Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens

Kaihang Pan, Wang Lin, Zhongqi Yue et al.

CVPR 2025arXiv:2504.14666
18
citations
#958

DriveTrack: A Benchmark for Long-Range Point Tracking in Real-World Videos

Arjun Balasingam, Joseph Chandler, Chenning Li et al.

CVPR 2024arXiv:2312.09523
18
citations
#959

Open-Set Domain Adaptation for Semantic Segmentation

Seun-An Choe, Ah-Hyung Shin, Keon Hee Park et al.

CVPR 2024arXiv:2405.19899
18
citations
#960

TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting

Bojun Xiong, Jialun Liu, JiaKui Hu et al.

CVPR 2025arXiv:2411.19654
18
citations
#961

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach

Jing Bi, Lianggong Bruce Wen, Zhang Liu et al.

CVPR 2025arXiv:2412.18108
18
citations
#962

MESA: Matching Everything by Segmenting Anything

Yesheng Zhang, Xu Zhao

CVPR 2024arXiv:2401.16741
18
citations
#963

PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos

Yufei Zhang, Jeffrey Kephart, Zijun Cui et al.

CVPR 2024arXiv:2404.04430
18
citations
#964

HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation

Yongliang Lin, Yongzhi Su, Praveen Nathan et al.

CVPR 2024arXiv:2311.12588
18
citations
#965

FreePoint: Unsupervised Point Cloud Instance Segmentation

Zhikai Zhang, Jian Ding, Li Jiang et al.

CVPR 2024arXiv:2305.06973
18
citations
#966

Decompositional Neural Scene Reconstruction with Generative Diffusion Prior

Junfeng Ni, Yu Liu, Ruijie Lu et al.

CVPR 2025arXiv:2503.14830
18
citations
#967

Fair-VPT: Fair Visual Prompt Tuning for Image Classification

Sungho Park, Hyeran Byun

CVPR 2024
18
citations
#968

UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing

Yiheng Li, RuiBing Hou, Hong Chang et al.

CVPR 2025highlightarXiv:2411.16781
18
citations
#969

Adaptive Rectangular Convolution for Remote Sensing Pansharpening

Xueyang Wang, Zhixin Zheng, Jiandong Shao et al.

CVPR 2025arXiv:2503.00467
18
citations
#970

Video Motion Transfer with Diffusion Transformers

Alexander Pondaven, Aliaksandr Siarohin, Sergey Tulyakov et al.

CVPR 2025arXiv:2412.07776
18
citations
#971

Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy

Joonhyun Jeong, Seyun Bae, Yeonsung Jung et al.

CVPR 2025arXiv:2503.20823
18
citations
#972

SuperSVG: Superpixel-based Scalable Vector Graphics Synthesis

Teng Hu, Ran Yi, Baihong Qian et al.

CVPR 2024arXiv:2406.09794
18
citations
#973

Ref-GS: Directional Factorization for 2D Gaussian Splatting

Youjia Zhang, Anpei Chen, Yumin Wan et al.

CVPR 2025arXiv:2412.00905
18
citations
#974

4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion

Chaoyang Wang, Peiye Zhuang, Tuan Duc Ngo et al.

CVPR 2025highlightarXiv:2412.04462
18
citations
#975

Composing Object Relations and Attributes for Image-Text Matching

Khoi Pham, Chuong Huynh, Ser-Nam Lim et al.

CVPR 2024arXiv:2406.11820
18
citations
#976

Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis

Jiangyong Huang, Baoxiong Jia, Yan Wang et al.

CVPR 2025arXiv:2503.22420
18
citations
#977

Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior

Chen Guo, Junxuan Li, Yash Kant et al.

CVPR 2025arXiv:2503.01610
18
citations
#978

FedMef: Towards Memory-efficient Federated Dynamic Pruning

Hong Huang, Weiming Zhuang, Chen Chen et al.

CVPR 2024arXiv:2403.14737
18
citations
#979

RMem: Restricted Memory Banks Improve Video Object Segmentation

Junbao Zhou, Ziqi Pang, Yu-Xiong Wang

CVPR 2024arXiv:2406.08476
18
citations
#980

Rethinking the Evaluation Protocol of Domain Generalization

Han Yu, Xingxuan Zhang, Renzhe Xu et al.

CVPR 2024arXiv:2305.15253
18
citations
#981

GenFusion: Closing the Loop between Reconstruction and Generation via Videos

Sibo Wu, Congrong Xu, Binbin Huang et al.

CVPR 2025arXiv:2503.21219
18
citations
#982

TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding

Zhihao Zhang, Shengcao Cao, Yu-Xiong Wang

CVPR 2024arXiv:2402.18490
18
citations
#983

Rotation-Agnostic Image Representation Learning for Digital Pathology

Saghir Alfasly, Abubakr Shafique, Peyman Nejat et al.

CVPR 2024arXiv:2311.08359
18
citations
#984

Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring

Huicong Zhang, Haozhe Xie, Hongxun Yao

CVPR 2024arXiv:2406.07551
18
citations
#985

SketchAgent: Language-Driven Sequential Sketch Generation

Yael Vinker, Tamar Rott Shaham, Kristine Zheng et al.

CVPR 2025arXiv:2411.17673
18
citations
#986

Unmixing Diffusion for Self-Supervised Hyperspectral Image Denoising

Haijin Zeng, Jiezhang Cao, Yongyong Chen et al.

CVPR 2024
18
citations
#987

HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation

Boyuan Wang, Xiaofeng Wang, Chaojun Ni et al.

CVPR 2025arXiv:2503.24026
18
citations
#988

TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing

Stefan Lionar, Jiabin Liang, Gim Hee Lee

CVPR 2025arXiv:2503.11629
18
citations
#989

FAR: Flexible Accurate and Robust 6DoF Relative Camera Pose Estimation

Chris Rockwell, Nilesh Kulkarni, Linyi Jin et al.

CVPR 2024highlightarXiv:2403.03221
18
citations
#990

FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio

Chao Xu, Yang Liu, Jiazheng Xing et al.

CVPR 2024arXiv:2403.01901
18
citations
#991

Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning

Fan Lu, Wei Wu, Kecheng Zheng et al.

CVPR 2025arXiv:2412.08614
18
citations
#992

AssistGUI: Task-Oriented PC Graphical User Interface Automation

Difei Gao, Lei Ji, Zechen Bai et al.

CVPR 2024
18
citations
#993

Correspondence-Free Non-Rigid Point Set Registration Using Unsupervised Clustering Analysis

Mingyang Zhao, Jiang Jingen, Lei Ma et al.

CVPR 2024highlightarXiv:2406.18817
18
citations
#994

Shadow Generation for Composite Image Using Diffusion Model

Qingyang Liu, Junqi You, Jian-Ting Wang et al.

CVPR 2024arXiv:2403.15234
18
citations
#995

OmniGuard: Hybrid Manipulation Localization via Augmented Versatile Deep Image Watermarking

Xuanyu Zhang, Zecheng Tang, Zhipei Xu et al.

CVPR 2025arXiv:2412.01615
18
citations
#996

Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset

Xiao Wang, Yu Jin, Wentao Wu et al.

CVPR 2025arXiv:2412.06647
18
citations
#997

T4P: Test-Time Training of Trajectory Prediction via Masked Autoencoder and Actor-specific Token Memory

Daehee Park, Jaeseok Jeong, Sung-Hoon Yoon et al.

CVPR 2024arXiv:2403.10052
18
citations
#998

A Simple and Effective Point-based Network for Event Camera 6-DOFs Pose Relocalization

Hongwei Ren, Jiadong Zhu, Yue Zhou et al.

CVPR 2024arXiv:2403.19412
18
citations
#999

Hash3D: Training-free Acceleration for 3D Generation

Xingyi Yang, Songhua Liu, Xinchao Wang

CVPR 2025arXiv:2404.06091
18
citations
#1000

Text-Enhanced Data-free Approach for Federated Class-Incremental Learning

Minh-Tuan Tran, Trung Le, Xuan-May Le et al.

CVPR 2024arXiv:2403.14101
18
citations