Most Cited CVPR "disentangled representation" Papers

5,589 papers found • Page 5 of 28

#801

Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation

Yuhui Zhang, Yuchang Su, Yiming Liu et al.

CVPR 2025posterarXiv:2501.03225
23
citations
#802

Efficient Visual State Space Model for Image Deblurring

Lingshun Kong, Jiangxin Dong, Jinhui Tang et al.

CVPR 2025posterarXiv:2405.14343
23
citations
#803

FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance Head-pose and Facial Expression Features

Andre Rochow, Max Schwarz, Sven Behnke

CVPR 2024posterarXiv:2404.09736
23
citations
#804

6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation

Li Xu, Haoxuan Qu, Yujun Cai et al.

CVPR 2024posterarXiv:2401.00029
23
citations
#805

PanoContext-Former: Panoramic Total Scene Understanding with a Transformer

Yuan Dong, Chuan Fang, Liefeng Bo et al.

CVPR 2024posterarXiv:2305.12497
23
citations
#806

UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection

Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.

CVPR 2025posterarXiv:2412.03342
23
citations
#807

Would Deep Generative Models Amplify Bias in Future Models?

Tianwei Chen, Yusuke Hirota, Mayu Otani et al.

CVPR 2024posterarXiv:2404.03242
23
citations
#808

Learning Equi-angular Representations for Online Continual Learning

Minhyuk Seo, Hyunseo Koh, Wonje Jeung et al.

CVPR 2024posterarXiv:2404.01628
23
citations
#809

SANeRF-HQ: Segment Anything for NeRF in High Quality

Yichen Liu, Benran Hu, Chi-Keung Tang et al.

CVPR 2024posterarXiv:2312.01531
23
citations
#810

Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning

Yixiong Zou, Yicong Liu, Yiman Hu et al.

CVPR 2024posterarXiv:2403.00567
23
citations
#811

Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces

Chenyangguang Zhang, Alexandros Delitzas, Fangjinhua Wang et al.

CVPR 2025highlightarXiv:2503.19199
23
citations
#812

Language-Guided Image Tokenization for Generation

Kaiwen Zha, Lijun Yu, Alireza Fathi et al.

CVPR 2025posterarXiv:2412.05796
23
citations
#813

Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation

Siteng Huang, Biao Gong, Yutong Feng et al.

CVPR 2024posterarXiv:2311.15841
23
citations
#814

GEARS: Local Geometry-aware Hand-object Interaction Synthesis

Keyang Zhou, Bharat Lal Bhatnagar, Jan Lenssen et al.

CVPR 2024posterarXiv:2404.01758
22
citations
#815

Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning

Bozhou Zhang, Nan Song, Xin Jin et al.

CVPR 2025posterarXiv:2503.14182
22
citations
#816

LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment

yiming ren, xiao han, Chengfeng Zhao et al.

CVPR 2024highlightarXiv:2402.17171
22
citations
#817

Domain Prompt Learning with Quaternion Networks

Qinglong Cao, Zhengqin Xu, Yuntian Chen et al.

CVPR 2024highlightarXiv:2312.08878
22
citations
#818

Unifying Top-down and Bottom-up Scanpath Prediction Using Transformers

Zhibo Yang, Sounak Mondal, Seoyoung Ahn et al.

CVPR 2024posterarXiv:2303.09383
22
citations
#819

Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding

feilong tang, Chengzhi Liu, Zhongxing Xu et al.

CVPR 2025posterarXiv:2505.16652
22
citations
#820

CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology

Yuxuan Sun, Yixuan Si, Chenglu Zhu et al.

CVPR 2025posterarXiv:2412.12077
22
citations
#821

SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation

Yunxiang Fu, Meng Lou, Yizhou Yu

CVPR 2025posterarXiv:2412.11890
22
citations
#822

F-LMM: Grounding Frozen Large Multimodal Models

Size Wu, Sheng Jin, Wenwei Zhang et al.

CVPR 2025posterarXiv:2406.05821
22
citations
#823

Spatio-Temporal Turbulence Mitigation: A Translational Perspective

Xingguang Zhang, Nicholas M Chimitt, Yiheng Chi et al.

CVPR 2024posterarXiv:2401.04244
22
citations
#824

StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image Translation

Sidi Wu, Yizi Chen, Loic Landrieu et al.

CVPR 2024posterarXiv:2403.20142
22
citations
#825

Targeted Representation Alignment for Open-World Semi-Supervised Learning

Ruixuan Xiao, Lei Feng, Kai Tang et al.

CVPR 2024poster
22
citations
#826

Large Language Models are Good Prompt Learners for Low-Shot Image Classification

Zhaoheng Zheng, Jingmin Wei, Xuefeng Hu et al.

CVPR 2024posterarXiv:2312.04076
22
citations
#827

Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation

Yudi Shi, Shangzhe Di, Qirui Chen et al.

CVPR 2025posterarXiv:2412.01694
22
citations
#828

Material Anything: Generating Materials for Any 3D Object via Diffusion

Xin Huang, Tengfei Wang, Ziwei Liu et al.

CVPR 2025highlightarXiv:2411.15138
22
citations
#829

GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation

Lang Lin, Xueyang Yu, Ziqi Pang et al.

CVPR 2025posterarXiv:2504.07962
22
citations
#830

Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension

Quan Liu, Hongzi Zhu, Zhenxi Wang et al.

CVPR 2024posterarXiv:2403.03532
22
citations
#831

COCONut: Modernizing COCO Segmentation

Xueqing Deng, Qihang Yu, Peng Wang et al.

CVPR 2024posterarXiv:2404.08639
22
citations
#832

Category-Level Multi-Part Multi-Joint 3D Shape Assembly

Yichen Li, Kaichun Mo, Yueqi Duan et al.

CVPR 2024posterarXiv:2303.06163
22
citations
#833

Mind Marginal Non-Crack Regions: Clustering-Inspired Representation Learning for Crack Segmentation

zhuangzhuang chen, Zhuonan Lai, Jie Chen et al.

CVPR 2024poster
22
citations
#834

GAFusion: Adaptive Fusing LiDAR and Camera with Multiple Guidance for 3D Object Detection

Xiaotian Li, Baojie Fan, Jiandong Tian et al.

CVPR 2024posterarXiv:2411.00340
22
citations
#835

Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation

Razvan Pasca, Alexey Gavryushin, Muhammad Hamza et al.

CVPR 2024posterarXiv:2301.09209
22
citations
#836

BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training

Xuanpu Zhang, Dan Song, pengxin zhan et al.

CVPR 2025posterarXiv:2408.06047
22
citations
#837

Semantic-aware SAM for Point-Prompted Instance Segmentation

Zhaoyang Wei, Pengfei Chen, Xuehui Yu et al.

CVPR 2024highlightarXiv:2312.15895
22
citations
#838

Rethinking Multi-view Representation Learning via Distilled Disentangling

Guanzhou Ke, Bo Wang, Xiao-Li Wang et al.

CVPR 2024posterarXiv:2403.10897
22
citations
#839

Mind the Time: Temporally-Controlled Multi-Event Video Generation

Ziyi Wu, Aliaksandr Siarohin, Willi Menapace et al.

CVPR 2025posterarXiv:2412.05263
22
citations
#840

Self-Supervised Multi-Object Tracking with Path Consistency

Zijia Lu, Bing Shuai, Yanbei Chen et al.

CVPR 2024highlightarXiv:2404.05136
21
citations
#841

PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization

Zining Chen, Weiqiu Wang, Zhicheng Zhao et al.

CVPR 2024posterarXiv:2404.09011
21
citations
#842

ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and Self-Prompting

Yankai Jiang, Zhongzhen Huang, Rongzhao Zhang et al.

CVPR 2024posterarXiv:2312.04964
21
citations
#843

Composed Video Retrieval via Enriched Context and Discriminative Embeddings

Omkar Thawakar, Muzammal Naseer, Rao Anwer et al.

CVPR 2024posterarXiv:2403.16997
21
citations
#844

NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors

Yannan He, Garvita Tiwari, Tolga Birdal et al.

CVPR 2024highlightarXiv:2403.03122
21
citations
#845

Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy

Zaijing Li, Yuquan Xie, Rui Shao et al.

CVPR 2025posterarXiv:2502.19902
21
citations
#846

Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach

Mir Rayat Imtiaz Hossain, Mennatullah Siam, Leonid Sigal et al.

CVPR 2024posterarXiv:2404.11732
21
citations
#847

GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion

Jiapeng Tang, Davide Davoli, Tobias Kirschstein et al.

CVPR 2025posterarXiv:2412.10209
21
citations
#848

Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated Learning

wenlong deng, Christos Thrampoulidis, Xiaoxiao Li

CVPR 2024posterarXiv:2310.18285
21
citations
#849

PromptHMR: Promptable Human Mesh Recovery

Yufu Wang, Yu Sun, Priyanka Patel et al.

CVPR 2025posterarXiv:2504.06397
21
citations
#850

Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval

Young Kyun Jang, Donghyun Kim, Zihang Meng et al.

CVPR 2024posterarXiv:2404.15516
21
citations
#851

UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video Enhancement

yaofeng xie, Lingwei Kong, Kai Chen et al.

CVPR 2024posterarXiv:2404.14542
21
citations
#852

Neural Spline Fields for Burst Image Fusion and Layer Separation

Ilya Chugunov, David Shustin, Ruyu Yan et al.

CVPR 2024posterarXiv:2312.14235
21
citations
#853

PrPSeg: Universal Proposition Learning for Panoramic Renal Pathology Segmentation

Ruining Deng, Quan Liu, Can Cui et al.

CVPR 2024posterarXiv:2402.19286
21
citations
#854

TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models

Haomiao Ni, Bernhard Egger, Suhas Lohit et al.

CVPR 2024posterarXiv:2404.16306
21
citations
#855

VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams

Liao Wang, Kaixin Yao, Chengcheng Guo et al.

CVPR 2024posterarXiv:2312.01407
21
citations
#856

SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE

YONGWEI CHEN, Yushi Lan, Shangchen Zhou et al.

CVPR 2025posterarXiv:2411.16856
21
citations
#857

DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling

Linqi Zhou, Andy Shih, Chenlin Meng et al.

CVPR 2024highlightarXiv:2311.17082
21
citations
#858

PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering

Yifan Gao, Zihang Lin, Chuanbin Liu et al.

CVPR 2025posterarXiv:2504.06632
21
citations
#859

Real-time 3D-aware Portrait Video Relighting

Ziqi Cai, Kaiwen Jiang, Shu-Yu Chen et al.

CVPR 2024highlightarXiv:2410.18355
21
citations
#860

Clustering Propagation for Universal Medical Image Segmentation

Yuhang Ding, Liulei Li, Wenguan Wang et al.

CVPR 2024posterarXiv:2403.16646
21
citations
#861

Guided Slot Attention for Unsupervised Video Object Segmentation

Minhyeok Lee, Suhwan Cho, Dogyoon Lee et al.

CVPR 2024posterarXiv:2303.08314
21
citations
#862

Matrix3D: Large Photogrammetry Model All-in-One

Yuanxun Lu, Jingyang Zhang, Tian Fang et al.

CVPR 2025highlightarXiv:2502.07685
21
citations
#863

Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation

Yuanbo Yang, Jiahao Shao, Xinyang Li et al.

CVPR 2025posterarXiv:2412.21117
21
citations
#864

MC^2: Multi-concept Guidance for Customized Multi-concept Generation

Jiaxiu Jiang, Yabo Zhang, Kailai Feng et al.

CVPR 2025posterarXiv:2404.05268
21
citations
#865

Language-Driven Anchors for Zero-Shot Adversarial Robustness

Xiao Li, Wei Zhang, Yining Liu et al.

CVPR 2024posterarXiv:2301.13096
21
citations
#866

3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation

Songchun Zhang, Yibo Zhang, Quan Zheng et al.

CVPR 2024posterarXiv:2403.09439
21
citations
#867

Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning

Desai Xie, Jiahao Li, Hao Tan et al.

CVPR 2024posterarXiv:2312.13980
21
citations
#868

MonoHair: High-Fidelity Hair Modeling from a Monocular Video

Keyu Wu, LINGCHEN YANG, Zhiyi Kuang et al.

CVPR 2024posterarXiv:2403.18356
21
citations
#869

Rethinking Few-shot 3D Point Cloud Semantic Segmentation

Zhaochong An, Guolei Sun, Yun Liu et al.

CVPR 2024posterarXiv:2403.00592
21
citations
#870

HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations

Peng Dai, Yang Zhang, Tao Liu et al.

CVPR 2024posterarXiv:2403.03561
21
citations
#871

TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model

Cheng Yang, Yang Sui, Jinqi Xiao et al.

CVPR 2025posterarXiv:2503.18278
20
citations
#872

Dual Prior Unfolding for Snapshot Compressive Imaging

Jiancheng Zhang, Haijin Zeng, Jiezhang Cao et al.

CVPR 2024poster
20
citations
#873

Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use

Imad Eddine Toubal, Aditya Avinash, Neil Alldrin et al.

CVPR 2024posterarXiv:2403.02626
20
citations
#874

HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery

Jingtao Li, Yingyi Liu, XINYU WANG et al.

CVPR 2025posterarXiv:2503.21841
20
citations
#875

Distilling Vision-Language Models on Millions of Videos

Yue Zhao, Long Zhao, Xingyi Zhou et al.

CVPR 2024posterarXiv:2401.06129
20
citations
#876

Generative Image Layer Decomposition with Visual Effects

Jinrui Yang, Qing Liu, Yijun Li et al.

CVPR 2025posterarXiv:2411.17864
20
citations
#877

Adv-CPG: A Customized Portrait Generation Framework with Facial Adversarial Attacks

Junying Wang, Hongyuan Zhang, Yuan Yuan

CVPR 2025posterarXiv:2503.08269
20
citations
#878

Steerers: A Framework for Rotation Equivariant Keypoint Descriptors

Georg Bökman, Johan Edstedt, Michael Felsberg et al.

CVPR 2024posterarXiv:2312.02152
20
citations
#879

One-Class Face Anti-spoofing via Spoof Cue Map-Guided Feature Learning

Pei-Kai Huang, Cheng-Hsuan Chiang, Tzu-Hsien Chen et al.

CVPR 2024poster
20
citations
#880

NeRF-HuGS: Improved Neural Radiance Fields in Non-static Scenes Using Heuristics-Guided Segmentation

Jiahao Chen, Yipeng Qin, Lingjie Liu et al.

CVPR 2024posterarXiv:2403.17537
20
citations
#881

Video Depth without Video Models

Bingxin Ke, Dominik Narnhofer, Shengyu Huang et al.

CVPR 2025posterarXiv:2411.19189
20
citations
#882

DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement

Hao Wu, Huabin Liu, Yu Qiao et al.

CVPR 2024posterarXiv:2404.02755
20
citations
#883

360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries

Huajian Huang, Changkun Liu, Yipeng Zhu et al.

CVPR 2024posterarXiv:2311.17389
20
citations
#884

FlowTrack: Revisiting Optical Flow for Long-Range Dense Tracking

Seokju Cho, Gabriel Huang, Seungryong Kim et al.

CVPR 2024poster
20
citations
#885

ASAM: Boosting Segment Anything Model with Adversarial Tuning

Bo Li, Haoke Xiao, Lv Tang

CVPR 2024posterarXiv:2405.00256
20
citations
#886

Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions

Zeyu Han, Fangrui Zhu, Qianru Lao et al.

CVPR 2024posterarXiv:2311.17048
20
citations
#887

SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device

Yushu Wu, Zhixing Zhang, Yanyu Li et al.

CVPR 2025posterarXiv:2412.10494
20
citations
#888

AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis

Tao Tang, Guangrun Wang, Yixing Lao et al.

CVPR 2024highlightarXiv:2402.17483
20
citations
#889

You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval

Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain et al.

CVPR 2024posterarXiv:2403.07222
20
citations
#890

SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video

Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello et al.

CVPR 2025posterarXiv:2412.09982
20
citations
#891

Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient

Zigeng Chen, Xinyin Ma, Gongfan Fang et al.

CVPR 2025posterarXiv:2411.17787
20
citations
#892

Instant Gaussian Stream: Fast and Generalizable Streaming of Dynamic Scene Reconstruction via Gaussian Splatting

Jinbo Yan, Rui Peng, Zhiyan Wang et al.

CVPR 2025highlightarXiv:2503.16979
20
citations
#893

A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified Benchmark

Jakub Paplham, Vojtech Franc

CVPR 2024posterarXiv:2307.04570
20
citations
#894

DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness

Yiming Zhong, Qi Jiang, Jingyi Yu et al.

CVPR 2025highlightarXiv:2503.08257
20
citations
#895

Generative Video Propagation

Shaoteng Liu, Tianyu Wang, Jui-Hsien Wang et al.

CVPR 2025posterarXiv:2412.19761
20
citations
#896

OccMamba: Semantic Occupancy Prediction with State Space Models

Heng Li, Yuenan Hou, Xiaohan Xing et al.

CVPR 2025posterarXiv:2408.09859
20
citations
#897

Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report Generation

Kang Liu, Zhuoqi Ma, Xiaolu Kang et al.

CVPR 2025posterarXiv:2502.20056
20
citations
#898

Long-Tailed Anomaly Detection with Learnable Class Names

Chih-Hui Ho, Kuan-Chuan Peng, Nuno Vasconcelos

CVPR 2024posterarXiv:2403.20236
20
citations
#899

Structure-Guided Adversarial Training of Diffusion Models

Ling Yang, Haotian Qian, Zhilong Zhang et al.

CVPR 2024posterarXiv:2402.17563
20
citations
#900

D^3: Scaling Up Deepfake Detection by Learning from Discrepancy

Yongqi Yang, Zhihao Qian, Ye Zhu et al.

CVPR 2025posterarXiv:2404.04584
20
citations
#901

Learning to Predict Activity Progress by Self-Supervised Video Alignment

Gerard Donahue, Ehsan Elhamifar

CVPR 2024poster
20
citations
#902

MLP Can Be A Good Transformer Learner

Sihao Lin, Pumeng Lyu, Dongrui Liu et al.

CVPR 2024posterarXiv:2404.05657
20
citations
#903

Grid Diffusion Models for Text-to-Video Generation

Taegyeong Lee, Soyeong Kwon, Taehwan Kim

CVPR 2024posterarXiv:2404.00234
20
citations
#904

TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression

Ho-Joong Kim, Jung-Ho Hong, Heejo Kong et al.

CVPR 2024posterarXiv:2404.02405
20
citations
#905

Loose Inertial Poser: Motion Capture with IMU-attached Loose-Wear Jacket

Chengxu Zuo, Yiming Wang, Lishuang Zhan et al.

CVPR 2024poster
20
citations
#906

LSNet: See Large, Focus Small

Ao Wang, Hui Chen, Zijia Lin et al.

CVPR 2025posterarXiv:2503.23135
20
citations
#907

Improving Plasticity in Online Continual Learning via Collaborative Learning

Maorong Wang, Nicolas Michel, Ling Xiao et al.

CVPR 2024posterarXiv:2312.00600
20
citations
#908

An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning

Jianqing Zhang, Yang Liu, Yang Hua et al.

CVPR 2024posterarXiv:2403.15760
20
citations
#909

ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

Yifan Pu, Yiming Zhao, Zhicong Tang et al.

CVPR 2025posterarXiv:2502.18364
20
citations
#910

TeTriRF: Temporal Tri-Plane Radiance Fields for Efficient Free-Viewpoint Video

Minye Wu, Zehao Wang, Georgios Kouros et al.

CVPR 2024posterarXiv:2312.06713
20
citations
#911

2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification

Jingwei Zhang, Anh Tien Nguyen, Xi Han et al.

CVPR 2025posterarXiv:2412.00678
20
citations
#912

Multi-Sensor Object Anomaly Detection: Unifying Appearance, Geometry, and Internal Properties

wenqiao Li, BoZhong Zheng, Xiaohao Xu et al.

CVPR 2025posterarXiv:2412.14592
20
citations
#913

PerAda: Parameter-Efficient Federated Learning Personalization with Generalization Guarantees

Chulin Xie, De-An Huang, Wenda Chu et al.

CVPR 2024posterarXiv:2302.06637
20
citations
#914

Multi-Level Neural Scene Graphs for Dynamic Urban Environments

Tobias Fischer, Lorenzo Porzi, Samuel Rota Bulò et al.

CVPR 2024posterarXiv:2404.00168
20
citations
#915

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices

Xudong LU, Yinghao Chen, chencheng Chen et al.

CVPR 2025posterarXiv:2411.10640
20
citations
#916

Taming Teacher Forcing for Masked Autoregressive Video Generation

Deyu Zhou, Quan Sun, Yuang Peng et al.

CVPR 2025posterarXiv:2501.12389
20
citations
#917

VideoMAC: Video Masked Autoencoders Meet ConvNets

Gensheng Pei, Tao Chen, Xiruo Jiang et al.

CVPR 2024posterarXiv:2402.19082
20
citations
#918

Any-Resolution AI-Generated Image Detection by Spectral Learning

Dimitrios Karageorgiou, Symeon Papadopoulos, Ioannis Kompatsiaris et al.

CVPR 2025posterarXiv:2411.19417
20
citations
#919

Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models

Andreas Müller, Denis Lukovnikov, Jonas Thietke et al.

CVPR 2025posterarXiv:2412.03283
20
citations
#920

Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards

Zijing Hu, Fengda Zhang, Long Chen et al.

CVPR 2025posterarXiv:2503.11240
19
citations
#921

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought

Yunze Man, De-An Huang, Guilin Liu et al.

CVPR 2025posterarXiv:2505.23766
19
citations
#922

Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation

Xinyao Li, Yuke Li, Zhekai Du et al.

CVPR 2024posterarXiv:2403.06946
19
citations
#923

The Devil is in Temporal Token: High Quality Video Reasoning Segmentation

Sitong Gong, Yunzhi Zhuge, Lu Zhang et al.

CVPR 2025posterarXiv:2501.08549
19
citations
#924

Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices

Huancheng Chen, Haris Vikalo

CVPR 2024posterarXiv:2311.18129
19
citations
#925

OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

Pengfei Zhou, Xiaopeng Peng, Jiajun Song et al.

CVPR 2025posterarXiv:2411.18499
19
citations
#926

ParamISP: Learned Forward and Inverse ISPs using Camera Parameters

Woohyeok Kim, Geonu Kim, Junyong Lee et al.

CVPR 2024posterarXiv:2312.13313
19
citations
#927

Segmenting Maxillofacial Structures in CBCT Volumes

Federico Bolelli, Kevin Marchesini, Niels van Nistelrooij et al.

CVPR 2025poster
19
citations
#928

Robust Image Denoising through Adversarial Frequency Mixup

Donghun Ryou, Inju Ha, Hyewon Yoo et al.

CVPR 2024poster
19
citations
#929

Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly

Yexin Liu, Zhengyang Liang, Yueze Wang et al.

CVPR 2025posterarXiv:2406.10638
19
citations
#930

Feat2GS: Probing Visual Foundation Models with Gaussian Splatting

Yue Chen, Xingyu Chen, Anpei Chen et al.

CVPR 2025posterarXiv:2412.09606
19
citations
#931

Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification

Chao Yi, Lu Ren, De-Chuan Zhan et al.

CVPR 2024posterarXiv:2404.17753
19
citations
#932

Unsupervised Keypoints from Pretrained Diffusion Models

Eric Hedlin, Gopal Sharma, Shweta Mahajan et al.

CVPR 2024highlightarXiv:2312.00065
19
citations
#933

Distraction is All You Need for Multimodal Large Language Model Jailbreaking

Zuopeng Yang, Jiluan Fan, Anli Yan et al.

CVPR 2025highlightarXiv:2502.10794
19
citations
#934

QMambaBSR: Burst Image Super-Resolution with Query State Space Model

Xin Di, Long Peng, Peizhe Xia et al.

CVPR 2025posterarXiv:2408.08665
19
citations
#935

GlitchBench: Can Large Multimodal Models Detect Video Game Glitches?

Mohammad Reza Taesiri, Tianjun Feng, Cor-Paul Bezemer et al.

CVPR 2024posterarXiv:2312.05291
19
citations
#936

LiDAR-based Person Re-identification

Wenxuan Guo, Zhiyu Pan, Yingping Liang et al.

CVPR 2024posterarXiv:2312.03033
19
citations
#937

CA-Jaccard: Camera-aware Jaccard Distance for Person Re-identification

Yiyu Chen, Zheyi Fan, Zhaoru Chen et al.

CVPR 2024posterarXiv:2311.10605
19
citations
#938

LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion

Pancheng Zhao, Peng Xu, Pengda Qin et al.

CVPR 2024posterarXiv:2404.00292
19
citations
#939

eTraM: Event-based Traffic Monitoring Dataset

Aayush Atul Verma, Bharatesh Chakravarthi, Arpitsinh Vaghela et al.

CVPR 2024highlightarXiv:2403.19976
19
citations
#940

DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer

Wei-Ting Chen, Gurunandan Krishnan, Qiang Gao et al.

CVPR 2024posterarXiv:2406.09622
19
citations
#941

Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach

Wei Dong, Xing Zhang, Bihui Chen et al.

CVPR 2024posterarXiv:2403.19067
19
citations
#942

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

ziang yan, Zhilin Li, Yinan He et al.

CVPR 2025posterarXiv:2412.19326
19
citations
#943

Improving Transferable Targeted Adversarial Attacks with Model Self-Enhancement

Han Wu, Guanyan Ou, Weibin Wu et al.

CVPR 2024poster
19
citations
#944

RecDiffusion: Rectangling for Image Stitching with Diffusion Models

Tianhao Zhou, Li Haipeng, Ziyi Wang et al.

CVPR 2024posterarXiv:2403.19164
19
citations
#945

Dexterous Grasp Transformer

Guo-Hao Xu, Yi-Lin Wei, Dian Zheng et al.

CVPR 2024posterarXiv:2404.18135
19
citations
#946

Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders

Fiona Ryan, Ajay Bati, Sangmin Lee et al.

CVPR 2025highlightarXiv:2412.09586
19
citations
#947

The Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement

Gabriele Trivigno, Carlo Masone, Barbara Caputo et al.

CVPR 2024highlightarXiv:2404.10438
19
citations
#948

Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation

Ji-Jia Wu, Andy Chia-Hao Chang, Chieh-Yu Chuang et al.

CVPR 2024posterarXiv:2404.04231
19
citations
#949

SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis

Hyojun Go, byeongjun park, Jiho Jang et al.

CVPR 2025posterarXiv:2411.16443
19
citations
#950

Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking

You Wu, Xucheng Wang, Xiangyang Yang et al.

CVPR 2025posterarXiv:2504.09228
19
citations
#951

Task-driven Image Fusion with Learnable Fusion Loss

Haowen Bai, Jiangshe Zhang, Zixiang Zhao et al.

CVPR 2025highlightarXiv:2412.03240
19
citations
#952

HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields

Haozhe Qi, Chen Zhao, Mathieu Salzmann et al.

CVPR 2024posterarXiv:2402.17062
19
citations
#953

VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning

Kang Chen, Xiangqian Wu

CVPR 2024posterarXiv:2303.02635
19
citations
#954

Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection

Jiacheng Zhang, Jiaming Li, Xiangru Lin et al.

CVPR 2024posterarXiv:2403.17387
19
citations
#955

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

Yunhan Yang, Yukun Huang, Xiaoyang Wu et al.

CVPR 2024posterarXiv:2312.03611
19
citations
#956

NViST: In the Wild New View Synthesis from a Single Image with Transformers

Wonbong Jang, Lourdes Agapito

CVPR 2024posterarXiv:2312.08568
19
citations
#957

Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation

Zilyu Ye, Zhiyang Chen, Tiancheng Li et al.

CVPR 2025posterarXiv:2412.01243
19
citations
#958

Scalable 3D Registration via Truncated Entry-wise Absolute Residuals

Tianyu Huang, Liangzu Peng, Rene Vidal et al.

CVPR 2024posterarXiv:2404.00915
19
citations
#959

Improved Video VAE for Latent Video Diffusion Model

Pingyu Wu, Kai Zhu, Yu Liu et al.

CVPR 2025posterarXiv:2411.06449
19
citations
#960

MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

Wei Cheng, Juncheng Mu, Xianfang Zeng et al.

CVPR 2025posterarXiv:2411.02336
19
citations
#961

Exploring Efficient Asymmetric Blind-Spots for Self-Supervised Denoising in Real-World Scenarios

Shiyan Chen, Jiyuan Zhang, Zhaofei Yu et al.

CVPR 2024posterarXiv:2303.16783
19
citations
#962

EmoEdit: Evoking Emotions through Image Manipulation

Jingyuan Yang, Jiawei Feng, Weibin Luo et al.

CVPR 2025posterarXiv:2405.12661
19
citations
#963

Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer

Hyeongjin Nam, Daniel Jung, Gyeongsik Moon et al.

CVPR 2024posterarXiv:2404.04819
19
citations
#964

DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving

Zhenhua Xu, Yan Bai, Yujia Zhang et al.

CVPR 2025highlight
19
citations
#965

LAN: Learning to Adapt Noise for Image Denoising

Changjin Kim, Tae Hyun Kim, Sungyong Baik

CVPR 2024posterarXiv:2412.10651
19
citations
#966

As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion Priors

Seungwoo Yoo, Kunho Kim, Vladimir G. Kim et al.

CVPR 2024posterarXiv:2311.16739
19
citations
#967

DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models

Zhendong Wang, Jianmin Bao, Shuyang Gu et al.

CVPR 2025posterarXiv:2503.01645
19
citations
#968

LaVin-DiT: Large Vision Diffusion Transformer

Zhaoqing Wang, Xiaobo Xia, Runnan Chen et al.

CVPR 2025posterarXiv:2411.11505
19
citations
#969

ODIN: A Single Model for 2D and 3D Segmentation

Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios et al.

CVPR 2024highlightarXiv:2401.02416
19
citations
#970

TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation

Hongxiang Zhao, Xingchen Liu, Mutian Xu et al.

CVPR 2025posterarXiv:2503.11423
19
citations
#971

Discriminability-Driven Channel Selection for Out-of-Distribution Detection

Yue Yuan, Rundong He, Yicong Dong et al.

CVPR 2024poster
19
citations
#972

ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval

Fang Kaipeng, Jingkuan Song, Lianli Gao et al.

CVPR 2024posterarXiv:2312.12478
19
citations
#973

Theoretically Achieving Continuous Representation of Oriented Bounding Boxes

Zikai Xiao, Guo-Ye Yang, Xue Yang et al.

CVPR 2024posterarXiv:2402.18975
19
citations
#974

CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset

Xiao Wang, Fuling Wang, Yuehang Li et al.

CVPR 2025posterarXiv:2410.00379
19
citations
#975

Textured Gaussians for Enhanced 3D Scene Appearance Modeling

Brian Chao, Hung-Yu Tseng, Lorenzo Porzi et al.

CVPR 2025posterarXiv:2411.18625
19
citations
#976

UniK3D: Universal Camera Monocular 3D Estimation

Luigi Piccinelli, Christos Sakaridis, Mattia Segu et al.

CVPR 2025posterarXiv:2503.16591
19
citations
#977

LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels

Tuo Feng, Wenguan Wang, Fan Ma et al.

CVPR 2024posterarXiv:2403.15173
19
citations
#978

Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens

Kaihang Pan, Wang Lin, Zhongqi Yue et al.

CVPR 2025posterarXiv:2504.14666
18
citations
#979

HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation

Yongliang Lin, Yongzhi Su, Praveen Nathan et al.

CVPR 2024posterarXiv:2311.12588
18
citations
#980

PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos

Yufei Zhang, Jeffrey Kephart, Zijun Cui et al.

CVPR 2024posterarXiv:2404.04430
18
citations
#981

TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting

Bojun Xiong, Jialun Liu, JiaKui Hu et al.

CVPR 2025posterarXiv:2411.19654
18
citations
#982

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach

Jing Bi, Lianggong Bruce Wen, Zhang Liu et al.

CVPR 2025posterarXiv:2412.18108
18
citations
#983

TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding

Zhihao Zhang, Shengcao Cao, Yu-Xiong Wang

CVPR 2024posterarXiv:2402.18490
18
citations
#984

Decompositional Neural Scene Reconstruction with Generative Diffusion Prior

Junfeng Ni, Yu Liu, Ruijie Lu et al.

CVPR 2025posterarXiv:2503.14830
18
citations
#985

Rotation-Agnostic Image Representation Learning for Digital Pathology

Saghir Alfasly, Abubakr Shafique, Peyman Nejat et al.

CVPR 2024posterarXiv:2311.08359
18
citations
#986

Exploring the Transferability of Visual Prompting for Multimodal Large Language Models

Yichi Zhang, Yinpeng Dong, Siyuan Zhang et al.

CVPR 2024highlightarXiv:2404.11207
18
citations
#987

FedMef: Towards Memory-efficient Federated Dynamic Pruning

Hong Huang, Weiming Zhuang, Chen Chen et al.

CVPR 2024posterarXiv:2403.14737
18
citations
#988

FAR: Flexible Accurate and Robust 6DoF Relative Camera Pose Estimation

Chris Rockwell, Nilesh Kulkarni, Linyi Jin et al.

CVPR 2024highlightarXiv:2403.03221
18
citations
#989

UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing

Yiheng Li, RuiBing Hou, Hong Chang et al.

CVPR 2025highlightarXiv:2411.16781
18
citations
#990

Adaptive Rectangular Convolution for Remote Sensing Pansharpening

Xueyang Wang, Zhixin Zheng, Jiandong Shao et al.

CVPR 2025posterarXiv:2503.00467
18
citations
#991

Video Motion Transfer with Diffusion Transformers

Alexander Pondaven, Aliaksandr Siarohin, Sergey Tulyakov et al.

CVPR 2025posterarXiv:2412.07776
18
citations
#992

Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy

Joonhyun Jeong, Seyun Bae, Yeonsung Jung et al.

CVPR 2025posterarXiv:2503.20823
18
citations
#993

Composing Object Relations and Attributes for Image-Text Matching

Khoi Pham, Chuong Huynh, Ser-Nam Lim et al.

CVPR 2024posterarXiv:2406.11820
18
citations
#994

Cloud-Device Collaborative Learning for Multimodal Large Language Models

Guanqun Wang, Jiaming Liu, Chenxuan Li et al.

CVPR 2024posterarXiv:2312.16279
18
citations
#995

Ref-GS: Directional Factorization for 2D Gaussian Splatting

Youjia Zhang, Anpei Chen, Yumin Wan et al.

CVPR 2025posterarXiv:2412.00905
18
citations
#996

Rethinking the Evaluation Protocol of Domain Generalization

Han Yu, Xingxuan Zhang, Renzhe Xu et al.

CVPR 2024posterarXiv:2305.15253
18
citations
#997

4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion

Chaoyang Wang, Peiye Zhuang, Tuan Duc Ngo et al.

CVPR 2025highlightarXiv:2412.04462
18
citations
#998

Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis

Jiangyong Huang, Baoxiong Jia, Yan Wang et al.

CVPR 2025posterarXiv:2503.22420
18
citations
#999

RMem: Restricted Memory Banks Improve Video Object Segmentation

Junbao Zhou, Ziqi Pang, Yu-Xiong Wang

CVPR 2024posterarXiv:2406.08476
18
citations
#1000

Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior

Chen Guo, Junxuan Li, Yash Kant et al.

CVPR 2025posterarXiv:2503.01610
18
citations