Most Cited CVPR "zero-shot machine translation" Papers

5,589 papers found • Page 9 of 28

#1601

SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images

Kaiyu Li, Ruixun Liu, Xiangyong Cao et al.

CVPR 2025arXiv:2410.01768
21
citations
#1602

Long-Tailed Anomaly Detection with Learnable Class Names

Chih-Hui Ho, Kuan-Chuan Peng, Nuno Vasconcelos

CVPR 2024arXiv:2403.20236
21
citations
#1603

Multi-Level Neural Scene Graphs for Dynamic Urban Environments

Tobias Fischer, Lorenzo Porzi, Samuel Rota Bulò et al.

CVPR 2024arXiv:2404.00168
21
citations
#1604

Task-driven Image Fusion with Learnable Fusion Loss

Haowen Bai, Jiangshe Zhang, Zixiang Zhao et al.

CVPR 2025highlightarXiv:2412.03240
21
citations
#1605

Video Depth without Video Models

Bingxin Ke, Dominik Narnhofer, Shengyu Huang et al.

CVPR 2025arXiv:2411.19189
21
citations
#1606

Adaptive Rectangular Convolution for Remote Sensing Pansharpening

Xueyang Wang, Zhixin Zheng, Jiandong Shao et al.

CVPR 2025arXiv:2503.00467
21
citations
#1607

PerAda: Parameter-Efficient Federated Learning Personalization with Generalization Guarantees

Chulin Xie, De-An Huang, Wenda Chu et al.

CVPR 2024arXiv:2302.06637
21
citations
#1608

Textured Gaussians for Enhanced 3D Scene Appearance Modeling

Brian Chao, Hung-Yu Tseng, Lorenzo Porzi et al.

CVPR 2025arXiv:2411.18625
21
citations
#1609

SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis

Hyojun Go, byeongjun park, Jiho Jang et al.

CVPR 2025arXiv:2411.16443
21
citations
#1610

DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling

Linqi Zhou, Andy Shih, Chenlin Meng et al.

CVPR 2024highlightarXiv:2311.17082
21
citations
#1611

Clustering Propagation for Universal Medical Image Segmentation

Yuhang Ding, Liulei Li, Wenguan Wang et al.

CVPR 2024arXiv:2403.16646
21
citations
#1612

QMambaBSR: Burst Image Super-Resolution with Query State Space Model

Xin Di, Long Peng, Peizhe Xia et al.

CVPR 2025arXiv:2408.08665
21
citations
#1613

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought

Yunze Man, De-An Huang, Guilin Liu et al.

CVPR 2025arXiv:2505.23766
21
citations
#1614

360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries

Huajian Huang, Changkun Liu, Yipeng Zhu et al.

CVPR 2024arXiv:2311.17389
21
citations
#1615

Spectral Meets Spatial: Harmonising 3D Shape Matching and Interpolation

Dongliang Cao, Marvin Eisenberger, Nafie El Amrani et al.

CVPR 2024arXiv:2402.18920
21
citations
#1616

SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device

Yushu Wu, Zhixing Zhang, Yanyu Li et al.

CVPR 2025arXiv:2412.10494
21
citations
#1617

MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders

jiajun cao, Yuan Zhang, Tao Huang et al.

CVPR 2025arXiv:2501.01709
21
citations
#1618

Real-time 3D-aware Portrait Video Relighting

Ziqi Cai, Kaiwen Jiang, Shu-Yu Chen et al.

CVPR 2024highlightarXiv:2410.18355
21
citations
#1619

DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation

Tianyi Yan, Dongming Wu, Wencheng Han et al.

CVPR 2025arXiv:2411.11252
21
citations
#1620

SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video

Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello et al.

CVPR 2025arXiv:2412.09982
21
citations
#1621

Improving Plasticity in Online Continual Learning via Collaborative Learning

Maorong Wang, Nicolas Michel, Ling Xiao et al.

CVPR 2024arXiv:2312.00600
21
citations
#1622

TeTriRF: Temporal Tri-Plane Radiance Fields for Efficient Free-Viewpoint Video

Minye Wu, Zehao Wang, Georgios Kouros et al.

CVPR 2024arXiv:2312.06713
21
citations
#1623

Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient

Zigeng Chen, Xinyin Ma, Gongfan Fang et al.

CVPR 2025arXiv:2411.17787
21
citations
#1624

OccMamba: Semantic Occupancy Prediction with State Space Models

Heng Li, Yuenan Hou, Xiaohan Xing et al.

CVPR 2025arXiv:2408.09859
21
citations
#1625

MLP Can Be A Good Transformer Learner

Sihao Lin, Pumeng Lyu, Dongrui Liu et al.

CVPR 2024arXiv:2404.05657
21
citations
#1626

VideoMAC: Video Masked Autoencoders Meet ConvNets

Gensheng Pei, Tao Chen, Xiruo Jiang et al.

CVPR 2024arXiv:2402.19082
21
citations
#1627

Grid Diffusion Models for Text-to-Video Generation

Taegyeong Lee, Soyeong Kwon, Taehwan Kim

CVPR 2024arXiv:2404.00234
21
citations
#1628

Generative Image Layer Decomposition with Visual Effects

Jinrui Yang, Qing Liu, Yijun Li et al.

CVPR 2025arXiv:2411.17864
21
citations
#1629

Composed Video Retrieval via Enriched Context and Discriminative Embeddings

Omkar Thawakar, Muzammal Naseer, Rao Anwer et al.

CVPR 2024arXiv:2403.16997
21
citations
#1630

StoryGPT-V: Large Language Models as Consistent Story Visualizers

Xiaoqian Shen, Mohamed Elhoseiny

CVPR 2025arXiv:2312.02252
21
citations
#1631

GEARS: Local Geometry-aware Hand-object Interaction Synthesis

Keyang Zhou, Bharat Lal Bhatnagar, Jan Lenssen et al.

CVPR 2024arXiv:2404.01758
21
citations
#1632

Guided Slot Attention for Unsupervised Video Object Segmentation

Minhyeok Lee, Suhwan Cho, Dogyoon Lee et al.

CVPR 2024arXiv:2303.08314
21
citations
#1633

Dual Prototype Attention for Unsupervised Video Object Segmentation

Suhwan Cho, Minhyeok Lee, Seunghoon Lee et al.

CVPR 2024arXiv:2211.12036
21
citations
#1634

Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?

Hanxin Zhu, Tianyu He, Xin Li et al.

CVPR 2024arXiv:2403.06092
21
citations
#1635

LSNet: See Large, Focus Small

Ao Wang, Hui Chen, Zijia Lin et al.

CVPR 2025arXiv:2503.23135
21
citations
#1636

Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation

Kunpeng Qiu, Zhiqiang Gao, Zhiying Zhou et al.

CVPR 2025arXiv:2505.06068
21
citations
#1637

Steerers: A Framework for Rotation Equivariant Keypoint Descriptors

Georg Bökman, Johan Edstedt, Michael Felsberg et al.

CVPR 2024arXiv:2312.02152
21
citations
#1638

Self-Supervised Multi-Object Tracking with Path Consistency

Zijia Lu, Bing Shuai, Yanbei Chen et al.

CVPR 2024highlightarXiv:2404.05136
21
citations
#1639

LAN: Learning to Adapt Noise for Image Denoising

Changjin Kim, Tae Hyun Kim, Sungyong Baik

CVPR 2024arXiv:2412.10651
21
citations
#1640

PrPSeg: Universal Proposition Learning for Panoramic Renal Pathology Segmentation

Ruining Deng, Quan Liu, Can Cui et al.

CVPR 2024arXiv:2402.19286
21
citations
#1641

MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

Wei Cheng, Juncheng Mu, Xianfang Zeng et al.

CVPR 2025arXiv:2411.02336
21
citations
#1642

InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models

Jiun Tian Hoe, Xudong Jiang, Chee Seng Chan et al.

CVPR 2024arXiv:2312.05849
21
citations
#1643

InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception

Haijie Li, Yanmin Wu, Jiarui Meng et al.

CVPR 2025arXiv:2411.19235
21
citations
#1644

An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning

Jianqing Zhang, Yang Liu, Yang Hua et al.

CVPR 2024arXiv:2403.15760
21
citations
#1645

Distilling Vision-Language Models on Millions of Videos

Yue Zhao, Long Zhao, Xingyi Zhou et al.

CVPR 2024arXiv:2401.06129
21
citations
#1646

You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval

Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain et al.

CVPR 2024arXiv:2403.07222
21
citations
#1647

Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated Learning

wenlong deng, Christos Thrampoulidis, Xiaoxiao Li

CVPR 2024arXiv:2310.18285
21
citations
#1648

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices

Xudong LU, Yinghao Chen, chencheng Chen et al.

CVPR 2025arXiv:2411.10640
21
citations
#1649

Omni-Scene: Omni-Gaussian Representation for Ego-Centric Sparse-View Scene Reconstruction

Dongxu Wei, Zhiqi Li, Peidong Liu

CVPR 2025arXiv:2412.06273
21
citations
#1650

Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation

Yuanbo Yang, Jiahao Shao, Xinyang Li et al.

CVPR 2025arXiv:2412.21117
21
citations
#1651

ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and Self-Prompting

Yankai Jiang, Zhongzhen Huang, Rongzhao Zhang et al.

CVPR 2024arXiv:2312.04964
21
citations
#1652

PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models

Dhouib Mohamed, Davide Buscaldi, Vanier Sonia et al.

CVPR 2025arXiv:2504.08966
21
citations
#1653

GRAM: Global Reasoning for Multi-Page VQA

Itshak Blau, Sharon Fogel, Roi Ronen et al.

CVPR 2024arXiv:2401.03411
21
citations
#1654

Unmixing Diffusion for Self-Supervised Hyperspectral Image Denoising

Haijin Zeng, Jiezhang Cao, Yongyong Chen et al.

CVPR 2024
21
citations
#1655

TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding

Zhihao Zhang, Shengcao Cao, Yu-Xiong Wang

CVPR 2024arXiv:2402.18490
20
citations
#1656

VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning

Kang Chen, Xiangqian Wu

CVPR 2024arXiv:2303.02635
20
citations
#1657

DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement

Hao Wu, Huabin Liu, Yu Qiao et al.

CVPR 2024arXiv:2404.02755
20
citations
#1658

NeRF-HuGS: Improved Neural Radiance Fields in Non-static Scenes Using Heuristics-Guided Segmentation

Jiahao Chen, Yipeng Qin, Lingjie Liu et al.

CVPR 2024arXiv:2403.17537
20
citations
#1659

Improved Self-Training for Test-Time Adaptation

Jing Ma

CVPR 2024
20
citations
#1660

Viewpoint-Aware Visual Grounding in 3D Scenes

Xiangxi Shi, Zhonghua Wu, Stefan Lee

CVPR 2024
20
citations
#1661

The Devil is in Temporal Token: High Quality Video Reasoning Segmentation

Sitong Gong, Yunzhi Zhuge, Lu Zhang et al.

CVPR 2025arXiv:2501.08549
20
citations
#1662

OED: Towards One-stage End-to-End Dynamic Scene Graph Generation

Guan Wang, Zhimin Li, Qingchao Chen et al.

CVPR 2024arXiv:2405.16925
20
citations
#1663

Decentralized Directed Collaboration for Personalized Federated Learning

Yingqi Liu, Yifan Shi, Qinglun Li et al.

CVPR 2024arXiv:2405.17876
20
citations
#1664

Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards

Zijing Hu, Fengda Zhang, Long Chen et al.

CVPR 2025arXiv:2503.11240
20
citations
#1665

AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis

Tao Tang, Guangrun Wang, Yixing Lao et al.

CVPR 2024highlightarXiv:2402.17483
20
citations
#1666

Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders

Fiona Ryan, Ajay Bati, Sangmin Lee et al.

CVPR 2025highlightarXiv:2412.09586
20
citations
#1667

Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation

Xin Zhang, Robby T. Tan

CVPR 2025highlightarXiv:2504.03193
20
citations
#1668

Theoretically Achieving Continuous Representation of Oriented Bounding Boxes

Zikai Xiao, Guo-Ye Yang, Xue Yang et al.

CVPR 2024arXiv:2402.18975
20
citations
#1669

Improving Transferable Targeted Adversarial Attacks with Model Self-Enhancement

Han Wu, Guanyan Ou, Weibin Wu et al.

CVPR 2024
20
citations
#1670

Video Motion Transfer with Diffusion Transformers

Alexander Pondaven, Aliaksandr Siarohin, Sergey Tulyakov et al.

CVPR 2025arXiv:2412.07776
20
citations
#1671

OSDFace: One-Step Diffusion Model for Face Restoration

Jingkai Wang, Jue Gong, Lin Zhang et al.

CVPR 2025arXiv:2411.17163
20
citations
#1672

FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video

Jiawei Zhang, Zijian Wu, Zhiyang Liang et al.

CVPR 2025arXiv:2411.15604
20
citations
#1673

Hierarchical Patch Diffusion Models for High-Resolution Video Generation

Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin et al.

CVPR 2024arXiv:2406.07792
20
citations
#1674

Regressor-Segmenter Mutual Prompt Learning for Crowd Counting

Mingyue Guo, Li Yuan, Zhaoyi Yan et al.

CVPR 2024arXiv:2312.01711
20
citations
#1675

ODIN: A Single Model for 2D and 3D Segmentation

Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios et al.

CVPR 2024highlightarXiv:2401.02416
20
citations
#1676

HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud

WENCAN CHENG, Hao Tang, Luc Van Gool et al.

CVPR 2024highlightarXiv:2404.03159
20
citations
#1677

Loose Inertial Poser: Motion Capture with IMU-attached Loose-Wear Jacket

Chengxu Zuo, Yiming Wang, Lishuang Zhan et al.

CVPR 2024
20
citations
#1678

Event-assisted Low-Light Video Object Segmentation

Li Hebei, Jin Wang, Jiahui Yuan et al.

CVPR 2024arXiv:2404.01945
20
citations
#1679

Learning to Predict Activity Progress by Self-Supervised Video Alignment

Gerard Donahue, Ehsan Elhamifar

CVPR 2024
20
citations
#1680

Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior

Chen Guo, Junxuan Li, Yash Kant et al.

CVPR 2025arXiv:2503.01610
20
citations
#1681

BEVSpread: Spread Voxel Pooling for Bird’s-Eye-View Representation in Vision-based Roadside 3D Object Detection

Wenjie Wang, Yehao Lu, Guangcong Zheng et al.

CVPR 2024arXiv:2406.08785
20
citations
#1682

Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use

Imad Eddine Toubal, Aditya Avinash, Neil Alldrin et al.

CVPR 2024arXiv:2403.02626
20
citations
#1683

SuperNormal: Neural Surface Reconstruction via Multi-View Normal Integration

Xu Cao, Takafumi Taketomi

CVPR 2024arXiv:2312.04803
20
citations
#1684

Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer

Hyeongjin Nam, Daniel Jung, Gyeongsik Moon et al.

CVPR 2024arXiv:2404.04819
20
citations
#1685

MoST: Multi-Modality Scene Tokenization for Motion Prediction

Norman Mu, Jingwei Ji, Zhenpei Yang et al.

CVPR 2024arXiv:2404.19531
20
citations
#1686

SketchAgent: Language-Driven Sequential Sketch Generation

Yael Vinker, Tamar Rott Shaham, Kristine Zheng et al.

CVPR 2025arXiv:2411.17673
20
citations
#1687

Instant Gaussian Stream: Fast and Generalizable Streaming of Dynamic Scene Reconstruction via Gaussian Splatting

Jinbo Yan, Rui Peng, Zhiyan Wang et al.

CVPR 2025highlightarXiv:2503.16979
20
citations
#1688

ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

Yifan Pu, Yiming Zhao, Zhicong Tang et al.

CVPR 2025arXiv:2502.18364
20
citations
#1689

Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning

Fan Lu, Wei Wu, Kecheng Zheng et al.

CVPR 2025arXiv:2412.08614
20
citations
#1690

DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness

Yiming Zhong, Qi Jiang, Jingyi Yu et al.

CVPR 2025highlightarXiv:2503.08257
20
citations
#1691

SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation

Keqi Chen, vinkle srivastav, Nicolas Padoy

CVPR 2024arXiv:2404.02041
20
citations
#1692

Open-Set Domain Adaptation for Semantic Segmentation

Seun-An Choe, Ah-Hyung Shin, Keon Hee Park et al.

CVPR 2024arXiv:2405.19899
20
citations
#1693

Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens

Kaihang Pan, Wang Lin, Zhongqi Yue et al.

CVPR 2025arXiv:2504.14666
20
citations
#1694

Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption

Du CHEN, Tianhe Wu, Kede Ma et al.

CVPR 2025arXiv:2503.11221
20
citations
#1695

Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation

Xingqun Qi, Jiahao Pan, Peng Li et al.

CVPR 2024arXiv:2311.17532
20
citations
#1696

Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition

Hongda Liu, Yunfan Liu, Min Ren et al.

CVPR 2025highlightarXiv:2411.18941
20
citations
#1697

Unsupervised Semantic Segmentation Through Depth-Guided Feature Correlation and Sampling

Leon Sick, Dominik Engel, Pedro Hermosilla et al.

CVPR 2024arXiv:2309.12378
20
citations
#1698

D^3: Scaling Up Deepfake Detection by Learning from Discrepancy

Yongqi Yang, Zhihao Qian, Ye Zhu et al.

CVPR 2025arXiv:2404.04584
20
citations
#1699

OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

Pengfei Zhou, Xiaopeng Peng, Jiajun Song et al.

CVPR 2025arXiv:2411.18499
20
citations
#1700

Improved Video VAE for Latent Video Diffusion Model

Pingyu Wu, Kai Zhu, Yu Liu et al.

CVPR 2025arXiv:2411.06449
20
citations
#1701

Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking

You Wu, Xucheng Wang, Xiangyang Yang et al.

CVPR 2025arXiv:2504.09228
20
citations
#1702

Making Vision Transformers Truly Shift-Equivariant

Renan A. Rojas-Gomez, Teck-Yian Lim, Minh Do et al.

CVPR 2024arXiv:2305.16316
20
citations
#1703

EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting

Dong In Lee, Hyeongcheol Park, Jiyoung Seo et al.

CVPR 2025arXiv:2412.11520
20
citations
#1704

Any-Shift Prompting for Generalization over Distributions

Zehao Xiao, Jiayi Shen, Mohammad Mahdi Derakhshani et al.

CVPR 2024arXiv:2402.10099
20
citations
#1705

TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression

Ho-Joong Kim, Jung-Ho Hong, Heejo Kong et al.

CVPR 2024arXiv:2404.02405
20
citations
#1706

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

ziang yan, Zhilin Li, Yinan He et al.

CVPR 2025arXiv:2412.19326
20
citations
#1707

MoFlow: One-Step Flow Matching for Human Trajectory Forecasting via Implicit Maximum Likelihood Estimation based Distillation

Yuxiang Fu, Qi Yan, Ke Li et al.

CVPR 2025arXiv:2503.09950
20
citations
#1708

Multi-Sensor Object Anomaly Detection: Unifying Appearance, Geometry, and Internal Properties

wenqiao Li, BoZhong Zheng, Xiaohao Xu et al.

CVPR 2025arXiv:2412.14592
20
citations
#1709

One-Class Face Anti-spoofing via Spoof Cue Map-Guided Feature Learning

Pei-Kai Huang, Cheng-Hsuan Chiang, Tzu-Hsien Chen et al.

CVPR 2024
20
citations
#1710

ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval

Fang Kaipeng, Jingkuan Song, Lianli Gao et al.

CVPR 2024arXiv:2312.12478
20
citations
#1711

Open-World Semantic Segmentation Including Class Similarity

Matteo Sodano, Federico Magistri, Lucas Nunes et al.

CVPR 2024arXiv:2403.07532
20
citations
#1712

Dexterous Grasp Transformer

Guo-Hao Xu, Yi-Lin Wei, Dian Zheng et al.

CVPR 2024arXiv:2404.18135
20
citations
#1713

FlowTrack: Revisiting Optical Flow for Long-Range Dense Tracking

Seokju Cho, Gabriel Huang, Seungryong Kim et al.

CVPR 2024
20
citations
#1714

Structure-Guided Adversarial Training of Diffusion Models

Ling Yang, Haotian Qian, Zhilong Zhang et al.

CVPR 2024arXiv:2402.17563
20
citations
#1715

DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation

Bo-Wen Yin, Jiao-Long Cao, Ming-Ming Cheng et al.

CVPR 2025arXiv:2504.04701
20
citations
#1716

Dual Prior Unfolding for Snapshot Compressive Imaging

Jiancheng Zhang, Haijin Zeng, Jiezhang Cao et al.

CVPR 2024
20
citations
#1717

A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified Benchmark

Jakub Paplham, Vojtech Franc

CVPR 2024arXiv:2307.04570
20
citations
#1718

BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model

song yiran, Qianyu Zhou, Xiangtai Li et al.

CVPR 2024arXiv:2401.02317
20
citations
#1719

EmoEdit: Evoking Emotions through Image Manipulation

Jingyuan Yang, Jiawei Feng, Weibin Luo et al.

CVPR 2025arXiv:2405.12661
20
citations
#1720

Adapting Short-Term Transformers for Action Detection in Untrimmed Videos

Min Yang, gaohuan, Ping Guo et al.

CVPR 2024arXiv:2312.01897
20
citations
#1721

Correspondence-Free Non-Rigid Point Set Registration Using Unsupervised Clustering Analysis

Mingyang Zhao, Jiang Jingen, Lei Ma et al.

CVPR 2024highlightarXiv:2406.18817
20
citations
#1722

Unsupervised Occupancy Learning from Sparse Point Cloud

Amine Ouasfi, Adnane Boukhayma

CVPR 2024highlightarXiv:2404.02759
20
citations
#1723

Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation

Ji-Jia Wu, Andy Chia-Hao Chang, Chieh-Yu Chuang et al.

CVPR 2024arXiv:2404.04231
20
citations
#1724

RecDiffusion: Rectangling for Image Stitching with Diffusion Models

Tianhao Zhou, Li Haipeng, Ziyi Wang et al.

CVPR 2024arXiv:2403.19164
20
citations
#1725

Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection

Jiacheng Zhang, Jiaming Li, Xiangru Lin et al.

CVPR 2024arXiv:2403.17387
20
citations
#1726

ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images

Nicolas Bourriez, Ihab Bendidi, Cohen Ethan et al.

CVPR 2024arXiv:2311.15264
20
citations
#1727

Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices

Huancheng Chen, Haris Vikalo

CVPR 2024arXiv:2311.18129
20
citations
#1728

Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy

Joonhyun Jeong, Seyun Bae, Yeonsung Jung et al.

CVPR 2025arXiv:2503.20823
20
citations
#1729

LaVin-DiT: Large Vision Diffusion Transformer

Zhaoqing Wang, Xiaobo Xia, Runnan Chen et al.

CVPR 2025arXiv:2411.11505
20
citations
#1730

SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout Control

Jaskirat Singh, Jianming Zhang, Qing Liu et al.

CVPR 2024arXiv:2312.05039
20
citations
#1731

RMem: Restricted Memory Banks Improve Video Object Segmentation

Junbao Zhou, Ziqi Pang, Yu-Xiong Wang

CVPR 2024arXiv:2406.08476
20
citations
#1732

MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction

Jingcheng Ni, Yuxin Guo, Yichen Liu et al.

CVPR 2025arXiv:2502.11663
20
citations
#1733

LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels

Tuo Feng, Wenguan Wang, Fan Ma et al.

CVPR 2024arXiv:2403.15173
20
citations
#1734

Robust Image Denoising through Adversarial Frequency Mixup

Donghun Ryou, Inju Ha, Hyewon Yoo et al.

CVPR 2024
20
citations
#1735

CA-Jaccard: Camera-aware Jaccard Distance for Person Re-identification

Yiyu Chen, Zheyi Fan, Zhaoru Chen et al.

CVPR 2024arXiv:2311.10605
20
citations
#1736

CLiC: Concept Learning in Context

Mehdi Safaee, Aryan Mikaeili, Or Patashnik et al.

CVPR 2024highlightarXiv:2311.17083
20
citations
#1737

Learning Inclusion Matching for Animation Paint Bucket Colorization

Yuekun Dai, Shangchen Zhou, Blake Li et al.

CVPR 2024arXiv:2403.18342
20
citations
#1738

Breaking the Low-Rank Dilemma of Linear Attention

Qihang Fan, Huaibo Huang, Ran He

CVPR 2025arXiv:2411.07635
20
citations
#1739

Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement

Ziyu Wang, Yue Xu, Cewu Lu et al.

CVPR 2024arXiv:2312.00362
20
citations
#1740

HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos

Jinglei Zhang, Jiankang Deng, Chao Ma et al.

CVPR 2025highlightarXiv:2501.02973
20
citations
#1741

Taming Teacher Forcing for Masked Autoregressive Video Generation

Deyu Zhou, Quan Sun, Yuang Peng et al.

CVPR 2025arXiv:2501.12389
20
citations
#1742

OmniGuard: Hybrid Manipulation Localization via Augmented Versatile Deep Image Watermarking

Xuanyu Zhang, Zecheng Tang, Zhipei Xu et al.

CVPR 2025arXiv:2412.01615
20
citations
#1743

Faces that Speak: Jointly Synthesising Talking Face and Speech from Text

Youngjoon Jang, Jihoon Kim, Junseok Ahn et al.

CVPR 2024arXiv:2405.10272
20
citations
#1744

UniK3D: Universal Camera Monocular 3D Estimation

Luigi Piccinelli, Christos Sakaridis, Mattia Segu et al.

CVPR 2025arXiv:2503.16591
20
citations
#1745

Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models

Ronghuan Wu, Wanchao Su, Jing Liao

CVPR 2025arXiv:2411.16602
20
citations
#1746

Genuine Knowledge from Practice: Diffusion Test-Time Adaptation for Video Adverse Weather Removal

Yijun Yang, Hongtao Wu, Angelica I. Aviles-Rivero et al.

CVPR 2024arXiv:2403.07684
20
citations
#1747

GlitchBench: Can Large Multimodal Models Detect Video Game Glitches?

Mohammad Reza Taesiri, Tianjun Feng, Cor-Paul Bezemer et al.

CVPR 2024arXiv:2312.05291
19
citations
#1748

NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation

Sicheng Li, Hao Li, Yiyi Liao et al.

CVPR 2024arXiv:2404.02185
19
citations
#1749

ShiftwiseConv: Small Convolutional Kernel with Large Kernel Effect

Dachong Li, li li, zhuangzhuang chen et al.

CVPR 2025arXiv:2401.12736
19
citations
#1750

Generative Omnimatte: Learning to Decompose Video into Layers

Yao-Chih Lee, Erika Lu, Sarah Rumbley et al.

CVPR 2025highlightarXiv:2411.16683
19
citations
#1751

DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture

Qianlong Xiang, Miao Zhang, Yuzhang Shang et al.

CVPR 2025arXiv:2409.03550
19
citations
#1752

TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting

Bojun Xiong, Jialun Liu, JiaKui Hu et al.

CVPR 2025arXiv:2411.19654
19
citations
#1753

4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models

Wanhua Li, Renping Zhou, Jiawei Zhou et al.

CVPR 2025arXiv:2503.10437
19
citations
#1754

Feat2GS: Probing Visual Foundation Models with Gaussian Splatting

Yue Chen, Xingyu Chen, Anpei Chen et al.

CVPR 2025arXiv:2412.09606
19
citations
#1755

URHand: Universal Relightable Hands

Zhaoxi Chen, Gyeongsik Moon, Kaiwen Guo et al.

CVPR 2024arXiv:2401.05334
19
citations
#1756

Learning Continuous 3D Words for Text-to-Image Generation

Ta-Ying Cheng, Matheus Gadelha, Thibault Groueix et al.

CVPR 2024arXiv:2402.08654
19
citations
#1757

eTraM: Event-based Traffic Monitoring Dataset

Aayush Atul Verma, Bharatesh Chakravarthi, Arpitsinh Vaghela et al.

CVPR 2024highlightarXiv:2403.19976
19
citations
#1758

Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation

Nicolas Dufour, Vicky Kalogeiton, David Picard et al.

CVPR 2025arXiv:2412.06781
19
citations
#1759

Scalable 3D Registration via Truncated Entry-wise Absolute Residuals

Tianyu Huang, Liangzu Peng, Rene Vidal et al.

CVPR 2024arXiv:2404.00915
19
citations
#1760

SuperSVG: Superpixel-based Scalable Vector Graphics Synthesis

Teng Hu, Ran Yi, Baihong Qian et al.

CVPR 2024arXiv:2406.09794
19
citations
#1761

DriveTrack: A Benchmark for Long-Range Point Tracking in Real-World Videos

Arjun Balasingam, Joseph Chandler, Chenning Li et al.

CVPR 2024arXiv:2312.09523
19
citations
#1762

MaskGaussian: Adaptive 3D Gaussian Representation from Probabilistic Masks

Yifei Liu, Zhihang Zhong, Yifan Zhan et al.

CVPR 2025arXiv:2412.20522
19
citations
#1763

A Stealthy Wrongdoer: Feature-Oriented Reconstruction Attack against Split Learning

Xiaoyang Xu, Mengda Yang, Wenzhe Yi et al.

CVPR 2024arXiv:2405.04115
19
citations
#1764

NViST: In the Wild New View Synthesis from a Single Image with Transformers

Wonbong Jang, Lourdes Agapito

CVPR 2024arXiv:2312.08568
19
citations
#1765

Text-Enhanced Data-free Approach for Federated Class-Incremental Learning

Minh-Tuan Tran, Trung Le, Xuan-May Le et al.

CVPR 2024arXiv:2403.14101
19
citations
#1766

Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration

Tony C. W. MOK, Zi Li, Yunhao Bai et al.

CVPR 2024highlightarXiv:2402.18933
19
citations
#1767

As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion Priors

Seungwoo Yoo, Kunho Kim, Vladimir G. Kim et al.

CVPR 2024arXiv:2311.16739
19
citations
#1768

Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models

Jingyao Xu, Yuetong Lu, Yandong Li et al.

CVPR 2024arXiv:2404.15081
19
citations
#1769

MESA: Matching Everything by Segmenting Anything

Yesheng Zhang, Xu Zhao

CVPR 2024arXiv:2401.16741
19
citations
#1770

Exploring Efficient Asymmetric Blind-Spots for Self-Supervised Denoising in Real-World Scenarios

Shiyan Chen, Jiyuan Zhang, Zhaofei Yu et al.

CVPR 2024arXiv:2303.16783
19
citations
#1771

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

Yunhan Yang, Yukun Huang, Xiaoyang Wu et al.

CVPR 2024arXiv:2312.03611
19
citations
#1772

PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos

Yufei Zhang, Jeffrey Kephart, Zijun Cui et al.

CVPR 2024arXiv:2404.04430
19
citations
#1773

Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation

Xinyao Li, Yuke Li, Zhekai Du et al.

CVPR 2024arXiv:2403.06946
19
citations
#1774

Rotation-Agnostic Image Representation Learning for Digital Pathology

Saghir Alfasly, Abubakr Shafique, Peyman Nejat et al.

CVPR 2024arXiv:2311.08359
19
citations
#1775

Unsupervised Keypoints from Pretrained Diffusion Models

Eric Hedlin, Gopal Sharma, Shweta Mahajan et al.

CVPR 2024highlightarXiv:2312.00065
19
citations
#1776

Bridging the Synthetic-to-Authentic Gap: Distortion-Guided Unsupervised Domain Adaptation for Blind Image Quality Assessment

Aobo Li, Jinjian Wu, Yongxu Liu et al.

CVPR 2024arXiv:2405.04167
19
citations
#1777

Holoported Characters: Real-time Free-viewpoint Rendering of Humans from Sparse RGB Cameras

Ashwath Shetty, Marc Habermann, Guoxing Sun et al.

CVPR 2024arXiv:2312.07423
19
citations
#1778

Constructing and Exploring Intermediate Domains in Mixed Domain Semi-supervised Medical Image Segmentation

Qinghe Ma, Jian Zhang, Lei Qi et al.

CVPR 2024arXiv:2404.08951
19
citations
#1779

Factored-NeuS: Reconstructing Surfaces, Illumination, and Materials of Possibly Glossy Objects

Yue Fan, Ningjing Fan, Ivan Skorokhodov et al.

CVPR 2025arXiv:2305.17929
19
citations
#1780

DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation

Zeeshan Hayder, Xuming He

CVPR 2024arXiv:2403.14886
19
citations
#1781

Attribute-Guided Pedestrian Retrieval: Bridging Person Re-ID with Internal Attribute Variability

Yan Huang, Zhang Zhang, Qiang Wu et al.

CVPR 2024
19
citations
#1782

CarPlanner: Consistent Auto-regressive Trajectory Planning for Large-Scale Reinforcement Learning in Autonomous Driving

Dongkun Zhang, Jiaming Liang, Ke Guo et al.

CVPR 2025arXiv:2502.19908
19
citations
#1783

Segmenting Maxillofacial Structures in CBCT Volumes

Federico Bolelli, Kevin Marchesini, Niels van Nistelrooij et al.

CVPR 2025
19
citations
#1784

MoST: Motion Style Transformer Between Diverse Action Contents

Boeun Kim, Jungho Kim, Hyung Jin Chang et al.

CVPR 2024arXiv:2403.06225
19
citations
#1785

Discriminability-Driven Channel Selection for Out-of-Distribution Detection

Yue Yuan, Rundong He, Yicong Dong et al.

CVPR 2024
19
citations
#1786

Overload: Latency Attacks on Object Detection for Edge Devices

Erh-Chung Chen, Pin-Yu Chen, I-Hsin Chung et al.

CVPR 2024arXiv:2304.05370
19
citations
#1787

Instant3dit: Multiview Inpainting for Fast Editing of 3D Objects

Amir Barda, Matheus Gadelha, Vladimir G. Kim et al.

CVPR 2025arXiv:2412.00518
19
citations
#1788

SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training

Jierun Chen, Dongting Hu, Xijie Huang et al.

CVPR 2025highlightarXiv:2412.09619
19
citations
#1789

GenFusion: Closing the Loop between Reconstruction and Generation via Videos

Sibo Wu, Congrong Xu, Binbin Huang et al.

CVPR 2025arXiv:2503.21219
19
citations
#1790

Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification

Chao Yi, Lu Ren, De-Chuan Zhan et al.

CVPR 2024arXiv:2404.17753
19
citations
#1791

Segment Every Out-of-Distribution Object

Wenjie Zhao, Jia Li, Xin Dong et al.

CVPR 2024arXiv:2311.16516
19
citations
#1792

DashGaussian: Optimizing 3D Gaussian Splatting in 200 Seconds

Youyu Chen, Junjun Jiang, Kui Jiang et al.

CVPR 2025highlightarXiv:2503.18402
19
citations
#1793

Driving-Video Dehazing with Non-Aligned Regularization for Safety Assistance

Junkai Fan, Jiangwei Weng, Kun Wang et al.

CVPR 2024arXiv:2405.09996
19
citations
#1794

The Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement

Gabriele Trivigno, Carlo Masone, Barbara Caputo et al.

CVPR 2024highlightarXiv:2404.10438
19
citations
#1795

Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model

Zelin Peng, Zhengqin Xu, Zhilin Zeng et al.

CVPR 2024arXiv:2311.17112
19
citations
#1796

From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding

Yonglu Li, Xiaoqian Wu, Xinpeng Liu et al.

CVPR 2024highlightarXiv:2304.00553
19
citations
#1797

LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting

Xiaoyan Xing, Konrad Groh, Sezer Karaoglu et al.

CVPR 2025arXiv:2412.00177
19
citations
#1798

Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation

Zilyu Ye, Zhiyang Chen, Tiancheng Li et al.

CVPR 2025arXiv:2412.01243
19
citations
#1799

Fair-VPT: Fair Visual Prompt Tuning for Image Classification

Sungho Park, Hyeran Byun

CVPR 2024
19
citations
#1800

Cross-View Completion Models are Zero-shot Correspondence Estimators

Honggyu An, Jin Hyeon Kim, Seonghoon Park et al.

CVPR 2025highlightarXiv:2412.09072
19
citations