Most Cited CVPR "neural collapse" Papers

5,589 papers found • Page 13 of 28

#2401

D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection

Dinh Phat Do, Taehoon Kim, JAEMIN NA et al.

CVPR 2024arXiv:2403.09359
12
citations
#2402

MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model

Chenjie Cao, Chaohui Yu, Shang Liu et al.

CVPR 2025arXiv:2411.16157
12
citations
#2403

Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior

Haitao Wu, Qing Li, Changqing Zhang et al.

CVPR 2025arXiv:2503.04207
12
citations
#2404

Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning

Rashindrie Perera, Saman Halgamuge

CVPR 2024arXiv:2403.04492
12
citations
#2405

Data-Efficient Multimodal Fusion on a Single GPU

Noël Vouitsis, Zhaoyan Liu, Satya Krishna Gorti et al.

CVPR 2024highlightarXiv:2312.10144
12
citations
#2406

Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning

Jiange Yang, Haoyi Zhu, Yating Wang et al.

CVPR 2025arXiv:2411.14519
12
citations
#2407

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

Chenyu Yang, Xuan Dong, Xizhou Zhu et al.

CVPR 2025arXiv:2412.09613
12
citations
#2408

Detecting Backdoor Attacks in Federated Learning via Direction Alignment Inspection

Jiahao Xu, Zikai Zhang, Rui Hu

CVPR 2025highlightarXiv:2503.07978
12
citations
#2409

HOIGPT: Learning Long-Sequence Hand-Object Interaction with Language Models

Mingzhen Huang, Fu-Jen Chu, Bugra Tekin et al.

CVPR 2025arXiv:2503.19157
12
citations
#2410

GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration

Yuchen Sun, Shanhui Zhao, Tao Yu et al.

CVPR 2025arXiv:2503.17709
12
citations
#2411

GaussianSpa: An “Optimizing-Sparsifying” Simplification Framework for Compact and High-Quality 3D Gaussian Splatting

Yangming Zhang, Wenqi Jia, Wei Niu et al.

CVPR 2025arXiv:2411.06019
12
citations
#2412

DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation

Zhixuan Liang, Yao Mu, Yixiao Wang et al.

CVPR 2025arXiv:2411.18562
12
citations
#2413

Asymmetric Masked Distillation for Pre-Training Small Foundation Models

Zhiyu Zhao, Bingkun Huang, Sen Xing et al.

CVPR 2024arXiv:2311.03149
12
citations
#2414

Defense Against Adversarial Attacks on No-Reference Image Quality Models with Gradient Norm Regularization

Yujia Liu, Chenxi Yang, Dingquan Li et al.

CVPR 2024arXiv:2403.11397
12
citations
#2415

Enhancing Visual Continual Learning with Language-Guided Supervision

Bolin Ni, Hongbo Zhao, Chenghao Zhang et al.

CVPR 2024arXiv:2403.16124
12
citations
#2416

Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models

Jinho Jeong, Sangmin Han, Jinwoo Kim et al.

CVPR 2025arXiv:2503.18446
12
citations
#2417

Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models

Pengze Zhang, Hubery Yin, Chen Li et al.

CVPR 2024highlightarXiv:2403.08381
12
citations
#2418

Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios

Kai Wang, Zekai Li, Zhi-Qi Cheng et al.

CVPR 2025arXiv:2410.17193
12
citations
#2419

Mr. DETR: Instructive Multi-Route Training for Detection Transformers

Chang-Bin Zhang, Yujie Zhong, Kai Han

CVPR 2025
12
citations
#2420

Neural Implicit Morphing of Face Images

Guilherme Schardong, Tiago Novello, Hallison Paz et al.

CVPR 2024arXiv:2308.13888
12
citations
#2421

CDMAD: Class-Distribution-Mismatch-Aware Debiasing for Class-Imbalanced Semi-Supervised Learning

Hyuck Lee, Heeyoung Kim

CVPR 2024arXiv:2403.10391
12
citations
#2422

Test-Time Backdoor Detection for Object Detection Models

Hangtao Zhang, Yichen Wang, Shihui Yan et al.

CVPR 2025arXiv:2503.15293
12
citations
#2423

FocusMAE: Gallbladder Cancer Detection from Ultrasound Videos with Focused Masked Autoencoders

Soumen Basu, Mayuna Gupta, Chetan Madan et al.

CVPR 2024arXiv:2403.08848
12
citations
#2424

MultiPhys: Multi-Person Physics-aware 3D Motion Estimation

Nicolás Ugrinovic, Boxiao Pan, Georgios Pavlakos et al.

CVPR 2024arXiv:2404.11987
12
citations
#2425

TexOct: Generating Textures of 3D Models with Octree-based Diffusion

Jialun Liu, Chenming Wu, Xinqi Liu et al.

CVPR 2024
12
citations
#2426

ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting

Shaofei Cai, Zihao Wang, Kewei Lian et al.

CVPR 2025arXiv:2410.17856
12
citations
#2427

Generative Powers of Ten

Xiaojuan Wang, Janne Kontkanen, Brian Curless et al.

CVPR 2024highlightarXiv:2312.02149
12
citations
#2428

OmniStyle: Filtering High Quality Style Transfer Data at Scale

Ye Wang, Ruiqi Liu, Jiang Lin et al.

CVPR 2025arXiv:2505.14028
12
citations
#2429

VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding

Kangsan Kim, Geon Park, Youngwan Lee et al.

CVPR 2025arXiv:2412.02186
12
citations
#2430

Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction

Xiaoyang Lyu, Chirui Chang, Peng Dai et al.

CVPR 2024highlightarXiv:2403.19314
12
citations
#2431

Discriminative Probing and Tuning for Text-to-Image Generation

Leigang Qu, Wenjie Wang, Yongqi Li et al.

CVPR 2024arXiv:2403.04321
12
citations
#2432

Decoupled Distillation to Erase: A General Unlearning Method for Any Class-centric Tasks

Yu Zhou, Dian Zheng, Qijie Mo et al.

CVPR 2025highlightarXiv:2503.23751
12
citations
#2433

NexusGS: Sparse View Synthesis with Epipolar Depth Priors in 3D Gaussian Splatting

Yulong Zheng, Zicheng Jiang, Shengfeng He et al.

CVPR 2025highlightarXiv:2503.18794
12
citations
#2434

Seeing the Unseen: Visual Common Sense for Semantic Placement

Ram Ramrakhya, Aniruddha Kembhavi, Dhruv Batra et al.

CVPR 2024arXiv:2401.07770
12
citations
#2435

Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation

Yu Qi, Yuanchen Ju, Tianming Wei et al.

CVPR 2025arXiv:2504.06961
12
citations
#2436

VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation

Saksham Singh Kushwaha, Yapeng Tian

CVPR 2025arXiv:2412.10768
12
citations
#2437

Geometrically-driven Aggregation for Zero-shot 3D Point Cloud Understanding

Guofeng Mei, Luigi Riz, Yiming Wang et al.

CVPR 2024highlightarXiv:2312.02244
12
citations
#2438

S²MVTC: a Simple yet Efficient Scalable Multi-View Tensor Clustering

Zhen Long, Qiyuan Wang, Yazhou Ren et al.

CVPR 2024
12
citations
#2439

LoRACLR: Contrastive Adaptation for Customization of Diffusion Models

Enis Simsar, Thomas Hofmann, Federico Tombari et al.

CVPR 2025arXiv:2412.09622
12
citations
#2440

Discover and Mitigate Multiple Biased Subgroups in Image Classifiers

Zeliang Zhang, Mingqian Feng, Zhiheng Li et al.

CVPR 2024arXiv:2403.12777
12
citations
#2441

MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation

Mingcheng Li, Xiaolu Hou, Ziyang Liu et al.

CVPR 2025arXiv:2505.02648
12
citations
#2442

On the Estimation of Image-matching Uncertainty in Visual Place Recognition

Mubariz Zaffar, Liangliang Nan, Julian F. P. Kooij

CVPR 2024highlightarXiv:2404.00546
12
citations
#2443

SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation

Duc-Hai Pham, Tung Do, Phong Nguyen et al.

CVPR 2025arXiv:2411.18229
12
citations
#2444

Weakly Supervised Monocular 3D Detection with a Single-View Image

Xueying Jiang, Sheng Jin, Lewei Lu et al.

CVPR 2024arXiv:2402.19144
12
citations
#2445

MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation

Huaize Liu, WenZhang Sun, Donglin Di et al.

CVPR 2025arXiv:2501.01808
12
citations
#2446

Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

Rui Li, Tobias Fischer, Mattia Segu et al.

CVPR 2024arXiv:2404.03658
12
citations
#2447

Puff-Net: Efficient Style Transfer with Pure Content and Style Feature Fusion Network

Sizhe Zheng, Pan Gao, Peng Zhou et al.

CVPR 2024arXiv:2405.19775
12
citations
#2448

Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution

Qihao Liu, Xi Yin, Alan L. Yuille et al.

CVPR 2025highlightarXiv:2412.15213
12
citations
#2449

SpiritSight Agent: Advanced GUI Agent with One Look

Zhiyuan Huang, Ziming Cheng, Junting Pan et al.

CVPR 2025arXiv:2503.03196
12
citations
#2450

SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion

Trong-Tung Nguyen, Quang Nguyen, Khoi Nguyen et al.

CVPR 2025arXiv:2412.04301
12
citations
#2451

Unexplored Faces of Robustness and Out-of-Distribution: Covariate Shifts in Environment and Sensor Domains

Eunsu Baek, Keondo Park, Ji-yoon Kim et al.

CVPR 2024arXiv:2404.15882
12
citations
#2452

AKiRa: Augmentation Kit on Rays for Optical Video Generation

Xi Wang, Robin Courant, Marc Christie et al.

CVPR 2025arXiv:2412.14158
12
citations
#2453

Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation

Yiping Wang, Xuehai He, Kuan Wang et al.

CVPR 2025arXiv:2412.16211
12
citations
#2454

Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning

Zihua Zhao, Mengxi Chen, Tianjie Dai et al.

CVPR 2024arXiv:2405.16996
12
citations
#2455

Data-Efficient Unsupervised Interpolation Without Any Intermediate Frame for 4D Medical Images

JungEun Kim, Hangyul Yoon, Geondo Park et al.

CVPR 2024arXiv:2404.01464
12
citations
#2456

PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion and Explicit Remeshing

Peng Li, Wangguandong Zheng, Yuan Liu et al.

CVPR 2025arXiv:2409.10141
12
citations
#2457

InsightEdit: Towards Better Instruction Following for Image Editing

Yingjing Xu, Jie Kong, Jiazhi Wang et al.

CVPR 2025arXiv:2411.17323
12
citations
#2458

CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools

Chinedu Innocent Nwoye, Kareem elgohary, Anvita A. Srinivas et al.

CVPR 2025arXiv:2312.07352
11
citations
#2459

JointSQ: Joint Sparsification-Quantization for Distributed Learning

Weiying Xie, Haowei Li, Ma Jitao et al.

CVPR 2024
11
citations
#2460

Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction

Wenke Xia, Ruoxuan Feng, Dong Wang et al.

CVPR 2025arXiv:2504.14588
11
citations
#2461

EMOE: Modality-Specific Enhanced Dynamic Emotion Experts

Yiyang Fang, Wenke Huang, Guancheng Wan et al.

CVPR 2025
11
citations
#2462

Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities

Michele Mazzamuto, Antonino Furnari, Yoichi Sato et al.

CVPR 2025arXiv:2406.08379
11
citations
#2463

MIMO: A Medical Vision Language Model with Visual Referring Multimodal Input and Pixel Grounding Multimodal Output

Yanyuan Chen, Dexuan Xu, Yu Huang et al.

CVPR 2025arXiv:2510.10011
11
citations
#2464

GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image

Chong Bao, Yinda Zhang, Yuan Li et al.

CVPR 2024arXiv:2404.02152
11
citations
#2465

GenPC: Zero-shot Point Cloud Completion via 3D Generative Priors

An Li, Zhe Zhu, Mingqiang Wei

CVPR 2025arXiv:2502.19896
11
citations
#2466

Geometry Field Splatting with Gaussian Surfels

Kaiwen Jiang, Venkataram Sivaram, Cheng Peng et al.

CVPR 2025arXiv:2411.17067
11
citations
#2467

Bridging the Gap Between End-to-End and Two-Step Text Spotting

Mingxin Huang, Hongliang Li, Yuliang Liu et al.

CVPR 2024arXiv:2404.04624
11
citations
#2468

NeRFPrior: Learning Neural Radiance Field as a Prior for Indoor Scene Reconstruction

Wenyuan Zhang, Emily Yue-ting Jia, Junsheng Zhou et al.

CVPR 2025highlightarXiv:2503.18361
11
citations
#2469

MatAnyone: Stable Video Matting with Consistent Memory Propagation

Peiqing Yang, Shangchen Zhou, Jixin Zhao et al.

CVPR 2025arXiv:2501.14677
11
citations
#2470

YolOOD: Utilizing Object Detection Concepts for Multi-Label Out-of-Distribution Detection

Alon Zolfi, Guy AmiT, Amit Baras et al.

CVPR 2024arXiv:2212.02081
11
citations
#2471

LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images

Jing Zhang, Irving Fang, Hao Wu et al.

CVPR 2024highlightarXiv:2403.13171
11
citations
#2472

Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level

Andong Deng, Tongjia Chen, Shoubin Yu et al.

CVPR 2025arXiv:2411.09921
11
citations
#2473

GaussianUDF: Inferring Unsigned Distance Functions through 3D Gaussian Splatting

Shujuan Li, Yu-Shen Liu, Zhizhong Han

CVPR 2025highlightarXiv:2503.19458
11
citations
#2474

From Poses to Identity: Training-Free Person Re-Identification via Feature Centralization

Chao Yuan, Guiwei Zhang, Changxiao Ma et al.

CVPR 2025arXiv:2503.00938
11
citations
#2475

Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting

Taeho Kang, Youngki Lee

CVPR 2024highlightarXiv:2402.18330
11
citations
#2476

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models

Jin Wang, Chenghui Lv, Xian Li et al.

CVPR 2025arXiv:2503.15024
11
citations
#2477

When the Future Becomes the Past: Taming Temporal Correspondence for Self-supervised Video Representation Learning

Yang Liu, Qianqian Xu, Peisong Wen et al.

CVPR 2025arXiv:2503.15096
11
citations
#2478

Quantifying Task Priority for Multi-Task Optimization

Wooseong Jeong, Kuk-Jin Yoon

CVPR 2024arXiv:2406.02996
11
citations
#2479

PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes

Bin Tan, Rui Yu, Yujun Shen et al.

CVPR 2025highlightarXiv:2412.03451
11
citations
#2480

DPHMs: Diffusion Parametric Head Models for Depth-based Tracking

Jiapeng Tang, Angela Dai, Yinyu Nie et al.

CVPR 2024arXiv:2312.01068
11
citations
#2481

CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP

Songlong Xing, Zhengyu Zhao, Nicu Sebe

CVPR 2025arXiv:2503.03613
11
citations
#2482

EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis

Sheng Miao, Jiaxin Huang, Dongfeng Bai et al.

CVPR 2025arXiv:2503.20168
11
citations
#2483

BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation

Yulu Pan, Ce Zhang, Gedas Bertasius

CVPR 2025arXiv:2503.20781
11
citations
#2484

Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation

Jingxi Chen, Brandon Y. Feng, Haoming Cai et al.

CVPR 2025arXiv:2412.07761
11
citations
#2485

Rethinking Few-Shot Adaptation of Vision-Language Models in Two Stages

Matteo Farina, Massimiliano Mancini, Giovanni Iacca et al.

CVPR 2025arXiv:2503.11609
11
citations
#2486

STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction

Zhimin Liao, Ping Wei, Shuaijia Chen et al.

CVPR 2025arXiv:2504.19749
11
citations
#2487

Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise

Brayan Monroy, Jorge Bacca, Julián Tachella

CVPR 2025arXiv:2412.04648
11
citations
#2488

Towards Understanding and Improving Adversarial Robustness of Vision Transformers

Samyak Jain, Tanima Dutta

CVPR 2024
11
citations
#2489

Data Synthesis with Diverse Styles for Face Recognition via 3DMM-Guided Diffusion

Yuxi Mi, Zhizhou Zhong, Yuge Huang et al.

CVPR 2025arXiv:2504.00430
11
citations
#2490

ResCLIP: Residual Attention for Training-free Dense Vision-language Inference

Jinhong Deng, Yuhang Yang, Wen Li et al.

CVPR 2025arXiv:2411.15851
11
citations
#2491

Mosaic of Modalities: A Comprehensive Benchmark for Multimodal Graph Learning

Jing Zhu, Yuhang Zhou, Shengyi Qian et al.

CVPR 2025arXiv:2406.16321
11
citations
#2492

FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning

Gaojian Wang, Feng Lin, Tong Wu et al.

CVPR 2025arXiv:2412.12032
11
citations
#2493

MFP: Making Full Use of Probability Maps for Interactive Image Segmentation

Chaewon Lee, Seon-Ho Lee, Chang-Su Kim

CVPR 2024arXiv:2404.18448
11
citations
#2494

FLHetBench: Benchmarking Device and State Heterogeneity in Federated Learning

Junyuan Zhang, Shuang Zeng, Miao Zhang et al.

CVPR 2024
11
citations
#2495

Synergistic Global-space Camera and Human Reconstruction from Videos

Yizhou Zhao, Tuanfeng Y. Wang, Bhiksha Raj et al.

CVPR 2024arXiv:2405.14855
11
citations
#2496

Generalized Event Cameras

Varun Sundar, Matthew Dutson, Andrei Ardelean et al.

CVPR 2024arXiv:2407.02683
11
citations
#2497

Multi-view Reconstruction via SfM-guided Monocular Depth Estimation

Haoyu Guo, He Zhu, Sida Peng et al.

CVPR 2025arXiv:2503.14483
11
citations
#2498

Objects as Volumes: A Stochastic Geometry View of Opaque Solids

Bailey Miller, Hanyu Chen, Alice Lai et al.

CVPR 2024arXiv:2312.15406
11
citations
#2499

VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step

Hanyang Wang, Fangfu Liu, Jiawei Chi et al.

CVPR 2025highlightarXiv:2504.01956
11
citations
#2500

Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering

Federico Cocchi, Nicholas Moratelli, Marcella Cornia et al.

CVPR 2025arXiv:2411.16863
11
citations
#2501

How to Train Neural Field Representations: A Comprehensive Study and Benchmark

Samuele Papa, Riccardo Valperga, David Knigge et al.

CVPR 2024arXiv:2312.10531
11
citations
#2502

Clockwork Diffusion: Efficient Generation With Model-Step Distillation

Amirhossein Habibian, Amir Ghodrati, Noor Fathima et al.

CVPR 2024highlightarXiv:2312.08128
11
citations
#2503

Distributionally Generative Augmentation for Fair Facial Attribute Classification

Fengda Zhang, Qianpei He, Kun Kuang et al.

CVPR 2024arXiv:2403.06606
11
citations
#2504

RGBAvatar: Reduced Gaussian Blendshapes for Online Modeling of Head Avatars

Linzhou Li, Yumeng Li, Yanlin Weng et al.

CVPR 2025highlightarXiv:2503.12886
11
citations
#2505

MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers

Haoyu Ma, Shahin Mahdizadehaghdam, Bichen Wu et al.

CVPR 2024arXiv:2312.12468
11
citations
#2506

MPDrive: Improving Spatial Understanding with Marker-Based Prompt Learning for Autonomous Driving

Zhi-Yuan Zhang, Xiaofan Li, Zhihao Xu et al.

CVPR 2025highlightarXiv:2504.00379
11
citations
#2507

Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation

Yongkang Li, Tianheng Cheng, Bin Feng et al.

CVPR 2025arXiv:2412.04533
11
citations
#2508

BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations

Weixi Feng, Chao Liu, Sifei Liu et al.

CVPR 2025arXiv:2501.07647
11
citations
#2509

Finsler-Laplace-Beltrami Operators with Application to Shape Analysis

Simon Weber, Thomas Dagès, Maolin Gao et al.

CVPR 2024arXiv:2404.03999
11
citations
#2510

StraightPCF: Straight Point Cloud Filtering

Dasith de Silva Edirimuni, Xuequan Lu, Gang Li et al.

CVPR 2024arXiv:2405.08322
11
citations
#2511

A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives

Simone Alberto Peirone, Francesca Pistilli, Antonio Alliegro et al.

CVPR 2024arXiv:2403.03037
11
citations
#2512

Lifting Motion to the 3D World via 2D Diffusion

Jiaman Li, Karen Liu, Jiajun Wu

CVPR 2025highlightarXiv:2411.18808
11
citations
#2513

NTO3D: Neural Target Object 3D Reconstruction with Segment Anything

Xiaobao Wei, Renrui Zhang, Jiarui Wu et al.

CVPR 2024arXiv:2309.12790
11
citations
#2514

3D-LFM: Lifting Foundation Model

Mosam Dabhi, László A. Jeni, Simon Lucey

CVPR 2024arXiv:2312.11894
11
citations
#2515

Video Summarization with Large Language Models

Min Jung Lee, Dayoung Gong, Minsu Cho

CVPR 2025arXiv:2504.11199
11
citations
#2516

SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters

Jianping Jiang, Weiye Xiao, Zhengyu Lin et al.

CVPR 2025arXiv:2412.00174
11
citations
#2517

LiDAR-RT: Gaussian-based Ray Tracing for Dynamic LiDAR Re-simulation

Chenxu Zhou, Lvchang Fu, Sida Peng et al.

CVPR 2025arXiv:2412.15199
11
citations
#2518

Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs

Yingji Zhong, Zhihao Li, Dave Zhenyu Chen et al.

CVPR 2025highlightarXiv:2503.05082
11
citations
#2519

Dual-Scale Transformer for Large-Scale Single-Pixel Imaging

Gang Qu, Ping Wang, Xin Yuan

CVPR 2024arXiv:2404.05001
11
citations
#2520

Real-World Efficient Blind Motion Deblurring via Blur Pixel Discretization

Insoo Kim, Jae Seok Choi, Geonseok Seo et al.

CVPR 2024arXiv:2404.12168
11
citations
#2521

Open-World Amodal Appearance Completion

Jiayang Ao, Yanbei Jiang, Qiuhong Ke et al.

CVPR 2025arXiv:2411.13019
11
citations
#2522

LOTUS: Evasive and Resilient Backdoor Attacks through Sub-Partitioning

Siyuan Cheng, Guanhong Tao, Yingqi Liu et al.

CVPR 2024arXiv:2403.17188
11
citations
#2523

Bayesian Test-Time Adaptation for Vision-Language Models

Lihua Zhou, Mao Ye, Shuaifeng Li et al.

CVPR 2025arXiv:2503.09248
11
citations
#2524

Cinematic Behavior Transfer via NeRF-based Differentiable Filming

Xuekun Jiang, Anyi Rao, Jingbo Wang et al.

CVPR 2024arXiv:2311.17754
11
citations
#2525

Specularity Factorization for Low-Light Enhancement

Saurabh Saini, P. J. Narayanan

CVPR 2024arXiv:2404.01998
11
citations
#2526

Robust Depth Enhancement via Polarization Prompt Fusion Tuning

Kei IKEMURA, Yiming Huang, Felix Heide et al.

CVPR 2024arXiv:2404.04318
11
citations
#2527

Rectified Diffusion Guidance for Conditional Generation

Mengfei Xia, Nan Xue, Yujun Shen et al.

CVPR 2025arXiv:2410.18737
11
citations
#2528

AttriHuman-3D: Editable 3D Human Avatar Generation with Attribute Decomposition and Indexing

Fan Yang, Tianyi Chen, XIAOSHENG HE et al.

CVPR 2024arXiv:2312.02209
11
citations
#2529

UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures

Mingyuan Zhou, Rakib Hyder, Ziwei Xuan et al.

CVPR 2024arXiv:2401.11078
11
citations
#2530

Learning Flow Fields in Attention for Controllable Person Image Generation

Zijian Zhou, Shikun Liu, Xiao Han et al.

CVPR 2025arXiv:2412.08486
11
citations
#2531

Point2RBox-v2: Rethinking Point-supervised Oriented Object Detection with Spatial Layout Among Instances

Yi Yu, Botao Ren, Peiyuan Zhang et al.

CVPR 2025arXiv:2502.04268
11
citations
#2532

SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding

Yangliu Hu, Zikai Song, Na Feng et al.

CVPR 2025arXiv:2504.07745
11
citations
#2533

MP-GUI: Modality Perception with MLLMs for GUI Understanding

Ziwei Wang, Weizhi Chen, Leyang Yang et al.

CVPR 2025arXiv:2503.14021
11
citations
#2534

Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis

Bingda Tang, Sayak Paul, Boyang Zheng et al.

CVPR 2025arXiv:2505.10046
11
citations
#2535

Is this Generated Person Existed in Real-world? Fine-grained Detecting and Calibrating Abnormal Human-body

Zeqing Wang, Qingyang Ma, Wentao Wan et al.

CVPR 2025highlightarXiv:2411.14205
11
citations
#2536

Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment

Angchi Xu, Wei-Shi Zheng

CVPR 2024arXiv:2403.19225
11
citations
#2537

SIGNeRF: Scene Integrated Generation for Neural Radiance Fields

Jan-Niklas Dihlmann, Andreas Engelhardt, Hendrik Lensch

CVPR 2024arXiv:2401.01647
11
citations
#2538

DeIL: Direct-and-Inverse CLIP for Open-World Few-Shot Learning

Shuai Shao, Yu Bai, Yan WANG et al.

CVPR 2024
11
citations
#2539

HumanNeRF-SE: A Simple yet Effective Approach to Animate HumanNeRF with Diverse Poses

Caoyuan Ma, Yu-Lun Liu, Zhixiang Wang et al.

CVPR 2024arXiv:2312.02232
11
citations
#2540

nnWNet: Rethinking the Use of Transformers in Biomedical Image Segmentation and Calling for a Unified Evaluation Benchmark

Yanfeng Zhou, Lingrui Li, Le Lu et al.

CVPR 2025
11
citations
#2541

Soften to Defend: Towards Adversarial Robustness via Self-Guided Label Refinement

Daiwei Yu, Zhuorong Li, Lina Wei et al.

CVPR 2024arXiv:2403.09101
11
citations
#2542

ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding

Zhenxing Zhang, Yaxiong Wang, Lechao Cheng et al.

CVPR 2025arXiv:2412.12718
11
citations
#2543

Using Diffusion Priors for Video Amodal Segmentation

Kaihua Chen, Deva Ramanan, Tarasha Khurana

CVPR 2025arXiv:2412.04623
11
citations
#2544

Test-Time Zero-Shot Temporal Action Localization

Benedetta Liberatori, Alessandro Conti, Paolo Rota et al.

CVPR 2024arXiv:2404.05426
11
citations
#2545

Dual Consolidation for Pre-Trained Model-Based Domain-Incremental Learning

Da-Wei Zhou, Zi-Wen Cai, Han-Jia Ye et al.

CVPR 2025arXiv:2410.00911
11
citations
#2546

COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting

Jiaxin Zhang, Junjun Jiang, Youyu Chen et al.

CVPR 2025arXiv:2503.19443
11
citations
#2547

HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator

Fan Yang, Ru Zhen, Jianing Wang et al.

CVPR 2025arXiv:2411.17261
11
citations
#2548

TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting

Jianchuan Chen, Jingchuan Hu, Gaige Wang et al.

CVPR 2025highlightarXiv:2503.17032
11
citations
#2549

DREAM: Diffusion Rectification and Estimation-Adaptive Models

Jinxin Zhou, Tianyu Ding, Tianyi Chen et al.

CVPR 2024arXiv:2312.00210
11
citations
#2550

BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing

Yunqi Gu, Ian Huang, Jihyeon Je et al.

CVPR 2025highlightarXiv:2504.01786
11
citations
#2551

From Activation to Initialization: Scaling Insights for Optimizing Neural Fields

Hemanth Saratchandran, Sameera Ramasinghe, Simon Lucey

CVPR 2024highlightarXiv:2403.19205
11
citations
#2552

MemoNav: Working Memory Model for Visual Navigation

Hongxin Li, Zeyu Wang, Xu Yang et al.

CVPR 2024highlightarXiv:2402.19161
11
citations
#2553

Disentangled Pre-training for Human-Object Interaction Detection

Zhuolong Li, Xingao Li, Changxing Ding et al.

CVPR 2024arXiv:2404.01725
11
citations
#2554

Science-T2I: Addressing Scientific Illusions in Image Synthesis

Jialuo Li, Wenhao Chai, XINGYU FU et al.

CVPR 2025arXiv:2504.13129
11
citations
#2555

Rapid Motor Adaptation for Robotic Manipulator Arms

Yichao Liang, Kevin Ellis, João F. Henriques

CVPR 2024arXiv:2312.04670
11
citations
#2556

ExpertAF: Expert Actionable Feedback from Video

Kumar Ashutosh, Tushar Nagarajan, Georgios Pavlakos et al.

CVPR 2025arXiv:2408.00672
11
citations
#2557

Initialization Matters for Adversarial Transfer Learning

Andong Hua, Jindong Gu, Zhiyu Xue et al.

CVPR 2024arXiv:2312.05716
11
citations
#2558

VladVA: Discriminative Fine-tuning of LVLMs

Yassine Ouali, Adrian Bulat, ALEXANDROS XENOS et al.

CVPR 2025arXiv:2412.04378
11
citations
#2559

PLeaS - Merging Models with Permutations and Least Squares

Anshul Nasery, Jonathan Hayase, Pang Wei Koh et al.

CVPR 2025arXiv:2407.02447
11
citations
#2560

ArtFormer: Controllable Generation of Diverse 3D Articulated Objects

Jiayi Su, Youhe Feng, Zheng Li et al.

CVPR 2025arXiv:2412.07237
11
citations
#2561

SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection

Mingxuan Liu, Tyler Hayes, Elisa Ricci et al.

CVPR 2024highlightarXiv:2405.10053
11
citations
#2562

Layered Image Vectorization via Semantic Simplification

Zhenyu Wang, Jianxi Huang, Zhida Sun et al.

CVPR 2025arXiv:2406.05404
11
citations
#2563

Bi-Causal: Group Activity Recognition via Bidirectional Causality

Youliang Zhang, Wenxuan Liu, danni xu et al.

CVPR 2024
11
citations
#2564

TopoCellGen: Generating Histopathology Cell Topology with a Diffusion Model

Meilong Xu, Saumya Gupta, Xiaoling Hu et al.

CVPR 2025arXiv:2412.06011
11
citations
#2565

Identifying and Mitigating Position Bias of Multi-image Vision-Language Models

Xinyu Tian, Shu Zou, Zhaoyuan Yang et al.

CVPR 2025arXiv:2503.13792
11
citations
#2566

Task-Agnostic Guided Feature Expansion for Class-Incremental Learning

Bowen Zheng, Da-Wei Zhou, Han-Jia Ye et al.

CVPR 2025arXiv:2503.00823
11
citations
#2567

Visual-Instructed Degradation Diffusion for All-in-One Image Restoration

Haina Qin, Wenyang Luo, Zewen Chen et al.

CVPR 2025arXiv:2506.16960
11
citations
#2568

SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining

Chull Hwan Song, Taebaek Hwang, Jooyoung Yoon et al.

CVPR 2024arXiv:2404.01156
11
citations
#2569

SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language

zehan wang, Sashuai zhou, Shaoxuan He et al.

CVPR 2025
11
citations
#2570

Semantic and Sequential Alignment for Referring Video Object Segmentation

Feiyu Pan, Hao Fang, Fangkai Li et al.

CVPR 2025
11
citations
#2571

EventGPT: Event Stream Understanding with Multimodal Large Language Models

shaoyu liu, Jianing Li, guanghui zhao et al.

CVPR 2025arXiv:2412.00832
11
citations
#2572

DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation

Zhiqiang Shen, Ammar Sherif, Zeyuan Yin et al.

CVPR 2025arXiv:2411.19946
11
citations
#2573

VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation

Weiming Ren, Huan Yang, Jie Min et al.

CVPR 2025arXiv:2412.00927
11
citations
#2574

Boosting Image Restoration via Priors from Pre-trained Models

Xiaogang Xu, Shu Kong, Tao Hu et al.

CVPR 2024arXiv:2403.06793
11
citations
#2575

Taming Stable Diffusion for Text to 360 Panorama Image Generation

Cheng Zhang, Qianyi Wu, Camilo Cruz Gambardella et al.

CVPR 2024highlightarXiv:2404.07949
11
citations
#2576

Video-Based Human Pose Regression via Decoupled Space-Time Aggregation

Jijie He, Wenwu Yang

CVPR 2024arXiv:2403.19926
11
citations
#2577

PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor

Jaewon Jung, Hongsun Jang, Jaeyong Song et al.

CVPR 2024arXiv:2403.06668
11
citations
#2578

Reconstructing People, Places, and Cameras

Lea Müller, Hongsuk Choi, Anthony Zhang et al.

CVPR 2025highlightarXiv:2412.17806
11
citations
#2579

Single Mesh Diffusion Models with Field Latents for Texture Generation

Thomas W. Mitchel, Carlos Esteves, Ameesh Makadia

CVPR 2024arXiv:2312.09250
11
citations
#2580

Towards Enhanced Image Inpainting: Mitigating Unwanted Object Insertion and Preserving Color Consistency

Yikai Wang, Chenjie Cao, Junqiu Yu et al.

CVPR 2025highlightarXiv:2312.04831
11
citations
#2581

Towards Precise Scaling Laws for Video Diffusion Transformers

Yuanyang Yin, Yaqi Zhao, Mingwu Zheng et al.

CVPR 2025arXiv:2411.17470
11
citations
#2582

PNeRV: Enhancing Spatial Consistency via Pyramidal Neural Representation for Videos

Qi Zhao, M. Salman Asif, Zhan Ma

CVPR 2024arXiv:2404.08921
10
citations
#2583

Image Quality Assessment: Investigating Causal Perceptual Effects with Abductive Counterfactual Inference

Wenhao Shen, Mingliang Zhou, Yu Chen et al.

CVPR 2025arXiv:2412.16939
10
citations
#2584

Step Differences in Instructional Video

Tushar Nagarajan, Lorenzo Torresani

CVPR 2024arXiv:2404.16222
10
citations
#2585

SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model

Chunlin Yu, Hanqing Wang, Ye Shi et al.

CVPR 2025arXiv:2412.01550
10
citations
#2586

SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments

Yue Cao, Yun Xing, Jie Zhang et al.

CVPR 2025arXiv:2412.00114
10
citations
#2587

FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation

Dong Zhao, Jinlong Li, Shuang Wang et al.

CVPR 2025arXiv:2503.17940
10
citations
#2588

Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates

Ka Chun SHUM, Jaeyeon Kim, Binh-Son Hua et al.

CVPR 2024arXiv:2309.11281
10
citations
#2589

PELA: Learning Parameter-Efficient Models with Low-Rank Approximation

Yangyang Guo, Guangzhi Wang, Mohan Kankanhalli

CVPR 2024arXiv:2310.10700
10
citations
#2590

Continuous Optical Zooming: A Benchmark for Arbitrary-Scale Image Super-Resolution in Real World

Huiyuan Fu, Fei Peng, Xianwei Li et al.

CVPR 2024
10
citations
#2591

Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis

Marianna Ohanyan, Hayk Manukyan, Zhangyang Wang et al.

CVPR 2024arXiv:2406.04032
10
citations
#2592

Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization

zefeng zhang, Hengzhu Tang, Jiawei Sheng et al.

CVPR 2025arXiv:2503.17928
10
citations
#2593

VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving

Haiming Zhang, Wending Zhou, Shenzhen The Chinese University of Hongkong et al.

CVPR 2025arXiv:2411.14716
10
citations
#2594

Neural Point Cloud Diffusion for Disentangled 3D Shape and Appearance Generation

Philipp Schröppel, Christopher Wewer, Jan Lenssen et al.

CVPR 2024arXiv:2312.14124
10
citations
#2595

Towards Accurate and Robust Architectures via Neural Architecture Search

Yuwei Ou, Yuqi Feng, Yanan Sun

CVPR 2024arXiv:2405.05502
10
citations
#2596

SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving

Su Sun, Cheng Zhao, Zhuoyang Sun et al.

CVPR 2025highlightarXiv:2411.15482
10
citations
#2597

Flexible Frame Selection for Efficient Video Reasoning

Shyamal Buch, Arsha Nagrani, Anurag Arnab et al.

CVPR 2025
10
citations
#2598

S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation

Yichen Xie, Runsheng Xu, Tong He et al.

CVPR 2025
10
citations
#2599

AAMDM: Accelerated Auto-regressive Motion Diffusion Model

Tianyu Li, Calvin Zhuhan Qiao, Ren Guanqiao et al.

CVPR 2024arXiv:2401.06146
10
citations
#2600

Generating Non-Stationary Textures using Self-Rectification

Yang Zhou, Rongjun Xiao, Dani Lischinski et al.

CVPR 2024arXiv:2401.02847
10
citations