Most Cited CVPR "orientation-aligned 3d generation" Papers

5,589 papers found • Page 8 of 28

#1401

SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

Muzhi Zhu, Yuzhuo Tian, Hao Chen et al.

CVPR 2025posterarXiv:2503.08625
11
citations
#1402

Privacy-Preserving Optics for Enhancing Protection in Face De-Identification

Jhon Lopez, Carlos Hinojosa, Henry Arguello et al.

CVPR 2024posterarXiv:2404.00777
11
citations
#1403

Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning

Jiange Yang, Haoyi Zhu, Yating Wang et al.

CVPR 2025posterarXiv:2411.14519
11
citations
#1404

From Activation to Initialization: Scaling Insights for Optimizing Neural Fields

Hemanth Saratchandran, Sameera Ramasinghe, Simon Lucey

CVPR 2024highlightarXiv:2403.19205
11
citations
#1405

Detecting Backdoor Attacks in Federated Learning via Direction Alignment Inspection

Jiahao Xu, Zikai Zhang, Rui Hu

CVPR 2025highlightarXiv:2503.07978
11
citations
#1406

Robust Depth Enhancement via Polarization Prompt Fusion Tuning

Kei IKEMURA, Yiming Huang, Felix Heide et al.

CVPR 2024posterarXiv:2404.04318
11
citations
#1407

GaussianSpa: An “Optimizing-Sparsifying” Simplification Framework for Compact and High-Quality 3D Gaussian Splatting

Yangming Zhang, Wenqi Jia, Wei Niu et al.

CVPR 2025posterarXiv:2411.06019
11
citations
#1408

Rectified Diffusion Guidance for Conditional Generation

Mengfei Xia, Nan Xue, Yujun Shen et al.

CVPR 2025posterarXiv:2410.18737
11
citations
#1409

NeRFPrior: Learning Neural Radiance Field as a Prior for Indoor Scene Reconstruction

Wenyuan Zhang, Emily Yue-ting Jia, Junsheng Zhou et al.

CVPR 2025highlightarXiv:2503.18361
11
citations
#1410

GenPC: Zero-shot Point Cloud Completion via 3D Generative Priors

An Li, Zhe Zhu, Mingqiang Wei

CVPR 2025posterarXiv:2502.19896
11
citations
#1411

Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models

Jinho Jeong, Sangmin Han, Jinwoo Kim et al.

CVPR 2025posterarXiv:2503.18446
11
citations
#1412

GaussianUDF: Inferring Unsigned Distance Functions through 3D Gaussian Splatting

Shujuan Li, Yu-Shen Liu, Zhizhong Han

CVPR 2025highlightarXiv:2503.19458
11
citations
#1413

BiPer: Binary Neural Networks using a Periodic Function

Edwin Vargas, Claudia Correa, Carlos Hinojosa et al.

CVPR 2024posterarXiv:2404.01278
11
citations
#1414

Bridging the Gap Between End-to-End and Two-Step Text Spotting

Mingxin Huang, Hongliang Li, Yuliang Liu et al.

CVPR 2024posterarXiv:2404.04624
11
citations
#1415

UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures

Mingyuan Zhou, Rakib Hyder, Ziwei Xuan et al.

CVPR 2024posterarXiv:2401.11078
11
citations
#1416

4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video

Qiang Hu, Zihan Zheng, Houqiang Zhong et al.

CVPR 2025posterarXiv:2503.18421
11
citations
#1417

DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation

Hongbin Lin, Zilu Guo, Yifan Zhang et al.

CVPR 2025posterarXiv:2503.11122
11
citations
#1418

EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis

Sheng Miao, Jiaxin Huang, Dongfeng Bai et al.

CVPR 2025posterarXiv:2503.20168
11
citations
#1419

Towards Understanding and Improving Adversarial Robustness of Vision Transformers

Samyak Jain, Tanima Dutta

CVPR 2024poster
11
citations
#1420

TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting

Jianchuan Chen, Jingchuan Hu, Gaige Wang et al.

CVPR 2025highlightarXiv:2503.17032
11
citations
#1421

BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers

Hui Zhang, Tingwei Gao, Jie Shao et al.

CVPR 2025posterarXiv:2503.15927
11
citations
#1422

NoT: Federated Unlearning via Weight Negation

Yasser Khalil, Leo Maxime Brunswic, Soufiane Lamghari et al.

CVPR 2025posterarXiv:2503.05657
11
citations
#1423

SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding

Yangliu Hu, Zikai Song, Na Feng et al.

CVPR 2025posterarXiv:2504.07745
11
citations
#1424

SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos

Changan Chen, Kumar Ashutosh, Rohit Girdhar et al.

CVPR 2024posterarXiv:2404.05206
11
citations
#1425

SIGNeRF: Scene Integrated Generation for Neural Radiance Fields

Jan-Niklas Dihlmann, Andreas Engelhardt, Hendrik Lensch

CVPR 2024posterarXiv:2401.01647
11
citations
#1426

VladVA: Discriminative Fine-tuning of LVLMs

Yassine Ouali, Adrian Bulat, ALEXANDROS XENOS et al.

CVPR 2025posterarXiv:2412.04378
11
citations
#1427

TopoCellGen: Generating Histopathology Cell Topology with a Diffusion Model

Meilong Xu, Saumya Gupta, Xiaoling Hu et al.

CVPR 2025posterarXiv:2412.06011
11
citations
#1428

SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation

Duc-Hai Pham, Tung Do, Phong Nguyen et al.

CVPR 2025posterarXiv:2411.18229
11
citations
#1429

Data-Efficient Unsupervised Interpolation Without Any Intermediate Frame for 4D Medical Images

JungEun Kim, Hangyul Yoon, Geondo Park et al.

CVPR 2024posterarXiv:2404.01464
11
citations
#1430

Data Synthesis with Diverse Styles for Face Recognition via 3DMM-Guided Diffusion

Yuxi Mi, Zhizhou Zhong, Yuge Huang et al.

CVPR 2025posterarXiv:2504.00430
11
citations
#1431

Semantic and Sequential Alignment for Referring Video Object Segmentation

Feiyu Pan, Hao Fang, Fangkai Li et al.

CVPR 2025poster
11
citations
#1432

StraightPCF: Straight Point Cloud Filtering

Dasith de Silva Edirimuni, Xuequan Lu, Gang Li et al.

CVPR 2024posterarXiv:2405.08322
11
citations
#1433

SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion

Trong-Tung Nguyen, Quang Nguyen, Khoi Nguyen et al.

CVPR 2025posterarXiv:2412.04301
11
citations
#1434

Finsler-Laplace-Beltrami Operators with Application to Shape Analysis

Simon Weber, Thomas Dagès, Maolin Gao et al.

CVPR 2024posterarXiv:2404.03999
11
citations
#1435

Multi-view Reconstruction via SfM-guided Monocular Depth Estimation

Haoyu Guo, He Zhu, Sida Peng et al.

CVPR 2025posterarXiv:2503.14483
11
citations
#1436

SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining

Chull Hwan Song, Taebaek Hwang, Jooyoung Yoon et al.

CVPR 2024posterarXiv:2404.01156
11
citations
#1437

Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes

Lihan Jiang, Kerui Ren, Mulin Yu et al.

CVPR 2025posterarXiv:2412.01745
11
citations
#1438

MIMO: A Medical Vision Language Model with Visual Referring Multimodal Input and Pixel Grounding Multimodal Output

Yanyuan Chen, Dexuan Xu, Yu Huang et al.

CVPR 2025posterarXiv:2510.10011
10
citations
#1439

DreamRelation: Bridging Customization and Relation Generation

Qingyu Shi, Lu Qi, Jianzong Wu et al.

CVPR 2025posterarXiv:2410.23280
10
citations
#1440

Towards Training-free Anomaly Detection with Vision and Language Foundation Models

Jinjin Zhang, Guodong Wang, yizhou jin et al.

CVPR 2025posterarXiv:2503.18325
10
citations
#1441

Flexible Frame Selection for Efficient Video Reasoning

Shyamal Buch, Arsha Nagrani, Anurag Arnab et al.

CVPR 2025poster
10
citations
#1442

Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis

Marianna Ohanyan, Hayk Manukyan, Zhangyang Wang et al.

CVPR 2024posterarXiv:2406.04032
10
citations
#1443

S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation

Yichen Xie, Runsheng Xu, Tong He et al.

CVPR 2025poster
10
citations
#1444

OrthCaps: An Orthogonal CapsNet with Sparse Attention Routing and Pruning

Geng Xinyu, Jiaming Wang, Jiawei Gong et al.

CVPR 2024posterarXiv:2403.13351
10
citations
#1445

LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images

Jing Zhang, Irving Fang, Hao Wu et al.

CVPR 2024highlightarXiv:2403.13171
10
citations
#1446

HIIF: Hierarchical Encoding based Implicit Image Function for Continuous Super-resolution

Yuxuan Jiang, Ho Man Kwan, jasmine peng et al.

CVPR 2025posterarXiv:2412.03748
10
citations
#1447

Unsigned Orthogonal Distance Fields: An Accurate Neural Implicit Representation for Diverse 3D Shapes

YuJie Lu, Long Wan, Nayu Ding et al.

CVPR 2024posterarXiv:2403.01414
10
citations
#1448

PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation

Qihan Huang, Weilong Dai, Jinlong Liu et al.

CVPR 2025posterarXiv:2412.03177
10
citations
#1449

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models

Jin Wang, Chenghui Lv, Xian Li et al.

CVPR 2025posterarXiv:2503.15024
10
citations
#1450

ID-Patch: Robust ID Association for Group Photo Personalization

Yimeng Zhang, Tiancheng Zhi, Jing Liu et al.

CVPR 2025posterarXiv:2411.13632
10
citations
#1451

CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP

Songlong Xing, Zhengyu Zhao, Nicu Sebe

CVPR 2025posterarXiv:2503.03613
10
citations
#1452

Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping

Hyeongjun Kwon, Jinhyun Jang, Jin Kim et al.

CVPR 2024posterarXiv:2404.00974
10
citations
#1453

Towards Accurate and Robust Architectures via Neural Architecture Search

Yuwei Ou, Yuqi Feng, Yanan Sun

CVPR 2024posterarXiv:2405.05502
10
citations
#1454

Anchor-based Robust Finetuning of Vision-Language Models

Jinwei Han, Zhiwen Lin, Zhongyisun Sun et al.

CVPR 2024posterarXiv:2404.06244
10
citations
#1455

RayFlow: Instance-Aware Diffusion Acceleration via Adaptive Flow Trajectories

Huiyang Shao, Xin Xia, Yuhong Yang et al.

CVPR 2025posterarXiv:2503.07699
10
citations
#1456

Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise

Brayan Monroy, Jorge Bacca, Julián Tachella

CVPR 2025posterarXiv:2412.04648
10
citations
#1457

ResCLIP: Residual Attention for Training-free Dense Vision-language Inference

Jinhong Deng, Yuhang Yang, Wen Li et al.

CVPR 2025posterarXiv:2411.15851
10
citations
#1458

Gaussian Eigen Models for Human Heads

Wojciech Zielonka, Timo Bolkart, Thabo Beeler et al.

CVPR 2025posterarXiv:2407.04545
10
citations
#1459

Continuous Optical Zooming: A Benchmark for Arbitrary-Scale Image Super-Resolution in Real World

Huiyuan Fu, Fei Peng, Xianwei Li et al.

CVPR 2024poster
10
citations
#1460

PLeaS - Merging Models with Permutations and Least Squares

Anshul Nasery, Jonathan Hayase, Pang Wei Koh et al.

CVPR 2025posterarXiv:2407.02447
10
citations
#1461

CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering

Tianyu Huai, Jie Zhou, Xingjiao Wu et al.

CVPR 2025highlightarXiv:2503.00413
10
citations
#1462

Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering

Federico Cocchi, Nicholas Moratelli, Marcella Cornia et al.

CVPR 2025posterarXiv:2411.16863
10
citations
#1463

EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation

Zihao Zhang, Haoran Chen, Haoyu Zhao et al.

CVPR 2025posterarXiv:2503.15831
10
citations
#1464

Temporal Separation with Entropy Regularization for Knowledge Distillation in Spiking Neural Networks

Kairong Yu, Chengting Yu, Tianqing Zhang et al.

CVPR 2025posterarXiv:2503.03144
10
citations
#1465

DocVLM: Make Your VLM an Efficient Reader

Mor Shpigel Nacson, Aviad Aberdam, Roy Ganz et al.

CVPR 2025posterarXiv:2412.08746
10
citations
#1466

RGBAvatar: Reduced Gaussian Blendshapes for Online Modeling of Head Avatars

Linzhou Li, Yumeng Li, Yanlin Weng et al.

CVPR 2025highlightarXiv:2503.12886
10
citations
#1467

V2X-R: Cooperative LiDAR-4D Radar Fusion with Denoising Diffusion for 3D Object Detection

Xun Huang, Jinlong Wang, Qiming Xia et al.

CVPR 2025posterarXiv:2411.08402
10
citations
#1468

Mind The Edge: Refining Depth Edges in Sparsely-Supervised Monocular Depth Estimation

Lior Talker, Aviad Cohen, Erez Yosef et al.

CVPR 2024posterarXiv:2212.05315
10
citations
#1469

PCDreamer: Point Cloud Completion Through Multi-view Diffusion Priors

Guangshun Wei, Yuan Feng, Long Ma et al.

CVPR 2025posterarXiv:2411.19036
10
citations
#1470

Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM

Yizhou Huang, Yihua Cheng, Kezhi Wang

CVPR 2025posterarXiv:2503.10898
10
citations
#1471

DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation

Zhiqiang Shen, Ammar Sherif, Zeyuan Yin et al.

CVPR 2025posterarXiv:2411.19946
10
citations
#1472

Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAM

Tongyan Hua, Addison, Lin Wang

CVPR 2024posterarXiv:2403.19473
10
citations
#1473

VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction

Zijian He, Yuwei Ning, Yipeng Qin et al.

CVPR 2025posterarXiv:2503.12165
10
citations
#1474

Synergistic Global-space Camera and Human Reconstruction from Videos

Yizhou Zhao, Tuanfeng Y. Wang, Bhiksha Raj et al.

CVPR 2024posterarXiv:2405.14855
10
citations
#1475

Spectrum AUC Difference (SAUCD): Human-aligned 3D Shape Evaluation

Tianyu Luan, Zhong Li, Lele Chen et al.

CVPR 2024posterarXiv:2403.01619
10
citations
#1476

Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation

Ting Liu, Siyuan Li

CVPR 2025posterarXiv:2504.00356
10
citations
#1477

Video Summarization with Large Language Models

Min Jung Lee, Dayoung Gong, Minsu Cho

CVPR 2025posterarXiv:2504.11199
10
citations
#1478

Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model

Zhaochong An, Guolei Sun, Yun Liu et al.

CVPR 2025posterarXiv:2503.16282
10
citations
#1479

Reconstructing People, Places, and Cameras

Lea Müller, Hongsuk Choi, Anthony Zhang et al.

CVPR 2025highlightarXiv:2412.17806
10
citations
#1480

ReCap: Better Gaussian Relighting with Cross-Environment Captures

Jingzhi Li, Zongwei Wu, Eduard Zamfir et al.

CVPR 2025posterarXiv:2412.07534
10
citations
#1481

BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation

Yuyang Peng, Shishi Xiao, Keming Wu et al.

CVPR 2025posterarXiv:2503.20672
10
citations
#1482

Layered Image Vectorization via Semantic Simplification

Zhenyu Wang, Jianxi Huang, Zhida Sun et al.

CVPR 2025posterarXiv:2406.05404
10
citations
#1483

Open-World Amodal Appearance Completion

Jiayang Ao, Yanbei Jiang, Qiuhong Ke et al.

CVPR 2025posterarXiv:2411.13019
10
citations
#1484

InsightEdit: Towards Better Instruction Following for Image Editing

Yingjing Xu, Jie Kong, Jiazhi Wang et al.

CVPR 2025posterarXiv:2411.17323
10
citations
#1485

Task-Agnostic Guided Feature Expansion for Class-Incremental Learning

Bowen Zheng, Da-Wei Zhou, Han-Jia Ye et al.

CVPR 2025posterarXiv:2503.00823
10
citations
#1486

Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks

Peng Xie, Yequan Bie, Jianda Mao et al.

CVPR 2025posterarXiv:2411.15720
10
citations
#1487

CCIN: Compositional Conflict Identification and Neutralization for Composed Image Retrieval

Likai Tian, Jian Zhao, Zechao Hu et al.

CVPR 2025highlight
10
citations
#1488

VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding

Yujie Liang, Xiaobin Hu, Boyuan Jiang et al.

CVPR 2025posterarXiv:2408.12340
10
citations
#1489

SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input

Zhen Lv, Yangqi Long, Congzhentao Huang et al.

CVPR 2025posterarXiv:2411.11934
10
citations
#1490

Data-Efficient Multimodal Fusion on a Single GPU

Noël Vouitsis, Zhaoyan Liu, Satya Krishna Gorti et al.

CVPR 2024highlightarXiv:2312.10144
10
citations
#1491

VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving

Haiming Zhang, Wending Zhou, Shenzhen The Chinese University of Hongkong et al.

CVPR 2025posterarXiv:2411.14716
10
citations
#1492

ContextSeg: Sketch Semantic Segmentation by Querying the Context with Attention

Jiawei Wang, Changjian Li

CVPR 2024posterarXiv:2311.16682
10
citations
#1493

Integrating Efficient Optimal Transport and Functional Maps For Unsupervised Shape Correspondence Learning

Tung Le, Khai Nguyen, Shanlin Sun et al.

CVPR 2024posterarXiv:2403.01781
10
citations
#1494

SpikeNeRF: Learning Neural Radiance Fields from Continuous Spike Stream

Lin Zhu, Kangmin Jia, Yifan Zhao et al.

CVPR 2024posterarXiv:2403.11222
10
citations
#1495

SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving

Su Sun, Cheng Zhao, Zhuoyang Sun et al.

CVPR 2025highlightarXiv:2411.15482
10
citations
#1496

HOIGPT: Learning Long-Sequence Hand-Object Interaction with Language Models

Mingzhen Huang, Fu-Jen Chu, Bugra Tekin et al.

CVPR 2025posterarXiv:2503.19157
10
citations
#1497

MemoNav: Working Memory Model for Visual Navigation

Hongxin Li, Zeyu Wang, Xu Yang et al.

CVPR 2024highlightarXiv:2402.19161
10
citations
#1498

GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration

Yuchen Sun, Shanhui Zhao, Tao Yu et al.

CVPR 2025posterarXiv:2503.17709
10
citations
#1499

BSNet: Box-Supervised Simulation-assisted Mean Teacher for 3D Instance Segmentation

Jiahao Lu, Jiacheng Deng, Tianzhu Zhang

CVPR 2024posterarXiv:2403.15019
10
citations
#1500

Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think

Zhenyi Lu, Xiaoye Qu, Zhenyi Lu et al.

CVPR 2025highlightarXiv:2503.00948
10
citations
#1501

Distilling ODE Solvers of Diffusion Models into Smaller Steps

Sanghwan Kim, Hao Tang, Fisher Yu

CVPR 2024posterarXiv:2309.16421
10
citations
#1502

Learning from One Continuous Video Stream

Joao Carreira, Michael King, Viorica Patraucean et al.

CVPR 2024posterarXiv:2312.00598
10
citations
#1503

Smart Help: Strategic Opponent Modeling for Proactive and Adaptive Robot Assistance in Households

Zhihao Cao, ZiDong Wang, Siwen Xie et al.

CVPR 2024posterarXiv:2404.09001
10
citations
#1504

Point2RBox-v2: Rethinking Point-supervised Oriented Object Detection with Spatial Layout Among Instances

Yi Yu, Botao Ren, Peiyuan Zhang et al.

CVPR 2025posterarXiv:2502.04268
10
citations
#1505

Tuning the Frequencies: Robust Training for Sinusoidal Neural Networks

Tiago Novello, Diana Aldana Moreno, André Araujo et al.

CVPR 2025highlightarXiv:2407.21121
10
citations
#1506

Dual-Scale Transformer for Large-Scale Single-Pixel Imaging

Gang Qu, Ping Wang, Xin Yuan

CVPR 2024posterarXiv:2404.05001
10
citations
#1507

Efficient Privacy-Preserving Visual Localization Using 3D Ray Clouds

Heejoon Moon, Chunghwan Lee, Je Hyeong Hong

CVPR 2024poster
10
citations
#1508

EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space

Jianrong Zhang, Hehe Fan, Yi Yang

CVPR 2025highlightarXiv:2412.14706
10
citations
#1509

Towards Automated Movie Trailer Generation

Dawit Argaw Argaw, Mattia Soldan, Alejandro Pardo et al.

CVPR 2024posterarXiv:2404.03477
10
citations
#1510

LTA-PCS: Learnable Task-Agnostic Point Cloud Sampling

Jiaheng Liu, Jianhao Li, Kaisiyuan Wang et al.

CVPR 2024poster
10
citations
#1511

Disentangled Pre-training for Human-Object Interaction Detection

Zhuolong Li, Xingao Li, Changxing Ding et al.

CVPR 2024posterarXiv:2404.01725
10
citations
#1512

Zero-Shot Structure-Preserving Diffusion Model for High Dynamic Range Tone Mapping

Ruoxi Zhu, Shusong Xu, Peiye Liu et al.

CVPR 2024highlight
10
citations
#1513

BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation

Yulu Pan, Ce Zhang, Gedas Bertasius

CVPR 2025posterarXiv:2503.20781
10
citations
#1514

SIRA: Scalable Inter-frame Relation and Association for Radar Perception

Ryoma Yataka, Pu Wang, Petros Boufounos et al.

CVPR 2024posterarXiv:2411.02220
10
citations
#1515

SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection

Mingxuan Liu, Tyler Hayes, Elisa Ricci et al.

CVPR 2024highlightarXiv:2405.10053
10
citations
#1516

SaCo Loss: Sample-wise Affinity Consistency for Vision-Language Pre-training

WU Sitong, Haoru Tan, Zhuotao Tian et al.

CVPR 2024poster
10
citations
#1517

Characteristics Matching Based Hash Codes Generation for Efficient Fine-grained Image Retrieval

Zhen-Duo Chen, Li-Jun Zhao, Zi-Chao Zhang et al.

CVPR 2024poster
10
citations
#1518

GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image

Chong Bao, Yinda Zhang, Yuan Li et al.

CVPR 2024posterarXiv:2404.02152
10
citations
#1519

Neural Point Cloud Diffusion for Disentangled 3D Shape and Appearance Generation

Philipp Schröppel, Christopher Wewer, Jan Lenssen et al.

CVPR 2024posterarXiv:2312.14124
10
citations
#1520

Diff2Flow: Training Flow Matching Models via Diffusion Model Alignment

Johannes Schusterbauer, Ming Gui, Frank Fundel et al.

CVPR 2025posterarXiv:2506.02221
10
citations
#1521

YolOOD: Utilizing Object Detection Concepts for Multi-Label Out-of-Distribution Detection

Alon Zolfi, Guy AmiT, Amit Baras et al.

CVPR 2024posterarXiv:2212.02081
10
citations
#1522

Single Mesh Diffusion Models with Field Latents for Texture Generation

Thomas W. Mitchel, Carlos Esteves, Ameesh Makadia

CVPR 2024posterarXiv:2312.09250
10
citations
#1523

Text-guided Explorable Image Super-resolution

Kanchana Vaishnavi Gandikota, Paramanand Chandramouli

CVPR 2024posterarXiv:2403.01124
10
citations
#1524

STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction

Zhimin Liao, Ping Wei, Shuaijia Chen et al.

CVPR 2025posterarXiv:2504.19749
10
citations
#1525

FSC: Few-point Shape Completion

Xianzu Wu, Xianfeng Wu, Tianyu Luan et al.

CVPR 2024posterarXiv:2403.07359
10
citations
#1526

FedMIA: An Effective Membership Inference Attack Exploiting "All for One" Principle in Federated Learning

Gongxi Zhu, Donghao Li, Hanlin Gu et al.

CVPR 2025poster
10
citations
#1527

Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution

Qihao Liu, Xi Yin, Alan L. Yuille et al.

CVPR 2025highlightarXiv:2412.15213
10
citations
#1528

CADDreamer: CAD Object Generation from Single-view Images

Yuan Li, Cheng Lin, Yuan Liu et al.

CVPR 2025highlightarXiv:2502.20732
10
citations
#1529

Step Differences in Instructional Video

Tushar Nagarajan, Lorenzo Torresani

CVPR 2024posterarXiv:2404.16222
10
citations
#1530

ObjectMover: Generative Object Movement with Video Prior

Xin Yu, Tianyu Wang, Soo Ye Kim et al.

CVPR 2025posterarXiv:2503.08037
10
citations
#1531

STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding

Zichen Liu, Kunlun Xu, Bing Su et al.

CVPR 2025posterarXiv:2503.15973
10
citations
#1532

A Theory of Joint Light and Heat Transport for Lambertian Scenes

Mani Ramanagopal, Sriram Narayanan, Aswin C. Sankaranarayanan et al.

CVPR 2024poster
10
citations
#1533

DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding

Wenhui Liao, Jiapeng Wang, Hongliang Li et al.

CVPR 2025posterarXiv:2408.15045
10
citations
#1534

Splatter-360: Generalizable 360 Gaussian Splatting for Wide-baseline Panoramic Images

Zheng Chen, Chenming Wu, Zhelun Shen et al.

CVPR 2025poster
10
citations
#1535

Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy

You Li, Fan Ma, Yi Yang

CVPR 2025posterarXiv:2411.16752
10
citations
#1536

CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools

Chinedu Innocent Nwoye, Kareem elgohary, Anvita A. Srinivas et al.

CVPR 2025posterarXiv:2312.07352
9
citations
#1537

LoRA Subtraction for Drift-Resistant Space in Exemplar-Free Continual Learning

Xuan Liu, Xiaobin Chang

CVPR 2025posterarXiv:2503.18985
9
citations
#1538

One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls

Minghui Hu, Jianbin Zheng, Chuanxia Zheng et al.

CVPR 2024posterarXiv:2311.15744
9
citations
#1539

SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments

Yue Cao, Yun Xing, Jie Zhang et al.

CVPR 2025posterarXiv:2412.00114
9
citations
#1540

ProMotion: Prototypes As Motion Learners

Yawen Lu, Dongfang Liu, Qifan Wang et al.

CVPR 2024posterarXiv:2406.04999
9
citations
#1541

Seurat: From Moving Points to Depth

Seokju Cho, Gabriel Huang, Seungryong Kim et al.

CVPR 2025highlightarXiv:2504.14687
9
citations
#1542

Distilling Monocular Foundation Model for Fine-grained Depth Completion

Yingping Liang, Yutao Hu, Wenqi Shao et al.

CVPR 2025posterarXiv:2503.16970
9
citations
#1543

Fusing Personal and Environmental Cues for Identification and Segmentation of First-Person Camera Wearers in Third-Person Views

Ziwei Zhao, Yuchen Wang, Chuhua Wang

CVPR 2024poster
9
citations
#1544

GLASS: Guided Latent Slot Diffusion for Object-Centric Learning

Krishnakant Singh, Simone Schaub-Meyer, Stefan Roth

CVPR 2025posterarXiv:2407.17929
9
citations
#1545

Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos

Sagnik Majumder, Ziad Al-Halah, Kristen Grauman

CVPR 2024posterarXiv:2307.04760
9
citations
#1546

Bilateral Event Mining and Complementary for Event Stream Super-Resolution

Zhilin Huang, Quanmin Liang, Yijie Yu et al.

CVPR 2024posterarXiv:2405.10037
9
citations
#1547

SuperPC: A Single Diffusion Model for Point Cloud Completion, Upsampling, Denoising, and Colorization

Yi Du, Zhipeng Zhao, Shaoshu Su et al.

CVPR 2025posterarXiv:2503.14558
9
citations
#1548

From Poses to Identity: Training-Free Person Re-Identification via Feature Centralization

Chao Yuan, Guiwei Zhang, Changxiao Ma et al.

CVPR 2025posterarXiv:2503.00938
9
citations
#1549

Neural Video Compression with Context Modulation

Chuanbo Tang, Zhuoyuan Li, Yifan Bian et al.

CVPR 2025posterarXiv:2505.14541
9
citations
#1550

MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking

Haolin Qin, Tingfa Xu, Tianhao Li et al.

CVPR 2025posterarXiv:2503.17699
9
citations
#1551

PreciseCam: Precise Camera Control for Text-to-Image Generation

Edurne Bernal-Berdun, Ana Serrano, Belen Masia et al.

CVPR 2025posterarXiv:2501.12910
9
citations
#1552

An End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion Models

Wentao Qu, Jing Wang, Yongshun Gong et al.

CVPR 2025posterarXiv:2411.16308
9
citations
#1553

Online Video Understanding: OVBench and VideoChat-Online

Zhenpeng Huang, Xinhao Li, Jiaqi Li et al.

CVPR 2025posterarXiv:2501.00584
9
citations
#1554

A Unified and Interpretable Emotion Representation and Expression Generation

Reni Paskaleva, Mykyta Holubakha, Andela Ilic et al.

CVPR 2024posterarXiv:2404.01243
9
citations
#1555

TraF-Align: Trajectory-aware Feature Alignment for Asynchronous Multi-agent Perception

Zhiying Song, Lei Yang, Fuxi Wen et al.

CVPR 2025posterarXiv:2503.19391
9
citations
#1556

Making Visual Sense of Oracle Bones for You and Me

Runqi Qiao, LAN YANG, Kaiyue Pang et al.

CVPR 2024poster
9
citations
#1557

SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation

Leigang Qu, Haochuan Li, Wenjie Wang et al.

CVPR 2025posterarXiv:2412.05818
9
citations
#1558

KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation

Antoni Bigata Casademunt, Michał Stypułkowski, Rodrigo Mira et al.

CVPR 2025posterarXiv:2503.01715
9
citations
#1559

HiMoR: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation

Yiming Liang, Tianhan Xu, Yuta Kikuchi

CVPR 2025posterarXiv:2504.06210
9
citations
#1560

Towards More Accurate Diffusion Model Acceleration with A Timestep Tuner

Mengfei Xia, Yujun Shen, Changsong Lei et al.

CVPR 2024posterarXiv:2310.09469
9
citations
#1561

Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training

Haicheng Wang, Chen Ju, Weixiong Lin et al.

CVPR 2025posterarXiv:2412.00440
9
citations
#1562

Semantics-aware Motion Retargeting with Vision-Language Models

Haodong Zhang, ZhiKe Chen, Haocheng Xu et al.

CVPR 2024posterarXiv:2312.01964
9
citations
#1563

MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining

Yunze Liu, Li Yi

CVPR 2025posterarXiv:2410.00871
9
citations
#1564

LITA-GS: Illumination-Agnostic Novel View Synthesis via Reference-Free 3D Gaussian Splatting and Physical Priors

Han Zhou, Wei Dong, Jun Chen

CVPR 2025posterarXiv:2504.00219
9
citations
#1565

Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model

Benlin Liu, Yuhao Dong, Yiqin Wang et al.

CVPR 2025posterarXiv:2408.00754
9
citations
#1566

CDFormer: When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution

Qingguo Liu, Chenyi Zhuang, Pan Gao et al.

CVPR 2024poster
9
citations
#1567

Knowledge-Aligned Counterfactual-Enhancement Diffusion Perception for Unsupervised Cross-Domain Visual Emotion Recognition

Wen Yin, Yong Wang, Guiduo Duan et al.

CVPR 2025posterarXiv:2505.19694
9
citations
#1568

CoA: Towards Real Image Dehazing via Compression-and-Adaptation

Long Ma, Yuxin Feng, Yan Zhang et al.

CVPR 2025posterarXiv:2504.05590
9
citations
#1569

UnCommon Objects in 3D

Xingchen Liu, Piyush Tayal, Jianyuan Wang et al.

CVPR 2025posterarXiv:2501.07574
9
citations
#1570

High-Quality Facial Geometry and Appearance Capture at Home

Yuxuan Han, Junfeng Lyu, Feng Xu

CVPR 2024posterarXiv:2312.03442
9
citations
#1571

Physical Plausibility-aware Trajectory Prediction via Locomotion Embodiment

Hiromu Taketsugu, Takeru Oba, Takahiro Maeda et al.

CVPR 2025posterarXiv:2503.17267
9
citations
#1572

DreamText: High Fidelity Scene Text Synthesis

Yibin Wang, Weizhong Zhang, honghui xu et al.

CVPR 2025posterarXiv:2405.14701
9
citations
#1573

Rethinking Query-based Transformer for Continual Image Segmentation

Yuchen Zhu, Cheng Shi, Dingyou Wang et al.

CVPR 2025posterarXiv:2507.07831
9
citations
#1574

Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models

Jiaming Zhang, Junhong Ye, Xingjun Ma et al.

CVPR 2025posterarXiv:2410.05346
9
citations
#1575

Towards Transformer-Based Aligned Generation with Self-Coherence Guidance

Shulei Wang, Wang Lin, Hai Huang et al.

CVPR 2025posterarXiv:2503.17675
9
citations
#1576

FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication

Eric Slyman, Stefan Lee, Scott Cohen et al.

CVPR 2024posterarXiv:2404.16123
9
citations
#1577

EventGPT: Event Stream Understanding with Multimodal Large Language Models

shaoyu liu, Jianing Li, guanghui zhao et al.

CVPR 2025posterarXiv:2412.00832
9
citations
#1578

Turb-Seg-Res: A Segment-then-Restore Pipeline for Dynamic Videos with Atmospheric Turbulence

Ripon Saha, Dehao Qin, Nianyi Li et al.

CVPR 2024posterarXiv:2404.13605
9
citations
#1579

Dual Prompting Image Restoration with Diffusion Transformers

Dehong Kong, Fan Li, Zhixin Wang et al.

CVPR 2025posterarXiv:2504.17825
9
citations
#1580

SAM2Object: Consolidating View Consistency via SAM2 for Zero-Shot 3D Instance Segmentation

Jihuai Zhao, Junbao Zhuo, Jiansheng Chen et al.

CVPR 2025poster
9
citations
#1581

DepthCues: Evaluating Monocular Depth Perception in Large Vision Models

Duolikun Danier, Mehmet Aygun, Changjian Li et al.

CVPR 2025posterarXiv:2411.17385
9
citations
#1582

TurboSL: Dense Accurate and Fast 3D by Neural Inverse Structured Light

Parsa Mirdehghan, Maxx Wu, Wenzheng Chen et al.

CVPR 2024poster
9
citations
#1583

Clockwork Diffusion: Efficient Generation With Model-Step Distillation

Amirhossein Habibian, Amir Ghodrati, Noor Fathima et al.

CVPR 2024highlightarXiv:2312.08128
9
citations
#1584

Mimic In-Context Learning for Multimodal Tasks

Yuchu Jiang, Jiale Fu, chenduo hao et al.

CVPR 2025posterarXiv:2504.08851
9
citations
#1585

Memory-Scalable and Simplified Functional Map Learning

Robin Magnet, Maks Ovsjanikov

CVPR 2024posterarXiv:2404.00330
9
citations
#1586

VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models

Muchao Ye, Weiyang Liu, Pan He

CVPR 2025posterarXiv:2412.01095
9
citations
#1587

MP-SfM: Monocular Surface Priors for Robust Structure-from-Motion

Zador Pataki, Paul-Edouard Sarlin, Johannes Schönberger et al.

CVPR 2025posterarXiv:2504.20040
9
citations
#1588

ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos

Tanveer Hannan, Md Mohaiminul Islam, Jindong Gu et al.

CVPR 2025posterarXiv:2411.14901
9
citations
#1589

FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation

Dong Zhao, Jinlong Li, Shuang Wang et al.

CVPR 2025posterarXiv:2503.17940
9
citations
#1590

GenDeg: Diffusion-based Degradation Synthesis for Generalizable All-In-One Image Restoration

Sudarshan Rajagopalan, Nithin Gopalakrishnan Nair, Jay Paranjape et al.

CVPR 2025posterarXiv:2411.17687
9
citations
#1591

Bayesian Test-Time Adaptation for Vision-Language Models

Lihua Zhou, Mao Ye, Shuaifeng Li et al.

CVPR 2025posterarXiv:2503.09248
9
citations
#1592

AAMDM: Accelerated Auto-regressive Motion Diffusion Model

Tianyu Li, Calvin Zhuhan Qiao, Ren Guanqiao et al.

CVPR 2024posterarXiv:2401.06146
9
citations
#1593

A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives

Simone Alberto Peirone, Francesca Pistilli, Antonio Alliegro et al.

CVPR 2024posterarXiv:2403.03037
9
citations
#1594

C3Net: Compound Conditioned ControlNet for Multimodal Content Generation

Juntao Zhang, Yuehuai LIU, Yu-Wing Tai et al.

CVPR 2024posterarXiv:2311.17951
9
citations
#1595

MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes

Ruijie Lu, Yixin Chen, Junfeng Ni et al.

CVPR 2025posterarXiv:2412.11457
9
citations
#1596

Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval

Arun Reddy, Alexander Martin, Eugene Yang et al.

CVPR 2025posterarXiv:2503.19009
9
citations
#1597

Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks

Miran Heo, Min-Hung Chen, De-An Huang et al.

CVPR 2025posterarXiv:2501.08326
9
citations
#1598

Post-pre-training for Modality Alignment in Vision-Language Foundation Models

Shin'ya Yamaguchi, Dewei Feng, Sekitoshi Kanai et al.

CVPR 2025posterarXiv:2504.12717
9
citations
#1599

Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners

Chun Feng, Joy Hsu, Weiyu Liu et al.

CVPR 2024posterarXiv:2404.19696
9
citations
#1600

MP-GUI: Modality Perception with MLLMs for GUI Understanding

Ziwei Wang, Weizhi Chen, Leyang Yang et al.

CVPR 2025posterarXiv:2503.14021
9
citations