Most Cited CVPR "sample ordering strategies" Papers

5,589 papers found • Page 26 of 28

#5001

SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance Field

Lizhe Liu, Bohua Wang, Hongwei Xie et al.

CVPR 2024highlightarXiv:2403.14366
#5002

MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning

Zhe Li, Laurence Yang, Bocheng Ren et al.

CVPR 2024posterarXiv:2402.02045
#5003

Density-Adaptive Model Based on Motif Matrix for Multi-Agent Trajectory Prediction

Di Wen, Haoran Xu, Zhaocheng He et al.

CVPR 2024poster
#5004

Towards Accurate Post-training Quantization for Diffusion Models

Changyuan Wang, Ziwei Wang, Xiuwei Xu et al.

CVPR 2024highlightarXiv:2305.18723
#5005

Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling

Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang et al.

CVPR 2024highlightarXiv:2406.03723
#5006

MultiDiff: Consistent Novel View Synthesis from a Single Image

Norman Müller, Katja Schwarz, Barbara Roessle et al.

CVPR 2024posterarXiv:2406.18524
#5007

Uncertainty-aware Action Decoupling Transformer for Action Anticipation

Hongji Guo, Nakul Agarwal, Shao-Yuan Lo et al.

CVPR 2024highlight
#5008

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Haoxin Chen, Yong Zhang, Xiaodong Cun et al.

CVPR 2024posterarXiv:2401.09047
#5009

EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models

Jingyuan Yang, Jiawei Feng, Hui Huang

CVPR 2024posterarXiv:2401.04608
#5010

3D Facial Expressions through Analysis-by-Neural-Synthesis

George Retsinas, Panagiotis Filntisis, Radek Danecek et al.

CVPR 2024posterarXiv:2404.04104
#5011

Visual Layout Composer: Image-Vector Dual Diffusion Model for Design Layout Generation

Mohammad Amin Shabani, Zhaowen Wang, Difan Liu et al.

CVPR 2024poster
#5012

A Simple Recipe for Language-guided Domain Generalized Segmentation

Mohammad Fahes, TUAN-HUNG VU, Andrei Bursuc et al.

CVPR 2024posterarXiv:2311.17922
#5013

An Edit Friendly DDPM Noise Space: Inversion and Manipulations

Inbar Huberman-Spiegelglas, Vladimir Kulikov, Tomer Michaeli

CVPR 2024posterarXiv:2304.06140
#5014

Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection

Taeheon Kim, Sebin Shin, Youngjoon Yu et al.

CVPR 2024posterarXiv:2403.01300
#5015

GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation

Mukul Khanna, Ram Ramrakhya, Gunjan Chhablani et al.

CVPR 2024posterarXiv:2404.06609
#5016

HRVDA: High-Resolution Visual Document Assistant

Chaohu Liu, Kun Yin, Haoyu Cao et al.

CVPR 2024posterarXiv:2404.06918
#5017

Building Bridges across Spatial and Temporal Resolutions: Reference-Based Super-Resolution via Change Priors and Conditional Diffusion Model

Runmin Dong, Shuai Yuan, Bin Luo et al.

CVPR 2024posterarXiv:2403.17460
#5018

Resolution Limit of Single-Photon LiDAR

Stanley H. Chan, Hashan K Weerasooriya, Weijian Zhang et al.

CVPR 2024posterarXiv:2403.17719
#5019

Generating Enhanced Negatives for Training Language-Based Object Detectors

Shiyu Zhao, Long Zhao, Vijay Kumar BG et al.

CVPR 2024posterarXiv:2401.00094
#5020

Object Recognition as Next Token Prediction

Kaiyu Yue, Bor-Chun Chen, Jonas Geiping et al.

CVPR 2024highlightarXiv:2312.02142
#5021

MuGE: Multiple Granularity Edge Detection

Caixia Zhou, Yaping Huang, Mengyang Pu et al.

CVPR 2024poster
#5022

LOTUS: Evasive and Resilient Backdoor Attacks through Sub-Partitioning

Siyuan Cheng, Guanhong Tao, Yingqi Liu et al.

CVPR 2024posterarXiv:2403.17188
#5023

Unsupervised Salient Instance Detection

Xin Tian, Ke Xu, Rynson W.H. Lau

CVPR 2024poster
#5024

Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception

Junwen He, Yifan Wang, Lijun Wang et al.

CVPR 2024highlightarXiv:2403.02969
#5025

PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation

Yuqi Wang, Yuntao Chen, Xingyu Liao et al.

CVPR 2024posterarXiv:2306.10013
#5026

XFibrosis: Explicit Vessel-Fiber Modeling for Fibrosis Staging from Liver Pathology Images

CHONG YIN, Siqi Liu, Fei Lyu et al.

CVPR 2024poster
#5027

Discriminative Pattern Calibration Mechanism for Source-Free Domain Adaptation

Haifeng Xia, Siyu Xia, Zhengming Ding

CVPR 2024poster
#5028

RAM-Avatar: Real-time Photo-Realistic Avatar from Monocular Videos with Full-body Control

xiang deng, Zerong Zheng, Yuxiang Zhang et al.

CVPR 2024poster
#5029

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Zhen Li, Mingdeng Cao, Xintao Wang et al.

CVPR 2024posterarXiv:2312.04461
#5030

3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling

Chaokang Jiang, Guangming Wang, Jiuming Liu et al.

CVPR 2024posterarXiv:2402.18146
#5031

CPR-Coach: Recognizing Composite Error Actions based on Single-class Training

Shunli Wang, Shuaibing Wang, Dingkang Yang et al.

CVPR 2024posterarXiv:2309.11718
#5032

Restoration by Generation with Constrained Priors

Zheng Ding, Xuaner Zhang, Zhuowen Tu et al.

CVPR 2024highlightarXiv:2312.17161
#5033

Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model

Zelin Peng, Zhengqin Xu, Zhilin Zeng et al.

CVPR 2024posterarXiv:2311.17112
#5034

Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering

Zhaohe Liao, Jiangtong Li, Li Niu et al.

CVPR 2024posterarXiv:2407.03008
#5035

Communication-Efficient Collaborative Perception via Information Filling with Codebook

Yue Hu, Juntong Peng, Sifei Liu et al.

CVPR 2024posterarXiv:2405.04966
#5036

QUADify: Extracting Meshes with Pixel-level Details and Materials from Images

Maximilian Frühauf, Hayko Riemenschneider, Markus Gross et al.

CVPR 2024highlight
#5037

Focus on Hiders: Exploring Hidden Threats for Enhancing Adversarial Training

Qian Li, Yuxiao Hu, Yinpeng Dong et al.

CVPR 2024posterarXiv:2312.07067
#5038

Any-Shift Prompting for Generalization over Distributions

Zehao Xiao, Jiayi Shen, Mohammad Mahdi Derakhshani et al.

CVPR 2024posterarXiv:2402.10099
#5039

Revisiting Counterfactual Problems in Referring Expression Comprehension

Zhihan Yu, Ruifan Li

CVPR 2024poster
#5040

VMINer: Versatile Multi-view Inverse Rendering with Near- and Far-field Light Sources

Fan Fei, Jiajun Tang, Ping Tan et al.

CVPR 2024highlight
#5041

Generating Non-Stationary Textures using Self-Rectification

Yang Zhou, Rongjun Xiao, Dani Lischinski et al.

CVPR 2024posterarXiv:2401.02847
#5042

OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising

Haichao Zhang, Yi Xu, Hongsheng Lu et al.

CVPR 2024posterarXiv:2404.02227
#5043

Rethinking Interactive Image Segmentation with Low Latency High Quality and Diverse Prompts

Qin Liu, Jaemin Cho, Mohit Bansal et al.

CVPR 2024posterarXiv:2404.00741
#5044

NAYER: Noisy Layer Data Generation for Efficient and Effective Data-free Knowledge Distillation

Minh-Tuan Tran, Trung Le, Xuan-May Le et al.

CVPR 2024posterarXiv:2310.00258
#5045

Revamping Federated Learning Security from a Defender's Perspective: A Unified Defense with Homomorphic Encrypted Data Space

Naveen Kumar Kummari, Reshmi Mitra, Krishna Mohan Chalavadi

CVPR 2024poster
#5046

PanoRecon: Real-Time Panoptic 3D Reconstruction from Monocular Video

Dong Wu, Zike Yan, Hongbin Zha

CVPR 2024poster
#5047

TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding

Yun Liu, Haolin Yang, Xu Si et al.

CVPR 2024posterarXiv:2401.08399
#5048

Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning

Rongjie Li, Yu Wu, Xuming He

CVPR 2024posterarXiv:2404.00909
#5049

Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations

Chenyu You, Yifei Min, Weicheng Dai et al.

CVPR 2024posterarXiv:2403.07241
#5050

Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle

Youtian Lin, Zuozhuo Dai, Siyu Zhu et al.

CVPR 2024highlightarXiv:2312.03431
#5051

Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs

Hao Fei, Shengqiong Wu, Wei Ji et al.

CVPR 2024posterarXiv:2308.13812
#5052

Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations

Sangmin Lee, Bolin Lai, Fiona Ryan et al.

CVPR 2024posterarXiv:2403.02090
#5053

How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?

Yuxin Chen, Zongyang Ma, Ziqi Zhang et al.

CVPR 2024posterarXiv:2407.07479
#5054

EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models

Sijie Cheng, Zhicheng Guo, Jingwen Wu et al.

CVPR 2024highlightarXiv:2311.15596
#5055

CaKDP: Category-aware Knowledge Distillation and Pruning Framework for Lightweight 3D Object Detection

Haonan Zhang, Longjun Liu, Yuqi Huang et al.

CVPR 2024poster
#5056

CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation

Xi Liu, Ying Guo, Cheng Zhen et al.

CVPR 2024posterarXiv:2403.00274
#5057

Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting

Haipeng Liu, Yang Wang, Biao Qian et al.

CVPR 2024posterarXiv:2403.19898
#5058

SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking

Xiaojun Hou, Jiazheng Xing, Yijie Qian et al.

CVPR 2024posterarXiv:2403.16002
#5059

MACE: Mass Concept Erasure in Diffusion Models

Shilin Lu, Zilan Wang, Leyang Li et al.

CVPR 2024posterarXiv:2403.06135
#5060

Teeth-SEG: An Efficient Instance Segmentation Framework for Orthodontic Treatment based on Multi-Scale Aggregation and Anthropic Prior Knowledge

Bo Zou, Shaofeng Wang, Hao Liu et al.

CVPR 2024poster
#5061

Loopy-SLAM: Dense Neural SLAM with Loop Closures

Lorenzo Liso, Erik Sandström, Vladimir Yugay et al.

CVPR 2024posterarXiv:2402.09944
#5062

MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models

Sanjoy Chowdhury, Sayan Nag, Joseph K J et al.

CVPR 2024highlightarXiv:2406.04673
#5063

InstaGen: Enhancing Object Detection by Training on Synthetic Dataset

Chengjian Feng, Yujie Zhong, Zequn Jie et al.

CVPR 2024posterarXiv:2402.05937
#5064

Modality-agnostic Domain Generalizable Medical Image Segmentation by Multi-Frequency in Multi-Scale Attention

Ju-Hyeon Nam, Nur Suriza Syazwany, Su Jung Kim et al.

CVPR 2024posterarXiv:2405.06284
#5065

NC-TTT: A Noise Constrastive Approach for Test-Time Training

David OSOWIECHI, Gustavo Vargas Hakim, Mehrdad Noori et al.

CVPR 2024highlight
#5066

ESCAPE: Encoding Super-keypoints for Category-Agnostic Pose Estimation

Khoi D Nguyen, Chen Li, Gim Hee Lee

CVPR 2024poster
#5067

CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation

Lingjun Zhao, Jingyu Song, Katherine Skinner

CVPR 2024posterarXiv:2403.19104
#5068

Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing

Bingyan Liu, Chengyu Wang, Tingfeng Cao et al.

CVPR 2024posterarXiv:2403.03431
#5069

Incremental Residual Concept Bottleneck Models

Chenming Shang, Shiji Zhou, Hengyuan Zhang et al.

CVPR 2024posterarXiv:2404.08978
#5070

DUSt3R: Geometric 3D Vision Made Easy

Shuzhe Wang, Vincent Leroy, Yohann Cabon et al.

CVPR 2024posterarXiv:2312.14132
#5071

Adversarial Text to Continuous Image Generation

Kilichbek Haydarov, Aashiq Muhamed, Xiaoqian Shen et al.

CVPR 2024poster
#5072

InceptionNeXt: When Inception Meets ConvNeXt

Weihao Yu, Pan Zhou, Shuicheng Yan et al.

CVPR 2024posterarXiv:2303.16900
#5073

LEMON: Learning 3D Human-Object Interaction Relation from 2D Images

Yuhang Yang, Wei Zhai, Hongchen Luo et al.

CVPR 2024posterarXiv:2312.08963
#5074

Dynamic Prompt Optimizing for Text-to-Image Generation

Wenyi Mo, Tianyu Zhang, Yalong Bai et al.

CVPR 2024posterarXiv:2404.04095
#5075

DaReNeRF: Direction-aware Representation for Dynamic Scenes

Ange Lou, Benjamin Planche, Zhongpai Gao et al.

CVPR 2024posterarXiv:2403.02265
#5076

Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers

Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain et al.

CVPR 2024posterarXiv:2403.07214
#5077

Traceable Federated Continual Learning

Qiang Wang, Bingyan Liu, Yawen Li

CVPR 2024poster
#5078

LASA: Instance Reconstruction from Real Scans using A Large-scale Aligned Shape Annotation Dataset

Haolin Liu, Chongjie Ye, Yinyu Nie et al.

CVPR 2024posterarXiv:2312.12418
#5079

Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos

Kumaranage Ravindu Nagasinghe, Honglu Zhou, Malitha Gunawardhana et al.

CVPR 2024posterarXiv:2403.02782
#5080

HuMoCon: Concept Discovery for Human Motion Understanding

Qihang Fang, Chengcheng Tang, Bugra Tekin et al.

CVPR 2025posterarXiv:2505.20920
#5081

Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization

Siyan Dong, Shuzhe Wang, Shaohui Liu et al.

CVPR 2025posterarXiv:2412.08376
#5082

Bridge Frame and Event: Common Spatiotemporal Fusion for High-Dynamic Scene Optical Flow

Hanyu Zhou, Haonan Wang, Haoyue Liu et al.

CVPR 2025posterarXiv:2503.06992
#5083

StoryGPT-V: Large Language Models as Consistent Story Visualizers

Xiaoqian Shen, Mohamed Elhoseiny

CVPR 2025posterarXiv:2312.02252
#5084

Invisible Backdoor Attack against Self-supervised Learning

Hanrong Zhang, Zhenting Wang, Boheng Li et al.

CVPR 2025posterarXiv:2405.14672
#5085

S^3-Face: SSS-Compliant Facial Reflectance Estimation via Diffusion Priors

Xingyu Ren, Jiankang Deng, Yuhao Cheng et al.

CVPR 2025poster
#5086

SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration

Jianyi Wang, Zhijie Lin, Meng Wei et al.

CVPR 2025highlightarXiv:2501.01320
#5087

RSAR: Restricted State Angle Resolver and Rotated SAR Benchmark

Xin Zhang, Xue Yang, Yuxuan Li et al.

CVPR 2025posterarXiv:2501.04440
#5088

Diffusion Model is Effectively Its Own Teacher

Xinyin Ma, Runpeng Yu, Songhua Liu et al.

CVPR 2025poster
#5089

Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection

wenqiao Li, Yao Gu, Xintao Chen et al.

CVPR 2025posterarXiv:2503.03562
#5090

Flow-NeRF: Joint Learning of Geometry, Poses, and Dense Flow within Unified Neural Representations

Xunzhi Zheng, Dan Xu

CVPR 2025posterarXiv:2503.10464
#5091

LiVOS: Light Video Object Segmentation with Gated Linear Matching

Qin Liu, Jianfeng Wang, Zhengyuan Yang et al.

CVPR 2025posterarXiv:2411.02818
#5092

Dynamic Content Prediction with Motion-aware Priors for Blind Face Video Restoration

Lianxin Xie, csbingbing zheng, Si Wu et al.

CVPR 2025poster
#5093

BADGR: Bundle Adjustment Diffusion Conditioned by Gradients for Wide-Baseline Floor Plan Reconstruction

Yuguang Li, Ivaylo Boyadzhiev, Zixuan Liu et al.

CVPR 2025highlightarXiv:2503.19340
#5094

Towards More General Video-based Deepfake Detection through Facial Component Guided Adaptation for Foundation Model

Yue-Hua Han, Tai-Ming Huang, Kailung Hua et al.

CVPR 2025posterarXiv:2404.05583
#5095

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

Zongjian Li, Bin Lin, Yang Ye et al.

CVPR 2025posterarXiv:2411.17459
#5096

MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling

Yifang Men, Yuan Yao, Miaomiao Cui et al.

CVPR 2025posterarXiv:2409.16160
#5097

Leveraging Perturbation Robustness to Enhance Out-of-Distribution Detection

Wenxi Chen, Raymond A. Yeh, Shaoshuai Mou et al.

CVPR 2025posterarXiv:2503.18784
#5098

Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation

Kunpeng Qiu, Zhiqiang Gao, Zhiying Zhou et al.

CVPR 2025posterarXiv:2505.06068
#5099

Parametric Point Cloud Completion for Polygonal Surface Reconstruction

Zhaiyu Chen, Yuqing Wang, Liangliang Nan et al.

CVPR 2025posterarXiv:2503.08363
#5100

RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives

Chirag Parikh, Deepti Rawat, Rakshitha R. T. et al.

CVPR 2025posterarXiv:2503.21459
#5101

AIM-Fair: Advancing Algorithmic Fairness via Selectively Fine-Tuning Biased Models with Contextual Synthetic Data

Zengqun Zhao, Ziquan Liu, Yu Cao et al.

CVPR 2025posterarXiv:2503.05665
#5102

TAET: Two-Stage Adversarial Equalization Training on Long-Tailed Distributions

Wang Yu-Hang, Junkang Guo, Aolei Liu et al.

CVPR 2025poster
#5103

LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes

Xiang Xu, Lingdong Kong, hui shuai et al.

CVPR 2025posterarXiv:2501.04004
#5104

Interpreting Object-level Foundation Models via Visual Precision Search

Ruoyu Chen, Siyuan Liang, Jingzhi Li et al.

CVPR 2025highlightarXiv:2411.16198
#5105

Descriptor-In-Pixel : Point-Feature Tracking For Pixel Processor Arrays

Laurie Bose, Piotr Dudek, Jianing Chen

CVPR 2025poster
#5106

Consistent Normal Orientation for 3D Point Clouds via Least Squares on Delaunay Graph

Rao Fu, Jianmin Zheng, Liang Yu

CVPR 2025poster
#5107

AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction

Yuanbin Man, Ying Huang, Chengming Zhang et al.

CVPR 2025highlightarXiv:2411.12593
#5108

Movie Weaver: Tuning-Free Multi-Concept Video Personalization with Anchored Prompts

Feng Liang, Haoyu Ma, Zecheng He et al.

CVPR 2025posterarXiv:2502.07802
#5109

Exploring Timeline Control for Facial Motion Generation

Yifeng Ma, Jinwei Qi, Chaonan Ji et al.

CVPR 2025posterarXiv:2505.20861
#5110

IRGS: Inter-Reflective Gaussian Splatting with 2D Gaussian Ray Tracing

Chun Gu, Xiaofei Wei, Zixuan Zeng et al.

CVPR 2025posterarXiv:2412.15867
#5111

OSLoPrompt: Bridging Low-Supervision Challenges and Open-Set Domain Generalization in CLIP

Mohamad Hassan N C, Divyam Gupta, Mainak Singha et al.

CVPR 2025posterarXiv:2503.16106
#5112

EchoTraffic: Enhancing Traffic Anomaly Understanding with Audio-Visual Insights

Zhenghao Xing, Hao Chen, Binzhu Xie et al.

CVPR 2025poster
#5113

Learning Temporally Consistent Video Depth from Video Diffusion Priors

Jiahao Shao, Yuanbo Yang, Hongyu Zhou et al.

CVPR 2025posterarXiv:2406.01493
#5114

Yo’Chameleon: Personalized Vision and Language Generation

Thao Nguyen, Krishna Kumar Singh, Jing Shi et al.

CVPR 2025poster
#5115

PersonaBooth: Personalized Text-to-Motion Generation

Boeun Kim, Hea In Jeong, JungHoon Sung et al.

CVPR 2025posterarXiv:2503.07390
#5116

Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite Imagery

Sara Al-Emadi, Yin Yang, Ferda Ofli

CVPR 2025posterarXiv:2503.19202
#5117

Electromyography-Informed Facial Expression Reconstruction for Physiological-Based Synthesis and Analysis

Tim Büchner, Christoph Anders, Orlando Guntinas-Lichius et al.

CVPR 2025highlightarXiv:2503.09556
#5118

InsTaG: Learning Personalized 3D Talking Head from Few-Second Video

Jiahe Li, Jiawei Zhang, Xiao Bai et al.

CVPR 2025posterarXiv:2502.20387
#5119

Unseen Visual Anomaly Generation

HAN SUN, Yunkang Cao, Hao Dong et al.

CVPR 2025posterarXiv:2406.01078
#5120

Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models

Jiacong Xu, Shao-Yuan Lo, Bardia Safaei et al.

CVPR 2025highlightarXiv:2502.07601
#5121

SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos

Yuzheng Liu, Siyan Dong, Shuzhe Wang et al.

CVPR 2025highlightarXiv:2412.09401
#5122

EchoMatch: Partial-to-Partial Shape Matching via Correspondence Reflection

Yizheng Xie, Viktoria Ehm, Paul Roetzer et al.

CVPR 2025poster
#5123

JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation

Yiyang Ma, Xingchao Liu, Xiaokang Chen et al.

CVPR 2025posterarXiv:2411.07975
#5124

PatchDEMUX: A Certifiably Robust Framework for Multi-label Classifiers Against Adversarial Patches

Dennis Jacob, Chong Xiang, Prateek Mittal

CVPR 2025posterarXiv:2505.24703
#5125

CALICO: Part-Focused Semantic Co-Segmentation with Large Vision-Language Models

Kiet A. Nguyen, Adheesh Juvekar, Tianjiao Yu et al.

CVPR 2025posterarXiv:2412.19331
#5126

Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation

Jingxi Chen, Brandon Y. Feng, Haoming Cai et al.

CVPR 2025posterarXiv:2412.07761
#5127

VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos

Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin et al.

CVPR 2025posterarXiv:2405.19209
#5128

MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments

Ege Özsoy, Chantal Pellegrini, Tobias Czempiel et al.

CVPR 2025posterarXiv:2503.02579
#5129

VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing

Juan Luis Gonzalez Bello, Xu Yao, Alex Whelan et al.

CVPR 2025posterarXiv:2504.07146
#5130

Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding

Pedro Hermosilla, Christian Stippel, Leon Sick

CVPR 2025posterarXiv:2504.06719
#5131

Prior Does Matter: Visual Navigation via Denoising Diffusion Bridge Models

Hao Ren, Yiming Zeng, Zetong Bi et al.

CVPR 2025posterarXiv:2504.10041
#5132

LoRA Recycle: Unlocking Tuning-Free Few-Shot Adaptability in Visual Foundation Models by Recycling Pre-Tuned LoRAs

Zixuan Hu, Yongxian Wei, Li Shen et al.

CVPR 2025poster
#5133

TensoFlow: Tensorial Flow-based Sampler for Inverse Rendering

Chun Gu, Xiaofei Wei, Li Zhang et al.

CVPR 2025posterarXiv:2503.18328
#5134

STAR-Edge: Structure-aware Local Spherical Curve Representation for Thin-walled Edge Extraction from Unstructured Point Clouds

Zikuan Li, Honghua Chen, Yuecheng Wang et al.

CVPR 2025posterarXiv:2503.00801
#5135

ZoomLDM: Latent Diffusion Model for Multi-scale Image Generation

Srikar Yellapragada, Alexandros Graikos, Kostas Triaridis et al.

CVPR 2025posterarXiv:2411.16969
#5136

RainyGS: Efficient Rain Synthesis with Physically-Based Gaussian Splatting

Qiyu Dai, Xingyu Ni, Qianfan Shen et al.

CVPR 2025posterarXiv:2503.21442
#5137

Design2GarmentCode: Turning Design Concepts to Tangible Garments Through Program Synthesis

Feng Zhou, Ruiyang Liu, chen liu et al.

CVPR 2025posterarXiv:2412.08603
#5138

Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation

Joohyun Kwon, Hanbyel Cho, Junmo Kim

CVPR 2025posterarXiv:2502.02091
#5139

EventFly: Event Camera Perception from Ground to the Sky

Lingdong Kong, Dongyue Lu, Xiang Xu et al.

CVPR 2025posterarXiv:2503.19916
#5140

Exploiting Deblurring Networks for Radiance Fields

Haeyun Choi, Heemin Yang, Janghyeok Han et al.

CVPR 2025posterarXiv:2502.14454
#5141

ICE: Intrinsic Concept Extraction from a Single Image via Diffusion Models

Fernando Julio Cendra, Kai Han

CVPR 2025highlightarXiv:2503.19902
#5142

Can Generative Video Models Help Pose Estimation?

Ruojin Cai, Jason Y. Zhang, Philipp Henzler et al.

CVPR 2025highlightarXiv:2412.16155
#5143

MMRL: Multi-Modal Representation Learning for Vision-Language Models

Yuncheng Guo, Xiaodong Gu

CVPR 2025posterarXiv:2503.08497
#5144

VidTwin: Video VAE with Decoupled Structure and Dynamics

Yuchi Wang, Junliang Guo, Xinyi Xie et al.

CVPR 2025posterarXiv:2412.17726
#5145

Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions

Stefan Andreas Baumann, Felix Krause, Michael Neumayr et al.

CVPR 2025posterarXiv:2403.17064
#5146

3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

Zhaoxi Chen, Jiaxiang Tang, Yuhao Dong et al.

CVPR 2025highlightarXiv:2409.12957
#5147

Unraveling Normal Anatomy via Fluid-Driven Anomaly Randomization

Peirong Liu, Ana Lawry Aguila, Juan Iglesias

CVPR 2025posterarXiv:2501.13370
#5148

VoCo-LLaMA: Towards Vision Compression with Large Language Models

Xubing Ye, Yukang Gan, Xiaoke Huang et al.

CVPR 2025posterarXiv:2406.12275
#5149

Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model

Yuting Zhang, Hao Lu, Qingyong Hu et al.

CVPR 2025posterarXiv:2505.24476
#5150

Continuous Locomotive Crowd Behavior Generation

Inhwan Bae, Junoh Lee, Hae-Gon Jeon

CVPR 2025posterarXiv:2504.04756
#5151

A Unified Latent Schrödinger Bridge Diffusion Model for Unsupervised Anomaly Detection and Localization

Shilhora Akshay, Niveditha Lakshmi Narasimhan, Jacob George et al.

CVPR 2025poster
#5152

Token Cropr: Faster ViTs for Quite a Few Tasks

Benjamin Bergner, Christoph Lippert, Aravindh Mahendran

CVPR 2025posterarXiv:2412.00965
#5153

CacheQuant: Comprehensively Accelerated Diffusion Models

Xuewen Liu, Zhikai Li, Qingyi Gu

CVPR 2025posterarXiv:2503.01323
#5154

Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding

Wenxuan Guo, Xiuwei Xu, Ziwei Wang et al.

CVPR 2025highlightarXiv:2502.10392
#5155

SimLTD: Simple Supervised and Semi-Supervised Long-Tailed Object Detection

Phi Vu Tran

CVPR 2025posterarXiv:2412.20047
#5156

What’s in the Image? A Deep-Dive into the Vision of Vision Language Models

Omri Kaduri, Shai Bagon, Tali Dekel

CVPR 2025posterarXiv:2411.17491
#5157

MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts

Peijie Wang, Zhong-Zhi Li, Fei Yin et al.

CVPR 2025posterarXiv:2502.20808
#5158

MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection

Hou-I Liu, Christine Wu, Jen-Hao Cheng et al.

CVPR 2025posterarXiv:2404.04910
#5159

APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers

Zhuguanyu Wu, Jiayi Zhang, Jiaxin Chen et al.

CVPR 2025posterarXiv:2504.02508
#5160

RestorGS: Depth-aware Gaussian Splatting for Efficient 3D Scene Restoration

Yuanjian Qiao, Mingwen Shao, Lingzhuang Meng et al.

CVPR 2025poster
#5161

Identity-Preserving Text-to-Video Generation by Frequency Decomposition

Shenghai Yuan, Jinfa Huang, Xianyi He et al.

CVPR 2025highlightarXiv:2411.17440
#5162

Associative Transformer

Yuwei Sun, Hideya Ochiai, Zhirong Wu et al.

CVPR 2025posterarXiv:2309.12862
#5163

Blood Flow Speed Estimation with Optical Coherence Tomography Angiography Images

Wensheng Cheng, Zhenghong Li, Jiaxiang Ren et al.

CVPR 2025poster
#5164

World-consistent Video Diffusion with Explicit 3D Modeling

Qihang Zhang, Shuangfei Zhai, Miguel Ángel Bautista et al.

CVPR 2025highlightarXiv:2412.01821
#5165

DPFlow: Adaptive Optical Flow Estimation with a Dual-Pyramid Framework

Henrique Morimitsu, Xiaobin Zhu, Roberto M. Cesar Jr et al.

CVPR 2025posterarXiv:2503.14880
#5166

OSDFace: One-Step Diffusion Model for Face Restoration

Jingkai Wang, Jue Gong, Lin Zhang et al.

CVPR 2025posterarXiv:2411.17163
#5167

Free-viewpoint Human Animation with Pose-correlated Reference Selection

Fa-Ting Hong, Zhan Xu, Haiyang Liu et al.

CVPR 2025highlightarXiv:2412.17290
#5168

3D Gaussian Inpainting with Depth-Guided Cross-View Consistency

Sheng-Yu Huang, Zi-Ting Chou, Yu-Chiang Frank Wang

CVPR 2025posterarXiv:2502.11801
#5169

Nonisotropic Gaussian Diffusion for Realistic 3D Human Motion Prediction

Cecilia Curreli, Dominik Muhle, Abhishek Saroha et al.

CVPR 2025posterarXiv:2501.06035
#5170

Temporal Score Analysis for Understanding and Correcting Diffusion Artifacts

Yu Cao, Zengqun Zhao, Ioannis Patras et al.

CVPR 2025posterarXiv:2503.16218
#5171

Visual Representation Learning through Causal Intervention for Controllable Image Editing

Shanshan Huang, Haoxuan Li, Chunyuan Zheng et al.

CVPR 2025highlight
#5172

Three-view Focal Length Recovery From Homographies

Yaqing Ding, Viktor Kocur, Zuzana Berger Haladova et al.

CVPR 2025posterarXiv:2501.07499
#5173

ProAPO: Progressively Automatic Prompt Optimization for Visual Classification

Xiangyan Qu, Gaopeng Gou, Jiamin Zhuang et al.

CVPR 2025posterarXiv:2502.19844
#5174

ShapeWords: Guiding Text-to-Image Synthesis with 3D Shape-Aware Prompts

Dmitrii M Petrov, Pradyumn Goyal, Divyansh Shivashok et al.

CVPR 2025posterarXiv:2412.02912
#5175

EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision

Yiming Zhao, Taein Kwon, Paul Streli et al.

CVPR 2025highlightarXiv:2409.02224
#5176

SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction

Enrico Pallotta, Sina Mokhtarzadeh Azar, Shuai Li et al.

CVPR 2025posterarXiv:2503.18933
#5177

AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting

Chung-Ho Wu, Yang-Jung Chen, Ying-Huan Chen et al.

CVPR 2025posterarXiv:2502.05176
#5178

Real-time Free-view Human Rendering from Sparse-view RGB Videos using Double Unprojected Textures

Guoxing Sun, Rishabh Dabral, Heming Zhu et al.

CVPR 2025highlightarXiv:2412.13183
#5179

Scene-agnostic Pose Regression for Visual Localization

Junwei Zheng, Ruiping Liu, Yufan Chen et al.

CVPR 2025posterarXiv:2503.19543
#5180

Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond)

Tomer Garber, Tom Tirer

CVPR 2025posterarXiv:2412.20596
#5181

Localizing Events in Videos with Multimodal Queries

Gengyuan Zhang, Mang Ling Ada Fok, Jialu Ma et al.

CVPR 2025posterarXiv:2406.10079
#5182

HuPerFlow: A Comprehensive Benchmark for Human vs. Machine Motion Estimation Comparison

Yung-Hao Yang, Zitang Sun, Taiki Fukiage et al.

CVPR 2025highlight
#5183

Realistic Test-Time Adaptation of Vision-Language Models

Maxime Zanella, Clément Fuchs, Christophe De Vleeschouwer et al.

CVPR 2025highlightarXiv:2501.03729
#5184

GOAL: Global-local Object Alignment Learning

Hyungyu Choi, Young Kyun Jang, Chanho Eom

CVPR 2025posterarXiv:2503.17782
#5185

Magma: A Foundation Model for Multimodal AI Agents

Jianwei Yang, Reuben Tan, Qianhui Wu et al.

CVPR 2025posterarXiv:2502.13130
#5186

HOTFormerLoc: Hierarchical Octree Transformer for Versatile Lidar Place Recognition Across Ground and Aerial Views

Ethan Griffiths, Maryam Haghighat, Simon Denman et al.

CVPR 2025posterarXiv:2503.08140
#5187

Time of the Flight of the Gaussians: Optimizing Depth Indirectly in Dynamic Radiance Fields

Runfeng Li, Mikhail Okunev, Zixuan Guo et al.

CVPR 2025posterarXiv:2505.05356
#5188

Generative Photomontage

Sean J. Liu, Nupur Kumari, Ariel Shamir et al.

CVPR 2025posterarXiv:2408.07116
#5189

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Ali Hatamizadeh, Jan Kautz

CVPR 2025posterarXiv:2407.08083
#5190

MotiF: Making Text Count in Image Animation with Motion Focal Loss

Shijie Wang, Samaneh Azadi, Rohit Girdhar et al.

CVPR 2025posterarXiv:2412.16153
#5191

Learning Physics-Based Full-Body Human Reaching and Grasping from Brief Walking References

Yitang Li, Mingxian Lin, Zhuo Lin et al.

CVPR 2025posterarXiv:2503.07481
#5192

Prof. Robot: Differentiable Robot Rendering Without Static and Self-Collisions

Quanyuan Ruan, Jiabao Lei, Wenhao Yuan et al.

CVPR 2025posterarXiv:2503.11269
#5193

Attention IoU: Examining Biases in CelebA using Attention Maps

Aaron Serianni, Tyler Zhu, Olga Russakovsky et al.

CVPR 2025posterarXiv:2503.19846
#5194

Stochastic Human Motion Prediction with Memory of Action Transition and Action Characteristic

Jianwei Tang, Hong Yang, Tengyue Chen et al.

CVPR 2025posterarXiv:2507.04062
#5195

Feature Selection for Latent Factor Models

Rittwika Kansabanik, Adrian Barbu

CVPR 2025posterarXiv:2412.10128
#5196

Generative Multiview Relighting for 3D Reconstruction under Extreme Illumination Variation

Hadi Alzayer, Philipp Henzler, Jonathan T. Barron et al.

CVPR 2025highlightarXiv:2412.15211
#5197

LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant

Yikun Liu, Yajie Zhang, jiayin cai et al.

CVPR 2025posterarXiv:2412.01720
#5198

DeepLA-Net: Very Deep Local Aggregation Networks for Point Cloud Analysis

Ziyin Zeng, Mingyue Dong, Jian Zhou et al.

CVPR 2025poster
#5199

ClimbingCap: Multi-Modal Dataset and Method for Rock Climbing in World Coordinate

Ming Yan, Xincheng Lin, Yuhua Luo et al.

CVPR 2025highlightarXiv:2503.21268
#5200

MVDoppler-Pose: Multi-Modal Multi-View mmWave Sensing for Long-Distance Self-Occluded Human Walking Pose Estimation

Jae-Ho Choi, Soheil Hor, Shubo Yang et al.

CVPR 2025poster