Most Cited CVPR "fine-grained visual reasoning" Papers

5,589 papers found • Page 6 of 28

#1001

Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset

Yiming Li, Zhiheng Li, Nuo Chen et al.

CVPR 2024posterarXiv:2406.09383
18
citations
#1002

FAR: Flexible Accurate and Robust 6DoF Relative Camera Pose Estimation

Chris Rockwell, Nilesh Kulkarni, Linyi Jin et al.

CVPR 2024highlightarXiv:2403.03221
18
citations
#1003

SuperSVG: Superpixel-based Scalable Vector Graphics Synthesis

Teng Hu, Ran Yi, Baihong Qian et al.

CVPR 2024posterarXiv:2406.09794
18
citations
#1004

Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis

Jiangyong Huang, Baoxiong Jia, Yan Wang et al.

CVPR 2025posterarXiv:2503.22420
18
citations
#1005

Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior

Chen Guo, Junxuan Li, Yash Kant et al.

CVPR 2025posterarXiv:2503.01610
18
citations
#1006

Cubify Anything: Scaling Indoor 3D Object Detection

Justin Lazarow, David Griffiths, Gefen Kohavi et al.

CVPR 2025highlightarXiv:2412.04458
18
citations
#1007

SketchAgent: Language-Driven Sequential Sketch Generation

Yael Vinker, Tamar Rott Shaham, Kristine Zheng et al.

CVPR 2025posterarXiv:2411.17673
18
citations
#1008

MESA: Matching Everything by Segmenting Anything

Yesheng Zhang, Xu Zhao

CVPR 2024posterarXiv:2401.16741
18
citations
#1009

FreePoint: Unsupervised Point Cloud Instance Segmentation

Zhikai Zhang, Jian Ding, Li Jiang et al.

CVPR 2024posterarXiv:2305.06973
18
citations
#1010

Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning

Fan Lu, Wei Wu, Kecheng Zheng et al.

CVPR 2025posterarXiv:2412.08614
18
citations
#1011

AssistGUI: Task-Oriented PC Graphical User Interface Automation

Difei Gao, Lei Ji, Zechen Bai et al.

CVPR 2024poster
18
citations
#1012

Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring

Huicong Zhang, Haozhe Xie, Hongxun Yao

CVPR 2024posterarXiv:2406.07551
18
citations
#1013

Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens

Kaihang Pan, Wang Lin, Zhongqi Yue et al.

CVPR 2025posterarXiv:2504.14666
18
citations
#1014

HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation

Boyuan Wang, Xiaofeng Wang, Chaojun Ni et al.

CVPR 2025posterarXiv:2503.24026
18
citations
#1015

Towards Modern Image Manipulation Localization: A Large-Scale Dataset and Novel Methods

Chenfan Qu, Yiwu Zhong, Chongyu Liu et al.

CVPR 2024poster
18
citations
#1016

A Simple and Effective Point-based Network for Event Camera 6-DOFs Pose Relocalization

Hongwei Ren, Jiadong Zhu, Yue Zhou et al.

CVPR 2024posterarXiv:2403.19412
18
citations
#1017

TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing

Stefan Lionar, Jiabin Liang, Gim Hee Lee

CVPR 2025posterarXiv:2503.11629
18
citations
#1018

Cloud-Device Collaborative Learning for Multimodal Large Language Models

Guanqun Wang, Jiaming Liu, Chenxuan Li et al.

CVPR 2024posterarXiv:2312.16279
18
citations
#1019

Rotation-Agnostic Image Representation Learning for Digital Pathology

Saghir Alfasly, Abubakr Shafique, Peyman Nejat et al.

CVPR 2024posterarXiv:2311.08359
18
citations
#1020

FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio

Chao Xu, Yang Liu, Jiazheng Xing et al.

CVPR 2024posterarXiv:2403.01901
18
citations
#1021

Open-Set Domain Adaptation for Semantic Segmentation

Seun-An Choe, Ah-Hyung Shin, Keon Hee Park et al.

CVPR 2024posterarXiv:2405.19899
18
citations
#1022

Correspondence-Free Non-Rigid Point Set Registration Using Unsupervised Clustering Analysis

Mingyang Zhao, Jiang Jingen, Lei Ma et al.

CVPR 2024highlightarXiv:2406.18817
18
citations
#1023

HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation

Yongliang Lin, Yongzhi Su, Praveen Nathan et al.

CVPR 2024posterarXiv:2311.12588
18
citations
#1024

4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion

Chaoyang Wang, Peiye Zhuang, Tuan Duc Ngo et al.

CVPR 2025highlightarXiv:2412.04462
18
citations
#1025

Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset

Xiao Wang, Yu Jin, Wentao Wu et al.

CVPR 2025posterarXiv:2412.06647
18
citations
#1026

Exploring the Transferability of Visual Prompting for Multimodal Large Language Models

Yichi Zhang, Yinpeng Dong, Siyuan Zhang et al.

CVPR 2024highlightarXiv:2404.11207
18
citations
#1027

T4P: Test-Time Training of Trajectory Prediction via Masked Autoencoder and Actor-specific Token Memory

Daehee Park, Jaeseok Jeong, Sung-Hoon Yoon et al.

CVPR 2024posterarXiv:2403.10052
18
citations
#1028

GenFusion: Closing the Loop between Reconstruction and Generation via Videos

Sibo Wu, Congrong Xu, Binbin Huang et al.

CVPR 2025posterarXiv:2503.21219
18
citations
#1029

Hash3D: Training-free Acceleration for 3D Generation

Xingyi Yang, Songhua Liu, Xinchao Wang

CVPR 2025posterarXiv:2404.06091
18
citations
#1030

OmniGuard: Hybrid Manipulation Localization via Augmented Versatile Deep Image Watermarking

Xuanyu Zhang, Zecheng Tang, Zhipei Xu et al.

CVPR 2025posterarXiv:2412.01615
18
citations
#1031

Composing Object Relations and Attributes for Image-Text Matching

Khoi Pham, Chuong Huynh, Ser-Nam Lim et al.

CVPR 2024posterarXiv:2406.11820
18
citations
#1032

PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos

Yufei Zhang, Jeffrey Kephart, Zijun Cui et al.

CVPR 2024posterarXiv:2404.04430
18
citations
#1033

RMem: Restricted Memory Banks Improve Video Object Segmentation

Junbao Zhou, Ziqi Pang, Yu-Xiong Wang

CVPR 2024posterarXiv:2406.08476
18
citations
#1034

Revisiting Adversarial Training Under Long-Tailed Distributions

Xinli Yue, Ningping Mou, Qian Wang et al.

CVPR 2024posterarXiv:2403.10073
17
citations
#1035

Adapting Short-Term Transformers for Action Detection in Untrimmed Videos

Min Yang, gaohuan, Ping Guo et al.

CVPR 2024posterarXiv:2312.01897
17
citations
#1036

Decompose-and-Compose: A Compositional Approach to Mitigating Spurious Correlation

Fahimeh Hosseini Noohdani, Parsa Hosseini, Aryan Yazdan Parast et al.

CVPR 2024posterarXiv:2402.18919
17
citations
#1037

Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera

Yuliang Guo, Sparsh Garg, S. Mahdi H. Miangoleh et al.

CVPR 2025posterarXiv:2501.02464
17
citations
#1038

Align Before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition

Yifei Chen, Dapeng Chen, Ruijin Liu et al.

CVPR 2024posterarXiv:2311.15619
17
citations
#1039

HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud

WENCAN CHENG, Hao Tang, Luc Van Gool et al.

CVPR 2024highlightarXiv:2404.03159
17
citations
#1040

DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data

Qihao Liu, Yi Zhang, Song Bai et al.

CVPR 2024posterarXiv:2406.04322
17
citations
#1041

DashGaussian: Optimizing 3D Gaussian Splatting in 200 Seconds

Youyu Chen, Junjun Jiang, Kui Jiang et al.

CVPR 2025highlightarXiv:2503.18402
17
citations
#1042

Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation

Xin Zhang, Robby T. Tan

CVPR 2025highlightarXiv:2504.03193
17
citations
#1043

De-Diffusion Makes Text a Strong Cross-Modal Interface

Chen Wei, Chenxi Liu, Siyuan Qiao et al.

CVPR 2024posterarXiv:2311.00618
17
citations
#1044

INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations

Yongming Zhu, Longhao Zhang, Zhengkun Rong et al.

CVPR 2025posterarXiv:2412.04037
17
citations
#1045

Understanding Video Transformers via Universal Concept Discovery

Matthew Kowal, Achal Dave, Rares Andrei Ambrus et al.

CVPR 2024highlightarXiv:2401.10831
17
citations
#1046

Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling

Junha Hyung, Kinam Kim, Susung Hong et al.

CVPR 2025posterarXiv:2411.18664
17
citations
#1047

Data Valuation and Detections in Federated Learning

Wenqian Li, Shuran Fu, Fengrui Zhang et al.

CVPR 2024posterarXiv:2311.05304
17
citations
#1048

Spiking Transformer with Spatial-Temporal Attention

Donghyun Lee, Yuhang Li, Youngeun Kim et al.

CVPR 2025posterarXiv:2409.19764
17
citations
#1049

Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval

Yuanmin Tang, Jue Zhang, Xiaoting Qin et al.

CVPR 2025highlightarXiv:2412.11077
17
citations
#1050

PO3AD: Predicting Point Offsets toward Better 3D Point Cloud Anomaly Detection

Jianan Ye, Weiguang Zhao, Xi Yang et al.

CVPR 2025posterarXiv:2412.12617
17
citations
#1051

FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity

Hang Hua, Qing Liu, Lingzhi Zhang et al.

CVPR 2025posterarXiv:2411.15411
17
citations
#1052

Condition-Aware Neural Network for Controlled Image Generation

Han Cai, Muyang Li, Qinsheng Zhang et al.

CVPR 2024posterarXiv:2404.01143
17
citations
#1053

Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Mutimodal Models

Xingrui Wang, Wufei Ma, Tiezheng Zhang et al.

CVPR 2025highlight
17
citations
#1054

Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation

Jiaming Zhou, Teli Ma, Kun-Yu Lin et al.

CVPR 2025posterarXiv:2406.14235
17
citations
#1055

HAVE-FUN: Human Avatar Reconstruction from Few-Shot Unconstrained Images

Xihe Yang, Xingyu Chen, Daiheng Gao et al.

CVPR 2024posterarXiv:2311.15672
17
citations
#1056

Neural Visibility Field for Uncertainty-Driven Active Mapping

Shangjie Xue, Jesse Dill, Pranay Mathur et al.

CVPR 2024posterarXiv:2406.06948
17
citations
#1057

SfmCAD: Unsupervised CAD Reconstruction by Learning Sketch-based Feature Modeling Operations

Pu Li, Jianwei Guo, HUIBIN LI et al.

CVPR 2024poster
17
citations
#1058

Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation

Xiyi Chen, Marko Mihajlovic, Shaofei Wang et al.

CVPR 2024posterarXiv:2401.04728
17
citations
#1059

Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors

Haoxuanye Ji, Pengpeng Liang, Erkang Cheng

CVPR 2024posterarXiv:2403.06093
17
citations
#1060

SuperNormal: Neural Surface Reconstruction via Multi-View Normal Integration

Xu Cao, Takafumi Taketomi

CVPR 2024posterarXiv:2312.04803
17
citations
#1061

Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation

Junyan Wang, Zhenhong Sun, Stewart Tan et al.

CVPR 2024posterarXiv:2403.05239
17
citations
#1062

Differentiable Information Bottleneck for Deterministic Multi-view Clustering

Xiaoqiang Yan, Zhixiang Jin, Fengshou Han et al.

CVPR 2024posterarXiv:2403.15681
17
citations
#1063

PAIR Diffusion: A Comprehensive Multimodal Object-Level Image Editor

Vidit Goel, Elia Peruzzo, Yifan Jiang et al.

CVPR 2024poster
17
citations
#1064

UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior

I-Hsiang Chen, Wei-Ting Chen, Yu-Wei Liu et al.

CVPR 2025highlightarXiv:2501.13134
17
citations
#1065

Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset

Yiqun Mei, Mingming He, Li Ma et al.

CVPR 2025posterarXiv:2503.14485
17
citations
#1066

Detecting Out-of-Distribution Through the Lens of Neural Collapse

Litian Liu, Yao Qin

CVPR 2025posterarXiv:2311.01479
17
citations
#1067

AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment

Yan Li, Yifei Xing, Xiangyuan Lan et al.

CVPR 2025posterarXiv:2412.00833
17
citations
#1068

Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters

Yuan Wang, Ouxiang Li, Tingting Mu et al.

CVPR 2025posterarXiv:2412.06143
17
citations
#1069

Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation

Zhuoman Liu, Weicai Ye, Yan Luximon et al.

CVPR 2025posterarXiv:2411.14423
17
citations
#1070

Adaptive VIO: Deep Visual-Inertial Odometry with Online Continual Learning

Youqi Pan, Wugen Zhou, Yingdian Cao et al.

CVPR 2024posterarXiv:2405.16754
17
citations
#1071

HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos

Jinglei Zhang, Jiankang Deng, Chao Ma et al.

CVPR 2025highlightarXiv:2501.02973
17
citations
#1072

De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts

Yuzheng Wang, Dingkang Yang, Zhaoyu Chen et al.

CVPR 2024posterarXiv:2403.19539
17
citations
#1073

DeiT-LT: Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets

Harsh Rangwani, Pradipto Mondal, Mayank Mishra et al.

CVPR 2024posterarXiv:2404.02900
17
citations
#1074

CycleINR: Cycle Implicit Neural Representation for Arbitrary-Scale Volumetric Super-Resolution of Medical Data

Wei Fang, Yuxing Tang, Heng Guo et al.

CVPR 2024posterarXiv:2404.04878
17
citations
#1075

ScanFormer: Referring Expression Comprehension by Iteratively Scanning

Wei Su, Peihan Miao, Huanzhang Dou et al.

CVPR 2024posterarXiv:2406.18048
16
citations
#1076

Video Prediction by Modeling Videos as Continuous Multi-Dimensional Processes

Gaurav Shrivastava, Abhinav Shrivastava

CVPR 2024poster
16
citations
#1077

Frozen Feature Augmentation for Few-Shot Image Classification

Andreas Bär, Neil Houlsby, Mostafa Dehghani et al.

CVPR 2024posterarXiv:2403.10519
16
citations
#1078

Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer

Junyi Wu, Bin Duan, Weitai Kang et al.

CVPR 2024posterarXiv:2403.14552
16
citations
#1079

The Scene Language: Representing Scenes with Programs, Words, and Embeddings

Yunzhi Zhang, Zizhang Li, Matt Zhou et al.

CVPR 2025highlightarXiv:2410.16770
16
citations
#1080

MLLM-as-a-Judge for Image Safety without Human Labeling

Zhenting Wang, Shuming Hu, Shiyu Zhao et al.

CVPR 2025highlightarXiv:2501.00192
16
citations
#1081

ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object

Zhe Shan, Yang Liu, Lei Zhou et al.

CVPR 2025posterarXiv:2503.12006
16
citations
#1082

Object Pose Estimation via the Aggregation of Diffusion Features

Tianfu Wang, Guosheng Hu, Hongguang Wang

CVPR 2024highlightarXiv:2403.18791
16
citations
#1083

Inverse Rendering of Glossy Objects via the Neural Plenoptic Function and Radiance Fields

Haoyuan Wang, Wenbo Hu, Lei Zhu et al.

CVPR 2024posterarXiv:2403.16224
16
citations
#1084

Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches

Qing Yu, Mikihiro Tanaka, Kent Fujiwara

CVPR 2024posterarXiv:2405.04771
16
citations
#1085

DreamOmni: Unified Image Generation and Editing

Bin Xia, Yuechen Zhang, Jingyao Li et al.

CVPR 2025posterarXiv:2412.17098
16
citations
#1086

Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction

Shiyu Zhao, Zhenting Wang, Felix Juefei-Xu et al.

CVPR 2025posterarXiv:2412.00556
16
citations
#1087

Mimir: Improving Video Diffusion Models for Precise Text Understanding

Shuai Tan, Biao Gong, Yutong Feng et al.

CVPR 2025posterarXiv:2412.03085
16
citations
#1088

Gaussian Shadow Casting for Neural Characters

Luis Bolanos, Shih-Yang Su, Helge Rhodin

CVPR 2024posterarXiv:2401.06116
16
citations
#1089

Diversified and Personalized Multi-rater Medical Image Segmentation

Yicheng Wu, Xiangde Luo, Zhe Xu et al.

CVPR 2024highlightarXiv:2403.13417
16
citations
#1090

Real-Time Simulated Avatar from Head-Mounted Sensors

Zhengyi Luo, Jinkun Cao, Rawal Khirodkar et al.

CVPR 2024highlightarXiv:2403.06862
16
citations
#1091

C^2RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction

Yiqun Lin, Jiewen Yang, hualiang wang et al.

CVPR 2024posterarXiv:2406.03902
16
citations
#1092

DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery

Yixuan Zhu, Ao Li, Yansong Tang et al.

CVPR 2024posterarXiv:2404.01424
16
citations
#1093

LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model

Dongkai Wang, shiyu xuan, Shiliang Zhang

CVPR 2024highlightarXiv:2406.04659
16
citations
#1094

Instant3dit: Multiview Inpainting for Fast Editing of 3D Objects

Amir Barda, Matheus Gadelha, Vladimir G. Kim et al.

CVPR 2025posterarXiv:2412.00518
16
citations
#1095

MaGGIe: Masked Guided Gradual Human Instance Matting

Chuong Huynh, Seoung Wug Oh, Abhinav Shrivastava et al.

CVPR 2024posterarXiv:2404.16035
16
citations
#1096

Quantization without Tears

Minghao Fu, Hao Yu, Jie Shao et al.

CVPR 2025posterarXiv:2411.13918
16
citations
#1097

MagicArticulate: Make Your 3D Models Articulation-Ready

Chaoyue Song, Jianfeng Zhang, Xiu Li et al.

CVPR 2025posterarXiv:2502.12135
16
citations
#1098

Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses

Inhee Lee, Byungjun Kim, Hanbyul Joo

CVPR 2024posterarXiv:2404.14410
16
citations
#1099

DART: Implicit Doppler Tomography for Radar Novel View Synthesis

Tianshu Huang, John Miller, Akarsh Prabhakara et al.

CVPR 2024posterarXiv:2403.03896
16
citations
#1100

Iterated Learning Improves Compositionality in Large Vision-Language Models

Chenhao Zheng, Jieyu Zhang, Aniruddha Kembhavi et al.

CVPR 2024posterarXiv:2404.02145
16
citations
#1101

ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting

Chen Duan, Pei Fu, Shan Guo et al.

CVPR 2024posterarXiv:2403.00303
16
citations
#1102

Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation

Nicolas Dufour, Vicky Kalogeiton, David Picard et al.

CVPR 2025posterarXiv:2412.06781
16
citations
#1103

KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling

Yu Wang, Xin Li, Shengzhao Wen et al.

CVPR 2024posterarXiv:2211.08071
16
citations
#1104

MambaIC: State Space Models for High-Performance Learned Image Compression

Fanhu Zeng, Hao Tang, Yihua Shao et al.

CVPR 2025posterarXiv:2503.12461
16
citations
#1105

Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption

Du CHEN, Tianhe Wu, Kede Ma et al.

CVPR 2025posterarXiv:2503.11221
16
citations
#1106

Efficiently Assemble Normalization Layers and Regularization for Federated Domain Generalization

Khiem Le, Tuan Long Ho, Cuong Do et al.

CVPR 2024posterarXiv:2403.15605
16
citations
#1107

Efficient Meshflow and Optical Flow Estimation from Event Cameras

Xinglong Luo, Ao Luo, Zhengning Wang et al.

CVPR 2024poster
16
citations
#1108

Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition

Hongda Liu, Yunfan Liu, Min Ren et al.

CVPR 2025highlightarXiv:2411.18941
16
citations
#1109

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?

Yunlong Tang, JunJia Guo, Hang Hua et al.

CVPR 2025posterarXiv:2411.10979
16
citations
#1110

Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models

Ronghuan Wu, Wanchao Su, Jing Liao

CVPR 2025posterarXiv:2411.16602
16
citations
#1111

MonoInstance: Enhancing Monocular Priors via Multi-view Instance Alignment for Neural Rendering and Reconstruction

Wenyuan Zhang, Yixiao Yang, Han Huang et al.

CVPR 2025posterarXiv:2503.18363
16
citations
#1112

Programmable Motion Generation for Open-Set Motion Control Tasks

Hanchao Liu, Xiaohang Zhan, Shaoli Huang et al.

CVPR 2024highlightarXiv:2405.19283
16
citations
#1113

Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding

Wei Suo, Lijun Zhang, Mengyang Sun et al.

CVPR 2025highlightarXiv:2503.00361
16
citations
#1114

Degradation-Aware Feature Perturbation for All-in-One Image Restoration

Xiangpeng Tian, Xiangyu Liao, Xiao Liu et al.

CVPR 2025posterarXiv:2505.12630
16
citations
#1115

Visual Concept Connectome (VCC): Open World Concept Discovery and their Interlayer Connections in Deep Models

Matthew Kowal, Richard P. Wildes, Kosta Derpanis

CVPR 2024highlightarXiv:2404.02233
16
citations
#1116

EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing

Gaoxiang Cong, Jiadong Pan, Liang Li et al.

CVPR 2025highlightarXiv:2412.08988
16
citations
#1117

AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation

Jeongsoo Choi, Se Jin Park, Minsu Kim et al.

CVPR 2024highlightarXiv:2312.02512
16
citations
#1118

Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment

Alireza Ganjdanesh, Shangqian Gao, Heng Huang

CVPR 2024posterarXiv:2403.19490
16
citations
#1119

EnvGS: Modeling View-Dependent Appearance with Environment Gaussian

Tao Xie, Xi Chen, Zhen Xu et al.

CVPR 2025posterarXiv:2412.15215
16
citations
#1120

Commonsense Prototype for Outdoor Unsupervised 3D Object Detection

Hai Wu, Shijia Zhao, Xun Huang et al.

CVPR 2024posterarXiv:2404.16493
16
citations
#1121

Day-Night Cross-domain Vehicle Re-identification

Hongchao Li, Jingong Chen, AIHUA ZHENG et al.

CVPR 2024poster
16
citations
#1122

Interactive3D: Create What You Want by Interactive 3D Generation

Shaocong Dong, Lihe Ding, Zhanpeng Huang et al.

CVPR 2024posterarXiv:2404.16510
16
citations
#1123

Revisiting MAE Pre-training for 3D Medical Image Segmentation

Tassilo Wald, Constantin Ulrich, Stanislav Lukyanenko et al.

CVPR 2025highlightarXiv:2410.23132
16
citations
#1124

Confronting Ambiguity in 6D Object Pose Estimation via Score-Based Diffusion on SE(3)

Tsu-Ching Hsiao, Hao-Wei Chen, Hsuan-Kung Yang et al.

CVPR 2024posterarXiv:2305.15873
16
citations
#1125

Global-Local Tree Search in VLMs for 3D Indoor Scene Generation

Wei Deng, Mengshi Qi, Huadong Ma

CVPR 2025posterarXiv:2503.18476
16
citations
#1126

SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild

Andreas Engelhardt, Amit Raj, Mark Boss et al.

CVPR 2024posterarXiv:2401.10171
16
citations
#1127

UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping

Wenbo Wang, Fangyun Wei, Lei Zhou et al.

CVPR 2025posterarXiv:2412.02699
16
citations
#1128

ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark

Ronghao Dang, Yuqian Yuan, Wenqi Zhang et al.

CVPR 2025posterarXiv:2501.05031
16
citations
#1129

Evidential Active Recognition: Intelligent and Prudent Open-World Embodied Perception

Lei Fan, Mingfu Liang, Yunxuan Li et al.

CVPR 2024posterarXiv:2311.13793
16
citations
#1130

PrEditor3D: Fast and Precise 3D Shape Editing

Ziya Erkoc, Can Gümeli, Chaoyang Wang et al.

CVPR 2025posterarXiv:2412.06592
16
citations
#1131

Make Me a BNN: A Simple Strategy for Estimating Bayesian Uncertainty from Pre-trained Models

Gianni Franchi, Olivier Laurent, Maxence Leguéry et al.

CVPR 2024posterarXiv:2312.15297
16
citations
#1132

SLICE: Stabilized LIME for Consistent Explanations for Image Classification

Revoti Prasad Bora, Kiran Raja, Philipp Terhörst et al.

CVPR 2024highlight
16
citations
#1133

CADTalk: An Algorithm and Benchmark for Semantic Commenting of CAD Programs

Haocheng Yuan, Jing Xu, Hao Pan et al.

CVPR 2024highlightarXiv:2311.16703
16
citations
#1134

Instance-Aware Group Quantization for Vision Transformers

Jaehyeon Moon, Dohyung Kim, Jun Yong Cheon et al.

CVPR 2024posterarXiv:2404.00928
15
citations
#1135

Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge

Dongjin Kim, Sung Jin Um, Sangmin Lee et al.

CVPR 2024posterarXiv:2403.17420
15
citations
#1136

Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval

Yuanmin Tang, Jing Yu, Keke Gai et al.

CVPR 2025posterarXiv:2503.17109
15
citations
#1137

Codebook Transfer with Part-of-Speech for Vector-Quantized Image Modeling

Baoquan Zhang, Huaibin Wang, Luo Chuyao et al.

CVPR 2024posterarXiv:2403.10071
15
citations
#1138

Few-Shot Recognition via Stage-Wise Retrieval-Augmented Finetuning

Tian Liu, Huixin Zhang, Shubham Parashar et al.

CVPR 2025posterarXiv:2406.11148
15
citations
#1139

Hybrid Proposal Refiner: Revisiting DETR Series from the Faster R-CNN Perspective

Jinjing Zhao, Fangyun Wei, Chang Xu

CVPR 2024poster
15
citations
#1140

Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation

Ruicong Liu, Takehiko Ohkawa, Mingfang Zhang et al.

CVPR 2024posterarXiv:2403.04381
15
citations
#1141

FrugalNeRF: Fast Convergence for Extreme Few-shot Novel View Synthesis without Learned Priors

Chin-Yang Lin, Chung-Ho Wu, Changhan Yeh et al.

CVPR 2025posterarXiv:2410.16271
15
citations
#1142

Scaling Properties of Diffusion Models For Perceptual Tasks

Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran et al.

CVPR 2025posterarXiv:2411.08034
15
citations
#1143

Efficient Vision-Language Pre-training by Cluster Masking

Zihao Wei, Zixuan Pan, Andrew Owens

CVPR 2024posterarXiv:2405.08815
15
citations
#1144

ConTex-Human: Free-View Rendering of Human from a Single Image with Texture-Consistent Synthesis

Xiangjun Gao, Xiaoyu Li, Chaopeng Zhang et al.

CVPR 2024posterarXiv:2311.17123
15
citations
#1145

JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration

yunlong lin, Zixu Lin, Haoyu Chen et al.

CVPR 2025posterarXiv:2504.04158
15
citations
#1146

JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups

Simindokht Jahangard, Zhixi Cai, Shiki Wen et al.

CVPR 2024posterarXiv:2404.04458
15
citations
#1147

Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices

Junyan Lin, Haoran Chen, Yue Fan et al.

CVPR 2025posterarXiv:2503.06063
15
citations
#1148

Enhancing Vision-Language Pre-training with Rich Supervisions

Yuan Gao, Kunyu Shi, Pengkai Zhu et al.

CVPR 2024highlightarXiv:2403.03346
15
citations
#1149

ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems

Xiangyuan Xue, Zeyu Lu, Di Huang et al.

CVPR 2025posterarXiv:2409.01392
15
citations
#1150

Cyclic Learning for Binaural Audio Generation and Localization

Zhaojian Li, Bin Zhao, Yuan Yuan

CVPR 2024poster
15
citations
#1151

DiffHuman: Probabilistic Photorealistic 3D Reconstruction of Humans

Akash Sengupta, Thiemo Alldieck, NIKOS KOLOTOUROS et al.

CVPR 2024posterarXiv:2404.00485
15
citations
#1152

SOAC: Spatio-Temporal Overlap-Aware Multi-Sensor Calibration using Neural Radiance Fields

Quentin HERAU, Nathan Piasco, Moussab Bennehar et al.

CVPR 2024posterarXiv:2311.15803
15
citations
#1153

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation

Ziyang Luo, Haoning Wu, Dongxu Li et al.

CVPR 2025posterarXiv:2411.13281
15
citations
#1154

Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observations

Shengeng Tang, Jiayi He, Lechao Cheng et al.

CVPR 2025posterarXiv:2411.16810
15
citations
#1155

FineLIP: Extending CLIP’s Reach via Fine-Grained Alignment with Longer Text Inputs

Mothilal Asokan, Kebin wu, Fatima Albreiki

CVPR 2025posterarXiv:2504.01916
15
citations
#1156

RoboGround: Robotic Manipulation with Grounded Vision-Language Priors

Haifeng Huang, Xinyi Chen, Yilun Chen et al.

CVPR 2025posterarXiv:2504.21530
15
citations
#1157

Adversarial Score Distillation: When score distillation meets GAN

Min Wei, Jingkai Zhou, Junyao Sun et al.

CVPR 2024posterarXiv:2312.00739
15
citations
#1158

Kandinsky Conformal Prediction: Efficient Calibration of Image Segmentation Algorithms

Joren Brunekreef, Eric Marcus, Ray Sheombarsing et al.

CVPR 2024posterarXiv:2311.11837
15
citations
#1159

TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes

Xuying Zhang, Bo-Wen Yin, yuming chen et al.

CVPR 2024posterarXiv:2312.04248
15
citations
#1160

Dynamic Cues-Assisted Transformer for Robust Point Cloud Registration

Hong Chen, Pei Yan, sihe xiang et al.

CVPR 2024highlight
15
citations
#1161

Multiple View Geometry Transformers for 3D Human Pose Estimation

Ziwei Liao, jialiang zhu, Chunyu Wang et al.

CVPR 2024posterarXiv:2311.10983
15
citations
#1162

S2Gaussian: Sparse-View Super-Resolution 3D Gaussian Splatting

Yecong Wan, Mingwen Shao, Yuanshuo Cheng et al.

CVPR 2025posterarXiv:2503.04314
15
citations
#1163

Bidirectional Autoregessive Diffusion Model for Dance Generation

Canyu Zhang, Youbao Tang, NING Zhang et al.

CVPR 2024poster
15
citations
#1164

Predicated Diffusion: Predicate Logic-Based Attention Guidance for Text-to-Image Diffusion Models

Kota Sueyoshi, Takashi Matsubara

CVPR 2024highlightarXiv:2311.16117
15
citations
#1165

FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolution

Junyang Chen, Jinshan Pan, Jiangxin Dong

CVPR 2025posterarXiv:2411.18824
15
citations
#1166

Versatile Medical Image Segmentation Learned from Multi-Source Datasets via Model Self-Disambiguation

Xiaoyang Chen, Hao Zheng, Yuemeng LI et al.

CVPR 2024posterarXiv:2311.10696
15
citations
#1167

Any6D: Model-free 6D Pose Estimation of Novel Object

Taeyeop Lee, Bowen Wen, Minjun Kang et al.

CVPR 2025posterarXiv:2503.18673
15
citations
#1168

What How and When Should Object Detectors Update in Continually Changing Test Domains?

Jayeon Yoo, Dongkwan Lee, Inseop Chung et al.

CVPR 2024posterarXiv:2312.08875
15
citations
#1169

On the Road to Portability: Compressing End-to-End Motion Planner for Autonomous Driving

Kaituo Feng, Changsheng Li, Dongchun Ren et al.

CVPR 2024posterarXiv:2403.01238
15
citations
#1170

Bridging the Synthetic-to-Authentic Gap: Distortion-Guided Unsupervised Domain Adaptation for Blind Image Quality Assessment

Aobo Li, Jinjian Wu, Yongxu Liu et al.

CVPR 2024posterarXiv:2405.04167
15
citations
#1171

Complementing Event Streams and RGB Frames for Hand Mesh Reconstruction

Jianping Jiang, xinyu zhou, Bingxuan Wang et al.

CVPR 2024posterarXiv:2403.07346
15
citations
#1172

Adapters Strike Back

Jan-Martin Steitz, Stefan Roth

CVPR 2024posterarXiv:2406.06820
15
citations
#1173

Dynamic Camera Poses and Where to Find Them

Chris Rockwell, Joseph Tung, Tsung-Yi Lin et al.

CVPR 2025posterarXiv:2504.17788
15
citations
#1174

SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes

Cheng-De Fan, Chen-Wei Chang, Yi-Ruei Liu et al.

CVPR 2025posterarXiv:2410.17249
15
citations
#1175

ConsistDreamer: 3D-Consistent 2D Diffusion for High-Fidelity Scene Editing

Jun-Kun Chen, Samuel Rota Bulò, Norman Müller et al.

CVPR 2024posterarXiv:2406.09404
15
citations
#1176

Progressive Divide-and-Conquer via Subsampling Decomposition for Accelerated MRI

Chong Wang, Lanqing Guo, Yufei Wang et al.

CVPR 2024highlightarXiv:2403.10064
15
citations
#1177

OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd Representation

Ganlong Zhao, Guanbin Li, Weikai Chen et al.

CVPR 2024posterarXiv:2403.17334
15
citations
#1178

ILIAS: Instance-Level Image retrieval At Scale

Giorgos Kordopatis-Zilos, Vladan Stojnić, Anna Manko et al.

CVPR 2025posterarXiv:2502.11748
15
citations
#1179

Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models

Weiwei Cao, Jianpeng Zhang, Yingda Xia et al.

CVPR 2024posterarXiv:2404.04936
15
citations
#1180

Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis

Jiapeng Zhu, Ceyuan Yang, Kecheng Zheng et al.

CVPR 2025posterarXiv:2309.03904
15
citations
#1181

OmniMotionGPT: Animal Motion Generation with Limited Data

Zhangsihao Yang, Mingyuan Zhou, Mengyi Shan et al.

CVPR 2024posterarXiv:2311.18303
15
citations
#1182

FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model

Jun Zhou, Jiahao Li, Zunnan Xu et al.

CVPR 2025posterarXiv:2503.19839
15
citations
#1183

Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D Environments

Liyuan Zhu, Shengyu Huang, Konrad Schindler et al.

CVPR 2024highlightarXiv:2312.09138
15
citations
#1184

Improving Spectral Snapshot Reconstruction with Spectral-Spatial Rectification

Jiancheng Zhang, Haijin Zeng, Yongyong Chen et al.

CVPR 2024poster
15
citations
#1185

The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective

Wenqi Jia, Miao Liu, Hao Jiang et al.

CVPR 2024posterarXiv:2312.12870
15
citations
#1186

MangaNinja: Line Art Colorization with Precise Reference Following

Zhiheng Liu, Ka Leong Cheng, Xi Chen et al.

CVPR 2025highlightarXiv:2501.08332
15
citations
#1187

Scaling Vision Pre-Training to 4K Resolution

Baifeng Shi, Boyi Li, Han Cai et al.

CVPR 2025highlightarXiv:2503.19903
15
citations
#1188

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

Jiuhai Chen, Jianwei Yang, Haiping Wu et al.

CVPR 2025posterarXiv:2412.04424
15
citations
#1189

Tri-Modal Motion Retrieval by Learning a Joint Embedding Space

Kangning Yin, Shihao Zou, Yuxuan Ge et al.

CVPR 2024highlightarXiv:2403.00691
15
citations
#1190

FreeTimeGS: Free Gaussian Primitives at Anytime Anywhere for Dynamic Scene Reconstruction

Yifan Wang, Peishan Yang, Zhen Xu et al.

CVPR 2025poster
15
citations
#1191

A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing

Li Maomao, Yu Li, Tianyu Yang et al.

CVPR 2024posterarXiv:2312.05856
15
citations
#1192

Breaking the Low-Rank Dilemma of Linear Attention

Qihang Fan, Huaibo Huang, Ran He

CVPR 2025posterarXiv:2411.07635
15
citations
#1193

GenN2N: Generative NeRF2NeRF Translation

Xiangyue Liu, Han Xue, Kunming Luo et al.

CVPR 2024posterarXiv:2404.02788
15
citations
#1194

A Noisy Elephant in the Room: Is Your Out-of-Distribution Detector Robust to Label Noise?

Galadrielle Humblot-Renaux, Sergio Escalera, Thomas B. Moeslund

CVPR 2024posterarXiv:2404.01775
15
citations
#1195

HiLo: Detailed and Robust 3D Clothed Human Reconstruction with High-and Low-Frequency Information of Parametric Models

Yifan Yang, Dong Liu, Shuhai Zhang et al.

CVPR 2024posterarXiv:2404.04876
15
citations
#1196

PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models

Dhouib Mohamed, Davide Buscaldi, Vanier Sonia et al.

CVPR 2025posterarXiv:2504.08966
15
citations
#1197

Customization Assistant for Text-to-Image Generation

Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu et al.

CVPR 2024posterarXiv:2312.03045
15
citations
#1198

SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving

Xuesong Chen, Linjiang Huang, Tao Ma et al.

CVPR 2025posterarXiv:2505.16805
15
citations
#1199

NeRF Director: Revisiting View Selection in Neural Volume Rendering

Wenhui Xiao, Rodrigo Santa Cruz, David Ahmedt-Aristizabal et al.

CVPR 2024posterarXiv:2406.08839
15
citations
#1200

IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera

Jian Huang, Chengrui Dong, Xuanhua Chen et al.

CVPR 2025highlightarXiv:2410.08107
15
citations