Most Cited CVPR "demographic group misclassification" Papers

5,589 papers found • Page 6 of 28

#1001

SfmCAD: Unsupervised CAD Reconstruction by Learning Sketch-based Feature Modeling Operations

Pu Li, Jianwei Guo, HUIBIN LI et al.

CVPR 2024poster
17
citations
#1002

Understanding Video Transformers via Universal Concept Discovery

Matthew Kowal, Achal Dave, Rares Andrei Ambrus et al.

CVPR 2024highlightarXiv:2401.10831
17
citations
#1003

Mimir: Improving Video Diffusion Models for Precise Text Understanding

Shuai Tan, Biao Gong, Yutong Feng et al.

CVPR 2025posterarXiv:2412.03085
16
citations
#1004

Instant3dit: Multiview Inpainting for Fast Editing of 3D Objects

Amir Barda, Matheus Gadelha, Vladimir G. Kim et al.

CVPR 2025posterarXiv:2412.00518
16
citations
#1005

EnvGS: Modeling View-Dependent Appearance with Environment Gaussian

Tao Xie, Xi Chen, Zhen Xu et al.

CVPR 2025posterarXiv:2412.15215
16
citations
#1006

GenFusion: Closing the Loop between Reconstruction and Generation via Videos

Sibo Wu, Congrong Xu, Binbin Huang et al.

CVPR 2025posterarXiv:2503.21219
16
citations
#1007

Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models

Ronghuan Wu, Wanchao Su, Jing Liao

CVPR 2025posterarXiv:2411.16602
16
citations
#1008

Degradation-Aware Feature Perturbation for All-in-One Image Restoration

Xiangpeng Tian, Xiangyu Liao, Xiao Liu et al.

CVPR 2025posterarXiv:2505.12630
16
citations
#1009

Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera

Yuliang Guo, Sparsh Garg, S. Mahdi H. Miangoleh et al.

CVPR 2025posterarXiv:2501.02464
16
citations
#1010

Video Prediction by Modeling Videos as Continuous Multi-Dimensional Processes

Gaurav Shrivastava, Abhinav Shrivastava

CVPR 2024poster
16
citations
#1011

Global-Local Tree Search in VLMs for 3D Indoor Scene Generation

Wei Deng, Mengshi Qi, Huadong Ma

CVPR 2025posterarXiv:2503.18476
16
citations
#1012

CADTalk: An Algorithm and Benchmark for Semantic Commenting of CAD Programs

Haocheng Yuan, Jing Xu, Hao Pan et al.

CVPR 2024highlightarXiv:2311.16703
16
citations
#1013

Evidential Active Recognition: Intelligent and Prudent Open-World Embodied Perception

Lei Fan, Mingfu Liang, Yunxuan Li et al.

CVPR 2024posterarXiv:2311.13793
16
citations
#1014

SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild

Andreas Engelhardt, Amit Raj, Mark Boss et al.

CVPR 2024posterarXiv:2401.10171
16
citations
#1015

ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object

Zhe Shan, Yang Liu, Lei Zhou et al.

CVPR 2025posterarXiv:2503.12006
16
citations
#1016

CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset

Xiao Wang, Fuling Wang, Yuehang Li et al.

CVPR 2025posterarXiv:2410.00379
16
citations
#1017

Frozen Feature Augmentation for Few-Shot Image Classification

Andreas Bär, Neil Houlsby, Mostafa Dehghani et al.

CVPR 2024posterarXiv:2403.10519
16
citations
#1018

UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior

I-Hsiang Chen, Wei-Ting Chen, Yu-Wei Liu et al.

CVPR 2025highlightarXiv:2501.13134
16
citations
#1019

Diversified and Personalized Multi-rater Medical Image Segmentation

Yicheng Wu, Xiangde Luo, Zhe Xu et al.

CVPR 2024highlightarXiv:2403.13417
16
citations
#1020

ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark

Ronghao Dang, Yuqian Yuan, Wenqi Zhang et al.

CVPR 2025posterarXiv:2501.05031
16
citations
#1021

C^2RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction

Yiqun Lin, Jiewen Yang, hualiang wang et al.

CVPR 2024posterarXiv:2406.03902
16
citations
#1022

Object Pose Estimation via the Aggregation of Diffusion Features

Tianfu Wang, Guosheng Hu, Hongguang Wang

CVPR 2024highlightarXiv:2403.18791
16
citations
#1023

DashGaussian: Optimizing 3D Gaussian Splatting in 200 Seconds

Youyu Chen, Junjun Jiang, Kui Jiang et al.

CVPR 2025highlightarXiv:2503.18402
16
citations
#1024

MLLM-as-a-Judge for Image Safety without Human Labeling

Zhenting Wang, Shuming Hu, Shiyu Zhao et al.

CVPR 2025highlightarXiv:2501.00192
16
citations
#1025

Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses

Inhee Lee, Byungjun Kim, Hanbyul Joo

CVPR 2024posterarXiv:2404.14410
16
citations
#1026

Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer

Junyi Wu, Bin Duan, Weitai Kang et al.

CVPR 2024posterarXiv:2403.14552
16
citations
#1027

Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction

Shiyu Zhao, Zhenting Wang, Felix Juefei-Xu et al.

CVPR 2025posterarXiv:2412.00556
16
citations
#1028

Revisiting MAE Pre-training for 3D Medical Image Segmentation

Tassilo Wald, Constantin Ulrich, Stanislav Lukyanenko et al.

CVPR 2025highlightarXiv:2410.23132
16
citations
#1029

DART: Implicit Doppler Tomography for Radar Novel View Synthesis

Tianshu Huang, John Miller, Akarsh Prabhakara et al.

CVPR 2024posterarXiv:2403.03896
16
citations
#1030

EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing

Gaoxiang Cong, Jiadong Pan, Liang Li et al.

CVPR 2025highlightarXiv:2412.08988
16
citations
#1031

LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model

Dongkai Wang, shiyu xuan, Shiliang Zhang

CVPR 2024highlightarXiv:2406.04659
16
citations
#1032

MaGGIe: Masked Guided Gradual Human Instance Matting

Chuong Huynh, Seoung Wug Oh, Abhinav Shrivastava et al.

CVPR 2024posterarXiv:2404.16035
16
citations
#1033

Adaptive Rectangular Convolution for Remote Sensing Pansharpening

Xueyang Wang, Zhixin Zheng, Jiandong Shao et al.

CVPR 2025posterarXiv:2503.00467
16
citations
#1034

PrEditor3D: Fast and Precise 3D Shape Editing

Ziya Erkoc, Can Gümeli, Chaoyang Wang et al.

CVPR 2025posterarXiv:2412.06592
16
citations
#1035

MagicArticulate: Make Your 3D Models Articulation-Ready

Chaoyue Song, Jianfeng Zhang, Xiu Li et al.

CVPR 2025posterarXiv:2502.12135
16
citations
#1036

Spiking Transformer with Spatial-Temporal Attention

Donghyun Lee, Yuhang Li, Youngeun Kim et al.

CVPR 2025posterarXiv:2409.19764
16
citations
#1037

DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery

Yixuan Zhu, Ao Li, Yansong Tang et al.

CVPR 2024posterarXiv:2404.01424
16
citations
#1038

KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling

Yu Wang, Xin Li, Shengzhao Wen et al.

CVPR 2024posterarXiv:2211.08071
16
citations
#1039

Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation

Nicolas Dufour, Vicky Kalogeiton, David Picard et al.

CVPR 2025posterarXiv:2412.06781
16
citations
#1040

Commonsense Prototype for Outdoor Unsupervised 3D Object Detection

Hai Wu, Shijia Zhao, Xun Huang et al.

CVPR 2024posterarXiv:2404.16493
16
citations
#1041

Efficiently Assemble Normalization Layers and Regularization for Federated Domain Generalization

Khiem Le, Tuan Long Ho, Cuong Do et al.

CVPR 2024posterarXiv:2403.15605
16
citations
#1042

Interactive3D: Create What You Want by Interactive 3D Generation

Shaocong Dong, Lihe Ding, Zhanpeng Huang et al.

CVPR 2024posterarXiv:2404.16510
16
citations
#1043

ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting

Chen Duan, Pei Fu, Shan Guo et al.

CVPR 2024posterarXiv:2403.00303
16
citations
#1044

Iterated Learning Improves Compositionality in Large Vision-Language Models

Chenhao Zheng, Jieyu Zhang, Aniruddha Kembhavi et al.

CVPR 2024posterarXiv:2404.02145
16
citations
#1045

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?

Yunlong Tang, JunJia Guo, Hang Hua et al.

CVPR 2025posterarXiv:2411.10979
16
citations
#1046

Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition

Hongda Liu, Yunfan Liu, Min Ren et al.

CVPR 2025highlightarXiv:2411.18941
16
citations
#1047

Day-Night Cross-domain Vehicle Re-identification

Hongchao Li, Jingong Chen, AIHUA ZHENG et al.

CVPR 2024poster
16
citations
#1048

DreamOmni: Unified Image Generation and Editing

Bin Xia, Yuechen Zhang, Jingyao Li et al.

CVPR 2025posterarXiv:2412.17098
16
citations
#1049

Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption

Du CHEN, Tianhe Wu, Kede Ma et al.

CVPR 2025posterarXiv:2503.11221
16
citations
#1050

Inverse Rendering of Glossy Objects via the Neural Plenoptic Function and Radiance Fields

Haoyuan Wang, Wenbo Hu, Lei Zhu et al.

CVPR 2024posterarXiv:2403.16224
16
citations
#1051

Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling

Junha Hyung, Kinam Kim, Susung Hong et al.

CVPR 2025posterarXiv:2411.18664
16
citations
#1052

Align Before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition

Yifei Chen, Dapeng Chen, Ruijin Liu et al.

CVPR 2024posterarXiv:2311.15619
16
citations
#1053

MonoInstance: Enhancing Monocular Priors via Multi-view Instance Alignment for Neural Rendering and Reconstruction

Wenyuan Zhang, Yixiao Yang, Han Huang et al.

CVPR 2025posterarXiv:2503.18363
16
citations
#1054

AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation

Jeongsoo Choi, Se Jin Park, Minsu Kim et al.

CVPR 2024highlightarXiv:2312.02512
16
citations
#1055

Programmable Motion Generation for Open-Set Motion Control Tasks

Hanchao Liu, Xiaohang Zhan, Shaoli Huang et al.

CVPR 2024highlightarXiv:2405.19283
16
citations
#1056

Visual Concept Connectome (VCC): Open World Concept Discovery and their Interlayer Connections in Deep Models

Matthew Kowal, Richard P. Wildes, Kosta Derpanis

CVPR 2024highlightarXiv:2404.02233
16
citations
#1057

HiLo: Detailed and Robust 3D Clothed Human Reconstruction with High-and Low-Frequency Information of Parametric Models

Yifan Yang, Dong Liu, Shuhai Zhang et al.

CVPR 2024posterarXiv:2404.04876
15
citations
#1058

NeRF Director: Revisiting View Selection in Neural Volume Rendering

Wenhui Xiao, Rodrigo Santa Cruz, David Ahmedt-Aristizabal et al.

CVPR 2024posterarXiv:2406.08839
15
citations
#1059

Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices

Junyan Lin, Haoran Chen, Yue Fan et al.

CVPR 2025posterarXiv:2503.06063
15
citations
#1060

OmniMotionGPT: Animal Motion Generation with Limited Data

Zhangsihao Yang, Mingyuan Zhou, Mengyi Shan et al.

CVPR 2024posterarXiv:2311.18303
15
citations
#1061

FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolution

Junyang Chen, Jinshan Pan, Jiangxin Dong

CVPR 2025posterarXiv:2411.18824
15
citations
#1062

ILIAS: Instance-Level Image retrieval At Scale

Giorgos Kordopatis-Zilos, Vladan Stojnić, Anna Manko et al.

CVPR 2025posterarXiv:2502.11748
15
citations
#1063

SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving

Xuesong Chen, Linjiang Huang, Tao Ma et al.

CVPR 2025posterarXiv:2505.16805
15
citations
#1064

Improving Spectral Snapshot Reconstruction with Spectral-Spatial Rectification

Jiancheng Zhang, Haijin Zeng, Yongyong Chen et al.

CVPR 2024poster
15
citations
#1065

Make Me a BNN: A Simple Strategy for Estimating Bayesian Uncertainty from Pre-trained Models

Gianni Franchi, Olivier Laurent, Maxence Leguéry et al.

CVPR 2024posterarXiv:2312.15297
15
citations
#1066

UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping

Wenbo Wang, Fangyun Wei, Lei Zhou et al.

CVPR 2025posterarXiv:2412.02699
15
citations
#1067

A Noisy Elephant in the Room: Is Your Out-of-Distribution Detector Robust to Label Noise?

Galadrielle Humblot-Renaux, Sergio Escalera, Thomas B. Moeslund

CVPR 2024posterarXiv:2404.01775
15
citations
#1068

TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes

Xuying Zhang, Bo-Wen Yin, yuming chen et al.

CVPR 2024posterarXiv:2312.04248
15
citations
#1069

ConsistDreamer: 3D-Consistent 2D Diffusion for High-Fidelity Scene Editing

Jun-Kun Chen, Samuel Rota Bulò, Norman Müller et al.

CVPR 2024posterarXiv:2406.09404
15
citations
#1070

FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model

Jun Zhou, Jiahao Li, Zunnan Xu et al.

CVPR 2025posterarXiv:2503.19839
15
citations
#1071

What How and When Should Object Detectors Update in Continually Changing Test Domains?

Jayeon Yoo, Dongkwan Lee, Inseop Chung et al.

CVPR 2024posterarXiv:2312.08875
15
citations
#1072

OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd Representation

Ganlong Zhao, Guanbin Li, Weikai Chen et al.

CVPR 2024posterarXiv:2403.17334
15
citations
#1073

Instance-Aware Group Quantization for Vision Transformers

Jaehyeon Moon, Dohyung Kim, Jun Yong Cheon et al.

CVPR 2024posterarXiv:2404.00928
15
citations
#1074

Breaking the Low-Rank Dilemma of Linear Attention

Qihang Fan, Huaibo Huang, Ran He

CVPR 2025posterarXiv:2411.07635
15
citations
#1075

Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches

Qing Yu, Mikihiro Tanaka, Kent Fujiwara

CVPR 2024posterarXiv:2405.04771
15
citations
#1076

A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing

Li Maomao, Yu Li, Tianyu Yang et al.

CVPR 2024posterarXiv:2312.05856
15
citations
#1077

Gaussian Shadow Casting for Neural Characters

Luis Bolanos, Shih-Yang Su, Helge Rhodin

CVPR 2024posterarXiv:2401.06116
15
citations
#1078

Real-Time Simulated Avatar from Head-Mounted Sensors

Zhengyi Luo, Jinkun Cao, Rawal Khirodkar et al.

CVPR 2024highlightarXiv:2403.06862
15
citations
#1079

Codebook Transfer with Part-of-Speech for Vector-Quantized Image Modeling

Baoquan Zhang, Huaibin Wang, Luo Chuyao et al.

CVPR 2024posterarXiv:2403.10071
15
citations
#1080

GenesisTex: Adapting Image Denoising Diffusion to Texture Space

Chenjian Gao, Boyan Jiang, Xinghui Li et al.

CVPR 2024posterarXiv:2403.17782
15
citations
#1081

SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes

Cheng-De Fan, Chen-Wei Chang, Yi-Ruei Liu et al.

CVPR 2025posterarXiv:2410.17249
15
citations
#1082

FrugalNeRF: Fast Convergence for Extreme Few-shot Novel View Synthesis without Learned Priors

Chin-Yang Lin, Chung-Ho Wu, Changhan Yeh et al.

CVPR 2025posterarXiv:2410.16271
15
citations
#1083

Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation

Ruicong Liu, Takehiko Ohkawa, Mingfang Zhang et al.

CVPR 2024posterarXiv:2403.04381
15
citations
#1084

JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration

yunlong lin, Zixu Lin, Haoyu Chen et al.

CVPR 2025posterarXiv:2504.04158
15
citations
#1085

Few-Shot Recognition via Stage-Wise Retrieval-Augmented Finetuning

Tian Liu, Huixin Zhang, Shubham Parashar et al.

CVPR 2025posterarXiv:2406.11148
15
citations
#1086

The Scene Language: Representing Scenes with Programs, Words, and Embeddings

Yunzhi Zhang, Zizhang Li, Matt Zhou et al.

CVPR 2025highlightarXiv:2410.16770
15
citations
#1087

Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observations

Shengeng Tang, Jiayi He, Lechao Cheng et al.

CVPR 2025posterarXiv:2411.16810
15
citations
#1088

Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding

Wei Suo, Lijun Zhang, Mengyang Sun et al.

CVPR 2025highlightarXiv:2503.00361
15
citations
#1089

Enhancing Vision-Language Pre-training with Rich Supervisions

Yuan Gao, Kunyu Shi, Pengkai Zhu et al.

CVPR 2024highlightarXiv:2403.03346
15
citations
#1090

RoboGround: Robotic Manipulation with Grounded Vision-Language Priors

Haifeng Huang, Xinyi Chen, Yilun Chen et al.

CVPR 2025posterarXiv:2504.21530
15
citations
#1091

MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects

Lei Fan, Dongdong Fan, Zhiguang Hu et al.

CVPR 2025posterarXiv:2412.04867
15
citations
#1092

Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge

Dongjin Kim, Sung Jin Um, Sangmin Lee et al.

CVPR 2024posterarXiv:2403.17420
15
citations
#1093

Efficient Vision-Language Pre-training by Cluster Masking

Zihao Wei, Zixuan Pan, Andrew Owens

CVPR 2024posterarXiv:2405.08815
15
citations
#1094

Adversarial Score Distillation: When score distillation meets GAN

Min Wei, Jingkai Zhou, Junyao Sun et al.

CVPR 2024posterarXiv:2312.00739
15
citations
#1095

MangaNinja: Line Art Colorization with Precise Reference Following

Zhiheng Liu, Ka Leong Cheng, Xi Chen et al.

CVPR 2025highlightarXiv:2501.08332
15
citations
#1096

ConTex-Human: Free-View Rendering of Human from a Single Image with Texture-Consistent Synthesis

Xiangjun Gao, Xiaoyu Li, Chaopeng Zhang et al.

CVPR 2024posterarXiv:2311.17123
15
citations
#1097

Cyclic Learning for Binaural Audio Generation and Localization

Zhaojian Li, Bin Zhao, Yuan Yuan

CVPR 2024poster
15
citations
#1098

Hybrid Proposal Refiner: Revisiting DETR Series from the Faster R-CNN Perspective

Jinjing Zhao, Fangyun Wei, Chang Xu

CVPR 2024poster
15
citations
#1099

DiffHuman: Probabilistic Photorealistic 3D Reconstruction of Humans

Akash Sengupta, Thiemo Alldieck, NIKOS KOLOTOUROS et al.

CVPR 2024posterarXiv:2404.00485
15
citations
#1100

Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval

Yuanmin Tang, Jue Zhang, Xiaoting Qin et al.

CVPR 2025highlightarXiv:2412.11077
15
citations
#1101

Any6D: Model-free 6D Pose Estimation of Novel Object

Taeyeop Lee, Bowen Wen, Minjun Kang et al.

CVPR 2025posterarXiv:2503.18673
15
citations
#1102

ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems

Xiangyuan Xue, Zeyu Lu, Di Huang et al.

CVPR 2025posterarXiv:2409.01392
15
citations
#1103

Scaling Properties of Diffusion Models For Perceptual Tasks

Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran et al.

CVPR 2025posterarXiv:2411.08034
15
citations
#1104

Versatile Medical Image Segmentation Learned from Multi-Source Datasets via Model Self-Disambiguation

Xiaoyang Chen, Hao Zheng, Yuemeng LI et al.

CVPR 2024posterarXiv:2311.10696
15
citations
#1105

IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera

Jian Huang, Chengrui Dong, Xuanhua Chen et al.

CVPR 2025highlightarXiv:2410.08107
15
citations
#1106

SOAC: Spatio-Temporal Overlap-Aware Multi-Sensor Calibration using Neural Radiance Fields

Quentin HERAU, Nathan Piasco, Moussab Bennehar et al.

CVPR 2024posterarXiv:2311.15803
15
citations
#1107

Multiple View Geometry Transformers for 3D Human Pose Estimation

Ziwei Liao, jialiang zhu, Chunyu Wang et al.

CVPR 2024posterarXiv:2311.10983
15
citations
#1108

S2Gaussian: Sparse-View Super-Resolution 3D Gaussian Splatting

Yecong Wan, Mingwen Shao, Yuanshuo Cheng et al.

CVPR 2025posterarXiv:2503.04314
15
citations
#1109

Bidirectional Autoregessive Diffusion Model for Dance Generation

Canyu Zhang, Youbao Tang, NING Zhang et al.

CVPR 2024poster
15
citations
#1110

Predicated Diffusion: Predicate Logic-Based Attention Guidance for Text-to-Image Diffusion Models

Kota Sueyoshi, Takashi Matsubara

CVPR 2024highlightarXiv:2311.16117
15
citations
#1111

The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective

Wenqi Jia, Miao Liu, Hao Jiang et al.

CVPR 2024posterarXiv:2312.12870
15
citations
#1112

Adapters Strike Back

Jan-Martin Steitz, Stefan Roth

CVPR 2024posterarXiv:2406.06820
15
citations
#1113

Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment

Alireza Ganjdanesh, Shangqian Gao, Heng Huang

CVPR 2024posterarXiv:2403.19490
15
citations
#1114

One-Shot Structure-Aware Stylized Image Synthesis

Hansam Cho, Jonghyun Lee, Seunggyu Chang et al.

CVPR 2024posterarXiv:2402.17275
15
citations
#1115

ScanFormer: Referring Expression Comprehension by Iteratively Scanning

Wei Su, Peihan Miao, Huanzhang Dou et al.

CVPR 2024posterarXiv:2406.18048
15
citations
#1116

Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis

Jiapeng Zhu, Ceyuan Yang, Kecheng Zheng et al.

CVPR 2025posterarXiv:2309.03904
15
citations
#1117

Progressive Divide-and-Conquer via Subsampling Decomposition for Accelerated MRI

Chong Wang, Lanqing Guo, Yufei Wang et al.

CVPR 2024highlightarXiv:2403.10064
15
citations
#1118

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

Jiuhai Chen, Jianwei Yang, Haiping Wu et al.

CVPR 2025posterarXiv:2412.04424
15
citations
#1119

FreeTimeGS: Free Gaussian Primitives at Anytime Anywhere for Dynamic Scene Reconstruction

Yifan Wang, Peishan Yang, Zhen Xu et al.

CVPR 2025poster
15
citations
#1120

Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D Environments

Liyuan Zhu, Shengyu Huang, Konrad Schindler et al.

CVPR 2024highlightarXiv:2312.09138
15
citations
#1121

Bridging the Synthetic-to-Authentic Gap: Distortion-Guided Unsupervised Domain Adaptation for Blind Image Quality Assessment

Aobo Li, Jinjian Wu, Yongxu Liu et al.

CVPR 2024posterarXiv:2405.04167
15
citations
#1122

Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation

Xiyi Chen, Marko Mihajlovic, Shaofei Wang et al.

CVPR 2024posterarXiv:2401.04728
15
citations
#1123

LSNet: See Large, Focus Small

Ao Wang, Hui Chen, Zijia Lin et al.

CVPR 2025posterarXiv:2503.23135
15
citations
#1124

Kandinsky Conformal Prediction: Efficient Calibration of Image Segmentation Algorithms

Joren Brunekreef, Eric Marcus, Ray Sheombarsing et al.

CVPR 2024posterarXiv:2311.11837
15
citations
#1125

PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models

Dhouib Mohamed, Davide Buscaldi, Vanier Sonia et al.

CVPR 2025posterarXiv:2504.08966
15
citations
#1126

Dynamic Camera Poses and Where to Find Them

Chris Rockwell, Joseph Tung, Tsung-Yi Lin et al.

CVPR 2025posterarXiv:2504.17788
15
citations
#1127

X-Dyna: Expressive Dynamic Human Image Animation

Di Chang, Hongyi Xu, You Xie et al.

CVPR 2025highlightarXiv:2501.10021
14
citations
#1128

Active Data Curation Effectively Distills Large-Scale Multimodal Models

Vishaal Udandarao, Nikhil Parthasarathy, Muhammad Ferjad Naeem et al.

CVPR 2025posterarXiv:2411.18674
14
citations
#1129

MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation

Weijia Wu, Mingyu Liu, Zeyu Zhu et al.

CVPR 2025posterarXiv:2411.15262
14
citations
#1130

Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models

Weiwei Cao, Jianpeng Zhang, Yingda Xia et al.

CVPR 2024posterarXiv:2404.04936
14
citations
#1131

SURE: SUrvey REcipes for building reliable and robust deep networks

Yuting Li, Yingyi Chen, Xuanlong Yu et al.

CVPR 2024posterarXiv:2403.00543
14
citations
#1132

GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities

Rao Fu, Dingxi Zhang, Alex Jiang et al.

CVPR 2025highlightarXiv:2412.04244
14
citations
#1133

Towards Universal Soccer Video Understanding

Jiayuan Rao, Haoning Wu, Hao Jiang et al.

CVPR 2025posterarXiv:2412.01820
14
citations
#1134

DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling

Xin Xie, Dong Gong

CVPR 2025posterarXiv:2412.00759
14
citations
#1135

TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation

Xiaopei Wu, Yuenan Hou, Xiaoshui Huang et al.

CVPR 2024posterarXiv:2407.09751
14
citations
#1136

ReCoRe: Regularized Contrastive Representation Learning of World Model

Rudra P, K. Poudel, Harit Pandya et al.

CVPR 2024posterarXiv:2312.09056
14
citations
#1137

Reversible Decoupling Network for Single Image Reflection Removal

Hao Zhao, Mingjia Li, Qiming Hu et al.

CVPR 2025posterarXiv:2410.08063
14
citations
#1138

ProTeCt: Prompt Tuning for Taxonomic Open Set Classification

Tz-Ying Wu, Chih-Hui Ho, Nuno Vasconcelos

CVPR 2024posterarXiv:2306.02240
14
citations
#1139

Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks

Haijin Zeng, Xiangming Wang, Yongyong Chen et al.

CVPR 2025posterarXiv:2503.16930
14
citations
#1140

FineLIP: Extending CLIP’s Reach via Fine-Grained Alignment with Longer Text Inputs

Mothilal Asokan, Kebin wu, Fatima Albreiki

CVPR 2025posterarXiv:2504.01916
14
citations
#1141

JamMa: Ultra-lightweight Local Feature Matching with Joint Mamba

Xiaoyong Lu, Songlin Du

CVPR 2025posterarXiv:2503.03437
14
citations
#1142

SAIST: Segment Any Infrared Small Target Model Guided by Contrastive Language-Image Pretraining

Mingjin Zhang, Xiaolong Li, Fei Gao et al.

CVPR 2025poster
14
citations
#1143

Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos

Chiara Plizzari, Alessio Tonioni, Yongqin Xian et al.

CVPR 2025posterarXiv:2503.13646
14
citations
#1144

Unifying Automatic and Interactive Matting with Pretrained ViTs

Zixuan Ye, Wenze Liu, He Guo et al.

CVPR 2024poster
14
citations
#1145

MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research

James Burgess, Jeffrey J Nirschl, Laura Bravo-Sánchez et al.

CVPR 2025posterarXiv:2503.13399
14
citations
#1146

En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data

Yifang Men, Biwen Lei, Yuan Yao et al.

CVPR 2024posterarXiv:2401.01173
14
citations
#1147

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation

Ziyang Luo, Haoning Wu, Dongxu Li et al.

CVPR 2025posterarXiv:2411.13281
14
citations
#1148

Customization Assistant for Text-to-Image Generation

Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu et al.

CVPR 2024posterarXiv:2312.03045
14
citations
#1149

Robust Distillation via Untargeted and Targeted Intermediate Adversarial Samples

Junhao Dong, Piotr Koniusz, Junxi Chen et al.

CVPR 2024poster
14
citations
#1150

DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction

Junwen Xiong, Peng Zhang, Tao You et al.

CVPR 2024posterarXiv:2403.01226
14
citations
#1151

NeRF Analogies: Example-Based Visual Attribute Transfer for NeRFs

Michael Fischer, Zhengqin Li, Thu Nguyen-Phuoc et al.

CVPR 2024posterarXiv:2402.08622
14
citations
#1152

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation

Yunhao Ge, Yihe Tang, Jiashu Xu et al.

CVPR 2024highlightarXiv:2405.09546
14
citations
#1153

Binarized Low-light Raw Video Enhancement

Gengchen Zhang, Yulun Zhang, Xin Yuan et al.

CVPR 2024posterarXiv:2403.19944
14
citations
#1154

CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images

Chen Cheng, Jiacheng Wei, Tianrun Chen et al.

CVPR 2025posterarXiv:2504.04753
14
citations
#1155

SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models

Zilan Wang, Junfeng Guo, Jiacheng Zhu et al.

CVPR 2025posterarXiv:2412.04852
14
citations
#1156

Estimating Noisy Class Posterior with Part-level Labels for Noisy Label Learning

Rui Zhao, Bin Shi, Jianfei Ruan et al.

CVPR 2024posterarXiv:2405.05714
14
citations
#1157

Tri-Modal Motion Retrieval by Learning a Joint Embedding Space

Kangning Yin, Shihao Zou, Yuxuan Ge et al.

CVPR 2024highlightarXiv:2403.00691
14
citations
#1158

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption

Tiehan Fan, Kepan Nan, Rui Xie et al.

CVPR 2025posterarXiv:2412.09283
14
citations
#1159

Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval

Yuanmin Tang, Jing Yu, Keke Gai et al.

CVPR 2025posterarXiv:2503.17109
14
citations
#1160

In2SET: Intra-Inter Similarity Exploiting Transformer for Dual-Camera Compressive Hyperspectral Imaging

Xin Wang, Lizhi Wang, Xiangtian Ma et al.

CVPR 2024posterarXiv:2312.13319
14
citations
#1161

Hyperbolic Learning with Synthetic Captions for Open-World Detection

Fanjie Kong, Yanbei Chen, Jiarui Cai et al.

CVPR 2024posterarXiv:2404.05016
14
citations
#1162

Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset

Yiqun Mei, Mingming He, Li Ma et al.

CVPR 2025posterarXiv:2503.14485
14
citations
#1163

DPU: Dynamic Prototype Updating for Multimodal Out-of-Distribution Detection

Li Li, Huixian Gong, Hao Dong et al.

CVPR 2025highlightarXiv:2411.08227
14
citations
#1164

HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions

Hao Xu, Li Haipeng, Yinqiao Wang et al.

CVPR 2024posterarXiv:2403.18575
14
citations
#1165

NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training

Dar-Yen Chen, Hmrishav Bandyopadhyay, Kai Zou et al.

CVPR 2025posterarXiv:2412.02030
14
citations
#1166

Quantization without Tears

Minghao Fu, Hao Yu, Jie Shao et al.

CVPR 2025posterarXiv:2411.13918
14
citations
#1167

Pippo: High-Resolution Multi-View Humans from a Single Image

Yash Kant, Ethan Weber, Jin Kyu Kim et al.

CVPR 2025highlightarXiv:2502.07785
14
citations
#1168

Adversarial Backdoor Attack by Naturalistic Data Poisoning on Trajectory Prediction in Autonomous Driving

Mozhgan Pourkeshavarz, Mohammad Sabokrou, Amir Rasouli

CVPR 2024posterarXiv:2306.15755
14
citations
#1169

Assessing and Learning Alignment of Unimodal Vision and Language Models

Le Zhang, Qian Yang, Aishwarya Agrawal

CVPR 2025highlightarXiv:2412.04616
14
citations
#1170

AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis

Khiem Vuong, Anurag Ghosh, Deva Ramanan et al.

CVPR 2025posterarXiv:2504.13157
14
citations
#1171

Region-Based Representations Revisited

Michal Shlapentokh-Rothman, Ansel Blume, Yao Xiao et al.

CVPR 2024posterarXiv:2402.02352
14
citations
#1172

UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing

Yiheng Li, RuiBing Hou, Hong Chang et al.

CVPR 2025highlightarXiv:2411.16781
14
citations
#1173

Bridging Modalities: Improving Universal Multimodal Retrieval by Multimodal Large Language Models

Xin Zhang, Yanzhao Zhang, Wen Xie et al.

CVPR 2025poster
14
citations
#1174

Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues

Youngjoon Jang, Haran Raajesh, Liliane Momeni et al.

CVPR 2025posterarXiv:2501.09754
14
citations
#1175

IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation

Yiren Song, Pei Yang, Hai Ci et al.

CVPR 2025posterarXiv:2412.11638
14
citations
#1176

Slice3D: Multi-Slice Occlusion-Revealing Single View 3D Reconstruction

Yizhi Wang, Wallace Lira, Wenqi Wang et al.

CVPR 2024poster
14
citations
#1177

Retrieving Semantics from the Deep: an RAG Solution for Gesture Synthesis

M. Hamza Mughal, Rishabh Dabral, Merel CJ Scholman et al.

CVPR 2025posterarXiv:2412.06786
14
citations
#1178

GenN2N: Generative NeRF2NeRF Translation

Xiangyue Liu, Han Xue, Kunming Luo et al.

CVPR 2024posterarXiv:2404.02788
14
citations
#1179

MambaIC: State Space Models for High-Performance Learned Image Compression

Fanhu Zeng, Hao Tang, Yihua Shao et al.

CVPR 2025posterarXiv:2503.12461
14
citations
#1180

HEAL-SWIN: A Vision Transformer On The Sphere

Oscar Carlsson, Jan E. Gerken, Hampus Linander et al.

CVPR 2024posterarXiv:2307.07313
14
citations
#1181

Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video

David Yifan Yao, Albert J. Zhai, Shenlong Wang

CVPR 2025highlightarXiv:2503.21761
14
citations
#1182

Complementing Event Streams and RGB Frames for Hand Mesh Reconstruction

Jianping Jiang, xinyu zhou, Bingxuan Wang et al.

CVPR 2024posterarXiv:2403.07346
14
citations
#1183

Learning Intra-view and Cross-view Geometric Knowledge for Stereo Matching

Rui Gong, Weide Liu, ZAIWANG GU et al.

CVPR 2024posterarXiv:2402.19270
14
citations
#1184

On the Road to Portability: Compressing End-to-End Motion Planner for Autonomous Driving

Kaituo Feng, Changsheng Li, Dongchun Ren et al.

CVPR 2024posterarXiv:2403.01238
14
citations
#1185

DiffSCI: Zero-Shot Snapshot Compressive Imaging via Iterative Spectral Diffusion Model

Zhenghao Pan, Haijin Zeng, Jiezhang Cao et al.

CVPR 2024posterarXiv:2311.11417
14
citations
#1186

Docopilot: Improving Multimodal Models for Document-Level Understanding

Yuchen Duan, Zhe Chen, Yusong Hu et al.

CVPR 2025posterarXiv:2507.14675
14
citations
#1187

What You See is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs

Alex Trevithick, Matthew Chan, Towaki Takikawa et al.

CVPR 2024posterarXiv:2401.02411
14
citations
#1188

DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes

Jinxiu Liu, Shaoheng Lin, Yinxiao Li et al.

CVPR 2025posterarXiv:2412.11100
14
citations
#1189

PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs

Michael Dorkenwald, Nimrod Barazani, Cees G. M. Snoek et al.

CVPR 2024posterarXiv:2402.08657
14
citations
#1190

Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting

Runsong Zhu, Shi Qiu, ZHENGZHE LIU et al.

CVPR 2025posterarXiv:2503.14029
14
citations
#1191

CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI

Siyuan Cheng, Lingjuan Lyu, Zhenting Wang et al.

CVPR 2025posterarXiv:2503.18286
14
citations
#1192

Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models

Zhihang Liu, Chen-Wei Xie, Pandeng Li et al.

CVPR 2025posterarXiv:2503.16036
14
citations
#1193

Synthetic-to-Real Self-supervised Robust Depth Estimation via Learning with Motion and Structure Priors

Weilong Yan, Ming Li, Li Haipeng et al.

CVPR 2025posterarXiv:2503.20211
14
citations
#1194

Learning Instance-Aware Correspondences for Robust Multi-Instance Point Cloud Registration in Cluttered Scenes

Zhiyuan Yu, Zheng Qin, lintao zheng et al.

CVPR 2024posterarXiv:2404.04557
14
citations
#1195

Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching

Bin Wang, Fan Wu, Linke Ouyang et al.

CVPR 2025posterarXiv:2409.03643
13
citations
#1196

RoDLA: Benchmarking the Robustness of Document Layout Analysis Models

Yufan Chen, Jiaming Zhang, Kunyu Peng et al.

CVPR 2024posterarXiv:2403.14442
13
citations
#1197

Move Anything with Layered Scene Diffusion

Jiawei Ren, Mengmeng Xu, Jui-Chieh Wu et al.

CVPR 2024posterarXiv:2404.07178
13
citations
#1198

How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval?

Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain et al.

CVPR 2024posterarXiv:2403.07203
13
citations
#1199

TANGO: Training-free Embodied AI Agents for Open-world Tasks

Filippo Ziliotto, Tommaso Campari, Luciano Serafini et al.

CVPR 2025posterarXiv:2412.10402
13
citations
#1200

MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling

Jian Yang, Dacheng Yin, Yizhou Zhou et al.

CVPR 2025posterarXiv:2410.10798
13
citations