Most Cited CVPR "object categories" Papers

5,589 papers found • Page 25 of 28

#4801

SleepVST: Sleep Staging from Near-Infrared Video Signals using Pre-Trained Transformers

Jonathan F. Carter, Joao Jorge, Oliver Gibson et al.

CVPR 2024highlightarXiv:2404.03831
#4802

Empowering Resampling Operation for Ultra-High-Definition Image Enhancement with Model-Aware Guidance

Yu, Jie Huang, Li et al.

CVPR 2024poster
#4803

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models

Rongjie Li, Songyang Zhang, Dahua Lin et al.

CVPR 2024posterarXiv:2404.00906
#4804

OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion

Xinyu Zhan, Lixin Yang, Yifei Zhao et al.

CVPR 2024posterarXiv:2403.19417
#4805

Visual Point Cloud Forecasting enables Scalable Autonomous Driving

Zetong Yang, Li Chen, Yanan Sun et al.

CVPR 2024highlightarXiv:2312.17655
#4806

HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation

Xin Huang, Ruizhi Shao, Qi Zhang et al.

CVPR 2024posterarXiv:2310.01406
#4807

HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video

Zicong Fan, Maria Parelli, Maria Kadoglou et al.

CVPR 2024highlightarXiv:2311.18448
#4808

AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving

Mingfu Liang, Jong-Chyi Su, Samuel Schulter et al.

CVPR 2024posterarXiv:2403.17373
#4809

Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing

Dongyoung Kim, Jinwoo Kim, Junsang Yu et al.

CVPR 2024posterarXiv:2402.18277
#4810

OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies

Lingdong Kong, Youquan Liu, Lai Xing Ng et al.

CVPR 2024highlightarXiv:2405.05259
#4811

Z*: Zero-shot Style Transfer via Attention Reweighting

Yingying Deng, Xiangyu He, Fan Tang et al.

CVPR 2024poster
#4812

G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis

Yufei Ye, Abhinav Gupta, Kris Kitani et al.

CVPR 2024posterarXiv:2404.12383
#4813

ConCon-Chi: Concept-Context Chimera Benchmark for Personalized Vision-Language Tasks

Andrea Rosasco, Stefano Berti, Giulia Pasquale et al.

CVPR 2024poster
#4814

Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation

Mukul Khanna, Yongsen Mao, Hanxiao Jiang et al.

CVPR 2024posterarXiv:2306.11290
#4815

KeyPoint Relative Position Encoding for Face Recognition

Minchul Kim, Feng Liu, Yiyang Su et al.

CVPR 2024posterarXiv:2403.14852
#4816

QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition

Xiang Li, Jinglu Wang, Xiaohao Xu et al.

CVPR 2024posterarXiv:2310.00132
#4817

From a Bird's Eye View to See: Joint Camera and Subject Registration without the Camera Calibration

Zekun Qian, Ruize Han, Wei Feng et al.

CVPR 2024posterarXiv:2212.09298
#4818

Joint2Human: High-Quality 3D Human Generation via Compact Spherical Embedding of 3D Joints

Muxin Zhang, Qiao Feng, Zhuo Su et al.

CVPR 2024posterarXiv:2312.08591
#4819

FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis

Feng Liang, Bichen Wu, Jialiang Wang et al.

CVPR 2024highlightarXiv:2312.17681
#4820

Accept the Modality Gap: An Exploration in the Hyperbolic Space

Sameera Ramasinghe, Violetta Shevchenko, Gil Avraham et al.

CVPR 2024highlight
#4821

Random Entangled Tokens for Adversarially Robust Vision Transformer

Huihui Gong, Minjing Dong, Siqi Ma et al.

CVPR 2024poster
#4822

OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition

Tongjia Chen, Hongshan Yu, Zhengeng Yang et al.

CVPR 2024posterarXiv:2312.00096
#4823

Continuous Pose for Monocular Cameras in Neural Implicit Representation

Qi Ma, Danda Paudel, Ajad Chhatkuli et al.

CVPR 2024posterarXiv:2311.17119
#4824

BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning

Siyuan Liang, Mingli Zhu, Aishan Liu et al.

CVPR 2024highlightarXiv:2311.12075
#4825

A Pedestrian is Worth One Prompt: Towards Language Guidance Person Re-Identification

Zexian Yang, Dayan Wu, Chenming Wu et al.

CVPR 2024highlight
#4826

From SAM to CAMs: Exploring Segment Anything Model for Weakly Supervised Semantic Segmentation

Hyeokjun Kweon, Kuk-Jin Yoon

CVPR 2024poster
#4827

Vlogger: Make Your Dream A Vlog

Shaobin Zhuang, Kunchang Li, Xinyuan Chen et al.

CVPR 2024posterarXiv:2401.09414
#4828

EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion

Zehuan Huang, Hao Wen, Junting Dong et al.

CVPR 2024posterarXiv:2312.06725
#4829

IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images

Yushuang Wu, Luyue Shi, Junhao Cai et al.

CVPR 2024highlightarXiv:2404.00269
#4830

SVGDreamer: Text Guided SVG Generation with Diffusion Model

XiMing Xing, Chuang Wang, Haitao Zhou et al.

CVPR 2024posterarXiv:2312.16476
#4831

Dual Prototype Attention for Unsupervised Video Object Segmentation

Suhwan Cho, Minhyeok Lee, Seunghoon Lee et al.

CVPR 2024posterarXiv:2211.12036
#4832

Contrastive Mean-Shift Learning for Generalized Category Discovery

Sua Choi, Dahyun Kang, Minsu Cho

CVPR 2024posterarXiv:2404.09451
#4833

Panacea: Panoramic and Controllable Video Generation for Autonomous Driving

Yuqing Wen, Yucheng Zhao, Yingfei Liu et al.

CVPR 2024posterarXiv:2408.07605
#4834

Flexible Biometrics Recognition: Bridging the Multimodality Gap through Attention Alignment and Prompt Tuning

Leslie Ching Ow Tiong, Dick Sigmund, Chen-Hui Chan et al.

CVPR 2024poster
#4835

ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers

Narges Norouzi, Svetlana Orlova, Daan de Geus et al.

CVPR 2024posterarXiv:2406.09936
#4836

UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion

Junsheng Zhou, Weiqi Zhang, Baorui Ma et al.

CVPR 2024posterarXiv:2404.06851
#4837

UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity

Jialong Zuo, Hanyu Zhou, Ying Nie et al.

CVPR 2024posterarXiv:2312.03441
#4838

Test-Time Zero-Shot Temporal Action Localization

Benedetta Liberatori, Alessandro Conti, Paolo Rota et al.

CVPR 2024posterarXiv:2404.05426
#4839

Towards Text-guided 3D Scene Composition

Qihang Zhang, Chaoyang Wang, Aliaksandr Siarohin et al.

CVPR 2024posterarXiv:2312.08885
#4840

Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation

Xiaohan Lei, Min Wang, Wengang Zhou et al.

CVPR 2024posterarXiv:2402.17587
#4841

Learning SO(3)-Invariant Semantic Correspondence via Local Shape Transform

Chunghyun Park, Seungwook Kim, Jaesik Park et al.

CVPR 2024posterarXiv:2404.11156
#4842

Taming Mode Collapse in Score Distillation for Text-to-3D Generation

Peihao Wang, Dejia Xu, Zhiwen Fan et al.

CVPR 2024posterarXiv:2401.00909
#4843

SnAG: Scalable and Accurate Video Grounding

Fangzhou Mu, Sicheng Mo, Yin Li

CVPR 2024posterarXiv:2404.02257
#4844

GLaMM: Pixel Grounding Large Multimodal Model

Hanoona Rasheed, Muhammad Maaz, Sahal Shaji Mullappilly et al.

CVPR 2024posterarXiv:2311.03356
#4845

ManiFPT: Defining and Analyzing Fingerprints of Generative Models

Hae Jin Song, Mahyar Khayatkhoei, Wael AbdAlmageed

CVPR 2024posterarXiv:2402.10401
#4846

CORE-MPI: Consistency Object Removal with Embedding MultiPlane Image

Donggeun Yoon, Donghyeon Cho

CVPR 2024poster
#4847

ScoreHypo: Probabilistic Human Mesh Estimation with Hypothesis Scoring

Yuan Xu, Xiaoxuan Ma, Jiajun Su et al.

CVPR 2024poster
#4848

HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models

Nataniel Ruiz, Yuanzhen Li, Varun Jampani et al.

CVPR 2024posterarXiv:2307.06949
#4849

UniMix: Towards Domain Adaptive and Generalizable LiDAR Semantic Segmentation in Adverse Weather

Haimei Zhao, Jing Zhang, Zhuo Chen et al.

CVPR 2024posterarXiv:2404.05145
#4850

BilevelPruning: Unified Dynamic and Static Channel Pruning for Convolutional Neural Networks

Shangqian Gao, Yanfu Zhang, Feihu Huang et al.

CVPR 2024poster
#4851

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

Jiayi Guo, Xingqian Xu, Yifan Pu et al.

CVPR 2024posterarXiv:2312.04410
#4852

Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents

Yuxi Wei, Zi Wang, Yifan Lu et al.

CVPR 2024highlightarXiv:2402.05746
#4853

Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions

Weizhen He, Yiheng Deng, SHIXIANG TANG et al.

CVPR 2024posterarXiv:2306.07520
#4854

SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers

Ioannis Kakogeorgiou, Spyros Gidaris, Konstantinos Karantzalos et al.

CVPR 2024highlightarXiv:2312.00648
#4855

MRC-Net: 6-DoF Pose Estimation with MultiScale Residual Correlation

Yuelong Li, Yafei Mao, Raja Bala et al.

CVPR 2024posterarXiv:2403.08019
#4856

Faces that Speak: Jointly Synthesising Talking Face and Speech from Text

Youngjoon Jang, Jihoon Kim, Junseok Ahn et al.

CVPR 2024posterarXiv:2405.10272
#4857

Learning to Segment Referred Objects from Narrated Egocentric Videos

Yuhan Shen, Huiyu Wang, Xitong Yang et al.

CVPR 2024poster
#4858

CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow

Chenbin Pan, Burhan Yaman, Senem Velipasalar et al.

CVPR 2024posterarXiv:2403.08919
#4859

ADFactory: An Effective Framework for Generalizing Optical Flow with NeRF

Han Ling, Quansen Sun, Yinghui Sun et al.

CVPR 2024posterarXiv:2311.04246
#4860

FastMAC: Stochastic Spectral Sampling of Correspondence Graph

Yifei Zhang, Hao Zhao, Hongyang Li et al.

CVPR 2024posterarXiv:2403.08770
#4861

PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution

Honghao Chen, Xiangxiang Chu, Renyongjian et al.

CVPR 2024posterarXiv:2403.07589
#4862

Improving Semantic Correspondence with Viewpoint-Guided Spherical Maps

Octave Mariotti, Oisin Mac Aodha, Hakan Bilen

CVPR 2024posterarXiv:2312.13216
#4863

Learning Degradation-Independent Representations for Camera ISP Pipelines

Yanhui Guo, Fangzhou Luo, Xiaolin Wu

CVPR 2024posterarXiv:2307.00761
#4864

Low-Resource Vision Challenges for Foundation Models

Yunhua Zhang, Hazel Doughty, Cees G. M. Snoek

CVPR 2024posterarXiv:2401.04716
#4865

MaxQ: Multi-Axis Query for N:M Sparsity Network

Jingyang Xiang, Siqi Li, Junhao Chen et al.

CVPR 2024poster
#4866

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Haoning Wu, Zicheng Zhang, Erli Zhang et al.

CVPR 2024posterarXiv:2311.06783
#4867

Efficient Scene Recovery Using Luminous Flux Prior

ZhongYu Li, Lei Zhang

CVPR 2024poster
#4868

Towards CLIP-driven Language-free 3D Visual Grounding via 2D-3D Relational Enhancement and Consistency

Yuqi Zhang, Han Luo, Yinjie Lei

CVPR 2024poster
#4869

LaRE^2: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection

Yunpeng Luo, Junlong Du, Ke Yan et al.

CVPR 2024posterarXiv:2403.17465
#4870

Stratified Avatar Generation from Sparse Observations

Han Feng, Wenchao Ma, Quankai Gao et al.

CVPR 2024posterarXiv:2405.20786
#4871

Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis

Simon Niedermayr, Josef Stumpfegger, rüdiger westermann

CVPR 2024posterarXiv:2401.02436
#4872

Motion Blur Decomposition with Cross-shutter Guidance

Xiang Ji, Haiyang Jiang, Yinqiang Zheng

CVPR 2024posterarXiv:2404.01120
#4873

NAPGuard: Towards Detecting Naturalistic Adversarial Patches

Siyang Wu, Jiakai Wang, Jiejie Zhao et al.

CVPR 2024poster
#4874

Active Prompt Learning in Vision Language Models

Jihwan Bang, Sumyeong Ahn, Jae-Gil Lee

CVPR 2024posterarXiv:2311.11178
#4875

Generating Handwritten Mathematical Expressions From Symbol Graphs: An End-to-End Pipeline

Yu chen, Fei Gao, YanguangZhang et al.

CVPR 2024poster
#4876

On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation

Agneet Chatterjee, Tejas Gokhale, Chitta Baral et al.

CVPR 2024posterarXiv:2404.08540
#4877

SingularTrajectory: Universal Trajectory Predictor Using Diffusion Model

Inhwan Bae, Young-Jae Park, Hae-Gon Jeon

CVPR 2024posterarXiv:2403.18452
#4878

Solving the Catastrophic Forgetting Problem in Generalized Category Discovery

Xinzi Cao, Xiawu Zheng, Guanhong Wang et al.

CVPR 2024posterarXiv:2501.05272
#4879

MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images

Junwen Huang, Hao Yu, Kuan-Ting Yu et al.

CVPR 2024posterarXiv:2403.01517
#4880

The Manga Whisperer: Automatically Generating Transcriptions for Comics

Ragav Sachdeva, Andrew Zisserman

CVPR 2024posterarXiv:2401.10224
#4881

3DInAction: Understanding Human Actions in 3D Point Clouds

Yizhak Ben-Shabat, Oren Shrout, Stephen Gould

CVPR 2024highlightarXiv:2303.06346
#4882

StyLitGAN: Image-Based Relighting via Latent Control

Anand Bhattad, James Soole, David Forsyth

CVPR 2024poster
#4883

Unsupervised Universal Image Segmentation

XuDong Wang, Dantong Niu, Xinyang Han et al.

CVPR 2024posterarXiv:2312.17243
#4884

Not All Classes Stand on Same Embeddings: Calibrating a Semantic Distance with Metric Tensor

Jae Hyeon Park, Gyoomin Lee, Seunggi Park et al.

CVPR 2024poster
#4885

Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling

Zhe Li, Zerong Zheng, Lizhen Wang et al.

CVPR 2024poster
#4886

Retrieval-Augmented Open-Vocabulary Object Detection

Jooyeon Kim, Eulrang Cho, Sehyung Kim et al.

CVPR 2024posterarXiv:2404.05687
#4887

LangSplat: 3D Language Gaussian Splatting

Minghan Qin, Wanhua Li, Jiawei ZHOU et al.

CVPR 2024highlightarXiv:2312.16084
#4888

SeaBird: Segmentation in Bird’s View with Dice Loss Improves Monocular 3D Detection of Large Objects

Abhinav Kumar, Yuliang Guo, Xinyu Huang et al.

CVPR 2024posterarXiv:2403.20318
#4889

Learning with Structural Labels for Learning with Noisy Labels

Noo-ri Kim, Jin-Seop Lee, Jee-Hyong Lee

CVPR 2024poster
#4890

Incremental Nuclei Segmentation from Histopathological Images via Future-class Awareness and Compatibility-inspired Distillation

Huyong Wang, Huisi Wu, Jing Qin

CVPR 2024poster
#4891

Model Inversion Robustness: Can Transfer Learning Help?

Sy-Tuyen Ho, Koh Jun Hao, Keshigeyan Chandrasegaran et al.

CVPR 2024posterarXiv:2405.05588
#4892

MaskCLR: Attention-Guided Contrastive Learning for Robust Action Representation Learning

Mohamed Abdelfattah, Mariam Hassan, Alex Alahi

CVPR 2024poster
#4893

D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection

Dinh Phat Do, Taehoon Kim, JAEMIN NA et al.

CVPR 2024posterarXiv:2403.09359
#4894

Intrinsic Image Diffusion for Indoor Single-view Material Estimation

Peter Kocsis, Vincent Sitzmann, Matthias Nießner

CVPR 2024posterarXiv:2312.12274
#4895

NetTrack: Tracking Highly Dynamic Objects with a Net

Guangze Zheng, Shijie Lin, Haobo Zuo et al.

CVPR 2024posterarXiv:2403.11186
#4896

FADES: Fair Disentanglement with Sensitive Relevance

Taeuk Jang, Xiaoqian Wang

CVPR 2024poster
#4897

Deep Single Image Camera Calibration by Heatmap Regression to Recover Fisheye Images Under Manhattan World Assumption

Nobuhiko Wakai, Satoshi Sato, Yasunori Ishii et al.

CVPR 2024posterarXiv:2303.17166
#4898

HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces

Haithem Turki, Vasu Agrawal, Samuel Rota Bulò et al.

CVPR 2024highlightarXiv:2312.03160
#4899

IIRP-Net: Iterative Inference Residual Pyramid Network for Enhanced Image Registration

Tai Ma, zhangsuwei, Jiafeng Li et al.

CVPR 2024poster
#4900

SEED-Bench: Benchmarking Multimodal Large Language Models

Bohao Li, Yuying Ge, Yixiao Ge et al.

CVPR 2024poster
#4901

Style Aligned Image Generation via Shared Attention

Amir Hertz, Andrey Voynov, Shlomi Fruchter et al.

CVPR 2024posterarXiv:2312.02133
#4902

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows

Zhenggang Tang, Jason Ren, Xiaoming Zhao et al.

CVPR 2024posterarXiv:2406.10543
#4903

LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content

Qihao Zhao, Yalun Dai, Hao Li et al.

CVPR 2024posterarXiv:2403.05854
#4904

How to Train Neural Field Representations: A Comprehensive Study and Benchmark

Samuele Papa, Riccardo Valperga, David Knigge et al.

CVPR 2024posterarXiv:2312.10531
#4905

Reg-PTQ: Regression-specialized Post-training Quantization for Fully Quantized Object Detector

Yifu Ding, Weilun Feng, Chuyan Chen et al.

CVPR 2024poster
#4906

FREE: Faster and Better Data-Free Meta-Learning

Yongxian Wei, Zixuan Hu, Zhenyi Wang et al.

CVPR 2024posterarXiv:2405.00984
#4907

You Only Need Less Attention at Each Stage in Vision Transformers

Shuoxi Zhang, Hanpeng Liu, Stephen Lin et al.

CVPR 2024posterarXiv:2406.00427
#4908

Hierarchical Patch Diffusion Models for High-Resolution Video Generation

Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin et al.

CVPR 2024posterarXiv:2406.07792
#4909

CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment

Sajid Javed, Arif Mahmood, IYYAKUTTI IYAPPAN GANAPATHI et al.

CVPR 2024posterarXiv:2406.05205
#4910

Towards Memorization-Free Diffusion Models

Chen Chen, Daochang Liu, Chang Xu

CVPR 2024posterarXiv:2404.00922
#4911

FedHCA2: Towards Hetero-Client Federated Multi-Task Learning

Yuxiang Lu, Suizhi Huang, Yuwen Yang et al.

CVPR 2024poster
#4912

Improving Unsupervised Hierarchical Representation with Reinforcement Learning

Ruyi An, Yewen Li, Xu He et al.

CVPR 2024poster
#4913

Global Latent Neural Rendering

Thomas Tanay, Matteo Maggioni

CVPR 2024posterarXiv:2312.08338
#4914

Data Poisoning based Backdoor Attacks to Contrastive Learning

Jinghuai Zhang, Hongbin Liu, Jinyuan Jia et al.

CVPR 2024posterarXiv:2211.08229
#4915

ERMVP: Communication-Efficient and Collaboration-Robust Multi-Vehicle Perception in Challenging Environments

Jingyu Zhang, Kun Yang, Yilei Wang et al.

CVPR 2024poster
#4916

GRAM: Global Reasoning for Multi-Page VQA

Itshak Blau, Sharon Fogel, Roi Ronen et al.

CVPR 2024posterarXiv:2401.03411
#4917

HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data

Qifan Yu, Juncheng Li, Longhui Wei et al.

CVPR 2024posterarXiv:2311.13614
#4918

Perception-Oriented Video Frame Interpolation via Asymmetric Blending

Guangyang Wu, Xin Tao, Changlin Li et al.

CVPR 2024posterarXiv:2404.06692
#4919

Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

Qi Yang, Xing Nie, Tong Li et al.

CVPR 2024highlightarXiv:2312.06462
#4920

Improving Generalized Zero-Shot Learning by Exploring the Diverse Semantics from External Class Names

Yapeng Li, Yong Luo, Zengmao Wang et al.

CVPR 2024poster
#4921

Continual Learning for Motion Prediction Model via Meta-Representation Learning and Optimal Memory Buffer Retention Strategy

Dae Jun Kang, Dongsuk Kum, Sanmin Kim

CVPR 2024poster
#4922

3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting

Zhiyin Qian, Shaofei Wang, Marko Mihajlovic et al.

CVPR 2024posterarXiv:2312.09228
#4923

Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation

Haofeng Liu, Chenshu Xu, Yifei Yang et al.

CVPR 2024posterarXiv:2404.01050
#4924

Building Vision-Language Models on Solid Foundations with Masked Distillation

Sepehr Sameni, Kushal Kafle, Hao Tan et al.

CVPR 2024poster
#4925

FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation

Pengchong Qiao, Lei Shang, Chang Liu et al.

CVPR 2024posterarXiv:2403.06775
#4926

OmniSDF: Scene Reconstruction using Omnidirectional Signed Distance Functions and Adaptive Binoctrees

Hakyeong Kim, Andreas Meuleman, Hyeonjoong Jang et al.

CVPR 2024posterarXiv:2404.00678
#4927

VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

Hyeonho Jeong, Geon Yeong Park, Jong Chul Ye

CVPR 2024posterarXiv:2312.00845
#4928

M3-UDA: A New Benchmark for Unsupervised Domain Adaptive Fetal Cardiac Structure Detection

Bin Pu, Liwen Wang, Jiewen Yang et al.

CVPR 2024poster
#4929

Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning

Da-Wei Zhou, Hai-Long Sun, Han-Jia Ye et al.

CVPR 2024posterarXiv:2403.12030
#4930

Multiway Point Cloud Mosaicking with Diffusion and Global Optimization

Shengze Jin, Iro Armeni, Marc Pollefeys et al.

CVPR 2024posterarXiv:2404.00429
#4931

NeRSP: Neural 3D Reconstruction for Reflective Objects with Sparse Polarized Images

Yufei Han, Heng Guo, Koki Fukai et al.

CVPR 2024posterarXiv:2406.07111
#4932

Beyond Text: Frozen Large Language Models in Visual Signal Comprehension

Lei Zhu, Fangyun Wei, Yanye Lu

CVPR 2024posterarXiv:2403.07874
#4933

CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective

Shunsuke Yasuki, Masato Taki

CVPR 2024posterarXiv:2403.06676
#4934

Regularized Parameter Uncertainty for Improving Generalization in Reinforcement Learning

Pehuen Moure, Longbiao Cheng, Joachim Ott et al.

CVPR 2024poster
#4935

Robust Noisy Correspondence Learning with Equivariant Similarity Consistency

Yuchen Yang, Erkun Yang, Likai Wang et al.

CVPR 2024poster
#4936

OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning

Siddharth Srivastava, Gaurav Sharma

CVPR 2024posterarXiv:2507.13364
#4937

Utility-Fairness Trade-Offs and How to Find Them

Sepehr Dehdashtian, Bashir Sadeghi, Vishnu Naresh Boddeti

CVPR 2024posterarXiv:2404.09454
#4938

Fitting Flats to Flats

Gabriel Dogadov, Ugo Finnendahl, Marc Alexa

CVPR 2024poster
#4939

HOIST-Former: Hand-held Objects Identification Segmentation and Tracking in the Wild

Supreeth Narasimhaswamy, Huy Anh Nguyen, Lihan Huang et al.

CVPR 2024poster
#4940

WaveFace: Authentic Face Restoration with Efficient Frequency Recovery

Yunqi Miao, Jiankang Deng, Jungong Han

CVPR 2024posterarXiv:2403.12760
#4941

Learning Adaptive Spatial Coherent Correlations for Speech-Preserving Facial Expression Manipulation

Tianshui Chen, Jianman Lin, Zhijing Yang et al.

CVPR 2024highlight
#4942

UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and Unfavorable Sets

Youngju Na, Woo Jae Kim, Kyu Han et al.

CVPR 2024posterarXiv:2403.05086
#4943

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Lihe Yang, Bingyi Kang, Zilong Huang et al.

CVPR 2024posterarXiv:2401.10891
#4944

EventDance: Unsupervised Source-free Cross-modal Adaptation for Event-based Object Recognition

Xu Zheng, Addison, Lin Wang

CVPR 2024posterarXiv:2403.14082
#4945

Real-World Efficient Blind Motion Deblurring via Blur Pixel Discretization

Insoo Kim, Jae Seok Choi, Geonseok Seo et al.

CVPR 2024posterarXiv:2404.12168
#4946

HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting

Hongyu Zhou, Jiahao Shao, Lu Xu et al.

CVPR 2024posterarXiv:2403.12722
#4947

Human Motion Prediction Under Unexpected Perturbation

Jiangbei Yue, Baiyi Li, Julien Pettré et al.

CVPR 2024highlightarXiv:2403.15891
#4948

SynFog: A Photo-realistic Synthetic Fog Dataset based on End-to-end Imaging Simulation for Advancing Real-World Defogging in Autonomous Driving

Yiming Xie, Henglu Wei, Zhenyi Liu et al.

CVPR 2024posterarXiv:2403.17094
#4949

IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing

Shaofei Wang, Bozidar Antic, Andreas Geiger et al.

CVPR 2024posterarXiv:2312.05210
#4950

All in One Framework for Multimodal Re-identification in the Wild

He Li, Mang Ye, Ming Zhang et al.

CVPR 2024posterarXiv:2405.04741
#4951

FedSOL: Stabilized Orthogonal Learning with Proximal Restrictions in Federated Learning

Gihun Lee, Minchan Jeong, SangMook Kim et al.

CVPR 2024posterarXiv:2308.12532
#4952

Don't Look into the Dark: Latent Codes for Pluralistic Image Inpainting

Haiwei Chen, Yajie Zhao

CVPR 2024posterarXiv:2403.18186
#4953

WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion

Soyong Shin, Juyong Kim, Eni Halilaj et al.

CVPR 2024posterarXiv:2312.07531
#4954

FedUV: Uniformity and Variance for Heterogeneous Federated Learning

Ha Min Son, Moon-Hyun Kim, Tai-Myoung Chung et al.

CVPR 2024posterarXiv:2402.18372
#4955

Language-driven Grasp Detection

An Dinh Vuong, Minh Nhat VU, Baoru Huang et al.

CVPR 2024posterarXiv:2406.09489
#4956

Each Test Image Deserves A Specific Prompt: Continual Test-Time Adaptation for 2D Medical Image Segmentation

Ziyang Chen, Yongsheng Pan, Yiwen Ye et al.

CVPR 2024posterarXiv:2311.18363
#4957

Prompting Vision Foundation Models for Pathology Image Analysis

CHONG YIN, Siqi Liu, Kaiyang Zhou et al.

CVPR 2024poster
#4958

Unmixing Before Fusion: A Generalized Paradigm for Multi-Source-based Hyperspectral Image Synthesis

Yang Yu, Erting Pan, Xinya Wang et al.

CVPR 2024poster
#4959

Navigating Beyond Dropout: An Intriguing Solution towards Generalizable Image Super Resolution

Hongjun Wang, Jiyuan Chen, Yinqiang Zheng et al.

CVPR 2024posterarXiv:2402.18929
#4960

Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior

Chen Cheng, Xiaofeng Yang, Fan Yang et al.

CVPR 2024posterarXiv:2403.09140
#4961

Task2Box: Box Embeddings for Modeling Asymmetric Task Relationships

Rangel Daroya, Aaron Sun, Subhransu Maji

CVPR 2024highlightarXiv:2403.17173
#4962

ExMap: Leveraging Explainability Heatmaps for Unsupervised Group Robustness to Spurious Correlations

Rwiddhi Chakraborty, Adrian de Sena Sletten, Michael C. Kampffmeyer

CVPR 2024posterarXiv:2403.13870
#4963

DUDF: Differentiable Unsigned Distance Fields with Hyperbolic Scaling

Miguel Fainstein, Viviana Siless, Emmanuel Iarussi

CVPR 2024posterarXiv:2402.08876
#4964

PBWR: Parametric-Building-Wireframe Reconstruction from Aerial LiDAR Point Clouds

Shangfeng Huang, Ruisheng Wang, Bo Guo et al.

CVPR 2024poster
#4965

GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation Demonstration and Imitation

Zifan Wang, Junyu Chen, Ziqing Chen et al.

CVPR 2024poster
#4966

Content-Style Decoupling for Unsupervised Makeup Transfer without Generating Pseudo Ground Truth

Zhaoyang Sun, Shengwu Xiong, Yaxiong Chen et al.

CVPR 2024posterarXiv:2405.17240
#4967

UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs

Yanwu Xu, Yang Zhao, Zhisheng Xiao et al.

CVPR 2024highlightarXiv:2311.09257
#4968

SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation

Keqi Chen, vinkle srivastav, Nicolas Padoy

CVPR 2024posterarXiv:2404.02041
#4969

Context-Aware Integration of Language and Visual References for Natural Language Tracking

Yanyan Shao, Shuting He, Qi Ye et al.

CVPR 2024posterarXiv:2403.19975
#4970

PointBeV: A Sparse Approach for BeV Predictions

Loick Chambon, Éloi Zablocki, Mickaël Chen et al.

CVPR 2024posterarXiv:2312.00703
#4971

Contextual Augmented Global Contrast for Multimodal Intent Recognition

Kaili Sun, Zhiwen Xie, Mang Ye et al.

CVPR 2024poster
#4972

Intraoperative 2D/3D Image Registration via Differentiable X-ray Rendering

Vivek Gopalakrishnan, Neel Dey, Polina Golland

CVPR 2024posterarXiv:2312.06358
#4973

LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation

Kibum Kim, Kanghoon Yoon, Jaehyeong Jeon et al.

CVPR 2024posterarXiv:2310.10404
#4974

FreeKD: Knowledge Distillation via Semantic Frequency Prompt

Yuan Zhang, Tao Huang, Jiaming Liu et al.

CVPR 2024posterarXiv:2311.12079
#4975

Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning

Wei Zhang, Chaoqun Wan, Tongliang Liu et al.

CVPR 2024poster
#4976

SNIDA: Unlocking Few-Shot Object Detection with Non-linear Semantic Decoupling Augmentation

Yanjie Wang, Xu Zou, Luxin Yan et al.

CVPR 2024poster
#4977

Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features

Niladri Shekhar Dutt, Sanjeev Muralikrishnan, Niloy J. Mitra

CVPR 2024posterarXiv:2311.17024
#4978

Unsupervised Occupancy Learning from Sparse Point Cloud

Amine Ouasfi, Adnane Boukhayma

CVPR 2024highlightarXiv:2404.02759
#4979

Hierarchical Intra-modal Correlation Learning for Label-free 3D Semantic Segmentation

Xin Kang, Lei Chu, Jiahao Li et al.

CVPR 2024poster
#4980

Adaptive Hyper-graph Aggregation for Modality-Agnostic Federated Learning

Fan Qi, Shuai Li

CVPR 2024poster
#4981

Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment

Angchi Xu, Wei-Shi Zheng

CVPR 2024posterarXiv:2403.19225
#4982

MoMask: Generative Masked Modeling of 3D Human Motions

chuan guo, Yuxuan Mu, Muhammad Gohar Javed et al.

CVPR 2024posterarXiv:2312.00063
#4983

Event-based Visible and Infrared Fusion via Multi-task Collaboration

Mengyue Geng, Lin Zhu, Lizhi Wang et al.

CVPR 2024poster
#4984

MotionEditor: Editing Video Motion via Content-Aware Diffusion

Shuyuan Tu, Qi Dai, Zhi-Qi Cheng et al.

CVPR 2024posterarXiv:2311.18830
#4985

State Space Models for Event Cameras

Nikola Zubic, Mathias Gehrig, Davide Scaramuzza

CVPR 2024poster
#4986

Test-Time Linear Out-of-Distribution Detection

Ke Fan, Tong Liu, Xingyu Qiu et al.

CVPR 2024poster
#4987

Exploiting Style Latent Flows for Generalizing Deepfake Video Detection

Jongwook Choi, Taehoon Kim, Yonghyun Jeong et al.

CVPR 2024posterarXiv:2403.06592
#4988

Unsupervised Semantic Segmentation Through Depth-Guided Feature Correlation and Sampling

Leon Sick, Dominik Engel, Pedro Hermosilla et al.

CVPR 2024posterarXiv:2309.12378
#4989

Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge

Haoxiang Ma, Modi Shi, Boyang GAO et al.

CVPR 2024posterarXiv:2404.01727
#4990

Generalized Event Cameras

Varun Sundar, Matthew Dutson, Andrei Ardelean et al.

CVPR 2024posterarXiv:2407.02683
#4991

Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval

Rohan Sarkar, Avinash Kak

CVPR 2024posterarXiv:2403.00272
#4992

NIVeL: Neural Implicit Vector Layers for Text-to-Vector Generation

Vikas Thamizharasan, Difan Liu, Matthew Fisher et al.

CVPR 2024posterarXiv:2405.15217
#4993

Hyperbolic Anomaly Detection

Huimin Li, Zhentao Chen, Yunhao Xu et al.

CVPR 2024poster
#4994

DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance

Zixuan Wang, Jia Jia, Shikun Sun et al.

CVPR 2024posterarXiv:2403.13667
#4995

HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative

CONG MA, Qiao Lei, Chengkai Zhu et al.

CVPR 2024posterarXiv:2403.02640
#4996

What Sketch Explainability Really Means for Downstream Tasks?

Hmrishav Bandyopadhyay, Pinaki Nath Chowdhury, Ayan Kumar Bhunia et al.

CVPR 2024posterarXiv:2403.09480
#4997

CommonCanvas: Open Diffusion Models Trained on Creative-Commons Images

Aaron Gokaslan, A. Feder Cooper, Jasmine Collins et al.

CVPR 2024poster
#4998

Memory-based Adapters for Online 3D Scene Perception

Xiuwei Xu, Chong Xia, Ziwei Wang et al.

CVPR 2024posterarXiv:2403.06974
#4999

Cross-spectral Gated-RGB Stereo Depth Estimation

Samuel Brucker, Stefanie Walz, Mario Bijelic et al.

CVPR 2024highlightarXiv:2405.12759
#5000

Readout Guidance: Learning Control from Diffusion Features

Grace Luo, Trevor Darrell, Oliver Wang et al.

CVPR 2024highlightarXiv:2312.02150