Most Cited 2024 "modal integration" Papers

12,324 papers found • Page 51 of 62

Filters:Most Cited 2024 modal integration Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

#10001

Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation

Jingyun Wang, Guoliang Kang

CVPR 2024arXiv:2408.06747

#10002

Leveraging Vision-Language Models for Improving Domain Generalization in Image Classification

Sravanti Addepalli, Ashish Asokan, Lakshay Sharma et al.

CVPR 2024arXiv:2310.08255

#10003

Absolute Pose from One or Two Scaled and Oriented Features

Jonathan Ventura, Zuzana Kukelova, Torsten Sattler et al.

CVPR 2024highlight

#10004

Draw Step by Step: Reconstructing CAD Construction Sequences from Point Clouds via Multimodal Diffusion.

Weijian Ma, Shuaiqi Chen, Yunzhong Lou et al.

CVPR 2024

#10005

DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation

Zeeshan Hayder, Xuming He

CVPR 2024arXiv:2403.14886

#10006

Open-Vocabulary 3D Semantic Segmentation with Foundation Models

Li Jiang, Shaoshuai Shi, Bernt Schiele

CVPR 2024highlight

#10007

Training Vision Transformers for Semi-Supervised Semantic Segmentation

Xinting Hu, Li Jiang, Bernt Schiele

CVPR 2024

#10008

APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation

Weizhao He, Yang Zhang, Wei Zhuo et al.

CVPR 2024arXiv:2406.08372

#10009

Design2Cloth: 3D Cloth Generation from 2D Masks

Jiali Zheng, Rolandos Alexandros Potamias, Stefanos Zafeiriou

CVPR 2024arXiv:2404.02686

#10010

S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes

Xingyi Li, Zhiguo Cao, Yizheng Wu et al.

CVPR 2024arXiv:2403.06205

#10011

SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation

Aysim Toker, Marvin Eisenberger, Daniel Cremers et al.

CVPR 2024arXiv:2403.16605

#10012

Dual-Consistency Model Inversion for Non-Exemplar Class Incremental Learning

Zihuan Qiu, Yi Xu, Fanman Meng et al.

CVPR 2024

#10013

DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes

Hao Yan, Zhihui Ke, Xiaobo Zhou et al.

CVPR 2024arXiv:2403.15679

#10014

Rolling Shutter Correction with Intermediate Distortion Flow Estimation

Mingdeng Cao, Sidi Yang, Yujiu Yang et al.

CVPR 2024arXiv:2404.06350

#10015

Towards Transferable Targeted 3D Adversarial Attack in the Physical World

Yao Huang, Yinpeng Dong, Shouwei Ruan et al.

CVPR 2024arXiv:2312.09558

#10016

Hybrid Functional Maps for Crease-Aware Non-Isometric Shape Matching

Lennart Bastian, Yizheng Xie, Nassir Navab et al.

CVPR 2024arXiv:2312.03678

#10017

Class Tokens Infusion for Weakly Supervised Semantic Segmentation

Sung-Hoon Yoon, Hoyong Kwon, Hyeonseong Kim et al.

CVPR 2024

#10018

SFOD: Spiking Fusion Object Detector

Yimeng Fan, Wei Zhang, Changsong Liu et al.

CVPR 2024arXiv:2403.15192

#10019

AnyDoor: Zero-shot Object-level Image Customization

Xi Chen, Lianghua Huang, Yu Liu et al.

CVPR 2024arXiv:2307.09481

#10020

GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs

Gege Gao, Weiyang Liu, Anpei Chen et al.

CVPR 2024arXiv:2312.00093

#10021

SeD: Semantic-Aware Discriminator for Image Super-Resolution

Bingchen Li, Xin Li, Hanxin Zhu et al.

CVPR 2024arXiv:2402.19387

#10022

InstanceDiffusion: Instance-level Control for Image Generation

XuDong Wang, Trevor Darrell, Sai Saketh Rambhatla et al.

CVPR 2024arXiv:2402.03290

#10023

Robust Emotion Recognition in Context Debiasing

Dingkang Yang, Kun Yang, Mingcheng Li et al.

CVPR 2024arXiv:2403.05963

#10024

Improving Training Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architecture

Huijie Zhang, Yifu Lu, Ismail Alkhouri et al.

CVPR 2024

#10025

Balancing Act: Distribution-Guided Debiasing in Diffusion Models

Rishubh Parihar, Abhijnya Bhat, Abhipsa Basu et al.

CVPR 2024arXiv:2402.18206

#10026

Sieve: Multimodal Dataset Pruning using Image Captioning Models

Anas Mahmoud, Mostafa Elhoushi, Amro Abbas et al.

CVPR 2024arXiv:2310.02110

#10027

Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation

Song Wang, Jiawei Yu, Wentong Li et al.

CVPR 2024arXiv:2404.11958

#10028

Towards Fairness-Aware Adversarial Learning

Yanghao Zhang, Tianle Zhang, Ronghui Mu et al.

CVPR 2024arXiv:2402.17729

#10029

SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge

Andong Wang, Bo Wu, Sunli Chen et al.

CVPR 2024arXiv:2405.09713

#10030

MuRF: Multi-Baseline Radiance Fields

Haofei Xu, Anpei Chen, Yuedong Chen et al.

CVPR 2024arXiv:2312.04565

#10031

Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans

Romain Loiseau, Elliot Vincent, Mathieu Aubry et al.

CVPR 2024arXiv:2304.09704

#10032

Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds

Tianrui Lou, Xiaojun Jia, Jindong Gu et al.

CVPR 2024arXiv:2403.05247

#10033

PIGEON: Predicting Image Geolocations

Lukas Haas, Michal Skreta, Silas Alberti et al.

CVPR 2024highlightarXiv:2307.05845

#10034

JoAPR: Cleaning the Lens of Prompt Learning for Vision-Language Models

YUNCHENG GUO, Xiaodong Gu

CVPR 2024

#10035

Retrieval-Augmented Egocentric Video Captioning

Jilan Xu, Yifei Huang, Junlin Hou et al.

CVPR 2024arXiv:2401.00789

#10036

GPLD3D: Latent Diffusion of 3D Shape Generative Models by Enforcing Geometric and Physical Priors

Yuan Dong, Qi Zuo, Xiaodong Gu et al.

CVPR 2024

#10037

Low-Rank Knowledge Decomposition for Medical Foundation Models

Yuhang Zhou, Haolin li, Siyuan Du et al.

CVPR 2024arXiv:2404.17184

#10038

Pixel-level Semantic Correspondence through Layout-aware Representation Learning and Multi-scale Matching Integration

Yixuan Sun, Zhangyue Yin, Haibo Wang et al.

CVPR 2024

#10039

View From Above: Orthogonal-View aware Cross-view Localization

Shan Wang, Chuong Nguyen, Jiawei Liu et al.

CVPR 2024

#10040

WorDepth: Variational Language Prior for Monocular Depth Estimation

Ziyao Zeng, Hyoungseob Park, Fengyu Yang et al.

CVPR 2024arXiv:2404.03635

#10041

Event-assisted Low-Light Video Object Segmentation

Li Hebei, Jin Wang, Jiahui Yuan et al.

CVPR 2024arXiv:2404.01945

#10042

3DToonify: Creating Your High-Fidelity 3D Stylized Avatar Easily from 2D Portrait Images

Yifang Men, Hanxi Liu, Yuan Yao et al.

CVPR 2024

#10043

Synthesize Diagnose and Optimize: Towards Fine-Grained Vision-Language Understanding

Wujian Peng, Sicheng Xie, Zuyao You et al.

CVPR 2024

#10044

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Tai Wang, Xiaohan Mao, Chenming Zhu et al.

CVPR 2024arXiv:2312.16170

#10045

DIOD: Self-Distillation Meets Object Discovery

Sandra Kara, Hejer AMMAR, Julien Denize et al.

CVPR 2024

#10046

FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models

LIn Zhao, Tianchen Zhao, Zinan Lin et al.

CVPR 2024arXiv:2403.16379

#10047

COLMAP-Free 3D Gaussian Splatting

Yang Fu, Sifei Liu, Amey Kulkarni et al.

CVPR 2024highlightarXiv:2312.07504

#10048

SNED: Superposition Network Architecture Search for Efficient Video Diffusion Model

Zhengang Li, Yan Kang, Yuchen Liu et al.

CVPR 2024arXiv:2406.00195

#10049

Personalized Residuals for Concept-Driven Text-to-Image Generation

Cusuh Ham, Matthew Fisher, James Hays et al.

CVPR 2024arXiv:2405.12978

#10050

CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation

Seokju Cho, Heeseong Shin, Sunghwan Hong et al.

CVPR 2024highlightarXiv:2303.11797

#10051

Deep Generative Model based Rate-Distortion for Image Downscaling Assessment

yuanbang liang, Bhavesh Garg, Paul L. Rosin et al.

CVPR 2024arXiv:2403.15139

#10052

Forecasting of 3D Whole-body Human Poses with Grasping Objects

yan haitao, Qiongjie Cui, Jiexin Xie et al.

CVPR 2024

#10053

VA3: Virtually Assured Amplification Attack on Probabilistic Copyright Protection for Text-to-Image Generative Models

Xiang Li, Qianli Shen, Kenji Kawaguchi

CVPR 2024highlightarXiv:2312.00057

#10054

PIE-NeRF: Physics-based Interactive Elastodynamics with NeRF

Yutao Feng, Yintong Shang, Xuan Li et al.

CVPR 2024arXiv:2311.13099

#10055

SNI-SLAM: Semantic Neural Implicit SLAM

Siting Zhu, Guangming Wang, Hermann Blum et al.

CVPR 2024arXiv:2311.11016

#10056

Edge-Aware 3D Instance Segmentation Network with Intelligent Semantic Prior

Wonseok Roh, Hwanhee Jung, Giljoo Nam et al.

CVPR 2024

#10057

TextureDreamer: Image-Guided Texture Synthesis Through Geometry-Aware Diffusion

Yu-Ying Yeh, Jia-Bin Huang, Changil Kim et al.

CVPR 2024arXiv:2401.09416

#10058

MAFA: Managing False Negatives for Vision-Language Pre-training

Jaeseok Byun, Dohoon Kim, Taesup Moon

CVPR 2024arXiv:2312.06112

#10059

Blur2Blur: Blur Conversion for Unsupervised Image Deblurring on Unknown Domains

Bang-Dang Pham, Phong Tran, Anh Tran et al.

CVPR 2024arXiv:2403.16205

#10060

RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models

Ozgur Kara, Bariscan Kurtkaya, Hidir Yesiltepe et al.

CVPR 2024highlightarXiv:2312.04524

#10061

ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for Autonomous Vehicles

Jiawei Zhang, Chejian Xu, Bo Li

CVPR 2024arXiv:2405.14062

#10062

MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos

Jielin Qiu, Jiacheng Zhu, William Han et al.

CVPR 2024highlightarXiv:2306.04216

#10063

Generalizable Novel-View Synthesis using a Stereo Camera

Haechan Lee, Wonjoon Jin, Seung-Hwan Baek et al.

CVPR 2024arXiv:2404.13541

#10064

Learning Structure-from-Motion with Graph Attention Networks

Lucas Brynte, José Pedro Iglesias, Carl Olsson et al.

CVPR 2024arXiv:2308.15984

#10065

Don’t Drop Your Samples! Coherence-Aware Training Benefits Conditional Diffusion

Nicolas Dufour, Victor Besnier, Vicky Kalogeiton et al.

CVPR 2024highlight

#10066

SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection

Peng Qi, Zehong Yan, Wynne Hsu et al.

CVPR 2024arXiv:2403.03170

#10067

Spatial-Aware Regression for Keypoint Localization

Dongkai Wang, Shiliang Zhang

CVPR 2024highlight

#10068

Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes

Chi-Hsi Kung, 書緯呂, Yi-Hsuan Tsai et al.

CVPR 2024arXiv:2311.17948

#10069

Diff-BGM: A Diffusion Model for Video Background Music Generation

Sizhe Li, Yiming Qin, Minghang Zheng et al.

CVPR 2024arXiv:2405.11913

#10070

ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation

Dar-Yen Chen, Hamish Tennent, Ching-Wen Hsu

CVPR 2024arXiv:2312.02109

#10071

EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars

Nikita Drobyshev, Antoni Bigata Casademunt, Konstantinos Vougioukas et al.

CVPR 2024arXiv:2404.19110

#10072

Shadow-Enlightened Image Outpainting

Hang Yu, Ruilin Li, Shaorong Xie et al.

CVPR 2024

#10073

Specularity Factorization for Low-Light Enhancement

Saurabh Saini, P. J. Narayanan

CVPR 2024arXiv:2404.01998

#10074

Latent Modulated Function for Computational Optimal Continuous Image Representation

Zongyao He, Zhi Jin

CVPR 2024highlightarXiv:2404.16451

#10075

Domain-Rectifying Adapter for Cross-Domain Few-Shot Segmentation

Jiapeng Su, Qi Fan, Wenjie Pei et al.

CVPR 2024arXiv:2404.10322

#10076

Shallow-Deep Collaborative Learning for Unsupervised Visible-Infrared Person Re-Identification

Bin Yang, Jun Chen, Mang Ye

CVPR 2024

#10077

L2B: Learning to Bootstrap Robust Models for Combating Label Noise

Yuyin Zhou, Xianhang li, Fengze Liu et al.

CVPR 2024arXiv:2202.04291

#10078

OED: Towards One-stage End-to-End Dynamic Scene Graph Generation

Guan Wang, Zhimin Li, Qingchao Chen et al.

CVPR 2024arXiv:2405.16925

#10079

SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis

Ziqiao Peng, Wentao Hu, Yue Shi et al.

CVPR 2024arXiv:2311.17590

#10080

Attack To Defend: Exploiting Adversarial Attacks for Detecting Poisoned Models

Samar Fares, Karthik Nandakumar

CVPR 2024

#10081

ViewFusion: Towards Multi-View Consistency via Interpolated Denoising

Xianghui Yang, Gil Avraham, Yan Zuo et al.

CVPR 2024arXiv:2402.18842

#10082

D3still: Decoupled Differential Distillation for Asymmetric Image Retrieval

Yi Xie, Yihong Lin, Wenjie Cai et al.

CVPR 2024

#10083

LiDAR-Net: A Real-scanned 3D Point Cloud Dataset for Indoor Scenes

Yanwen Guo, Yuanqi Li, Dayong Ren et al.

CVPR 2024

#10084

Delving into the Trajectory Long-tail Distribution for Muti-object Tracking

Sijia Chen, En Yu, Jinyang Li et al.

CVPR 2024arXiv:2403.04700

#10085

Non-autoregressive Sequence-to-Sequence Vision-Language Models

Kunyu Shi, Qi Dong, Luis Goncalves et al.

CVPR 2024arXiv:2403.02249

#10086

No More Ambiguity in 360° Room Layout via Bi-Layout Estimation

Yu-Ju Tsai, Jin-Cheng Jhang, JINGJING ZHENG et al.

CVPR 2024arXiv:2404.09993

#10087

MTLoRA: Low-Rank Adaptation Approach for Efficient Multi-Task Learning

Ahmed Agiza, Marina Neseem, Sherief Reda

CVPR 2024highlight

#10088

HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes

Yichen Yao, Zimo Jiang, YUJING SUN et al.

CVPR 2024arXiv:2403.02769

#10089

Streaming Dense Video Captioning

Xingyi Zhou, Anurag Arnab, Shyamal Buch et al.

CVPR 2024arXiv:2404.01297

#10090

3D LiDAR Mapping in Dynamic Environments using a 4D Implicit Neural Representation

Xingguang Zhong, Yue Pan, Cyrill Stachniss et al.

CVPR 2024arXiv:2405.03388

#10091

SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction

Yuanhui Huang, Wenzhao Zheng, Borui Zhang et al.

CVPR 2024arXiv:2311.12754

#10092

On the Scalability of Diffusion-based Text-to-Image Generation

Hao Li, Yang Zou, Ying Wang et al.

CVPR 2024arXiv:2404.02883

#10093

Bootstrapping Autonomous Driving Radars with Self-Supervised Learning

Yiduo Hao, Sohrab Madani, Junfeng Guan et al.

CVPR 2024arXiv:2312.04519

#10094

Analyzing and Improving the Training Dynamics of Diffusion Models

Tero Karras, Miika Aittala, Jaakko Lehtinen et al.

CVPR 2024arXiv:2312.02696

#10095

DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaptation by Combining 3D GANs and Diffusion Priors

Biwen Lei, Kai Yu, Mengyang Feng et al.

CVPR 2024arXiv:2312.16837

#10096

OneLLM: One Framework to Align All Modalities with Language

Jiaming Han, Kaixiong Gong, Yiyuan Zhang et al.

CVPR 2024arXiv:2312.03700

#10097

LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition

Zhonglin Sun, Chen Feng, Ioannis Patras et al.

CVPR 2024arXiv:2403.08161

#10098

See Say and Segment: Teaching LMMs to Overcome False Premises

Tsung-Han Wu, Giscard Biamby, David Chan et al.

CVPR 2024

#10099

PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models

Fei Deng, Qifei Wang, Wei Wei et al.

CVPR 2024arXiv:2402.08714

#10100

Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection

Jongha Kim, Jihwan Park, Jinyoung Park et al.

CVPR 2024arXiv:2403.17709

#10101

MMVP: A Multimodal MoCap Dataset with Vision and Pressure Sensors

He Zhang, Shenghao Ren, Haolei Yuan et al.

CVPR 2024arXiv:2403.17610

#10102

SD2Event:Self-supervised Learning of Dynamic Detectors and Contextual Descriptors for Event Cameras

Yuan Gao, Yuqing Zhu, Xinjun Li et al.

CVPR 2024

#10103

Tuning Stable Rank Shrinkage: Aiming at the Overlooked Structural Risk in Fine-tuning

Sicong Shen, Yang Zhou, Bingzheng Wei et al.

CVPR 2024

#10104

DiSR-NeRF: Diffusion-Guided View-Consistent Super-Resolution NeRF

Jie Long Lee, Chen Li, Gim Hee Lee

CVPR 2024arXiv:2404.00874

#10105

Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications

Junyi Ma, Xieyuanli Chen, Jiawei Huang et al.

CVPR 2024arXiv:2311.17663

#10106

Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

Li Hu

CVPR 2024arXiv:2311.17117

#10107

Relightable and Animatable Neural Avatar from Sparse-View Video

Zhen Xu, Sida Peng, Chen Geng et al.

CVPR 2024highlightarXiv:2308.07903

#10108

Objects as Volumes: A Stochastic Geometry View of Opaque Solids

Bailey Miller, Hanyu Chen, Alice Lai et al.

CVPR 2024arXiv:2312.15406

#10109

Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation

Xiao Lin, Wenfei Yang, Yuan Gao et al.

CVPR 2024arXiv:2403.19527

#10110

PaReNeRF: Toward Fast Large-scale Dynamic NeRF with Patch-based Reference

Xiao Tang, Min Yang, Penghui Sun et al.

CVPR 2024

#10111

PostureHMR: Posture Transformation for 3D Human Mesh Recovery

Yu-Pei Song, Xiao WU, Zhaoquan Yuan et al.

CVPR 2024

#10112

Desigen: A Pipeline for Controllable Design Template Generation

Haohan Weng, Danqing Huang, YU QIAO et al.

CVPR 2024arXiv:2403.09093

#10113

WANDR: Intention-guided Human Motion Generation

Markos Diomataris, Nikos Athanasiou, Omid Taheri et al.

CVPR 2024arXiv:2404.15383

#10114

WWW: A Unified Framework for Explaining What Where and Why of Neural Networks by Interpretation of Neuron Concepts

Yong Hyun Ahn, Hyeon Bae Kim, Seong Tae Kim

CVPR 2024arXiv:2402.18956

#10115

Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval

Minkuk Kim, Hyeon Bae Kim, Jinyoung Moon et al.

CVPR 2024arXiv:2404.07610

#10116

Rich Human Feedback for Text-to-Image Generation

Youwei Liang, Junfeng He, Gang Li et al.

CVPR 2024arXiv:2312.10240

#10117

Dr. Bokeh: DiffeRentiable Occlusion-aware Bokeh Rendering

Yichen Sheng, Zixun Yu, Lu Ling et al.

CVPR 2024

#10118

SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation

Yuxuan Zhang, Yiren Song, Jiaming Liu et al.

CVPR 2024arXiv:2312.16272

#10119

SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting

Hoon Kim, Minje Jang, Wonjun Yoon et al.

CVPR 2024highlightarXiv:2402.18848

#10120

CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor

Shuyang Sun, Runjia Li, Philip H.S. Torr et al.

CVPR 2024arXiv:2312.07661

#10121

Learning from Observer Gaze: Zero-Shot Attention Prediction Oriented by Human-Object Interaction Recognition

Yuchen Zhou, Linkai Liu, Chao Gou

CVPR 2024

#10122

Super-Resolution Reconstruction from Bayer-Pattern Spike Streams

Yanchen Dong, Ruiqin Xiong, Jian Zhang et al.

CVPR 2024

#10123

Image Neural Field Diffusion Models

Yinbo Chen, Oliver Wang, Richard Zhang et al.

CVPR 2024highlightarXiv:2406.07480

#10124

Denoising Point Clouds in Latent Space via Graph Convolution and Invertible Neural Network

Aihua Mao, Biao Yan, Zijing Ma et al.

CVPR 2024

#10125

Dual-View Visual Contextualization for Web Navigation

Jihyung Kil, Chan Hee Song, Boyuan Zheng et al.

CVPR 2024arXiv:2402.04476

#10126

Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation

Haojie Zhang, Yongyi Su, Xun Xu et al.

CVPR 2024arXiv:2312.03502

#10127

Language-guided Image Reflection Separation

Haofeng Zhong, Yuchen Hong, Shuchen Weng et al.

CVPR 2024arXiv:2402.11874

#10128

SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation

Thuan Nguyen, Anh Tran

CVPR 2024arXiv:2312.05239

#10129

Looking 3D: Anomaly Detection with 2D-3D Alignment

Ankan Kumar Bhunia, Changjian Li, Hakan Bilen

CVPR 2024arXiv:2406.19393

#10130

EventPS: Real-Time Photometric Stereo Using an Event Camera

Bohan Yu, Jieji Ren, Jin Han et al.

CVPR 2024

#10131

Diffusion Handles Enabling 3D Edits for Diffusion Models by Lifting Activations to 3D

Karran Pandey, Paul Guerrero, Matheus Gadelha et al.

CVPR 2024highlightarXiv:2312.02190

#10132

Circuit Design and Efficient Simulation of Quantum Inner Product and Empirical Studies of Its Effect on Near-Term Hybrid Quantum-Classic Machine Learning

Hao Xiong, Yehui Tang, Xinyu Ye et al.

CVPR 2024

#10133

Uncovering What Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly

Hang Du, Sicheng Zhang, Binzhu Xie et al.

CVPR 2024arXiv:2405.00181

#10134

CrowdDiff: Multi-hypothesis Crowd Density Estimation using Diffusion Models

Yasiru Ranasinghe, Nithin Gopalakrishnan Nair, Wele Gedara Chaminda Bandara et al.

CVPR 2024arXiv:2303.12790

#10135

PSDPM: Prototype-based Secondary Discriminative Pixels Mining for Weakly Supervised Semantic Segmentation

Xinqiao Zhao, Ziqian Yang, Tianhong Dai et al.

CVPR 2024

#10136

Towards 3D Vision with Low-Cost Single-Photon Cameras

Fangzhou Mu, Carter Sifferman, Sacha Jungerman et al.

CVPR 2024arXiv:2403.17801

#10137

Shadows Don't Lie and Lines Can't Bend! Generative Models don't know Projective Geometry...for now

Ayush Sarkar, Hanlin Mai, Amitabh Mahapatra et al.

CVPR 2024arXiv:2311.17138

#10138

Aligning Logits Generatively for Principled Black-Box Knowledge Distillation

Jing Ma, Xiang Xiang, Ke Wang et al.

CVPR 2024arXiv:2205.10490

#10139

Semantically-Shifted Incremental Adapter-Tuning is A Continual ViTransformer

Yuwen Tan, Qinhao Zhou, Xiang Xiang et al.

CVPR 2024

#10140

Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models

David Stotko, Nils Wandel, Reinhard Klein

CVPR 2024arXiv:2311.12796

#10141

GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

Tong Wu, Guandao Yang, Zhibing Li et al.

CVPR 2024arXiv:2401.04092

#10142

On Train-Test Class Overlap and Detection for Image Retrieval

Chull Hwan Song, Jooyoung Yoon, Taebaek Hwang et al.

CVPR 2024arXiv:2404.01524

#10143

Orthogonal Adaptation for Modular Customization of Diffusion Models

Ryan Po, Guandao Yang, Kfir Aberman et al.

CVPR 2024highlightarXiv:2312.02432

#10144

Permutation Equivariance of Transformers and Its Applications

Hengyuan Xu, Liyao Xiang, Hangyu Ye et al.

CVPR 2024arXiv:2304.07735

#10145

ViTamin: Designing Scalable Vision Models in the Vision-Language Era

Jieneng Chen, Qihang Yu, Xiaohui Shen et al.

CVPR 2024arXiv:2404.02132

#10146

HomoFormer: Homogenized Transformer for Image Shadow Removal

Jie Xiao, Xueyang Fu, Yurui Zhu et al.

CVPR 2024

#10147

PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models

Yiming Zhang, Zhening Xing, Yanhong Zeng et al.

CVPR 2024arXiv:2312.13964

#10148

DemoCaricature: Democratising Caricature Generation with a Rough Sketch

Dar-Yen Chen, Ayan Kumar Bhunia, Subhadeep Koley et al.

CVPR 2024arXiv:2312.04364

#10149

End-to-End Spatio-Temporal Action Localisation with Video Transformers

Alexey Gritsenko, Xuehan Xiong, Josip Djolonga et al.

CVPR 2024arXiv:2304.12160

#10150

TRINS: Towards Multimodal Language Models that Can Read

Ruiyi Zhang, Yanzhe Zhang, Jian Chen et al.

CVPR 2024arXiv:2406.06730

#10151

EFHQ: Multi-purpose ExtremePose-Face-HQ dataset

Trung Dao, Duc H Vu, Cuong Pham et al.

CVPR 2024arXiv:2312.17205

#10152

Logarithmic Lenses: Exploring Log RGB Data for Image Classification

Bruce Maxwell, Sumegha Singhania, Avnish Patel et al.

CVPR 2024

#10153

Unlocking Pre-trained Image Backbones for Semantic Image Synthesis

Tariq Berrada, Jakob Verbeek, camille couprie et al.

CVPR 2024arXiv:2312.13314

#10154

TokenCompose: Text-to-Image Diffusion with Token-level Supervision

Zirui Wang, Zhizhou Sha, Zheng Ding et al.

CVPR 2024arXiv:2312.03626

#10155

Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates

Ka Chun SHUM, Jaeyeon Kim, Binh-Son Hua et al.

CVPR 2024arXiv:2309.11281

#10156

Infer from What You Have Seen Before: Temporally-dependent Classifier for Semi-supervised Video Segmentation

Jiafan Zhuang, Zilei Wang, Yixin Zhang et al.

CVPR 2024

#10157

Seeing the World through Your Eyes

Hadi Alzayer, Kevin Zhang, Brandon Y. Feng et al.

CVPR 2024arXiv:2306.09348

#10158

Learning Vision from Models Rivals Learning Vision from Data

Yonglong Tian, Lijie Fan, Kaifeng Chen et al.

CVPR 2024arXiv:2312.17742

#10159

RegionGPT: Towards Region Understanding Vision Language Model

Qiushan Guo, Shalini De Mello, Danny Yin et al.

CVPR 2024arXiv:2403.02330

#10160

Unlocking the Potential of Pre-trained Vision Transformers for Few-Shot Semantic Segmentation through Relationship Descriptors

Ziqin Zhou, Hai-Ming Xu, Yangyang Shu et al.

CVPR 2024

#10161

Holo-Relighting: Controllable Volumetric Portrait Relighting from a Single Image

Yiqun Mei, Yu Zeng, He Zhang et al.

CVPR 2024arXiv:2403.09632

#10162

Relightful Harmonization: Lighting-aware Portrait Background Replacement

Mengwei Ren, Wei Xiong, Jae Shin Yoon et al.

CVPR 2024arXiv:2312.06886

#10163

Taming the Tail in Class-Conditional GANs: Knowledge Sharing via Unconditional Training at Lower Resolutions

Saeed Khorram, Mingqi Jiang, Mohamad Shahbazi et al.

CVPR 2024arXiv:2402.17065

#10164

Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences

Axel Barroso-Laguna, Sowmya Munukutla, Victor Adrian Prisacariu et al.

CVPR 2024arXiv:2404.06337

#10165

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Xiang Yue, Yuansheng Ni, Kai Zhang et al.

CVPR 2024arXiv:2311.16502

#10166

MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation

Yanhui Wang, Jianmin Bao, Wenming Weng et al.

CVPR 2024highlightarXiv:2311.18829

#10167

MorpheuS: Neural Dynamic 360° Surface Reconstruction from Monocular RGB-D Video

Hengyi Wang, Jingwen Wang, Lourdes Agapito

CVPR 2024arXiv:2312.00778

#10168

Capturing Closely Interacted Two-Person Motions with Reaction Priors

Qi Fang, Yinghui Fan, Yanjun Li et al.

CVPR 2024

#10169

LightOctree: Lightweight 3D Spatially-Coherent Indoor Lighting Estimation

Xuecan Wang, Shibang Xiao, Xiaohui Liang

CVPR 2024arXiv:2404.03925

#10170

Bi-level Learning of Task-Specific Decoders for Joint Registration and One-Shot Medical Image Segmentation

Xin Fan, Xiaolin Wang, Jiaxin Gao et al.

CVPR 2024

#10171

FairCLIP: Harnessing Fairness in Vision-Language Learning

Yan Luo, MIN SHI, Muhammad Osama Khan et al.

CVPR 2024arXiv:2403.19949

#10172

Navigate Beyond Shortcuts: Debiased Learning Through the Lens of Neural Collapse

Yining Wang, Junjie Sun, Chenyue Wang et al.

CVPR 2024highlightarXiv:2405.05587

#10173

DiVa-360: The Dynamic Visual Dataset for Immersive Neural Fields

Cheng-You Lu, Peisen Zhou, Angela Xing et al.

CVPR 2024highlightarXiv:2307.16897

#10174

Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

Mark Hamilton, Andrew Zisserman, John Hershey et al.

CVPR 2024

#10175

Learning Visual Prompt for Gait Recognition

Kang Ma, Ying Fu, Chunshui Cao et al.

CVPR 2024

#10176

ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models

Xinyu Tian, Shu Zou, Zhaoyuan Yang et al.

CVPR 2024arXiv:2311.16494

#10177

PolarRec: Improving Radio Interferometric Data Reconstruction Using Polar Coordinates

Ruoqi Wang, Zhuoyang Chen, Jiayi Zhu et al.

CVPR 2024

#10178

LDP: Language-driven Dual-Pixel Image Defocus Deblurring Network

Hao Yang, Liyuan Pan, Yan Yang et al.

CVPR 2024arXiv:2307.09815

#10179

StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN

Jongwoo Choi, Kwanggyoon Seo, Amirsaman Ashtari et al.

CVPR 2024arXiv:2403.14186

#10180

MS-MANO: Enabling Hand Pose Tracking with Biomechanical Constraints

Pengfei Xie, Wenqiang Xu, Tutian Tang et al.

CVPR 2024arXiv:2404.10227

#10181

CGI-DM: Digital Copyright Authentication for Diffusion Models via Contrasting Gradient Inversion

Xiaoyu Wu, Yang Hua, Chumeng Liang et al.

CVPR 2024arXiv:2403.11162

#10182

Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation

Ming Xu, Stephen Gould

CVPR 2024arXiv:2404.01518

#10183

ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization

Weiyao Wang, Pierre Gleize, Hao Tang et al.

CVPR 2024arXiv:2401.08937

#10184

Learning Large-Factor EM Image Super-Resolution with Generative Priors

Jiateng Shou, Zeyu Xiao, Shiyu Deng et al.

CVPR 2024

#10185

PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion

Ying-Tian Liu, Yuan-Chen Guo, Guan Luo et al.

CVPR 2024arXiv:2312.09069

#10186

Learning for Transductive Threshold Calibration in Open-World Recognition

Qin ZHANG, DONGSHENG An, Tianjun Xiao et al.

CVPR 2024arXiv:2305.12039

#10187

Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation

Zhiwei Yang, Kexue Fu, Minghong Duan et al.

CVPR 2024arXiv:2402.18467

#10188

Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence

Junyi Zhang, Charles Herrmann, Junhwa Hur et al.

CVPR 2024arXiv:2311.17034

#10189

Amodal Ground Truth and Completion in the Wild

Guanqi Zhan, Chuanxia Zheng, Weidi Xie et al.

CVPR 2024arXiv:2312.17247

#10190

MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding

Chun-Peng Chang, Shaoxiang Wang, Alain Pagani et al.

CVPR 2024arXiv:2403.03077

#10191

Seeing Unseen: Discover Novel Biomedical Concepts via Geometry-Constrained Probabilistic Modeling

Jianan Fan, Dongnan Liu, Hang Chang et al.

CVPR 2024arXiv:2403.01053

#10192

Real-Time Exposure Correction via Collaborative Transformations and Adaptive Sampling

Ziwen Li, Feng Zhang, Meng Cao et al.

CVPR 2024

#10193

Gaussian Splatting SLAM

Hidenobu Matsuki, Riku Murai, Paul Kelly et al.

CVPR 2024highlightarXiv:2312.06741

#10194

PromptCoT: Align Prompt Distribution via Adapted Chain-of-Thought

Junyi Yao, Yijiang Liu, Zhen Dong et al.

CVPR 2024

#10195

NeLF-Pro: Neural Light Field Probes for Multi-Scale Novel View Synthesis

Zinuo You, Andreas Geiger, Anpei Chen

CVPR 2024arXiv:2312.13328

#10196

Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction

Guillaume Jaume, Anurag Vaidya, Richard J. Chen et al.

CVPR 2024arXiv:2304.06819

#10197

Practical Measurements of Translucent Materials with Inter-Pixel Translucency Prior

Zhenyu Chen, Jie Guo, Shuichang Lai et al.

CVPR 2024

#10198

RTracker: Recoverable Tracking via PN Tree Structured Memory

Yuqing Huang, Xin Li, Zikun Zhou et al.

CVPR 2024arXiv:2403.19242

#10199

View-Category Interactive Sharing Transformer for Incomplete Multi-View Multi-Label Learning

Shilong Ou, Zhe Xue, Yawen Li et al.

CVPR 2024highlight

#10200

Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models

Yabin Zhang, Wenjie Zhu, Hui Tang et al.

CVPR 2024arXiv:2403.17589

← Previous

1...49 50 51 52 53...62