Most Cited 2024 "cluster-directed mixed graphs" Papers

12,324 papers found • Page 57 of 62

#11201

AETTA: Label-Free Accuracy Estimation for Test-Time Adaptation

Taeckyung Lee, Sorn Chottananurak, Taesik Gong et al.

CVPR 2024posterarXiv:2404.01351
#11202

A Simple Recipe for Language-guided Domain Generalized Segmentation

Mohammad Fahes, TUAN-HUNG VU, Andrei Bursuc et al.

CVPR 2024posterarXiv:2311.17922
#11203

An Edit Friendly DDPM Noise Space: Inversion and Manipulations

Inbar Huberman-Spiegelglas, Vladimir Kulikov, Tomer Michaeli

CVPR 2024posterarXiv:2304.06140
#11204

AdaShift: Learning Discriminative Self-Gated Neural Feature Activation With an Adaptive Shift Factor

Sudong Cai

CVPR 2024poster
#11205

PredToken: Predicting Unknown Tokens and Beyond with Coarse-to-Fine Iterative Decoding

Xuesong Nie, Haoyuan Jin, Yunfeng Yan et al.

CVPR 2024poster
#11206

Holistic Features are almost Sufficient for Text-to-Video Retrieval

Kaibin Tian, Ruixiang Zhao, Zijie Xin et al.

CVPR 2024poster
#11207

Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection

Taeheon Kim, Sebin Shin, Youngjoon Yu et al.

CVPR 2024posterarXiv:2403.01300
#11208

Seeing the Unseen: Visual Common Sense for Semantic Placement

Ram Ramrakhya, Aniruddha Kembhavi, Dhruv Batra et al.

CVPR 2024posterarXiv:2401.07770
#11209

Diffuse Attend and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion

Junjiao Tian, Lavisha Aggarwal, Andrea Colaco et al.

CVPR 2024poster
#11210

GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation

Mukul Khanna, Ram Ramrakhya, Gunjan Chhablani et al.

CVPR 2024posterarXiv:2404.06609
#11211

WonderJourney: Going from Anywhere to Everywhere

Hong-Xing Yu, Haoyi Duan, Junhwa Hur et al.

CVPR 2024posterarXiv:2312.03884
#11212

CLIP-Driven Open-Vocabulary 3D Scene Graph Generation via Cross-Modality Contrastive Learning

Lianggangxu Chen, Xuejiao Wang, Jiale Lu et al.

CVPR 2024highlight
#11213

Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation

Luca Barsellotti, Roberto Amoroso, Marcella Cornia et al.

CVPR 2024posterarXiv:2404.06542
#11214

HRVDA: High-Resolution Visual Document Assistant

Chaohu Liu, Kun Yin, Haoyu Cao et al.

CVPR 2024posterarXiv:2404.06918
#11215

A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation

Qucheng Peng, Ce Zheng, Chen Chen

CVPR 2024posterarXiv:2403.11310
#11216

Building Bridges across Spatial and Temporal Resolutions: Reference-Based Super-Resolution via Change Priors and Conditional Diffusion Model

Runmin Dong, Shuai Yuan, Bin Luo et al.

CVPR 2024posterarXiv:2403.17460
#11217

Gaussian Shading: Provable Performance-Lossless Image Watermarking for Diffusion Models

Zijin Yang, Kai Zeng, Kejiang Chen et al.

CVPR 2024posterarXiv:2404.04956
#11218

Multimodal Sense-Informed Forecasting of 3D Human Motions

Zhenyu Lou, Qiongjie Cui, Haofan Wang et al.

CVPR 2024poster
#11219

Resolution Limit of Single-Photon LiDAR

Stanley H. Chan, Hashan K Weerasooriya, Weijian Zhang et al.

CVPR 2024posterarXiv:2403.17719
#11220

Correlation-aware Coarse-to-fine MLPs for Deformable Medical Image Registration

Mingyuan Meng, Dagan Feng, Lei Bi et al.

CVPR 2024posterarXiv:2406.00123
#11221

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Kristen Grauman, Andrew Westbury, Lorenzo Torresani et al.

CVPR 2024posterarXiv:2311.18259
#11222

CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided Attention

Mohammad Sadil Khan, Elona Dupont, Sk Aziz Ali et al.

CVPR 2024highlightarXiv:2402.17678
#11223

LAA-Net: Localized Artifact Attention Network for Quality-Agnostic and Generalizable Deepfake Detection

Dat NGUYEN, Nesryne Mejri, Inder Pal Singh et al.

CVPR 2024posterarXiv:2401.13856
#11224

The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing

Denis Bobkov, Vadim Titov, Aibek Alanov et al.

CVPR 2024posterarXiv:2406.10601
#11225

Adaptive Random Feature Regularization on Fine-tuning Deep Neural Networks

Shin&#x27, ya Yamaguchi, Sekitoshi Kanai et al.

CVPR 2024poster
#11226

Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement

Zaid Khan, Vijay Kumar BG, Samuel Schulter et al.

CVPR 2024posterarXiv:2404.04627
#11227

Generating Enhanced Negatives for Training Language-Based Object Detectors

Shiyu Zhao, Long Zhao, Vijay Kumar BG et al.

CVPR 2024posterarXiv:2401.00094
#11228

Joint-Task Regularization for Partially Labeled Multi-Task Learning

Kento Nishi, Junsik Kim, Wanhua Li et al.

CVPR 2024posterarXiv:2404.01976
#11229

MRFP: Learning Generalizable Semantic Segmentation from Sim-2-Real with Multi-Resolution Feature Perturbation

Sumanth Udupa, Prajwal Gurunath, Aniruddh Sikdar et al.

CVPR 2024posterarXiv:2311.18331
#11230

Object Recognition as Next Token Prediction

Kaiyu Yue, Bor-Chun Chen, Jonas Geiping et al.

CVPR 2024highlightarXiv:2312.02142
#11231

MuGE: Multiple Granularity Edge Detection

Caixia Zhou, Yaping Huang, Mengyang Pu et al.

CVPR 2024poster
#11232

Local-consistent Transformation Learning for Rotation-invariant Point Cloud Analysis

Yiyang Chen, Lunhao Duan, Shanshan Zhao et al.

CVPR 2024posterarXiv:2403.11113
#11233

Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis

Zhan Li, Zhang Chen, Zhong Li et al.

CVPR 2024posterarXiv:2312.16812
#11234

LOTUS: Evasive and Resilient Backdoor Attacks through Sub-Partitioning

Siyuan Cheng, Guanhong Tao, Yingqi Liu et al.

CVPR 2024posterarXiv:2403.17188
#11235

The More You See in 2D the More You Perceive in 3D

Xinyang Han, Zelin Gao, Angjoo Kanazawa et al.

CVPR 2024highlightarXiv:2404.03652
#11236

What When and Where? Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions

Brian Chen, Nina Shvetsova, Andrew Rouditchenko et al.

CVPR 2024posterarXiv:2303.16990
#11237

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Jack Urbanek, Florian Bordes, Pietro Astolfi et al.

CVPR 2024posterarXiv:2312.08578
#11238

Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities

Mingcheng Li, Dingkang Yang, Xiao Zhao et al.

CVPR 2024posterarXiv:2404.16456
#11239

ES³: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations

Yuanhang Zhang, Shuang Yang, Shiguang Shan et al.

CVPR 2024poster
#11240

Depth-aware Test-Time Training for Zero-shot Video Object Segmentation

Weihuang Liu, Xi Shen, Haolun Li et al.

CVPR 2024posterarXiv:2403.04258
#11241

MSU-4S - The Michigan State University Four Seasons Dataset

Daniel Kent, Mohammed Alyaqoub, Xiaohu Lu et al.

CVPR 2024poster
#11242

An Interactive Navigation Method with Effect-oriented Affordance

Xiaohan Wang, Yuehu LIU, Xinhang Song et al.

CVPR 2024poster
#11243

Rapid 3D Model Generation with Intuitive 3D Input

Tianrun Chen, Chaotao Ding, Shangzhan Zhang et al.

CVPR 2024highlight
#11244

Unsupervised Salient Instance Detection

Xin Tian, Ke Xu, Rynson W.H. Lau

CVPR 2024poster
#11245

Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception

Junwen He, Yifan Wang, Lijun Wang et al.

CVPR 2024highlightarXiv:2403.02969
#11246

CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation

Zineng Tang, Ziyi Yang, MAHMOUD KHADEMI et al.

CVPR 2024highlight
#11247

PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation

Yuqi Wang, Yuntao Chen, Xingyu Liao et al.

CVPR 2024posterarXiv:2306.10013
#11248

AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-Resolution

Cheeun Hong, Kyoung Mu Lee

CVPR 2024posterarXiv:2404.03296
#11249

MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection

Jakub Micorek, Horst Possegger, Dominik Narnhofer et al.

CVPR 2024posterarXiv:2403.14497
#11250

Instance-level Expert Knowledge and Aggregate Discriminative Attention for Radiology Report Generation

Shenshen Bu, Taiji Li, Zhiming Dai et al.

CVPR 2024poster
#11251

HyperSDFusion: Bridging Hierarchical Structures in Language and Geometry for Enhanced 3D Text2Shape Generation

Zhiying Leng, Tolga Birdal, Xiaohui Liang et al.

CVPR 2024posterarXiv:2403.00372
#11252

Just Add ?! Pose Induced Video Transformers for Understanding Activities of Daily Living

Dominick Reilly, Srijan Das

CVPR 2024poster
#11253

PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics

Tianyi Xie, Zeshun Zong, Yuxing Qiu et al.

CVPR 2024highlightarXiv:2311.12198
#11254

Viewpoint-Aware Visual Grounding in 3D Scenes

Xiangxi Shi, Zhonghua Wu, Stefan Lee

CVPR 2024poster
#11255

Long-Tail Class Incremental Learning via Independent Sub-prototype Construction

Xi Wang, Xu Yang, Jie Yin et al.

CVPR 2024poster
#11256

An Aggregation-Free Federated Learning for Tackling Data Heterogeneity

Yuan Wang, Huazhu Fu, Renuga Kanagavelu et al.

CVPR 2024posterarXiv:2404.18962
#11257

Infrared Adversarial Car Stickers

Xiaopei Zhu, Yuqiu Liu, Zhanhao Hu et al.

CVPR 2024posterarXiv:2405.09924
#11258

XFibrosis: Explicit Vessel-Fiber Modeling for Fibrosis Staging from Liver Pathology Images

CHONG YIN, Siqi Liu, Fei Lyu et al.

CVPR 2024poster
#11259

Advancing Saliency Ranking with Human Fixations: Dataset Models and Benchmarks

Bowen Deng, Siyang Song, Andrew French et al.

CVPR 2024poster
#11260

Implicit Event-RGBD Neural SLAM

Delin Qu, Chi Yan, Dong Wang et al.

CVPR 2024highlightarXiv:2311.11013
#11261

Retraining-Free Model Quantization via One-Shot Weight-Coupling Learning

Chen Tang, Yuan Meng, Jiacheng Jiang et al.

CVPR 2024posterarXiv:2401.01543
#11262

DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior

Tianyu Huang, Yihan Zeng, Zhilu Zhang et al.

CVPR 2024posterarXiv:2312.06439
#11263

From Coarse to Fine-Grained Open-Set Recognition

Nico Lang, Vésteinn Snæbjarnarson, Elijah Cole et al.

CVPR 2024poster
#11264

Exploiting Inter-sample and Inter-feature Relations in Dataset Distillation

Wenxiao Deng, Wenbin Li, Tianyu Ding et al.

CVPR 2024posterarXiv:2404.00563
#11265

Discriminative Pattern Calibration Mechanism for Source-Free Domain Adaptation

Haifeng Xia, Siyu Xia, Zhengming Ding

CVPR 2024poster
#11266

RAM-Avatar: Real-time Photo-Realistic Avatar from Monocular Videos with Full-body Control

xiang deng, Zerong Zheng, Yuxiang Zhang et al.

CVPR 2024poster
#11267

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Zhen Li, Mingdeng Cao, Xintao Wang et al.

CVPR 2024posterarXiv:2312.04461
#11268

Privacy-Preserving Face Recognition Using Trainable Feature Subtraction

Yuxi Mi, Zhizhou Zhong, Yuge Huang et al.

CVPR 2024posterarXiv:2403.12457
#11269

Arbitrary Motion Style Transfer with Multi-condition Motion Latent Diffusion Model

Wenfeng Song, Xingliang Jin, Shuai Li et al.

CVPR 2024poster
#11270

3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling

Chaokang Jiang, Guangming Wang, Jiuming Liu et al.

CVPR 2024posterarXiv:2402.18146
#11271

CPR-Coach: Recognizing Composite Error Actions based on Single-class Training

Shunli Wang, Shuaibing Wang, Dingkang Yang et al.

CVPR 2024posterarXiv:2309.11718
#11272

Restoration by Generation with Constrained Priors

Zheng Ding, Xuaner Zhang, Zhuowen Tu et al.

CVPR 2024highlightarXiv:2312.17161
#11273

Unified Entropy Optimization for Open-Set Test-Time Adaptation

Zhengqing Gao, Xu-Yao Zhang, Cheng-Lin Liu

CVPR 2024posterarXiv:2404.06065
#11274

Poly Kernel Inception Network for Remote Sensing Detection

Xinhao Cai, Qiuxia Lai, Yuwei Wang et al.

CVPR 2024posterarXiv:2403.06258
#11275

Distraction is All You Need: Memory-Efficient Image Immunization against Diffusion-Based Image Editing

Ling Lo, Cheng Yeo, Hong-Han Shuai et al.

CVPR 2024highlight
#11276

Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model

Zelin Peng, Zhengqin Xu, Zhilin Zeng et al.

CVPR 2024posterarXiv:2311.17112
#11277

MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction

Xiaolu Liu, Song Wang, Wentong Li et al.

CVPR 2024posterarXiv:2404.00876
#11278

ViT-Lens: Towards Omni-modal Representations

Stan Weixian Lei, Yixiao Ge, Kun Yi et al.

CVPR 2024posterarXiv:2311.16081
#11279

Prompt-Driven Referring Image Segmentation with Instance Contrasting

Chao Shang, Zichen Song, Heqian Qiu et al.

CVPR 2024poster
#11280

CosmicMan: A Text-to-Image Foundation Model for Humans

Shikai Li, Jianglin Fu, Kaiyuan Liu et al.

CVPR 2024highlightarXiv:2404.01294
#11281

MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark

Sanghyun Woo, Kwanyong Park, Inkyu Shin et al.

CVPR 2024posterarXiv:2403.20225
#11282

Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering

Zhaohe Liao, Jiangtong Li, Li Niu et al.

CVPR 2024posterarXiv:2407.03008
#11283

Overload: Latency Attacks on Object Detection for Edge Devices

Erh-Chung Chen, Pin-Yu Chen, I-Hsin Chung et al.

CVPR 2024posterarXiv:2304.05370
#11284

Neural Exposure Fusion for High-Dynamic Range Object Detection

Emmanuel Onzon, Maximilian Bömer, Fahim Mannan et al.

CVPR 2024poster
#11285

Semantics Distortion and Style Matter: Towards Source-free UDA for Panoramic Segmentation

Xu Zheng, Pengyuan Zhou, ATHANASIOS et al.

CVPR 2024poster
#11286

PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness

Anh-Quan Cao, Angela Dai, Raoul de Charette

CVPR 2024posterarXiv:2312.02158
#11287

Evaluating Transferability in Retrieval Tasks: An Approach Using MMD and Kernel Methods

Mengyu Dai, Amir Hossein Raffiee, Aashish Jain et al.

CVPR 2024poster
#11288

ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks

Kai Han, Yunhe Wang, Jianyuan Guo et al.

CVPR 2024poster
#11289

Confronting Ambiguity in 6D Object Pose Estimation via Score-Based Diffusion on SE(3)

Tsu-Ching Hsiao, Hao-Wei Chen, Hsuan-Kung Yang et al.

CVPR 2024posterarXiv:2305.15873
#11290

Communication-Efficient Collaborative Perception via Information Filling with Codebook

Yue Hu, Juntong Peng, Sifei Liu et al.

CVPR 2024posterarXiv:2405.04966
#11291

QUADify: Extracting Meshes with Pixel-level Details and Materials from Images

Maximilian Frühauf, Hayko Riemenschneider, Markus Gross et al.

CVPR 2024highlight
#11292

Enhancing Post-training Quantization Calibration through Contrastive Learning

Yuzhang Shang, Gaowen Liu, Ramana Kompella et al.

CVPR 2024poster
#11293

LASO: Language-guided Affordance Segmentation on 3D Object

Yicong Li, Na Zhao, Junbin Xiao et al.

CVPR 2024poster
#11294

Dispersed Structured Light for Hyperspectral 3D Imaging

Suhyun Shin, Seokjun Choi, Felix Heide et al.

CVPR 2024posterarXiv:2311.18287
#11295

DualAD: Disentangling the Dynamic and Static World for End-to-End Driving

Simon Doll, Niklas Hanselmann, Lukas Schneider et al.

CVPR 2024posterarXiv:2406.06264
#11296

Focus on Hiders: Exploring Hidden Threats for Enhancing Adversarial Training

Qian Li, Yuxiao Hu, Yinpeng Dong et al.

CVPR 2024posterarXiv:2312.07067
#11297

ColorPCR: Color Point Cloud Registration with Multi-Stage Geometric-Color Fusion

Juncheng Mu, Lin Bie, Shaoyi Du et al.

CVPR 2024poster
#11298

Any-Shift Prompting for Generalization over Distributions

Zehao Xiao, Jiayi Shen, Mohammad Mahdi Derakhshani et al.

CVPR 2024posterarXiv:2402.10099
#11299

Time- Memory- and Parameter-Efficient Visual Adaptation

Otniel-Bogdan Mercea, Alexey Gritsenko, Cordelia Schmid et al.

CVPR 2024highlightarXiv:2402.02887
#11300

Behind the Veil: Enhanced Indoor 3D Scene Reconstruction with Occluded Surfaces Completion

Su Sun, Cheng Zhao, Yuliang Guo et al.

CVPR 2024posterarXiv:2404.03070
#11301

Revisiting Counterfactual Problems in Referring Expression Comprehension

Zhihan Yu, Ruifan Li

CVPR 2024poster
#11302

Differentiable Point-based Inverse Rendering

Hoon-Gyu Chung, Seokjun Choi, Seung-Hwan Baek

CVPR 2024posterarXiv:2312.02480
#11303

VMINer: Versatile Multi-view Inverse Rendering with Near- and Far-field Light Sources

Fan Fei, Jiajun Tang, Ping Tan et al.

CVPR 2024highlight
#11304

ActiveDC: Distribution Calibration for Active Finetuning

Wenshuai Xu, Zhenghui Hu, Yu Lu et al.

CVPR 2024posterarXiv:2311.07634
#11305

AUEditNet: Dual-Branch Facial Action Unit Intensity Manipulation with Implicit Disentanglement

Shiwei Jin, Zhen Wang, Lei Wang et al.

CVPR 2024posterarXiv:2404.05063
#11306

VecFusion: Vector Font Generation with Diffusion

Vikas Thamizharasan, Difan Liu, Shantanu Agarwal et al.

CVPR 2024highlightarXiv:2312.10540
#11307

Generating Non-Stationary Textures using Self-Rectification

Yang Zhou, Rongjun Xiao, Dani Lischinski et al.

CVPR 2024posterarXiv:2401.02847
#11308

OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising

Haichao Zhang, Yi Xu, Hongsheng Lu et al.

CVPR 2024posterarXiv:2404.02227
#11309

CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation

Jun Wang, Yuzhe Qin, Kaiming Kuang et al.

CVPR 2024posterarXiv:2402.14795
#11310

Video Harmonization with Triplet Spatio-Temporal Variation Patterns

Zonghui Guo, XinYu Han, Jie Zhang et al.

CVPR 2024poster
#11311

Rethinking Interactive Image Segmentation with Low Latency High Quality and Diverse Prompts

Qin Liu, Jaemin Cho, Mohit Bansal et al.

CVPR 2024posterarXiv:2404.00741
#11312

SeMoLi: What Moves Together Belongs Together

Jenny Seidenschwarz, Aljoša Ošep, Francesco Ferroni et al.

CVPR 2024posterarXiv:2402.19463
#11313

HINTED: Hard Instance Enhanced Detector with Mixed-Density Feature Fusion for Sparsely-Supervised 3D Object Detection

Qiming Xia, Wei Ye, Hai Wu et al.

CVPR 2024poster
#11314

NAYER: Noisy Layer Data Generation for Efficient and Effective Data-free Knowledge Distillation

Minh-Tuan Tran, Trung Le, Xuan-May Le et al.

CVPR 2024posterarXiv:2310.00258
#11315

Domain-Specific Block Selection and Paired-View Pseudo-Labeling for Online Test-Time Adaptation

Yeonguk Yu, Sungho Shin, Seunghyeok Back et al.

CVPR 2024posterarXiv:2404.10966
#11316

Revamping Federated Learning Security from a Defender's Perspective: A Unified Defense with Homomorphic Encrypted Data Space

Naveen Kumar Kummari, Reshmi Mitra, Krishna Mohan Chalavadi

CVPR 2024poster
#11317

LLMs are Good Sign Language Translators

Jia Gong, Lin Geng Foo, Yixuan He et al.

CVPR 2024posterarXiv:2404.00925
#11318

PanoRecon: Real-Time Panoptic 3D Reconstruction from Monocular Video

Dong Wu, Zike Yan, Hongbin Zha

CVPR 2024poster
#11319

TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding

Yun Liu, Haolin Yang, Xu Si et al.

CVPR 2024posterarXiv:2401.08399
#11320

Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning

Rongjie Li, Yu Wu, Xuming He

CVPR 2024posterarXiv:2404.00909
#11321

Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations

Chenyu You, Yifei Min, Weicheng Dai et al.

CVPR 2024posterarXiv:2403.07241
#11322

Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle

Youtian Lin, Zuozhuo Dai, Siyu Zhu et al.

CVPR 2024highlightarXiv:2312.03431
#11323

Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs

Hao Fei, Shengqiong Wu, Wei Ji et al.

CVPR 2024posterarXiv:2308.13812
#11324

Can Biases in ImageNet Models Explain Generalization?

Paul Gavrikov, Janis Keuper

CVPR 2024posterarXiv:2404.01509
#11325

HumMUSS: Human Motion Understanding using State Space Models

Arnab Mondal, Stefano Alletto, Denis Tome

CVPR 2024posterarXiv:2404.10880
#11326

Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations

Sangmin Lee, Bolin Lai, Fiona Ryan et al.

CVPR 2024posterarXiv:2403.02090
#11327

FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

Sicheng Mo, Fangzhou Mu, Kuan Heng Lin et al.

CVPR 2024posterarXiv:2312.07536
#11328

How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?

Yuxin Chen, Zongyang Ma, Ziqi Zhang et al.

CVPR 2024posterarXiv:2407.07479
#11329

Revisiting Adversarial Training at Scale

Zeyu Wang, Xianhang li, Hongru Zhu et al.

CVPR 2024posterarXiv:2401.04727
#11330

G-FARS: Gradient-Field-based Auto-Regressive Sampling for 3D Part Grouping

Junfeng Cheng, Tania Stathaki

CVPR 2024posterarXiv:2405.06828
#11331

Make Pixels Dance: High-Dynamic Video Generation

Yan Zeng, Guoqiang Wei, Jiani Zheng et al.

CVPR 2024posterarXiv:2311.10982
#11332

Masked AutoDecoder is Effective Multi-Task Vision Generalist

Han Qiu, Jiaxing Huang, Peng Gao et al.

CVPR 2024posterarXiv:2403.07692
#11333

Generative Multi-modal Models are Good Class Incremental Learners

Xusheng Cao, Haori Lu, Linlan Huang et al.

CVPR 2024poster
#11334

Deciphering ‘What’ and ‘Where’ Visual Pathways from Spectral Clustering of Layer-Distributed Neural Representations

Xiao Zhang, David Yunis, Michael Maire

CVPR 2024highlightarXiv:2312.06716
#11335

LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction

Bo Zou, Chao Yang, Yu Qiao et al.

CVPR 2024posterarXiv:2404.00913
#11336

EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models

Sijie Cheng, Zhicheng Guo, Jingwen Wu et al.

CVPR 2024highlightarXiv:2311.15596
#11337

Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks Methods and Applications

Karren Yang, Anurag Ranjan, Jen-Hao Rick Chang et al.

CVPR 2024poster
#11338

From Feature to Gaze: A Generalizable Replacement of Linear Layer for Gaze Estimation

Yiwei Bao, Feng Lu

CVPR 2024highlight
#11339

NC-SDF: Enhancing Indoor Scene Reconstruction Using Neural SDFs with View-Dependent Normal Compensation

Ziyi Chen, Xiaolong Wu, Yu Zhang

CVPR 2024posterarXiv:2405.00340
#11340

Language Models as Black-Box Optimizers for Vision-Language Models

Shihong Liu, Samuel Yu, Zhiqiu Lin et al.

CVPR 2024posterarXiv:2309.05950
#11341

Transferable Structural Sparse Adversarial Attack Via Exact Group Sparsity Training

Di Ming, Peng Ren, Yunlong Wang et al.

CVPR 2024poster
#11342

Holistic Autonomous Driving Understanding by Bird’s-Eye-View Injected Multi-Modal Large Models

Xinpeng Ding, Jianhua Han, Hang Xu et al.

CVPR 2024posterarXiv:2401.00988
#11343

ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering

Haokai Pang, Heming Zhu, Adam Kortylewski et al.

CVPR 2024posterarXiv:2312.05941
#11344

Equivariant Plug-and-Play Image Reconstruction

Matthieu Terris, Thomas Moreau, Nelly Pustelnik et al.

CVPR 2024posterarXiv:2312.01831
#11345

DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars

Tobias Kirschstein, Simon Giebenhain, Matthias Nießner

CVPR 2024posterarXiv:2311.18635
#11346

Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection

Jiaming Li, Jiacheng Zhang, Jichang Li et al.

CVPR 2024posterarXiv:2406.00510
#11347

Addressing Background Context Bias in Few-Shot Segmentation through Iterative Modulation

Lanyun Zhu, Tianrun Chen, Jianxiong Yin et al.

CVPR 2024poster
#11348

OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation

Xiongwei Wu, Sicheng Yu, Ee-Peng Lim et al.

CVPR 2024posterarXiv:2404.01409
#11349

AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection

Trevine Oorloff, Surya Koppisetti, Nicolo Bonettini et al.

CVPR 2024posterarXiv:2406.02951
#11350

CaKDP: Category-aware Knowledge Distillation and Pruning Framework for Lightweight 3D Object Detection

Haonan Zhang, Longjun Liu, Yuqi Huang et al.

CVPR 2024poster
#11351

Friendly Sharpness-Aware Minimization

Tao Li, Pan Zhou, Zhengbao He et al.

CVPR 2024posterarXiv:2403.12350
#11352

CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation

Xi Liu, Ying Guo, Cheng Zhen et al.

CVPR 2024posterarXiv:2403.00274
#11353

Brain Decodes Deep Nets

Huzheng Yang, James Gee, Jianbo Shi

CVPR 2024highlightarXiv:2312.01280
#11354

MoSAR: Monocular Semi-Supervised Model for Avatar Reconstruction using Differentiable Shading

Abdallah Dib, Luiz Gustavo Hafemann, Emeline Got et al.

CVPR 2024posterarXiv:2312.13091
#11355

Point2CAD: Reverse Engineering CAD Models from 3D Point Clouds

Yujia Liu, Anton Obukhov, Jan D. Wegner et al.

CVPR 2024highlightarXiv:2312.04962
#11356

A Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid Transformer and Contrastive Learning

Yuelin Zhang, Pengyu Zheng, Wanquan Yan et al.

CVPR 2024posterarXiv:2403.02611
#11357

Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting

Haipeng Liu, Yang Wang, Biao Qian et al.

CVPR 2024posterarXiv:2403.19898
#11358

Misalignment-Robust Frequency Distribution Loss for Image Transformation

Zhangkai Ni, Juncheng Wu, Zian Wang et al.

CVPR 2024posterarXiv:2402.18192
#11359

WildlifeMapper: Aerial Image Analysis for Multi-Species Detection and Identification

Satish Kumar, Bowen Zhang, Chandrakanth Gudavalli et al.

CVPR 2024poster
#11360

SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking

Xiaojun Hou, Jiazheng Xing, Yijie Qian et al.

CVPR 2024posterarXiv:2403.16002
#11361

SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System

Yunfei Fan, Tianyu Zhao, Guidong Wang

CVPR 2024posterarXiv:2312.01616
#11362

MACE: Mass Concept Erasure in Diffusion Models

Shilin Lu, Zilan Wang, Leyang Li et al.

CVPR 2024posterarXiv:2403.06135
#11363

DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

Tianhao Qi, Shancheng Fang, Yanze Wu et al.

CVPR 2024highlightarXiv:2403.06951
#11364

Learning Degradation-unaware Representation with Prior-based Latent Transformations for Blind Face Restoration

Lianxin Xie, csbingbing zheng, Wen Xue et al.

CVPR 2024poster
#11365

360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model

Qian Wang, Weiqi Li, Chong Mou et al.

CVPR 2024posterarXiv:2401.06578
#11366

Alpha Invariance: On Inverse Scaling Between Distance and Volume Density in Neural Radiance Fields

Joshua Ahn, Haochen Wang, Raymond A. Yeh et al.

CVPR 2024posterarXiv:2404.02155
#11367

Countering Personalized Text-to-Image Generation with Influence Watermarks

Hanwen Liu, Zhicheng Sun, Yadong Mu

CVPR 2024poster
#11368

Teeth-SEG: An Efficient Instance Segmentation Framework for Orthodontic Treatment based on Multi-Scale Aggregation and Anthropic Prior Knowledge

Bo Zou, Shaofeng Wang, Hao Liu et al.

CVPR 2024poster
#11369

T-VSL: Text-Guided Visual Sound Source Localization in Mixtures

Tanvir Mahmud, Yapeng Tian, Diana Marculescu

CVPR 2024posterarXiv:2404.01751
#11370

ANIM: Accurate Neural Implicit Model for Human Reconstruction from a single RGB-D Image

Marco Pesavento, Yuanlu Xu, Nikolaos Sarafianos et al.

CVPR 2024posterarXiv:2403.10357
#11371

vid-TLDR: Training Free Token Merging for Light-weight Video Transformer

Joonmyung Choi, Sanghyeok Lee, Jaewon Chu et al.

CVPR 2024posterarXiv:2403.13347
#11372

Initialization Matters for Adversarial Transfer Learning

Andong Hua, Jindong Gu, Zhiyu Xue et al.

CVPR 2024posterarXiv:2312.05716
#11373

MindBridge: A Cross-Subject Brain Decoding Framework

Shizun Wang, Songhua Liu, Zhenxiong Tan et al.

CVPR 2024highlightarXiv:2404.07850
#11374

Loopy-SLAM: Dense Neural SLAM with Loop Closures

Lorenzo Liso, Erik Sandström, Vladimir Yugay et al.

CVPR 2024posterarXiv:2402.09944
#11375

Weakly-Supervised Audio-Visual Video Parsing with Prototype-based Pseudo-Labeling

Kranthi Kumar Rachavarapu, Kalyan Ramakrishnan, A. N. Rajagopalan

CVPR 2024poster
#11376

MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models

Sanjoy Chowdhury, Sayan Nag, Joseph K J et al.

CVPR 2024highlightarXiv:2406.04673
#11377

InstaGen: Enhancing Object Detection by Training on Synthetic Dataset

Chengjian Feng, Yujie Zhong, Zequn Jie et al.

CVPR 2024posterarXiv:2402.05937
#11378

Narrative Action Evaluation with Prompt-Guided Multimodal Interaction

Shiyi Zhang, Sule Bai, Guangyi Chen et al.

CVPR 2024posterarXiv:2404.14471
#11379

DeconfuseTrack: Dealing with Confusion for Multi-Object Tracking

Cheng Huang, Shoudong Han, Mengyu He et al.

CVPR 2024poster
#11380

ChatPose: Chatting about 3D Human Pose

Yao Feng, Jing Lin, Sai Kumar Dwivedi et al.

CVPR 2024posterarXiv:2311.18836
#11381

Modality-agnostic Domain Generalizable Medical Image Segmentation by Multi-Frequency in Multi-Scale Attention

Ju-Hyeon Nam, Nur Suriza Syazwany, Su Jung Kim et al.

CVPR 2024posterarXiv:2405.06284
#11382

NC-TTT: A Noise Constrastive Approach for Test-Time Training

David OSOWIECHI, Gustavo Vargas Hakim, Mehrdad Noori et al.

CVPR 2024highlight
#11383

JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments

Duy Tho Le, Chenhui Gou, Stavya Datta et al.

CVPR 2024posterarXiv:2404.01686
#11384

Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models

Jingyao Xu, Yuetong Lu, Yandong Li et al.

CVPR 2024posterarXiv:2404.15081
#11385

ESCAPE: Encoding Super-keypoints for Category-Agnostic Pose Estimation

Khoi D Nguyen, Chen Li, Gim Hee Lee

CVPR 2024poster
#11386

Minimal Perspective Autocalibration

Andrea Porfiri Dal Cin, Timothy Duff, Luca Magri et al.

CVPR 2024posterarXiv:2405.05605
#11387

ReGenNet: Towards Human Action-Reaction Synthesis

Liang Xu, Yizhou Zhou, Yichao Yan et al.

CVPR 2024posterarXiv:2403.11882
#11388

RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos

Hongchi Xia, Yang Fu, Sifei Liu et al.

CVPR 2024posterarXiv:2401.12592
#11389

Aligning and Prompting Everything All at Once for Universal Visual Perception

Yunhang Shen, Chaoyou Fu, Peixian Chen et al.

CVPR 2024posterarXiv:2312.02153
#11390

ZONE: Zero-Shot Instruction-Guided Local Editing

Shanglin Li, Bohan Zeng, Yutang Feng et al.

CVPR 2024posterarXiv:2312.16794
#11391

Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption

Buzhen Huang, Chen Li, Chongyang Xu et al.

CVPR 2024posterarXiv:2404.11291
#11392

Label Propagation for Zero-shot Classification with Vision-Language Models

Vladan Stojnić, Yannis Kalantidis, Giorgos Tolias

CVPR 2024posterarXiv:2404.04072
#11393

IQ-VFI: Implicit Quadratic Motion Estimation for Video Frame Interpolation

Mengshun Hu, Kui Jiang, Zhihang Zhong et al.

CVPR 2024poster
#11394

Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition

Anqi Zhu, Qiuhong Ke, Mingming Gong et al.

CVPR 2024posterarXiv:2406.13327
#11395

Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous and Instruction-guided Driving

Brian Yang, Huangyuan Su, Nikolaos Gkanatsios et al.

CVPR 2024poster
#11396

Structured Model Probing: Empowering Efficient Transfer Learning by Structured Regularization

Zhi-Fan Wu, Chaojie Mao, Xue Wang et al.

CVPR 2024poster
#11397

CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation

Lingjun Zhao, Jingyu Song, Katherine Skinner

CVPR 2024posterarXiv:2403.19104
#11398

Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing

Bingyan Liu, Chengyu Wang, Tingfeng Cao et al.

CVPR 2024posterarXiv:2403.03431
#11399

TULIP: Transformer for Upsampling of LiDAR Point Clouds

Bin Yang, Patrick Pfreundschuh, Roland Siegwart et al.

CVPR 2024posterarXiv:2312.06733
#11400

Incremental Residual Concept Bottleneck Models

Chenming Shang, Shiji Zhou, Hengyuan Zhang et al.

CVPR 2024posterarXiv:2404.08978