Most Cited 2024 "cluster-directed mixed graphs" Papers
12,324 papers found • Page 57 of 62
Conference
AETTA: Label-Free Accuracy Estimation for Test-Time Adaptation
Taeckyung Lee, Sorn Chottananurak, Taesik Gong et al.
A Simple Recipe for Language-guided Domain Generalized Segmentation
Mohammad Fahes, TUAN-HUNG VU, Andrei Bursuc et al.
An Edit Friendly DDPM Noise Space: Inversion and Manipulations
Inbar Huberman-Spiegelglas, Vladimir Kulikov, Tomer Michaeli
AdaShift: Learning Discriminative Self-Gated Neural Feature Activation With an Adaptive Shift Factor
Sudong Cai
PredToken: Predicting Unknown Tokens and Beyond with Coarse-to-Fine Iterative Decoding
Xuesong Nie, Haoyuan Jin, Yunfeng Yan et al.
Holistic Features are almost Sufficient for Text-to-Video Retrieval
Kaibin Tian, Ruixiang Zhao, Zijie Xin et al.
Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection
Taeheon Kim, Sebin Shin, Youngjoon Yu et al.
Seeing the Unseen: Visual Common Sense for Semantic Placement
Ram Ramrakhya, Aniruddha Kembhavi, Dhruv Batra et al.
Diffuse Attend and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion
Junjiao Tian, Lavisha Aggarwal, Andrea Colaco et al.
GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation
Mukul Khanna, Ram Ramrakhya, Gunjan Chhablani et al.
WonderJourney: Going from Anywhere to Everywhere
Hong-Xing Yu, Haoyi Duan, Junhwa Hur et al.
CLIP-Driven Open-Vocabulary 3D Scene Graph Generation via Cross-Modality Contrastive Learning
Lianggangxu Chen, Xuejiao Wang, Jiale Lu et al.
Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation
Luca Barsellotti, Roberto Amoroso, Marcella Cornia et al.
HRVDA: High-Resolution Visual Document Assistant
Chaohu Liu, Kun Yin, Haoyu Cao et al.
A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation
Qucheng Peng, Ce Zheng, Chen Chen
Building Bridges across Spatial and Temporal Resolutions: Reference-Based Super-Resolution via Change Priors and Conditional Diffusion Model
Runmin Dong, Shuai Yuan, Bin Luo et al.
Gaussian Shading: Provable Performance-Lossless Image Watermarking for Diffusion Models
Zijin Yang, Kai Zeng, Kejiang Chen et al.
Multimodal Sense-Informed Forecasting of 3D Human Motions
Zhenyu Lou, Qiongjie Cui, Haofan Wang et al.
Resolution Limit of Single-Photon LiDAR
Stanley H. Chan, Hashan K Weerasooriya, Weijian Zhang et al.
Correlation-aware Coarse-to-fine MLPs for Deformable Medical Image Registration
Mingyuan Meng, Dagan Feng, Lei Bi et al.
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Kristen Grauman, Andrew Westbury, Lorenzo Torresani et al.
CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided Attention
Mohammad Sadil Khan, Elona Dupont, Sk Aziz Ali et al.
LAA-Net: Localized Artifact Attention Network for Quality-Agnostic and Generalizable Deepfake Detection
Dat NGUYEN, Nesryne Mejri, Inder Pal Singh et al.
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing
Denis Bobkov, Vadim Titov, Aibek Alanov et al.
Adaptive Random Feature Regularization on Fine-tuning Deep Neural Networks
Shin', ya Yamaguchi, Sekitoshi Kanai et al.
Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement
Zaid Khan, Vijay Kumar BG, Samuel Schulter et al.
Generating Enhanced Negatives for Training Language-Based Object Detectors
Shiyu Zhao, Long Zhao, Vijay Kumar BG et al.
Joint-Task Regularization for Partially Labeled Multi-Task Learning
Kento Nishi, Junsik Kim, Wanhua Li et al.
MRFP: Learning Generalizable Semantic Segmentation from Sim-2-Real with Multi-Resolution Feature Perturbation
Sumanth Udupa, Prajwal Gurunath, Aniruddh Sikdar et al.
Object Recognition as Next Token Prediction
Kaiyu Yue, Bor-Chun Chen, Jonas Geiping et al.
MuGE: Multiple Granularity Edge Detection
Caixia Zhou, Yaping Huang, Mengyang Pu et al.
Local-consistent Transformation Learning for Rotation-invariant Point Cloud Analysis
Yiyang Chen, Lunhao Duan, Shanshan Zhao et al.
Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis
Zhan Li, Zhang Chen, Zhong Li et al.
LOTUS: Evasive and Resilient Backdoor Attacks through Sub-Partitioning
Siyuan Cheng, Guanhong Tao, Yingqi Liu et al.
The More You See in 2D the More You Perceive in 3D
Xinyang Han, Zelin Gao, Angjoo Kanazawa et al.
What When and Where? Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Brian Chen, Nina Shvetsova, Andrew Rouditchenko et al.
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Jack Urbanek, Florian Bordes, Pietro Astolfi et al.
Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities
Mingcheng Li, Dingkang Yang, Xiao Zhao et al.
ES³: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
Yuanhang Zhang, Shuang Yang, Shiguang Shan et al.
Depth-aware Test-Time Training for Zero-shot Video Object Segmentation
Weihuang Liu, Xi Shen, Haolun Li et al.
MSU-4S - The Michigan State University Four Seasons Dataset
Daniel Kent, Mohammed Alyaqoub, Xiaohu Lu et al.
An Interactive Navigation Method with Effect-oriented Affordance
Xiaohan Wang, Yuehu LIU, Xinhang Song et al.
Rapid 3D Model Generation with Intuitive 3D Input
Tianrun Chen, Chaotao Ding, Shangzhan Zhang et al.
Unsupervised Salient Instance Detection
Xin Tian, Ke Xu, Rynson W.H. Lau
Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception
Junwen He, Yifan Wang, Lijun Wang et al.
CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation
Zineng Tang, Ziyi Yang, MAHMOUD KHADEMI et al.
PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation
Yuqi Wang, Yuntao Chen, Xingyu Liao et al.
AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-Resolution
Cheeun Hong, Kyoung Mu Lee
MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection
Jakub Micorek, Horst Possegger, Dominik Narnhofer et al.
Instance-level Expert Knowledge and Aggregate Discriminative Attention for Radiology Report Generation
Shenshen Bu, Taiji Li, Zhiming Dai et al.
HyperSDFusion: Bridging Hierarchical Structures in Language and Geometry for Enhanced 3D Text2Shape Generation
Zhiying Leng, Tolga Birdal, Xiaohui Liang et al.
Just Add ?! Pose Induced Video Transformers for Understanding Activities of Daily Living
Dominick Reilly, Srijan Das
PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
Tianyi Xie, Zeshun Zong, Yuxing Qiu et al.
Viewpoint-Aware Visual Grounding in 3D Scenes
Xiangxi Shi, Zhonghua Wu, Stefan Lee
Long-Tail Class Incremental Learning via Independent Sub-prototype Construction
Xi Wang, Xu Yang, Jie Yin et al.
An Aggregation-Free Federated Learning for Tackling Data Heterogeneity
Yuan Wang, Huazhu Fu, Renuga Kanagavelu et al.
Infrared Adversarial Car Stickers
Xiaopei Zhu, Yuqiu Liu, Zhanhao Hu et al.
XFibrosis: Explicit Vessel-Fiber Modeling for Fibrosis Staging from Liver Pathology Images
CHONG YIN, Siqi Liu, Fei Lyu et al.
Advancing Saliency Ranking with Human Fixations: Dataset Models and Benchmarks
Bowen Deng, Siyang Song, Andrew French et al.
Implicit Event-RGBD Neural SLAM
Delin Qu, Chi Yan, Dong Wang et al.
Retraining-Free Model Quantization via One-Shot Weight-Coupling Learning
Chen Tang, Yuan Meng, Jiacheng Jiang et al.
DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior
Tianyu Huang, Yihan Zeng, Zhilu Zhang et al.
From Coarse to Fine-Grained Open-Set Recognition
Nico Lang, Vésteinn Snæbjarnarson, Elijah Cole et al.
Exploiting Inter-sample and Inter-feature Relations in Dataset Distillation
Wenxiao Deng, Wenbin Li, Tianyu Ding et al.
Discriminative Pattern Calibration Mechanism for Source-Free Domain Adaptation
Haifeng Xia, Siyu Xia, Zhengming Ding
RAM-Avatar: Real-time Photo-Realistic Avatar from Monocular Videos with Full-body Control
xiang deng, Zerong Zheng, Yuxiang Zhang et al.
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Zhen Li, Mingdeng Cao, Xintao Wang et al.
Privacy-Preserving Face Recognition Using Trainable Feature Subtraction
Yuxi Mi, Zhizhou Zhong, Yuge Huang et al.
Arbitrary Motion Style Transfer with Multi-condition Motion Latent Diffusion Model
Wenfeng Song, Xingliang Jin, Shuai Li et al.
3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling
Chaokang Jiang, Guangming Wang, Jiuming Liu et al.
CPR-Coach: Recognizing Composite Error Actions based on Single-class Training
Shunli Wang, Shuaibing Wang, Dingkang Yang et al.
Restoration by Generation with Constrained Priors
Zheng Ding, Xuaner Zhang, Zhuowen Tu et al.
Unified Entropy Optimization for Open-Set Test-Time Adaptation
Zhengqing Gao, Xu-Yao Zhang, Cheng-Lin Liu
Poly Kernel Inception Network for Remote Sensing Detection
Xinhao Cai, Qiuxia Lai, Yuwei Wang et al.
Distraction is All You Need: Memory-Efficient Image Immunization against Diffusion-Based Image Editing
Ling Lo, Cheng Yeo, Hong-Han Shuai et al.
Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model
Zelin Peng, Zhengqin Xu, Zhilin Zeng et al.
MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction
Xiaolu Liu, Song Wang, Wentong Li et al.
ViT-Lens: Towards Omni-modal Representations
Stan Weixian Lei, Yixiao Ge, Kun Yi et al.
Prompt-Driven Referring Image Segmentation with Instance Contrasting
Chao Shang, Zichen Song, Heqian Qiu et al.
CosmicMan: A Text-to-Image Foundation Model for Humans
Shikai Li, Jianglin Fu, Kaiyuan Liu et al.
MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark
Sanghyun Woo, Kwanyong Park, Inkyu Shin et al.
Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering
Zhaohe Liao, Jiangtong Li, Li Niu et al.
Overload: Latency Attacks on Object Detection for Edge Devices
Erh-Chung Chen, Pin-Yu Chen, I-Hsin Chung et al.
Neural Exposure Fusion for High-Dynamic Range Object Detection
Emmanuel Onzon, Maximilian Bömer, Fahim Mannan et al.
Semantics Distortion and Style Matter: Towards Source-free UDA for Panoramic Segmentation
Xu Zheng, Pengyuan Zhou, ATHANASIOS et al.
PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness
Anh-Quan Cao, Angela Dai, Raoul de Charette
Evaluating Transferability in Retrieval Tasks: An Approach Using MMD and Kernel Methods
Mengyu Dai, Amir Hossein Raffiee, Aashish Jain et al.
ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks
Kai Han, Yunhe Wang, Jianyuan Guo et al.
Confronting Ambiguity in 6D Object Pose Estimation via Score-Based Diffusion on SE(3)
Tsu-Ching Hsiao, Hao-Wei Chen, Hsuan-Kung Yang et al.
Communication-Efficient Collaborative Perception via Information Filling with Codebook
Yue Hu, Juntong Peng, Sifei Liu et al.
QUADify: Extracting Meshes with Pixel-level Details and Materials from Images
Maximilian Frühauf, Hayko Riemenschneider, Markus Gross et al.
Enhancing Post-training Quantization Calibration through Contrastive Learning
Yuzhang Shang, Gaowen Liu, Ramana Kompella et al.
LASO: Language-guided Affordance Segmentation on 3D Object
Yicong Li, Na Zhao, Junbin Xiao et al.
Dispersed Structured Light for Hyperspectral 3D Imaging
Suhyun Shin, Seokjun Choi, Felix Heide et al.
DualAD: Disentangling the Dynamic and Static World for End-to-End Driving
Simon Doll, Niklas Hanselmann, Lukas Schneider et al.
Focus on Hiders: Exploring Hidden Threats for Enhancing Adversarial Training
Qian Li, Yuxiao Hu, Yinpeng Dong et al.
ColorPCR: Color Point Cloud Registration with Multi-Stage Geometric-Color Fusion
Juncheng Mu, Lin Bie, Shaoyi Du et al.
Any-Shift Prompting for Generalization over Distributions
Zehao Xiao, Jiayi Shen, Mohammad Mahdi Derakhshani et al.
Time- Memory- and Parameter-Efficient Visual Adaptation
Otniel-Bogdan Mercea, Alexey Gritsenko, Cordelia Schmid et al.
Behind the Veil: Enhanced Indoor 3D Scene Reconstruction with Occluded Surfaces Completion
Su Sun, Cheng Zhao, Yuliang Guo et al.
Revisiting Counterfactual Problems in Referring Expression Comprehension
Zhihan Yu, Ruifan Li
Differentiable Point-based Inverse Rendering
Hoon-Gyu Chung, Seokjun Choi, Seung-Hwan Baek
VMINer: Versatile Multi-view Inverse Rendering with Near- and Far-field Light Sources
Fan Fei, Jiajun Tang, Ping Tan et al.
ActiveDC: Distribution Calibration for Active Finetuning
Wenshuai Xu, Zhenghui Hu, Yu Lu et al.
AUEditNet: Dual-Branch Facial Action Unit Intensity Manipulation with Implicit Disentanglement
Shiwei Jin, Zhen Wang, Lei Wang et al.
VecFusion: Vector Font Generation with Diffusion
Vikas Thamizharasan, Difan Liu, Shantanu Agarwal et al.
Generating Non-Stationary Textures using Self-Rectification
Yang Zhou, Rongjun Xiao, Dani Lischinski et al.
OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising
Haichao Zhang, Yi Xu, Hongsheng Lu et al.
CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation
Jun Wang, Yuzhe Qin, Kaiming Kuang et al.
Video Harmonization with Triplet Spatio-Temporal Variation Patterns
Zonghui Guo, XinYu Han, Jie Zhang et al.
Rethinking Interactive Image Segmentation with Low Latency High Quality and Diverse Prompts
Qin Liu, Jaemin Cho, Mohit Bansal et al.
SeMoLi: What Moves Together Belongs Together
Jenny Seidenschwarz, Aljoša Ošep, Francesco Ferroni et al.
HINTED: Hard Instance Enhanced Detector with Mixed-Density Feature Fusion for Sparsely-Supervised 3D Object Detection
Qiming Xia, Wei Ye, Hai Wu et al.
NAYER: Noisy Layer Data Generation for Efficient and Effective Data-free Knowledge Distillation
Minh-Tuan Tran, Trung Le, Xuan-May Le et al.
Domain-Specific Block Selection and Paired-View Pseudo-Labeling for Online Test-Time Adaptation
Yeonguk Yu, Sungho Shin, Seunghyeok Back et al.
Revamping Federated Learning Security from a Defender's Perspective: A Unified Defense with Homomorphic Encrypted Data Space
Naveen Kumar Kummari, Reshmi Mitra, Krishna Mohan Chalavadi
LLMs are Good Sign Language Translators
Jia Gong, Lin Geng Foo, Yixuan He et al.
PanoRecon: Real-Time Panoptic 3D Reconstruction from Monocular Video
Dong Wu, Zike Yan, Hongbin Zha
TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding
Yun Liu, Haolin Yang, Xu Si et al.
Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning
Rongjie Li, Yu Wu, Xuming He
Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations
Chenyu You, Yifei Min, Weicheng Dai et al.
Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle
Youtian Lin, Zuozhuo Dai, Siyu Zhu et al.
Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs
Hao Fei, Shengqiong Wu, Wei Ji et al.
Can Biases in ImageNet Models Explain Generalization?
Paul Gavrikov, Janis Keuper
HumMUSS: Human Motion Understanding using State Space Models
Arnab Mondal, Stefano Alletto, Denis Tome
Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations
Sangmin Lee, Bolin Lai, Fiona Ryan et al.
FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition
Sicheng Mo, Fangzhou Mu, Kuan Heng Lin et al.
How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?
Yuxin Chen, Zongyang Ma, Ziqi Zhang et al.
Revisiting Adversarial Training at Scale
Zeyu Wang, Xianhang li, Hongru Zhu et al.
G-FARS: Gradient-Field-based Auto-Regressive Sampling for 3D Part Grouping
Junfeng Cheng, Tania Stathaki
Make Pixels Dance: High-Dynamic Video Generation
Yan Zeng, Guoqiang Wei, Jiani Zheng et al.
Masked AutoDecoder is Effective Multi-Task Vision Generalist
Han Qiu, Jiaxing Huang, Peng Gao et al.
Generative Multi-modal Models are Good Class Incremental Learners
Xusheng Cao, Haori Lu, Linlan Huang et al.
Deciphering ‘What’ and ‘Where’ Visual Pathways from Spectral Clustering of Layer-Distributed Neural Representations
Xiao Zhang, David Yunis, Michael Maire
LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction
Bo Zou, Chao Yang, Yu Qiao et al.
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
Sijie Cheng, Zhicheng Guo, Jingwen Wu et al.
Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks Methods and Applications
Karren Yang, Anurag Ranjan, Jen-Hao Rick Chang et al.
From Feature to Gaze: A Generalizable Replacement of Linear Layer for Gaze Estimation
Yiwei Bao, Feng Lu
NC-SDF: Enhancing Indoor Scene Reconstruction Using Neural SDFs with View-Dependent Normal Compensation
Ziyi Chen, Xiaolong Wu, Yu Zhang
Language Models as Black-Box Optimizers for Vision-Language Models
Shihong Liu, Samuel Yu, Zhiqiu Lin et al.
Transferable Structural Sparse Adversarial Attack Via Exact Group Sparsity Training
Di Ming, Peng Ren, Yunlong Wang et al.
Holistic Autonomous Driving Understanding by Bird’s-Eye-View Injected Multi-Modal Large Models
Xinpeng Ding, Jianhua Han, Hang Xu et al.
ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering
Haokai Pang, Heming Zhu, Adam Kortylewski et al.
Equivariant Plug-and-Play Image Reconstruction
Matthieu Terris, Thomas Moreau, Nelly Pustelnik et al.
DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars
Tobias Kirschstein, Simon Giebenhain, Matthias Nießner
Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection
Jiaming Li, Jiacheng Zhang, Jichang Li et al.
Addressing Background Context Bias in Few-Shot Segmentation through Iterative Modulation
Lanyun Zhu, Tianrun Chen, Jianxiong Yin et al.
OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation
Xiongwei Wu, Sicheng Yu, Ee-Peng Lim et al.
AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection
Trevine Oorloff, Surya Koppisetti, Nicolo Bonettini et al.
CaKDP: Category-aware Knowledge Distillation and Pruning Framework for Lightweight 3D Object Detection
Haonan Zhang, Longjun Liu, Yuqi Huang et al.
Friendly Sharpness-Aware Minimization
Tao Li, Pan Zhou, Zhengbao He et al.
CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation
Xi Liu, Ying Guo, Cheng Zhen et al.
Brain Decodes Deep Nets
Huzheng Yang, James Gee, Jianbo Shi
MoSAR: Monocular Semi-Supervised Model for Avatar Reconstruction using Differentiable Shading
Abdallah Dib, Luiz Gustavo Hafemann, Emeline Got et al.
Point2CAD: Reverse Engineering CAD Models from 3D Point Clouds
Yujia Liu, Anton Obukhov, Jan D. Wegner et al.
A Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid Transformer and Contrastive Learning
Yuelin Zhang, Pengyu Zheng, Wanquan Yan et al.
Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting
Haipeng Liu, Yang Wang, Biao Qian et al.
Misalignment-Robust Frequency Distribution Loss for Image Transformation
Zhangkai Ni, Juncheng Wu, Zian Wang et al.
WildlifeMapper: Aerial Image Analysis for Multi-Species Detection and Identification
Satish Kumar, Bowen Zhang, Chandrakanth Gudavalli et al.
SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking
Xiaojun Hou, Jiazheng Xing, Yijie Qian et al.
SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System
Yunfei Fan, Tianyu Zhao, Guidong Wang
MACE: Mass Concept Erasure in Diffusion Models
Shilin Lu, Zilan Wang, Leyang Li et al.
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
Tianhao Qi, Shancheng Fang, Yanze Wu et al.
Learning Degradation-unaware Representation with Prior-based Latent Transformations for Blind Face Restoration
Lianxin Xie, csbingbing zheng, Wen Xue et al.
360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model
Qian Wang, Weiqi Li, Chong Mou et al.
Alpha Invariance: On Inverse Scaling Between Distance and Volume Density in Neural Radiance Fields
Joshua Ahn, Haochen Wang, Raymond A. Yeh et al.
Countering Personalized Text-to-Image Generation with Influence Watermarks
Hanwen Liu, Zhicheng Sun, Yadong Mu
Teeth-SEG: An Efficient Instance Segmentation Framework for Orthodontic Treatment based on Multi-Scale Aggregation and Anthropic Prior Knowledge
Bo Zou, Shaofeng Wang, Hao Liu et al.
T-VSL: Text-Guided Visual Sound Source Localization in Mixtures
Tanvir Mahmud, Yapeng Tian, Diana Marculescu
ANIM: Accurate Neural Implicit Model for Human Reconstruction from a single RGB-D Image
Marco Pesavento, Yuanlu Xu, Nikolaos Sarafianos et al.
vid-TLDR: Training Free Token Merging for Light-weight Video Transformer
Joonmyung Choi, Sanghyeok Lee, Jaewon Chu et al.
Initialization Matters for Adversarial Transfer Learning
Andong Hua, Jindong Gu, Zhiyu Xue et al.
MindBridge: A Cross-Subject Brain Decoding Framework
Shizun Wang, Songhua Liu, Zhenxiong Tan et al.
Loopy-SLAM: Dense Neural SLAM with Loop Closures
Lorenzo Liso, Erik Sandström, Vladimir Yugay et al.
Weakly-Supervised Audio-Visual Video Parsing with Prototype-based Pseudo-Labeling
Kranthi Kumar Rachavarapu, Kalyan Ramakrishnan, A. N. Rajagopalan
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
Sanjoy Chowdhury, Sayan Nag, Joseph K J et al.
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset
Chengjian Feng, Yujie Zhong, Zequn Jie et al.
Narrative Action Evaluation with Prompt-Guided Multimodal Interaction
Shiyi Zhang, Sule Bai, Guangyi Chen et al.
DeconfuseTrack: Dealing with Confusion for Multi-Object Tracking
Cheng Huang, Shoudong Han, Mengyu He et al.
ChatPose: Chatting about 3D Human Pose
Yao Feng, Jing Lin, Sai Kumar Dwivedi et al.
Modality-agnostic Domain Generalizable Medical Image Segmentation by Multi-Frequency in Multi-Scale Attention
Ju-Hyeon Nam, Nur Suriza Syazwany, Su Jung Kim et al.
NC-TTT: A Noise Constrastive Approach for Test-Time Training
David OSOWIECHI, Gustavo Vargas Hakim, Mehrdad Noori et al.
JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments
Duy Tho Le, Chenhui Gou, Stavya Datta et al.
Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models
Jingyao Xu, Yuetong Lu, Yandong Li et al.
ESCAPE: Encoding Super-keypoints for Category-Agnostic Pose Estimation
Khoi D Nguyen, Chen Li, Gim Hee Lee
Minimal Perspective Autocalibration
Andrea Porfiri Dal Cin, Timothy Duff, Luca Magri et al.
ReGenNet: Towards Human Action-Reaction Synthesis
Liang Xu, Yizhou Zhou, Yichao Yan et al.
RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos
Hongchi Xia, Yang Fu, Sifei Liu et al.
Aligning and Prompting Everything All at Once for Universal Visual Perception
Yunhang Shen, Chaoyou Fu, Peixian Chen et al.
ZONE: Zero-Shot Instruction-Guided Local Editing
Shanglin Li, Bohan Zeng, Yutang Feng et al.
Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption
Buzhen Huang, Chen Li, Chongyang Xu et al.
Label Propagation for Zero-shot Classification with Vision-Language Models
Vladan Stojnić, Yannis Kalantidis, Giorgos Tolias
IQ-VFI: Implicit Quadratic Motion Estimation for Video Frame Interpolation
Mengshun Hu, Kui Jiang, Zhihang Zhong et al.
Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition
Anqi Zhu, Qiuhong Ke, Mingming Gong et al.
Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous and Instruction-guided Driving
Brian Yang, Huangyuan Su, Nikolaos Gkanatsios et al.
Structured Model Probing: Empowering Efficient Transfer Learning by Structured Regularization
Zhi-Fan Wu, Chaojie Mao, Xue Wang et al.
CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation
Lingjun Zhao, Jingyu Song, Katherine Skinner
Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing
Bingyan Liu, Chengyu Wang, Tingfeng Cao et al.
TULIP: Transformer for Upsampling of LiDAR Point Clouds
Bin Yang, Patrick Pfreundschuh, Roland Siegwart et al.
Incremental Residual Concept Bottleneck Models
Chenming Shang, Shiji Zhou, Hengyuan Zhang et al.