Most Cited 2024 "domain knowledge adaptation" Papers
12,324 papers found • Page 43 of 62
Conference
A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis
Xiang Liu, Zhaoxiang Liu, Huan Hu et al.
Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models
Taesup Kim, Donggeun Kim
Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation
Nina Weng, Paraskevas Pegios, Eike Petersen et al.
GroCo: Ground Constraint for Metric Self-Supervised Monocular Depth
Aurélien Cecille, Stefan Duffner, Franck DAVOINE et al.
EMIE-MAP: Large-Scale Road Surface Reconstruction Based on Explicit Mesh and Implicit Encoding
Wenhua Wu, Qi Wang, Guangming Wang et al.
HyperSpaceX: Radial and Angular Exploration of HyperSpherical Dimensions
Chiranjeev Chiranjeev, Muskan Dosi, Kartik Thakral et al.
Common Sense Reasoning for Deep Fake Detection
Yue Zhang, Ben Colman, Xiao Guo et al.
Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers
Ekaterina Grishina, Mikhail Gorbunov, Maxim Rakhuba
Deciphering the Role of Representation Disentanglement: Investigating Compositional Generalization in CLIP Models
Reza Abbasi, Mohammad Rohban, Mahdieh Soleymani Baghshah
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling
Siming Yan, Min Bai, Weifeng Chen et al.
Learning Scalable Model Soup on a Single GPU: An Efficient Subspace Training Strategy
Tao Li, Weisen Jiang, Fanghui Liu et al.
Deep Companion Learning: Enhancing Generalization Through Historical Consistency
Ruizhao Zhu, Venkatesh Saligrama
Straightforward Layer-wise Pruning for More Efficient Visual Adaptation
Ruizi Han, Jinglei Tang
ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-agnostic Counting
Michael A Hobley, Victor Adrian Prisacariu
CrossScore: A Multi-View Approach to Image Evaluation and Scoring
Zirui Wang, Wenjing Bian, Victor Adrian Prisacariu
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
Yuanhong Chen, Chong Wang, Yuyuan Liu et al.
DiffClass: Diffusion-Based Class Incremental Learning
Zichong Meng, Jie Zhang, Changdi Yang et al.
Dual-Decoupling Learning and Metric-Adaptive Thresholding for Semi-Supervised Multi-Label Learning
Jiahao Xiao, Ming-Kun Xie, Heng-Bo Fan et al.
SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation
Lingchen Meng, Shiyi Lan, Hengduo Li et al.
DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing
Minghao Chen, Iro Laina, Andrea Vedaldi
Handling The Non-Smooth Challenge in Tensor SVD: A Multi-Objective Tensor Recovery Framework
Jingjing Zheng, Wanglong Lu, Wenzhe Wang et al.
3D Gaussian Parametric Head Model
Yuelang Xu, Lizhen Wang, Zerong Zheng et al.
Dynamic Neural Radiance Field From Defocused Monocular Video
Xianrui Luo, Huiqiang Sun, Juewen Peng et al.
4Diff: 3D-Aware Diffusion Model for Third-to-First Viewpoint Translation
Feng Cheng, Mi Luo, Huiyu Wang et al.
Realistic Human Motion Generation with Cross-Diffusion Models
Zeping Ren, Shaoli Huang, Xiu Li
UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model
Xiangyu Fan, Jiaqi Li, Zhiqian Lin et al.
PartCraft: Crafting Creative Objects by Parts
Kam Woh Ng, Xiatian Zhu, Yi-Zhe Song et al.
AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation
Sun Yanan, Yanchen Liu, Yinhao Tang et al.
MERLiN: Single-Shot Material Estimation and Relighting for Photometric Stereo
Ashish Tiwari, Satoshi Ikehata, Shanmuganathan Raman
Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design
Li, zhihao shu, Jie Ji et al.
BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation
Hee Suk Yoon, Eunseop Yoon, Joshua Tian Jin Tee et al.
PRET: Planning with Directed Fidelity Trajectory for Vision and Language Navigation
Renjie Lu, Jing-Ke Meng, WEISHI ZHENG
Rethinking Few-shot Class-incremental Learning: Learning from Yourself
Yu-Ming Tang, Yi-Xing Peng, Jing-Ke Meng et al.
Long-CLIP: Unlocking the Long-Text Capability of CLIP
Beichen Zhang, Pan Zhang, Xiaoyi Dong et al.
RoGUENeRF: A Robust Geometry-Consistent Universal Enhancer for NeRF
Sibi Catley-Chandar, Richard Shaw, Greg Slabaugh et al.
FuseTeacher: Modality-fused Encoders are Strong Vision Supervisors
Chen-Wei Xie, Siyang Sun, Liming Zhao et al.
MVDD: Multi-View Depth Diffusion Models
Zhen Wang, Qiangeng Xu, Feitong Tan et al.
Learning with Counterfactual Explanations for Radiology Report Generation
Mingjie Li, Haokun Lin, Liang Qiu et al.
Pseudo-Embedding for Generalized Few-Shot Point Cloud Segmentation
Chih-Jung Tsai, Hwann-Tzong Chen, Tyng-Luh Liu
Wavelet Convolutions for Large Receptive Fields
Shahaf Finder, Roy Amoyal, Eran Treister et al.
AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer
Zhuguanyu Wu, Jiaxin Chen, Hanwen Zhong et al.
Gradient-based Out-of-Distribution Detection
Taha Entesari, Sina Sharifi, Bardia Safaei et al.
Veil Privacy on Visual Data: Concealing Privacy for Humans, Unveiling for DNNs
Shuchao Pang, Ruhao Ma, Bing Li et al.
Simple Unsupervised Knowledge Distillation With Space Similarity
Aditya Singh, Haohan Wang
Learning Natural Consistency Representation for Face Forgery Video Detection
Daichi Zhang, Zihao Xiao, Shikun Li et al.
View-Consistent 3D Editing with Gaussian Splatting
Yuxuan Wang, Xuanyu Yi, Zike Wu et al.
HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban Scenes
Zhuopeng Li, Yilin Zhang, Chenming Wu et al.
Generating Human Interaction Motions in Scenes with Text Control
Hongwei Yi, Justus Thies, Michael J. Black et al.
Optimizing Illuminant Estimation in Dual-Exposure HDR Imaging
Mahmoud Afifi, Zhenhua Hu, Liang Liang
Instruction Tuning-free Visual Token Complement for Multimodal LLMs
Dongsheng Wang, Jiequan Cui, Miaoge Li et al.
Improving Point-based Crowd Counting and Localization Based on Auxiliary Point Guidance
I-HSIANG CHEN, Wei-Ting Chen, Yu-Wei Liu et al.
Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts
Shuangkang Fang, Yufeng Wang, Yi-Hsuan Tsai et al.
HybridBooth: Hybrid Prompt Inversion for Efficient Subject-Driven Generation
Shanyan Guan, Yanhao Ge, Ying Tai et al.
SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer
Zijie Wu, Chaohui Yu, Yanqin Jiang et al.
Revisit Self-supervision with Local Structure-from-Motion
Shengjie Zhu, Xiaoming Liu
On the Viability of Monocular Depth Pre-training for Semantic Segmentation
DONG LAO, Fengyu Yang, Daniel Wang et al.
Weakly-supervised Camera Localization by Ground-to-satellite Image Registration
Yujiao Shi, HONGDONG LI, Akhil Perincherry et al.
GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection
Ziying Song, Lei Yang, Shaoqing Xu et al.
ProtoComp: Diverse Point Cloud Completion with Controllable Prototype
Xumin Yu, Yanbo Wang, Jie Zhou et al.
Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture
Xuanchen Li, Yuhao Cheng, Xingyu Ren et al.
EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis
Shuai Tan, Bin Ji, Mengxiao Bi et al.
Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos
Mi Luo, Zihui Xue, Alex Dimakis et al.
LivePhoto: Real Image Animation with Text-guided Motion Control
Xi Chen, Zhiheng Liu, Mengting Chen et al.
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
Wendi Zheng, Jiayan Teng, Zhuoyi Yang et al.
OAPT: Offset-Aware Partition Transformer for Double JPEG Artifacts Removal
Qiao Mo, Yukang Ding, Jinhua Hao et al.
Seeing the Unseen: A Frequency Prompt Guided Transformer for Image Restoration
shihao zhou, Jinshan Pan, Jinglei Shi et al.
Animate Your Motion: Turning Still Images into Dynamic Videos
Mingxiao Li, Bo Wan, Marie-Francine Moens et al.
Spatial-Temporal Multi-level Association for Video Object Segmentation
Deshui Miao, Xin Li, Zhenyu He et al.
Context-Aware Action Recognition: Introducing a Comprehensive Dataset for Behavior Contrast
Tatsuya Sasaki, Yoshiki Ito, Satoshi Kondo
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
Jinbo Xing, Menghan Xia, Yong Zhang et al.
UniProcessor: A Text-induced Unified Low-level Image Processor
Huiyu Duan, Xiongkuo Min, Sijing Wu et al.
Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors
Tongkun Guan, Wei Shen, Xue Yang et al.
Learning Chain of Counterfactual Thought for Bias-Robust Vision-Language Reasoning
Yifeng Zhang, Ming Jiang, Qi Zhao
Let the Avatar Talk using Texts without Paired Training Data
Xiuzhe Wu, Yang-Tian Sun, Handi Chen et al.
Attention Beats Linear for Fast Implicit Neural Representation Generation
Shuyi Zhang, Ke Liu, Jingjun Gu et al.
Prompt-Based Test-Time Real Image Dehazing: A Novel Pipeline
Zixuan Chen, Zewei He, Ziqian Lu et al.
RCS-Prompt: Learning Prompt to Rearrange Class Space for Prompt-based Continual Learning
Longrong Yang, Hanbin Zhao, Yunlong Yu et al.
Dynamic Guidance Adversarial Distillation with Enhanced Teacher Knowledge
Hyejin Park, Dongbo Min
Gaussian Grouping: Segment and Edit Anything in 3D Scenes
Mingqiao Ye, Martin Danelljan, Fisher Yu et al.
3D Hand Sequence Recovery from Real Blurry Images and Event Stream
Joonkyu Park, Gyeongsik Moon, Weipeng Xu et al.
Segmentation-guided Layer-wise Image Vectorization with Gradient Fills
Hengyu Zhou, Hui Zhang, Bin Wang
HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting
Zhenglin Zhou, Fan Ma, Hehe Fan et al.
Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion
Hang Xu, Chen Long, Wenxiao Zhang et al.
StructLDM: Structured Latent Diffusion for 3D Human Generation
Tao Hu, Fangzhou Hong, Ziwei Liu
High-Fidelity Modeling of Generalizable Wrinkle Deformation
Jingfan Guo, Jae Shin Yoon, Shunsuke Saito et al.
COMPOSE: Comprehensive Portrait Shadow Editing
Andrew Hou, Zhixin Shu, Xuaner Zhang et al.
EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion
Guangyao Zhai, Evin Pınar Örnek, Dave Zhenyu Chen et al.
Learning Representations from Foundation Models for Domain Generalized Stereo Matching
Yongjian Zhang, Longguang Wang, Kunhong Li et al.
NeRF-XL: NeRF at Any Scale with Multi-GPU
Ruilong Li, Sanja Fidler, Angjoo Kanazawa et al.
3D Hand Pose Estimation in Everyday Egocentric Images
Aditya Prakash, Ruisen Tu, Matthew Chang et al.
Controllable Human-Object Interaction Synthesis
Jiaman Li, Alexander Clegg, Roozbeh Mottaghi et al.
Nymeria: A Massive Collection of Egocentric Multi-modal Human Motion in the Wild
Lingni Ma, Yuting Ye, Rowan Postyeni et al.
Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene
Ruiyang Zhang, Hu Zhang, Hang Yu et al.
Six-Point Method for Multi-Camera Systems with Reduced Solution Space
Banglei Guan, Ji Zhao, Laurent Kneip
Tuning-Free Image Customization with Image and Text Guidance
Pengzhi Li, Qiang Nie, Ying Chen et al.
MegaScenes: Scene-Level View Synthesis at Scale
Joseph Tung, Gene Chou, Ruojin Cai et al.
Mono-ViFI: A Unified Learning Framework for Self-supervised Single- and Multi-frame Monocular Depth Estimation
Jinfeng Liu, Lingtong Kong, Bo Li et al.
Idea2Img: Iterative Self-Refinement with GPT-4V for Automatic Image Design and Generation
Zhengyuan Yang, Jianfeng Wang, Linjie Li et al.
Preventing Catastrophic Forgetting through Memory Networks in Continuous Detection
Gaurav Bhatt, Leonid Sigal, James Ross
COIN-Matting: Confounder Intervention for Image Matting
Zhaohe Liao, Jiangtong Li, Jun Lan et al.
Score Distillation Sampling with Learned Manifold Corrective
Thiemo Alldieck, Nikos Kolotouros, Cristian Sminchisescu
Rethinking Weakly-supervised Video Temporal Grounding From a Game Perspective
Xiang Fang, Zeyu Xiong, Wanlong Fang et al.
AdaDiffSR: Adaptive Region-aware Dynamic acceleration Diffusion Model for Real-World Image Super-Resolution
Yuanting Fan, Chengxu Liu, Nengzhong Yin et al.
Domesticating SAM for Breast Ultrasound Image Segmentation via Spatial-frequency Fusion and Uncertainty Correction
Wanting Zhang, Huisi Wu, Jing Qin
Interaction-centric Spatio-Temporal Context Reasoning for Multi-Person Video HOI Recognition
Yisong Wang, Nan Xi, Jingjing Meng et al.
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
Yanwei Li, Chengyao Wang, Jiaya Jia
Agent3D-Zero: An Agent for Zero-shot 3D Understanding
Sha Zhang, Di Huang, Jiajun Deng et al.
Structured-NeRF: Hierarchical Scene Graph with Neural Representation
Zhide Zhong, Jiakai Cao, songen gu et al.
APL: Anchor-based Prompt Learning for One-stage Weakly Supervised Referring Expression Comprehension
Yaxin Luo, Jiayi Ji, Xiaofu Chen et al.
DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency
Xiaojing Zhong, Xinyi Huang, Xiaofeng Yang et al.
MeshFeat: Multi-Resolution Features for Neural Fields on Meshes
Mihir Mahajan, Florian Hofherr, Daniel Cremers
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias
Sanghyun Jo, Soohyun Ryu, Sungyub Kim et al.
DragAPart: Learning a Part-Level Motion Prior for Articulated Objects
Ruining Li, Chuanxia Zheng, Christian Rupprecht et al.
Learning to Unlearn for Robust Machine Unlearning
Mark HUANG, Lin Geng Foo, Jun Liu
Diff-Tracker: Text-to-Image Diffusion Models are Unsupervised Trackers
Zhengbo Zhang, Li Xu, Duo Peng et al.
Echoes of the Past: Boosting Long-tail Recognition via Reflective Learning
Qihao Zhao, YALUN DAI, Shen Lin et al.
Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models
Saman Motamed, Danda Pani Paudel, Luc Van Gool
Taming CLIP for Fine-grained and Structured Visual Understanding of Museum Exhibits
Ada-Astrid Balauca, Danda Paudel, Kristina Toutanova et al.
Visual Text Generation in the Wild
Yuanzhi Zhu, Jiawei Liu, Feiyu Gao et al.
A Unified Image Compression Method for Human Perception and Multiple Vision Tasks
Sha Guo, Sui Lin, Chen-Lin Zhang et al.
Learning Quantized Adaptive Conditions for Diffusion Models
Yuchen Liang, Yuchuan Tian, Lei Yu et al.
Learn to Optimize Denoising Scores: A Unified and Improved Diffusion Prior for 3D Generation
Xiaofeng Yang, Yiwen Chen, Cheng Chen et al.
Discovering Unwritten Visual Classifiers with Large Language Models
Mia Chiquier, Utkarsh Mall, Carl Vondrick
Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach
Taolin Zhang, Jiawang Bai, Zhihe Lu et al.
On the Approximation Risk of Few-Shot Class-Incremental Learning
Xuan Wang, Zhong Ji, Xiyao Liu et al.
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
Dahyun Kang, Minsu Cho
URS-NeRF: Unordered Rolling Shutter Bundle Adjustment for Neural Radiance Fields
Bo Xu, Liu Ziao, Mengqi GUO et al.
Hierarchically Structured Neural Bones for Reconstructing Animatable Objects from Casual Videos
Subin Jeon, In Cho, Minsu Kim et al.
Protecting NeRFs' Copyright via Plug-And-Play Watermarking Base Model
Qi Song, Ziyuan Luo, Ka Chun Cheung et al.
MOD-UV: Learning Mobile Object Detectors from Unlabeled Videos
Yihong Sun, Bharath Hariharan
V-Trans4Style: Visual Transition Recommendation for Video Production Style Adaptation
Pooja Guhan, Tsung-Wei Huang, Guan-Ming Su et al.
WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation
Jiachen Lu, Ze Huang, Zeyu Yang et al.
Uncertainty-Driven Spectral Compressive Imaging with Spatial-Frequency Transformer
Lintao Peng, Siyu Xie, Liheng Bian
Weakly-Supervised Spatio-Temporal Video Grounding with Variational Cross-Modal Alignment
Yang Jin, Yadong Mu
Chronologically Accurate Retrieval for Temporal Grounding of Motion-Language Models
Kent Fujiwara, Mikihiro Tanaka, Qing Yu
SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow
Yuanzhi Zhu, Xingchao Liu, Qiang Liu
Domain Reduction Strategy for Non-Line-of-Sight Imaging
Hyunbo Shim, In Cho, Daekyu Kwon et al.
Learning to Enhance Aperture Phasor Field for Non-Line-of-Sight Imaging
In Cho, Hyunbo Shim, Seon Joo Kim
FRI-Net: Floorplan Reconstruction via Room-wise Implicit Representation
Honghao Xu, Juzhan Xu, Zeyu Huang et al.
A Framework for Efficient Model Evaluation through Stratification, Sampling, and Estimation
Riccardo Fogliato, Pratik Patil, Mathew Monfort et al.
DECIDER: Leveraging Foundation Model Priors for Improved Model Failure Detection and Explanation
Rakshith Subramanyam, Kowshik Thopalli, Vivek Sivaraman Narayanaswamy et al.
ExMatch: Self-guided Exploitation for Semi-Supervised Learning with Scarce Labeled Samples
Noo-ri Kim, Jin-Seop Lee, Jee-Hyong LEE
CaesarNeRF: Calibrated Semantic Representation for Few-Shot Generalizable Neural Rendering
Haidong Zhu, Tianyu Ding, Tianyi Chen et al.
Open-Vocabulary RGB-Thermal Semantic Segmentation
Guoqiang Zhao, JunJie Huang, Xiaoyun Yan et al.
UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening
Siyuan Cheng, Guangyu Shen, Kaiyuan Zhang et al.
Unsupervised Moving Object Segmentation with Atmospheric Turbulence
Dehao Qin, Ripon Saha, Woojeh Chung et al.
Modeling Label Correlations with Latent Context for Multi-Label Recognition
Zhao-Min Chen, Quan Cui, Ruoxi Deng et al.
Towards Reliable Advertising Image Generation Using Human Feedback
Zhenbang Du, Wei Feng, Haohan Wang et al.
Weak-to-Strong Compositional Learning from Generative Models for Language-based Object Detection
Kwanyong Park, Kuniaki Saito, Donghyun Kim
TurboEdit: Real-time text-based disentangled real image editing
Zongze Wu, Nicholas I Kolkin, Jonathan Brandt et al.
The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers
Seungwoo Son, Jegwang Ryu, Namhoon Lee et al.
Improving Vision and Language Concepts Understanding with Multimodal Counterfactual Samples
Chengen Lai, Shengli Song, Sitong Yan et al.
Functional Transform-Based Low-Rank Tensor Factorization for Multi-Dimensional Data Recovery
Jian-Li Wang, Xi-Le Zhao
Clean & Compact: Efficient Data-Free Backdoor Defense with Model Compactness
Huy Phan, Jinqi Xiao, Yang Sui et al.
Language-Assisted Skeleton Action Understanding for Skeleton-Based Temporal Action Segmentation
Haoyu Ji, Bowen Chen, Xinglong Xu et al.
A Geometric Distortion Immunized Deep Watermarking Framework with Robustness Generalizability
Linfeng Ma, Han Fang, Tianyi Wei et al.
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Xiaohan Wang, Yuhui Zhang, Orr Zohar et al.
MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration
Yulin Ren, Xin Li, Bingchen Li et al.
Adaptive Human Trajectory Prediction via Latent Corridors
Neerja Thakkar, Karttikeya Mangalam, Andrea Bajcsy et al.
Generalizable Facial Expression Recognition
Yuhang Zhang, Xiuqi Zheng, Chenyi Liang et al.
RS-NeRF: Neural Radiance Fields from Rolling Shutter Images
Muyao Niu, Tong Chen, Yifan Zhan et al.
MARs: Multi-view Attention Regularizations for Patch-based Feature Recognition of Space Terrain
Timothy Chase, Karthik Dantu
Grid-Attention: Enhancing Computational Efficiency of Large Vision Models without Fine-Tuning
Pengyu Li, Biao Wang, Tianchu Guo et al.
Enhanced Motion Forecasting with Visual Relation Reasoning
Sungjune Kim, Hadam Baek, Seunggwan Lee et al.
DSA: Discriminative Scatter Analysis for Early Smoke Segmentation
Lujian Yao, Haitao Zhao, Jingchao Peng et al.
DualBEV: Unifying Dual View Transformation with Probabilistic Correspondences
Peidong Li, Wancheng Shen, Qihao Huang et al.
Continuous SO(3) Equivariant Convolution for 3D Point Cloud Analysis
Jaein Kim, HEE BIN YOO, Dong-Sig Han et al.
MedRAT: Unpaired Medical Report Generation via Auxiliary Tasks
Elad Hirsch, Gefen Dawidowicz, Ayellet Tal
Towards Unified Representation of Invariant-Specific Features in Missing Modality Face Anti-Spoofing
Guanghao Zheng, Yuchen Liu, Wenrui Dai et al.
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web
Raghav Kapoor, Yash Parag Butala, Melisa A Russak et al.
Self-Supervised Underwater Caustics Removal and Descattering via Deep Monocular SLAM
Jonathan Sauder, Devis TUIA
SCAPE: A Simple and Strong Category-Agnostic Pose Estimator
Yujia Liang, Zixuan Ye, Wenze Liu et al.
Image-to-Lidar Relational Distillation for Autonomous Driving Data
Anas Mahmoud, Ali Harakeh, Steven Waslander
IGNORE: Information Gap-based False Negative Loss Rejection for Single Positive Multi-Label Learning
Gyeong Ryeol Song, Noo-ri Kim, Jin-Seop Lee et al.
CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection
Shuang Hao, Chunlin Zhong, He Tang
Siamese Vision Transformers are Scalable Audio-visual Learners
Yan-Bo Lin, Gedas Bertasius
Visual Relationship Transformation
Xiaoyu Xu, Jiayan Qiu, Baosheng Yu et al.
Scene-aware Human Motion Forecasting via Mutual Distance Prediction
Chaoyue Xing, Wei Mao, Miaomiao LIU
Elysium: Exploring Object-level Perception in Videos through Semantic Integration Using MLLMs
Han Wang, Yanjie Wang, Ye Yongjie et al.
Rethinking Data Bias: Dataset Copyright Protection via Embedding Class-wise Hidden Bias
Jinhyeok Jang, ByungOk Han, Jaehong Kim et al.
Federated Learning with Local Openset Noisy Labels
Zonglin Di, Zhaowei Zhu, Xiaoxiao Li et al.
Match-Stereo-Videos: Bidirectional Alignment for Consistent Dynamic Stereo Matching
Junpeng Jing, Ye Mao, Krystian Mikolajczyk
PoseSOR: Human Pose Can Guide Our Attention
Huankang Guan, Rynson W.H. Lau
SpeedUpNet: A Plug-and-Play Adapter Network for Accelerating Text-to-Image Diffusion Models
Weilong Chai, Dandan Zheng, Jiajiong Cao et al.
Rethinking Tree-Ring Watermarking for Enhanced Multi-Key Identification
Hai Ci, Pei Yang, Yiren Song et al.
Dual-stage Hyperspectral Image Classification Model with Spectral Supertoken
Peifu Liu, Tingfa Xu, Jie Wang et al.
Optimal Transport of Diverse Unsupervised Tasks for Robust Learning from Noisy Few-Shot Data
Xiaofan Que, Qi Yu
LITA: Language Instructed Temporal-Localization Assistant
De-An Huang, Shijia Liao, Subhashree Radhakrishnan et al.
BurstM: Deep Burst Multi-scale SR using Fourier Space with Optical Flow
EungGu Kang, Byeonghun Lee, Sunghoon Im et al.
Unsupervised Dense Prediction using Differentiable Normalized Cuts
Yanbin Liu, Stephen Gould
uCAP: An Unsupervised Prompting Method for Vision-Language Models
A. Tuan Nguyen, Kai Sheng Tai, Bor-Chun Chen et al.
Flexible Distribution Alignment: Towards Long-tailed Semi-supervised Learning with Proper Calibration
Emanuel Sanchez Aimar, Nathaniel D Helgesen, Yonghao Xu et al.
Efficient Frequency-Domain Image Deraining with Contrastive Regularization
Ning Gao, xingyu jiang, Xiuhui Zhang et al.
Deep Cost Ray Fusion for Sparse Depth Video Completion
Jungeon Kim, Soongjin Kim, Jaesik Park et al.
SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning
Mengxin Zheng, Jiaqi Xue, Zihao Wang et al.
Norma: A Noise Robust Memory-Augmented Framework for Whole Slide Image Classification
Yu Bai, Bo Zhang, Zheng Zhang et al.
Adaptive High-Frequency Transformer for Diverse Wildlife Re-Identification
Chenyue Li, Shuoyi Chen, Mang Ye
An accurate detection is not all you need to combat label noise in web-noisy datasets
Paul Albert, Kevin McGuinness, Eric Arazo et al.
Gated Temporal Diffusion for Stochastic Long-term Dense Anticipation
Olga Zatsarynna, Emad Bahrami, Yazan Abu Farha et al.
Causal Subgraphs and Information Bottlenecks: Redefining OOD Robustness in Graph Neural Networks
Weizhi An, Wenliang Zhong, Feng Jiang et al.