Most Cited ICCV "in-distribution mapping" Papers
2,701 papers found • Page 11 of 14
Conference
AM-Adapter: Appearance Matching Adapter for Exemplar-based Semantic Image Synthesis in-the-Wild
Siyoon Jin, Jisu Nam, Jiyoung Kim et al.
MultiModal Action Conditioned Video Simulation
Yichen Li, Antonio Torralba
Processing and acquisition traces in visual encoders: What does CLIP know about your camera?
Ryan Ramos, Vladan Stojnić, Giorgos Kordopatis-Zilos et al.
DiffIP: Representation Fingerprints for Robust IP Protection of Diffusion Models
Zhuoling Li, Haoxuan Qu, Jason Kuen et al.
SafetyDPO: Scalable Safety Alignment for Text-to-Image Generation
Runtao Liu, I Chen, Jindong Gu et al.
Toward Better Out-painting: Improving the Image Composition with Initialization Policy Model
Xuan Han, Yihao Zhao, Yanhao Ge et al.
Soft Local Completeness: Rethinking Completeness in XAI
Ziv Weiss Haddad, Oren Barkan, Yehonatan Elisha et al.
Dual-Temporal Exemplar Representation Network for Video Semantic Segmentation
Xiaolong Xu, Lei Zhang, Jiayi Li et al.
PBFG: A New Physically-Based Dataset and Removal of Lens Flares and Glares
Jie Zhu, Sungkil Lee
Correspondence as Video: Test-Time Adaption on SAM2 for Reference Segmentation in the Wild
Haoran Wang, Zekun Li, Jian Zhang et al.
An Information-Theoretic Regularizer for Lossy Neural Image Compression
ZHANG YINGWEN, Meng Wang, Xihua Sheng et al.
Knowledge-Guided Part Segmentation
Xuejian Gou, Fang Liu, Licheng Jiao et al.
Controllable Feature Whitening for Hyperparameter-Free Bias Mitigation
Yooshin Cho, Hanbyel Cho, Janghyeon Lee et al.
PersonaCraft: Personalized and Controllable Full-Body Multi-Human Scene Generation Using Occlusion-Aware 3D-Conditioned Diffusion
Gwanghyun Kim, Suh Jeon Jeon, Seunggyu Lee et al.
Co-Painter: Fine-Grained Controllable Image Stylization via Implicit Decoupling and Adaptive Injection
Bowen Fu, Wei Wei, Jiaqi Tang et al.
FusionPhys: A Flexible Framework for Fusing Complementary Sensing Modalities in Remote Physiological Measurement
Chenhang Ying, Huiyu Yang, Jieyi Ge et al.
FiVE-Bench: A Fine-grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models
Minghan LI, Chenxi Xie, Yichen Wu et al.
TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance
Minghao Fu, Guo-Hua Wang, Xiaohao Chen et al.
Stroke2Sketch: Harnessing Stroke Attributes for Training-Free Sketch Generation
Rui Yang, Huining Li, Yiyi Long et al.
Power of Cooperative Supervision: Multiple Teachers Framework for Advanced 3D Semi-Supervised Object Detection
Jin-Hee Lee, Jae-keun Lee, Jeseok Kim et al.
Cross-Granularity Online Optimization with Masked Compensated Information for Learned Image Compression
Haowei Kuang, Wenhan Yang, Zongming Guo et al.
Reusing Computation in Text-to-Image Diffusion for Efficient Generation of Image Sets
Dale Decatur, Thibault Groueix, Wang Yifan et al.
ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching
Yuxuan Yuan, Luyao Tang, Chaoqi Chen et al.
DADet: Safeguarding Image Conditional Diffusion Models against Adversarial and Backdoor Attacks via Diffusion Anomaly Detection
Hongwei Yu, Xinlong Ding, Jiawei Li et al.
From Linearity to Non-Linearity: How Masked Autoencoders Capture Spatial Correlations
Anthony Bisulco, Rahul Ramesh, Randall Balestriero et al.
DiGA3D: Coarse-to-Fine Diffusional Propagation of Geometry and Appearance for Versatile 3D Inpainting
Jingyi Pan, Dan Xu, Qiong Luo
LEGO-Maker: A Semantic-Driven Algorithm for Text-to-3D Generation
Yifei Zhang, Lei Chen
One-Step Specular Highlight Removal with Adapted Diffusion Models
Mahir Atmis, LEVENT KARACAN, Mehmet SARIGÜL
COVTrack: Continuous Open-Vocabulary Tracking via Adaptive Multi-Cue Fusion
Zekun Qian, Ruize Han, Zhixiang Wang et al.
Text2Outfit: Controllable Outfit Generation with Multimodal Language Models
Yuanhao Zhai, Yen-Liang Lin, Minxu Peng et al.
monoVLN: Bridging the Observation Gap between Monocular and Panoramic Vision and Language Navigation
Ren-Jie Lu, Yu Zhou, hao cheng et al.
MCID: Multi-aspect Copyright Infringement Detection for Generated Images
Chuanwei Huang, Zexi Jia, Hongyan Fei et al.
Beyond Perspective: Neural 360-Degree Video Compression
Andy Regensky, Marc Windsheimer, Fabian Brand et al.
Harnessing Input-Adaptive Inference for Efficient VLN
Dongwoo Kang, Akhil Perincherry, Zachary Coalson et al.
TextMaster: A Unified Framework for Realistic Text Editing via Glyph-Style Dual-Control
Zhenyu Yan, Jian Wang, Aoqiang Wang et al.
Rethinking the Upsampling Process in Light Field Super-Resolution with Spatial-Epipolar Implicit Image Function
Ruixuan Cong, Yu Wang, Mingyuan Zhao et al.
Performing Defocus Deblurring by Modeling its Formation Process
Zhengbo Zhang, Lin Geng Foo, Hossein Rahmani et al.
CasP: Improving Semi-Dense Feature Matching Pipeline Leveraging Cascaded Correspondence Priors for Guidance
Peiqi Chen, Lei Yu, Yi Wan et al.
Supervised Exploratory Learning for Long-Tailed Visual Recognition
Zhongquan Jian, Yanhao Chen, Wangyancheng Wangyancheng et al.
Vision-Language Interactive Relation Mining for Open-Vocabulary Scene Graph Generation
Yukuan Min, Muli Yang, Jinhao Zhang et al.
FontAnimate: High Quality Few-shot Font Generation via Animating Font Transfer Process
Bin Fu, Zixuan Wang, Kainan Yan et al.
Who Controls the Authorization? Invertible Networks for Copyright Protection in Text-to-Image Synthesis
Baoyue Hu, Yang Wei, Junhao Xiao et al.
OrderChain: Towards General Instruct-Tuning for Stimulating the Ordinal Understanding Ability of MLLM
Jinhong Wang, Shuo Tong, Jintai CHEN et al.
Unified Open-World Segmentation with Multi-Modal Prompts
Yang Liu, Yufei Yin, Chenchen Jing et al.
Toward Long-Tailed Online Anomaly Detection through Class-Agnostic Concepts
Chiao-An Yang, Kuan-Chuan Peng, Raymond A. Yeh
EventUPS: Uncalibrated Photometric Stereo Using an Event Camera
Jinxiu Liang, Bohan Yu, Siqi Yang et al.
More Reliable Pseudo-labels, Better Performance: A Generalized Approach to Single Positive Multi-label Learning
Luong Tran, Thieu Vo, Anh Nguyen et al.
Less is More: Empowering GUI Agent with Context-Aware Simplification
Gongwei Chen, Xurui Zhou, Rui Shao et al.
Pose-Star: Anatomy-Aware Editing for Open-World Fashion Images
Yuran Dong, Mang Ye
TRKT: Weakly Supervised Dynamic Scene Graph Generation with Temporal-enhanced Relation-aware Knowledge Transferring
Zhu Xu, Ting Lei, Zhimin Li et al.
DCHM: Depth-Consistent Human Modeling for Multiview Detection
Jiahao Ma, Tianyu Wang, Miaomiao Liu et al.
Adversarial Robustness of Discriminative Self-Supervised Learning in Vision
Ömer Veysel Çağatan, Ömer TAL, M. Emre Gursoy
Global and Local Entailment Learning for Natural World Imagery
Srikumar Sastry, Aayush Dhakal, Eric Xing et al.
Generalized Deep Multi-view Clustering via Causal Learning with Partially Aligned Cross-view Correspondence
Xihong Yang, Siwei Wang, Jiaqi Jin et al.
Active Perception Meets Rule-Guided RL: A Two-Phase Approach for Precise Object Navigation in Complex Environments
Liang Qin, Min Wang, Peiwei Li et al.
Zero-Shot Depth Aware Image Editing with Diffusion Models
Rishubh Parihar, Sachidanand VS, Venkatesh Babu Radhakrishnan
UNIS: A Unified Framework for Achieving Unbiased Neural Implicit Surfaces in Volume Rendering
Junkai Deng, Hanting Niu, Jiaze Li et al.
Split-and-Combine: Enhancing Style Augmentation for Single Domain Generalization
Zhen Zhang, Zhen Zhang, Qianlong Dang et al.
Continual Personalization for Diffusion Models
Yu-Chien Liao, Jr-Jen Chen, Chi-Pin Huang et al.
IntrinsicControlNet: Cross-distribution Image Generation with Real and Unreal
Jiayuan Lu, Rengan Xie, Zixuan Xie et al.
Anti-Tamper Protection for Unauthorized Individual Image Generation
Zelin Li, Ruohan Zong, Yifan Liu et al.
Dual Recursive Feedback on Generation and Appearance Latents for Pose-Robust Text-to-Image Diffusion
Jiwon Kim, Pureum Kim, SeonHwa Kim et al.
IQA-Adapter: Exploring Knowledge Transfer from Image Quality Assessment to Diffusion-based Generative Models
Khaled Abud, Sergey Lavrushkin, Alexey Kirillov et al.
Loss Functions for Predictor-based Neural Architecture Search
Han Ji, Yuqi Feng, Jiahao Fan et al.
Advancing Text-to-3D Generation with Linearized Lookahead Variational Score Distillation
Yu Lei, Bingde Liu, Qingsong Xie et al.
Language-Driven Multi-Label Zero-Shot Learning with Semantic Granularity
Shouwen Wang, Qian Wan, Junbin Gao et al.
LHM: Large Animatable Human Reconstruction Model for Single Image to 3D in Seconds
Lingteng Qiu, Xiaodong Gu, Peihao Li et al.
SynTag: Enhancing the Geometric Robustness of Inversion-based Generative Image Watermarking
Han Fang, Kejiang Chen, Zehua Ma et al.
Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models
Hyundong Jin, Hyung Jin Chang, Eunwoo Kim
Learning Separable Fine-Grained Representation via Dendrogram Construction from Coarse Labels for Fine-grained Visual Recognition
Guanghui Shi, Xuefeng liang, Wenjie Li et al.
Domain-aware Category-level Geometry Learning Segmentation for 3D Point Clouds
Pei He, Lingling Li, Licheng Jiao et al.
Translation of Text Embedding via Delta Vector to Suppress Strongly Entangled Content in Text-to-Image Diffusion Models
Eunseo Koh, SeungHoo Hong, Tae-Young Kim et al.
GaussianReg: Rapid 2D/3D Registration for Emergency Surgery via Explicit 3D Modeling with Gaussian Primitives
Weihao Yu, Xiaoqing Guo, Xinyu Liu et al.
Hyper-Depth: Hypergraph-based Multi-Scale Representation Fusion for Monocular Depth Estimation
Lin Bie, Siqi Li, Yifan Feng et al.
ArgoTweak: Towards Self-Updating HD Maps through Structured Priors
Lena Wild, Rafael Valencia, Patric Jensfelt
Event-aided Dense and Continuous Point Tracking: Everywhere and Anytime
Zhexiong Wan, Jianqin Luo, Yuchao Dai et al.
Context-Aware Academic Emotion Dataset and Benchmark
Luming Zhao, Jingwen Xuan, Jiamin Lou et al.
FlowChef: Steering of Rectified Flow Models for Controlled Generations
Maitreya Patel, Song Wen, Dimitris Metaxas et al.
TPG-INR: Target Prior-Guided Implicit 3D CT Reconstruction for Enhanced Sparse-view Imaging
QingleiCao QingleiCao, Ziyao Tang, Xiaoqin Tang
LUSD: Localized Update Score Distillation for Text-Guided Image Editing
Worameth Chinchuthakun, Tossaporn Saengja, Nontawat Tritrong et al.
Unknown Text Learning for CLIP-based Few-Shot Open-set Recognition
Rui Ma, Qilong Wang, Bing Cao et al.
MaTe: Images Are All You Need for Material Transfer via Diffusion Transformer
Nisha Huang, Henglin Liu, Yizhou Lin et al.
Efficient Visual Place Recognition Through Multimodal Semantic Knowledge Integration
Sitao Zhang, Hongda Mao, Qingshuang Chen et al.
COME: Dual Structure-Semantic Learning with Collaborative MoE for Universal Lesion Detection Across Heterogeneous Ultrasound Datasets
Lingyu Chen, Yawen Zeng, Yue Wang et al.
NATRA: Noise-Agnostic Framework for Trajectory Prediction with Noisy Observations
Rongqing Li, Changsheng Li, Ruilin Lv et al.
MS3D: High-Quality 3D Generation via Multi-Scale Representation Modeling
Guan Luo, Jianfeng Zhang
Tree-NeRV: Efficient Non-Uniform Sampling for Neural Video Representation via Tree-Structured Feature Grids
Jiancheng Zhao, Yifan Zhan, Qingtian Zhu et al.
UniDxMD: Towards Unified Representation for Cross-Modal Unsupervised Domain Adaptation in 3D Semantic Segmentation
Zhengyin Liang, Hui Yin, Min Liang et al.
Hybrid Layout Control for Diffusion Transformer: Fewer Annotations, Superior Aesthetics
Keming Wu, Junwen Chen, Zhanhao Liang et al.
PLAN: Proactive Low-Rank Allocation for Continual Learning
XIEQUN WANG, Zhan Zhuang, Yu Zhang
Leveraging Spatial Invariance to Boost Adversarial Transferability
Zihan Zhou, LI LI, Yanli Ren et al.
CRAM: Large Scale Video Continual Learning with Bootstrapped Compression
Shivani Mall, Joao F. Henriques
Scalable Dual Fingerprinting for Hierarchical Attribution of Text-to-Image Models
Jianwei Fei, Yunshu Dai, Peipeng Yu et al.
FedXDS: Leveraging Model Attribution Methods to counteract Data Heterogeneity in Federated Learning
Maximilian Hoefler, Karsten Mueller, Wojciech Samek
Visual Textualization for Image Prompted Object Detection
Yongjian Wu, Yang Zhou, Jiya Saiyin et al.
Straighten Viscous Rectified Flow via Noise Optimization
Jimin Dai, Jiexi Yan, Jian Yang et al.
LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs
Haoran Lou, Chunxiao Fan, Ziyan Liu et al.
UDC-VIT: A Real-World Video Dataset for Under-Display Cameras
Kyusu Ahn, JiSoo Kim, Sangik Lee et al.
Dual-Expert Consistency Model for Efficient and High-Quality Video Generation
Zhengyao Lyu, Chenyang Si, Tianlin Pan et al.
RogSplat: Robust Gaussian Splatting via Generative Priors
Hanyang Kong, Xingyi Yang, Xinchao Wang
Penalizing Boundary Activation for Object Completeness in Diffusion Models
Haoyang Xu, Tianhao Zhao, Sibei Yang et al.
Transformer-based Tooth Alignment Prediction with Occlusion and Collision Constraints
DongZhenXing DongZhenXing, Jiazhou Chen
Efficient Concertormer for Image Deblurring and Beyond
Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien et al.
One Object, Multiple Lies: A Benchmark for Cross-task Adversarial Attack on Unified Vision-Language Models
Jiale Zhao, XINYANG JIANG, Junyao Gao et al.
DeSPITE: Exploring Contrastive Deep Skeleton-Pointcloud-IMU-Text Embeddings for Advanced Point Cloud Human Activity Understanding
Thomas Kreutz, Max Mühlhäuser, Alejandro Sanchez Guinea
SD2Actor: Continuous State Decomposition via Diffusion Embeddings for Robotic Manipulation
lijiayi jiayi
MixANT: Observation-dependent Memory Propagation for Stochastic Dense Action Anticipation
Syed Talal Wasim, Hamid Suleman, Olga Zatsarynna et al.
Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis
Xinyu Hou, Zongsheng Yue, Xiaoming Li et al.
SEREP: Semantic Facial Expression Representation for Robust In-the-Wild Capture and Retargeting
Arthur Josi, Luiz Gustavo Hafemann, Abdallah Dib et al.
Scene Graph Guided Generation: Enable Accurate Relations Generation in Text-to-Image Models via Textural Rectification
Guibao SHEN, Luozhou Wang, Jiantao Lin et al.
ReMP-AD: Retrieval-enhanced Multi-modal Prompt Fusion for Few-Shot Industrial Visual Anomaly Detection
Hongchi Ma, Guanglei Yang, Debin Zhao et al.
GMMamba: Group Masking Mamba for Whole Slide Image Classification
Tingting Zheng, Hongxun Yao, Kui Jiang et al.
MistSense: Versatile Online Detection of Procedural and Execution Mistakes
Constantin Patsch, Yuankai Wu, Marsil Zakour et al.
Attention to Trajectory: Trajectory-Aware Open-Vocabulary Tracking
Yunhao Li, Yifan Jiao, Dan Meng et al.
RareCLIP: Rarity-aware Online Zero-shot Industrial Anomaly Detection
Jianfang He, Min Cao, Silong Peng et al.
Fine-Grained 3D Gaussian Head Avatars Modeling from Static Captures via Joint Reconstruction and Registration
Yuan Sun, Xuan Wang, Cong Wang et al.
Not All Degradations Are Equal: A Targeted Feature Denoising Framework for Generalizable Image Super-Resolution
hongjun wang, Jiyuan Chen, Zhengwei Yin et al.
Generic Event Boundary Detection via Denoising Diffusion
Jaejun Hwang, Dayoung Gong, Manjin Kim et al.
Temporal Rate Reduction Clustering for Human Motion Segmentation
Xianghan Meng, Zhengyu Tong, Zhiyuan Huang et al.
Hierarchy UGP: Hierarchy Unified Gaussian Primitive for Large-Scale Dynamic Scene Reconstruction
Hongyang Sun, Qinglin Yang, Jiawei Wang et al.
RobAVA: A Large-scale Dataset and Baseline Towards Video based Robotic Arm Action Understanding
Baoli Sun, Ning Wang, Xinzhu Ma et al.
Backdoor Mitigation by Distance-Driven Detoxification
Shaokui Wei, Jiayin Liu, Hongyuan Zha
Democratizing High-Fidelity Co-Speech Gesture Video Generation
Xu Yang, Shaoli Huang, Shenbo Xie et al.
π-AVAS: Can Physics-Integrated Audio-Visual Modeling Boost Neural Acoustic Synthesis?
Susan Liang, Chao Huang, Yolo Yunlong Tang et al.
HFD-Teacher: High-Frequency Depth Distillation from Depth Foundation Models for Enhanced Depth Completion
Zhiyuan Yang, Anqi Cheng, Haiyue Zhu et al.
Proxy-Bridged Game Transformer for Interactive Extreme Motion Prediction
Yanwen Fang, Wenqi Jia, Xu Cao et al.
Q-Norm: Robust Representation Learning via Quality-Adaptive Normalization
Lanning Zhang, Ying Zhou, Fei Gao et al.
Separation for Better Integration: Disentangling Edge and Motion in Event-based Deblurring
Yufei Zhu, Hao Chen, Yongjian Deng et al.
EVDM: Event-based Real-world Video Deblurring with Mamba
Zhijing Sun, Senyan Xu, Kean Liu et al.
LOMM: Latest Object Memory Management for Temporally Consistent Video Instance Segmentation
Seunghun Lee, Jiwan Seo, Minwoo Choi et al.
Diversity-Enhanced Distribution Alignment for Dataset Distillation
Hongcheng Li, Yucan Zhou, Xiaoyan Gu et al.
Height-Fidelity Dense Global Fusion for Multi-modal 3D Object Detection
Hanshi Wang, Jin Gao, Weiming Hu et al.
SMSTracker: Tri-path Score Mask Sigma Fusion for Multi-Modal Tracking
Sixian Chan, Zedong Li, Xiaoqin Zhang et al.
Two Losses, One Goal: Balancing Conflict Gradients for Semi-supervised Semantic Segmentation
Rui Sun, Huayu Mai, Wangkai Li et al.
MOERL: When Mixture-of-Experts Meet Reinforcement Learning for Adverse Weather Image Restoration
Tao Wang, Peiwen Xia, Bo Li et al.
CMB-ML: A Cosmic Microwave Background Dataset for the Oldest Possible Computer Vision Task
James Amato, Yunan Xie, Leonel Medina-Varela et al.
Adapt Foundational Segmentation Models with Heterogeneous Searching Space
Li Yi, Jie Hu, Songan Zhang et al.
Think Twice: Test-Time Reasoning for Robust CLIP Zero-Shot Classification
Shenyu Lu, Zhaoying Pan, Xiaoqian Wang
Continual Adaptation: Environment-Conditional Parameter Generation for Object Detection in Dynamic Scenarios
Deng Li, Aming WU, Yang Li et al.
Unfolding-Associative Encoder-Decoder Network with Progressive Alignment for Pansharpening
Shijie Fang, Hongping Gan
FedAGC: Federated Continual Learning with Asymmetric Gradient Correction
Chengchao Zhang, Fanhua Shang, Hongying Liu et al.
Privacy-centric Deep Motion Retargeting for Anonymization of Skeleton-Based Motion Visualization
Thomas Carr, Depeng Xu, Shuhan Yuan et al.
MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation
Sungwoo Cho, Jeongsoo Choi, Sungnyun Kim et al.
EditCLIP: Representation Learning for Image Editing
Qian Wang, Aleksandar Cvejic, Abdelrahman Eldesokey et al.
Capturing head avatar with hand contacts from a monocular video
Haonan He, Yufeng Zheng, Jie Song
GaussianSpeech: Audio-Driven Personalized 3D Gaussian Avatars
Shivangi Aneja, Artem Sevastopolsky, Tobias Kirschstein et al.
From Sharp to Blur: Unsupervised Domain Adaptation for 2D Human Pose Estimation Under Extreme Motion Blur Using Event Cameras
Youngho Kim, Hoonhee Cho, Kuk-Jin Yoon
Allowing Oscillation Quantization: Overcoming Solution Space Limitation in Low Bit-Width Quantization
Weiying Xie, Zihan Meng, Jitao Ma et al.
MorphoGen: Efficient Unconditional Generation of Long-Range Projection Neuronal Morphology via a Global-to-Local Framework
Tianfang Zhu, Hongyang Zhou, Anan LI
AdaDCP: Learning an Adapter with Discrete Cosine Prior for Clear-to-Adverse Domain Generalization
Qi Bi, Yixian Shen, Jingjun Yi et al.
Less Static, More Private: Towards Transferable Privacy-Preserving Action Recognition by Generative Decoupled Learning
Zhi-Wei Xia, Kun-Yu Lin, Yuan-Ming Li et al.
MemDistill: Distilling LiDAR Knowledge into Memory for Camera-Only 3D Object Detection
Donghyeon Kwon, Youngseok Yoon, Hyeongseok Son et al.
Image Intrinsic Scale Assessment: Bridging the Gap Between Quality and Resolution
Vlad Hosu, Lorenzo Agnolucci, Daisuke Iso et al.
SDFormer: Vision-based 3D Semantic Scene Completion via SAM-assisted Dual-channel Voxel Transformer
Yujie Xue, Huilong Pi, Jiapeng Zhang et al.
Joint Learning of Pose Regression and Denoising Diffusion with Score Scaling Sampling for Category-level 6D Pose Estimation
Seunghyun Lee, Tae-Kyun Kim
TopoTTA: Topology-Enhanced Test-Time Adaptation for Tubular Structure Segmentation
Jiale Zhou, Wenhan Wang, Shikun Li et al.
Neuromanifold-Regularized KANs for Shape-fair Feature Representations
Mazlum Arslan, Weihong Guo, Shuo Li
Highlight What You Want: Weakly-Supervised Instance-Level Controllable Infrared-Visible Image Fusion
Zeyu Wang, Jizheng Zhang, Haiyu Song et al.
GestureHYDRA: Semantic Co-speech Gesture Synthesis via Hybrid Modality Diffusion Transformer and Cascaded-Synchronized Retrieval-Augmented Generation
Quanwei Yang, Luying Huang, Kaisiyuan Wang et al.
Learning A Unified Template for Gait Recognition
Panjian Huang, Saihui Hou, Junzhou Huang et al.
ZFusion: Efficient Deep Compositional Zero-shot Learning for Blind Image Super-Resolution with Generative Diffusion Prior
Alireza Esmaeilzehi, Hossein Zaredar, Yapeng Tian et al.
DeFSS: Image-to-Mask Denoising Learning for Few-shot Segmentation
Zishu Qin, Junhao Xu, Weifeng Ge
FlowDPS : Flow-Driven Posterior Sampling for Inverse Problems
Jeongsol Kim, Bryan Sangwoo Kim, Jong Ye
HADES: Human Avatar with Dynamic Explicit Hair Strands
Zhanfeng Liao, Hanzhang Tu, Cheng Peng et al.
TAD-E2E: A Large-scale End-to-end Autonomous Driving Dataset
Chang Liu, mingxuzhu mingxuzhu, Zheyuan Zhang et al.
Intra-modal and Cross-modal Synchronization for Audio-visual Deepfake Detection and Temporal Localization
Ashutosh Anshul, Shreyas Gopal, Deepu Rajan et al.
Cooperative Pseudo Labeling for Unsupervised Federated Classification
Kuangpu Guo, Lijun Sheng, Yongcan Yu et al.
NAPPure: Adversarial Purification for Robust Image Classification under Non-Additive Perturbations
Junjie Nan, Jianing Li, Wei Chen et al.
Photolithography Overlay Map Generation with Implicit Knowledge Distillation Diffusion Transformer
YuanFu Yang, Hsiu-Hui Hsiao
VideoSetDiff: Identifying and Reasoning Similarities and Differences in Similar Videos
YUE QIU, Yanjun Sun, Takuma Yagi et al.
What's Making That Sound Right Now? Video-centric Audio-Visual Localization
hahyeon choi, Junhoo Lee, Nojun Kwak
MissRAG: Addressing the Missing Modality Challenge in Multimodal Large Language Models
Vittorio Pipoli, Alessia Saporita, Federico Bolelli et al.
Scaling Action Detection: AdaTAD++ with Transformer-Enhanced Temporal-Spatial Adaptation
Tanay Agrawal, Abid Ali, Antitza Dantcheva et al.
VehicleMAE: View-asymmetry Mutual Learning for Vehicle Re-identification Pre-training via Masked AutoEncoders
Qi Wang, Zeyu Zhang, Dong Wang et al.
GDKVM: Echocardiography Video Segmentation via Spatiotemporal Key-Value Memory with Gated Delta Rule
Rui Wang, Yimu Sun, Jingxing Guo et al.
MagicCity: Geometry-Aware 3D City Generation from Satellite Imagery with Multi-View Consistency
Xingbo YAO, xuanmin Wang, Hao WU et al.
Enhancing Adversarial Transferability by Balancing Exploration and Exploitation with Gradient-Guided Sampling
Zenghao Niu, Weicheng Xie, Siyang Song et al.
Multi-scenario Overlapping Text Segmentation with Depth Awareness
Yang Liu, Xudong Xie, Yuliang Liu et al.
Factorized Learning for Temporally Grounded Video-Language Models
Wenzheng Zeng, Difei Gao, Mike Zheng Shou et al.
MinCD-PnP: Learning 2D-3D Correspondences with Approximate Blind PnP
Pei An, Jiaqi Yang, Muyao Peng et al.
FullDiT: Video Generative Foundation Models with Multimodal Control via Full Attention
Xuan Ju, Weicai Ye, Quande Liu et al.
SC-Lane: Slope-aware and Consistent Road Height Estimation Framework for 3D Lane Detection
Chaesong Park, Eunbin Seo, JihyeonHwang JihyeonHwang et al.
TimeBooth: Disentangled Facial Invariant Representation for Diverse and Personalized Face Aging
Zepeng Su, zhulin liu, Zongyan Zhang et al.
Switch-a-View: View Selection Learned from Unlabeled In-the-wild Videos
Sagnik Majumder, Tushar Nagarajan, Ziad Al-Halah et al.
ConceptSplit: Decoupled Multi-Concept Personalization of Diffusion Models via Token-wise Adaptation and Attention Disentanglement
Habin Lim, Youngseob Won, Juwon Seo et al.
DISTIL: Data-Free Inversion of Suspicious Trojan Inputs via Latent Diffusion
Hossein Mirzaei, Zeinab Taghavi, Sepehr Rezaee et al.
Backdoor Defense via Enhanced Splitting and Trap Isolation
Hongrui Yu, Lu Qi, Wanyu Lin et al.
Learning Hierarchical Line Buffer for Image Processing
Jiacheng Li, Feiran Li, Daisuke Iso
Task-Aware Prompt Gradient Projection for Parameter-Efficient Tuning Federated Class-Incremental Learning
Hualong Ke, Yachao Zhang, Jiangming Shi et al.
Robust Adverse Weather Removal via Spectral-based Spatial Grouping
Yuhwan Jeong, Yunseo Yang, Youngho Yoon et al.
ART: Adaptive Relation Tuning for Generalized Relation Prediction
Gopika Sudhakaran, Hikaru Shindo, Patrick Schramowski et al.
Humans as Checkerboards: Calibrating Camera Motion Scale for World-Coordinate Human Mesh Recovery
Fengyuan Yang, Kerui Gu, Ha Linh Nguyen et al.
Learning Efficient and Generalizable Human Representation with Human Gaussian Model
Yifan Liu, Shengjun Zhang, Chensheng Dai et al.
Overcoming Dual Drift for Continual Long-Tailed Visual Question Answering
Feifei Zhang, Zhihao Wang, Xi Zhang et al.
Event-guided HDR Reconstruction with Diffusion Priors
Yixin Yang, jiawei zhang, Yang Zhang et al.
GeoMan: Temporally Consistent Human Geometry Estimation using Image-to-Video Diffusion
Gwanghyun Kim, Xueting Li, Ye Yuan et al.
Fast Image Super-Resolution via Consistency Rectified Flow
Jiaqi Xu, Wenbo Li, Haoze Sun et al.
MBTI: Masked Blending Transformers with Implicit Positional Encoding for Frame-rate Agnostic Motion Estimation
Jungwoo Huh, Yeseung Park, Seongjean Kim et al.
χ: Symmetry Understanding of 3D Shapes via Chirality Disentanglement
Weikang Wang, Tobias Weißberg, Nafie El Amrani et al.
Dirichlet-Constrained Variational Codebook Learning for Temporally Coherent Video Face Restoration
Baoyou Chen, Ce Liu, Weihao Yuan et al.