Most Cited 2025 "image-caption pairs" Papers
22,274 papers found • Page 100 of 112
Conference
DMF-Net: Image-Guided Point Cloud Completion with Dual-Channel Modality Fusion and Shape-Aware Upsampling Transformer
Aihua Mao, Yuxuan Tang, Jiangtao Huang et al.
Black-Box Test-Time Prompt Tuning for Vision-Language Models
Fan'an Meng, Chaoran Cui, Hongjun Dai et al.
Sp3ctralMamba: Physics-Driven Joint State Space Model for Hyperspectral Image Reconstruction
Ge Meng, Jingyan Tu, Jingjia Huang et al.
Qua2SeDiMo: Quantifiable Quantization Sensitivity of Diffusion Models
Keith G. Mills, Mohammad Salameh, Ruichen Chen et al.
Energy vs. Noise: Towards Robust Temporal Action Localization in Open-World
Chenyu Mu, Jiahua Li, Kun Wei et al.
SegFace: Face Segmentation of Long-Tail Classes
Kartik Narayan, Vibashan Vs, Vishal M. Patel
HiGDA: Hierarchical Graph of Nodes to Learn Local-to-Global Topology for Semi-Supervised Domain Adaptation
Ba Hung Ngo, Doanh C. Bui, Nhat-Tuong Do-Tran et al.
iMoT: Inertial Motion Transformer for Inertial Navigation
Son Minh Nguyen, Duc Viet Le, Paul Havinga
SPU-IMR: Self-supervised Arbitrary-scale Point Cloud Upsampling via Iterative Mask-recovery Network
Ziming Nie, Qiao Wu, Chenlei Lv et al.
Exploring Semantic Consistency and Style Diversity for Domain Generalized Semantic Segmentation
Hongwei Niu, Linhuang Xie, Jianghang Lin et al.
Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community
Jiancheng Pan, Yanxing Liu, Yuqian Fu et al.
Learning with Open-world Noisy Data via Class-independent Margin in Dual Representation Space
Linchao Pan, Can Gao, Jie Zhou et al.
DuSSS: Dual Semantic Similarity-Supervised Vision-Language Model for Semi-Supervised Medical Image Segmentation
Qingtao Pan, Wenhao Qiao, Jingjiao Lou et al.
Fair Training with Zero Inputs
Wenjie Pan, Jianqing Zhu, Huanqiang Zeng
Procedure Knowledge Decoupled Distillation Strategy for Procedure Planning in Instructional Videos
Xiaotian Pan, Zhaobo Qi, Xin Sun et al.
S2S2: Semantic Stacking for Robust Semantic Segmentation in Medical Imaging
Yimu Pan, Sitao Zhang, Alison D. Gernand et al.
Point Cloud Semantic Segmentation with Sparse and Inhomogeneous Annotations
Zhiyi Pan, Nan Zhang, Wei Gao et al.
Modular-Cam: Modular Dynamic Camera-view Video Generation with LLM
Zirui Pan, Xin Wang, Yipeng Zhang et al.
Partially Blinded Unlearning: Class Unlearning for Deep Networks from Bayesian Perspective
Subhodip Panda, Shashwat Sourav, Prathosh A.P.
Beyond Text: Fine-Grained Multi-Modal Fact Verification with Hypergraph Transformers
Hui Pang, Chaozhuo Li, Litian Zhang et al.
SeeDiff: Off-the-Shelf Seeded Mask Generation from Diffusion Models
Joon Hyun Park, Kumju Jo, Sungyong Baik
EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba
Xiaohuan Pei, Tao Huang, Chang Xu
CDE-Learning: Camera Deviation Elimination Learning for Unsupervised Person Re-identification
Jinjia Peng, Songyu Zhang, Huibing Wang
Adaptive Dual-domain Learning for Underwater Image Enhancement
Lintao Peng, Liheng Bian
Boosting Image De-Raining via Central-Surrounding Synergistic Convolution
Long Peng, Yang Wang, Xin Di et al.
3D-aware Select, Expand, and Squeeze Token for Aerial Action Recognition
Luying Peng, Xiangbo Shu, Yazhou Yao et al.
OAMaskFlow: Occlusion-Aware Motion Mask for Scene Flow
Xiongfeng Peng, Zhihua Liu, Weiming Li et al.
HVDualformer: Histogram-Vision Dual Transformer for White Balance
Yan-Tsung Peng, Guan-Rong Chen
Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance
Duc-Hai Pham, Duc-Dung Nguyen, Anh Pham et al.
Leveraging Anatomical Consistency for Multi-Object Detection in Ultrasound Images via Source-free Unsupervised Domain Adaptation
Bin Pu, Xingguo Lv, Jiewen Yang et al.
Dive into Aerial Remote Sensing Underwater Depth Estimation with Hyperspectral Imagery
Jiahao Qi, Xingyue Liu, Chen Chen et al.
Unsupervised Domain Adaptive Person Search via Dual Self-Calibration
Linfeng Qi, Huibing Wang, Jiqing Zhang et al.
PhysDiff: Physiology-based Dynamicity Disentangled Diffusion Model for Remote Physiological Measurement
Wei Qian, Gaoji Su, Dan Guo et al.
Holistic Correction with Object Prototype for Video Object Segmentation
Shengye Qiao, Changqun Xia, Yanjie Liang et al.
Integrating Low-Level Visual Cues for Enhanced Unsupervised Semantic Segmentation
Yuhao Qing, Dan Zeng, Shaorong Xie et al.
PC-BEV: An Efficient Polar-Cartesian BEV Fusion Framework for LiDAR Semantic Segmentation
Shoumeng Qiu, Xinrun Li, Xiangyang Xue et al.
High-Fidelity Polarimetric Implicit 3D Reconstruction with View-Dependent Physical Representation
Yu Qiu, Sijia Wen, Hainan Zhang et al.
HSOD-BIT-V2: A Challenging Benchmark for Hyperspectral Salient Object Detection
Yuhao Qiu, Shuyan Bai, Tingfa Xu et al.
Universal Features Guided Zero-Shot Category-Level Object Pose Estimation
Wentian Qu, Chenyu Meng, Heng Li et al.
GHOST: Gaussian Hypothesis Open-Set Technique
Ryan Rabinowitz, Steve Cruz, Manuel Günther et al.
CDTR: Semantic Alignment for Video Moment Retrieval Using Concept Decomposition Transformer
Ran Ran, Jiwei Wei, Xiangyi Cai et al.
Improving Integrated Gradient-based Transferable Adversarial Examples by Refining the Integration Path
Yuchen Ren, Zhengyu Zhao, Chenhao Lin et al.
GenHMR: Generative Human Mesh Recovery
Muhammad Usama Saleem, Ekkasit Pinyoanuntapong, Pu Wang et al.
FunEditor: Achieving Complex Image Edits via Function Aggregation with Diffusion Models
Mohammadreza Samadi, Fred X. Han, Mohammad Salameh et al.
PVTree: Realistic and Controllable Palm Vein Generation for Recognition Tasks
Sheng Shang, Chenglong Zhao, Ruixin Zhang et al.
Video Summarization Using Denoising Diffusion Probabilistic Model
Zirui Shang, Yubo Zhu, Hongxi Li et al.
IMAGDressing-v1: Customizable Virtual Dressing
Fei Shen, Xin Jiang, Xin He et al.
In2NeCT: Inter-class and Intra-class Neural Collapse Tuning for Semantic Segmentation of Imbalanced Remote Sensing Images
Junao Shen, Qiyun Hu, Tian Feng et al.
Topology-Aware 3D Gaussian Splatting: Leveraging Persistent Homology for Optimized Structural Integrity
Tianqi Shen, Shaohua Liu, Jiaqi Feng et al.
Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera
Haixin Shi, Yinlin Hu, Daniel Koguciuk et al.
Normal-NeRF: Ambiguity-Robust Normal Estimation for Highly Reflective Scenes
Ji Shi, Xianghua Ying, Ruohao Guo et al.
Neural Block Compression: Variable Bitrates Feature Blocks for Texture Representation
Rui Shi, Yishun Dou, Zhong Zheng et al.
HS-FPN: High Frequency and Spatial Perception FPN for Tiny Object Detection
Zican Shi, Jing Hu, Jie Ren et al.
SdalsNet: Self-Distilled Attention Localization and Shift Network for Unsupervised Camouflaged Object Detection
Peiyao Shou, Yixiu Liu, Wei Wang et al.
OGP-Net: Optical Guidance Meets Pixel-Level Contrastive Distillation for Robust Multi-Modal and Missing Modality Segmentation
Aniruddh Sikdar, Jayant Teotia, Suresh Sundaram
Fine-Grained Perception in Panoramic Scenes: A Novel Task, Dataset, and Method for Object Importance Ranking
Jia Song, Chenglizhao Chen, Xu Yu et al.
CtrlAvatar: Controllable Avatars Generation via Disentangled Invertible Networks
Wenfeng Song, Yang Ding, Fei Hou et al.
ERL-MPP: Evolutionary Reinforcement Learning with Multi-head Puzzle Perception for Solving Large-scale Jigsaw Puzzles of Eroded Gaps
Xingke Song, Xiaoying Yang, Chenglin Yao et al.
Temporal Coherent Object Flow for Multi-Object Tracking
Zikai Song, Run Luo, Lintao Ma et al.
Toward Improving Robustness and Accuracy in Unsupervised Domain Adaptation
Aishwarya Soni, Tanima Dutta
Hierarchical Vector Quantization for Unsupervised Action Segmentation
Federico Spurio, Emad Bahrami, Gianpiero Francesca et al.
Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization Through Spare-Coding Transformer
Lei Su, Xiaochen Ma, Xuekang Zhu et al.
EigenSR: Eigenimage-Bridged Pre-Trained RGB Learners for Single Hyperspectral Image Super-Resolution
Xi Su, Xiangfei Shen, Mingyang Wan et al.
Dual-branch Graph Feature Learning for NLOS Imaging
Xiongfei Su, Tianyi Zhu, Lina Liu et al.
Explicit Relational Reasoning Network for Scene Text Detection
Yuchen Su, Zhineng Chen, Yongkun Du et al.
3D Annotation-Free Learning by Distilling 2D Open-Vocabulary Segmentation Models for Autonomous Driving
Boyi Sun, Yuhang Liu, Xingxia Wang et al.
NeuralFlix: A Simple While Effective Framework for Semantic Decoding of Videos from Non-invasive Brain Recordings
Jingyuan Sun, Mingxiao Li, Marie-Francine Moens
Guided and Variance-Corrected Fusion with One-shot Style Alignment for Large-Content Image Generation
Shoukun Sun, Min Xian, Tiankai Yao et al.
M2Flow: A Motion Information Fusion Framework for Enhanced Unsupervised Optical Flow Estimation in Autonomous Driving
Xunpei Sun, Gang Chen, Zuoxun Hou
Leveraging Large Vision-Language Model as User Intent-Aware Encoder for Composed Image Retrieval
Zelong Sun, Dong Jing, Guoxing Yang et al.
C2P-CLIP: Injecting Category Common Prompt in CLIP to Enhance Generalization in Deepfake Detection
Chuangchuang Tan, Renshuai Tao, Huan Liu et al.
Neighbor Does Matter: Density-Aware Contrastive Learning for Medical Semi-supervised Segmentation
Feilong Tang, Zhongxing Xu, Ming Hu et al.
MUSE: Mamba Is Efficient Multi-scale Learner for Text-video Retrieval
Haoran Tang, Meng Cao, Jinfa Huang et al.
BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving
Tao Tang, Dafeng Wei, Zhengyu Jia et al.
More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding
Yuan Tang, Xu Han, Xianzhi Li et al.
RAGG: Retrieval-Augmented Grasp Generation Model
Zhenhua Tang, Bin Zhu, Yanbin Hao et al.
From Representation Space to Prognostic Insights: Whole Slide Image Generation with Hierarchical Diffusion Model for Survival Prediction
Zhihao Tang, Xi Zhang, Chaozhuo Li
3D²-Actor: Learning Pose-Conditioned 3D-Aware Denoiser for Realistic Gaussian Avatar Modeling
Zichen Tang, Hongyu Yang, Hanchen Zhang et al.
Stitch, Contrast, and Segment: Learning a Human Action Segmentation Model Using Trimmed Skeleton Videos
Haitao Tian, Pierre Payeur
Unsupervised Self-Prior Embedding Neural Representation for Iterative Sparse-View CT Reconstruction
Xuanyu Tian, Lixuan Chen, Qing Wu et al.
AI-generated Image Quality Assessment in Visual Communication
Yu Tian, Yixuan Li, Baoliang Chen et al.
G-VEval: A Versatile Metric for Evaluating Image and Video Captions Using GPT-4o
Tony Cheng Tong, Sirui He, Zhiwen Shao et al.
Memory-Augmented Re-Completion for 3D Semantic Scene Completion
Yu-Wen Tseng, Sheng-Ping Yang, Jhih-Ciang Wu et al.
TextToucher: Fine-Grained Text-to-Touch Generation
Jiahang Tu, Hao Fu, Fengyu Yang et al.
Watch Video, Catch Keyword: Context-aware Keyword Attention for Moment Retrieval and Highlight Detection
Sung Jin Um, Dongjin Kim, Sangmin Lee et al.
VOILA: Complexity-Aware Universal Segmentation of CT Images by Voxel Interacting with Language
Zishuo Wan, Yu Gao, Wanyuan Pang et al.
ParGo: Bridging Vision-Language with Partial and Global Views
An-Lan Wang, Bin Shan, Wei Shi et al.
RA-GAR: A Richly Annotated Benchmark for Gait Attribute Recognition
Chenye Wang, Saihui Hou, Aoqi Li et al.
Towards Efficient Object Re-Identification with a Novel Cloud-Edge Collaborative Framework
Chuanming Wang, Yuxin Yang, Mengshi Qi et al.
Target-Driven Distillation: Consistency Distillation with Target Timestep Selection and Decoupled Guidance
Cunzheng Wang, Ziyuan Guo, Yuxuan Duan et al.
A Black-Box Evaluation Framework for Semantic Robustness in Bird’s Eye View Detection
Fu Wang, Yanghao Zhang, Xiangyu Yin et al.
Scene Graph-Grounded Image Generation
Fuyun Wang, Tong Zhang, Yuanzhi Wang et al.
S³-Mamba: Small-Size-Sensitive Mamba for Lesion Segmentation
Gui Wang, Yuexiang Li, Wenting Chen et al.
BLS-GAN: A Deep Layer Separation Framework for Eliminating Bone Overlap in Conventional Radiographs
Haolin Wang, Yafei Ou, Prasoon Ambalathankandy et al.
EMControl: Adding Conditional Control to Text-to-Image Diffusion Models via Expectation-Maximization
He Wang, Longquan Dai, Jinhui Tang
M2OST: Many-to-one Regression for Predicting Spatial Transcriptomics from Digital Pathology Images
Hongyi Wang, Xiuju Du, Jing Liu et al.
RAP-SR: RestorAtion Prior Enhancement in Diffusion Models for Realistic Image Super-Resolution
Jiangang Wang, Qingnan Fan, Jinwei Chen et al.
MM-Mixing: Multi-Modal Mixing Alignment for 3D Understanding
Jiaze Wang, Yi Wang, Ziyu Guo et al.
OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision
Junjie Wang, Bin Chen, Bin Kang et al.
InpDiffusion: Image Inpainting Localization via Conditional Diffusion Models
Kai Wang, Shaozhang Niu, Qixian Hao et al.
Tracking Everything Everywhere across Multiple Cameras
Li-Heng Wang, YuJu Cheng, Tyng-Luh Liu
VLScene: Vision-Language Guidance Distillation for Camera-Based 3D Semantic Scene Completion
Meng Wang, Huilong Pi, Ruihui Li et al.
Deep Multi-modal Graph Clustering via Graph Transformer Network
Qianqian Wang, Haiming Xu, Zihao Zhang et al.
The Parables of the Mustard Seed and the Yeast: Extremely Low-Budget, High-Performance Nighttime Semantic Segmentation
Shiqin Wang, Xin Xu, Haoyang Chen et al.
GFlow: Recovering 4D World from Monocular Video
Shizun Wang, Xingyi Yang, Qiuhong Shen et al.
Imagine: Image-Guided 3D Part Assembly with Structure Knowledge Graph
Weihao Wang, Yu Lan, Mingyu You et al.
MVReward: Better Aligning and Evaluating Multi-View Diffusion Models with Human Preferences
Weitao Wang, Haoran Xu, Yuxiao Yang et al.
FreeGen: Bridging Visual-Linguistic Discrepancies Towards Diffusion-based Pixel-level Data Synthesis
Wenzhuang Wang, Mingcan Ma, Yong Chen et al.
DCTMamba: Advancing JPEG Image Restoration Through Long-Sequence Modeling and Adaptive Frequency Strategy
Xi Wang, Xueyang Fu, Liang Li et al.
From 2D CAD Drawings to 3D Parametric Models: A Vision-Language Approach
Xilin Wang, Jia Zheng, Yuanchao Hu et al.
Lifting Scheme-Based Implicit Disentanglement of Emotion-Related Facial Dynamics in the Wild
Xingjian Wang, Li Chai
MIMTrack: In-Context Tracking via Masked Image Modeling
Xingmei Wang, Guohao Nie, Jiaxiang Meng et al.
From Coarse to Fine: A Matching and Alignment Framework for Unsupervised Cross-View Geo-Localization
Xueyi Wang, Lele Zhang, Zheng Fan et al.
RefDetector: A Simple Yet Effective Matching-based Method for Referring Expression Comprehension
Yabing Wang, Zhuotao Tian, Zheng Qin et al.
Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension
Yaxian Wang, Henghui Ding, Shuting He et al.
Breaking Barriers in Physical-World Adversarial Examples: Improving Robustness and Transferability via Robust Feature
Yichen Wang, Yuxuan Chou, Ziqi Zhou et al.
Capturing the Unseen: Vision-Free Facial Motion Capture Using Inertial Measurement Units
Youjia Wang, Yiwen Wu, Hengan Zhou et al.
Re-Attentional Controllable Video Diffusion Editing
Yuanzhi Wang, Yong Li, Mengyi Liu et al.
MambaPro: Multi-Modal Object Re-identification with Mamba Aggregation and Synergistic Prompt
Yuhao Wang, Xuehu Liu, Tianyu Yan et al.
IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis
Yuji Wang, Jingchen Ni, Yong Liu et al.
Target Scanpath-Guided 360-Degree Image Enhancement
Yujia Wang, Fang-Lue Zhang, Neil A. Dodgson
DualNet: Robust Self-Supervised Stereo Matching with Pseudo-Label Supervision
Yun Wang, Jiahao Zheng, Chenghao Zhang et al.
Mamba YOLO: A Simple Baseline for Object Detection with State Space Model
Zeyu Wang, Chen Li, Huiying Xu et al.
Style Nursing with Spatial and Semantic Guidance for Zero-Shot Traffic Scene Style Transfer
Zhen Wang, Zihang Lin, Meng Yuan et al.
Thermal-Aware Low-Light Image Enhancement: A Real-World Benchmark and a New Light-Weight Model
Zhen Wang, Yaozu Wu, Dongyuan Li et al.
Attention-Imperceptible Backdoor Attacks on Vision Transformers
Zhishen Wang, Rui Wang, Lihua Jing
LLM-RG4: Flexible and Factual Radiology Report Generation Across Diverse Input Contexts
Zhuhao Wang, Yihua Sun, Zihan Li et al.
MSV-PCT: Multi-Sparse-View Enhanced Transformer Framework for Salient Object Detection in Point Clouds
Zihao Wang, Yiming Huang, Gengyu Lyu et al.
GlyphSR: A Simple Glyph-Aware Framework for Scene Text Image Super-Resolution
Baole Wei, Yuxuan Zhou, Liangcai Gao et al.
Power of Diversity: Enhancing Data-Free Black-Box Attack with Domain-Augmented Learning
Yang Wei, Jingyu Tan, Guowen Xu et al.
Achieving Lightweight Super-Resolution for Real-Time Computer Graphics
Yu Wen, Chen Zhang, Chenhao Xie et al.
Multi-axis Prompt and Multi-dimension Fusion Network for All-in-one Weather-degraded Image Restoration
Yuanbo Wen, Tao Gao, Jing Zhang et al.
USDRL: Unified Skeleton-Based Dense Representation Learning with Multi-Grained Feature Decorrelation
Wanjiang Weng, Hongsong Wang, Junbo Wang et al.
Spin: Diffusion-based Semantic Image Painting Through Independent Information Injection
Dantong Wu, Zhiqiang Chen, Tianjiao Du et al.
Structural Pruning via Spatial-aware Information Redundancy for Semantic Segmentation
Dongyue Wu, Zilin Guo, Li Yu et al.
SVRMamba: Slice-to-Volume Reconstruction from Multiple MRI Stacks with Slice Sequence Guided Mamba
Jiangjie Wu, Hongjiang Wei, Yuyao Zhang
VarCMP: Adapting Cross-Modal Pre-Training Models for Video Anomaly Retrieval
Peng Wu, Wanshun Su, Xiangteng He et al.
Realistic Noise Synthesis with Diffusion Models
Qi Wu, Mingyan Han, Ting Jiang et al.
PanAdapter: Two-Stage Fine-Tuning with Spatial-Spectral Priors Injecting for Pansharpening
RuoCheng Wu, Zien Zhang, Shangqi Deng et al.
Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning
Shengqiong Wu, Hao Fei, Liangming Pan et al.
CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities
Tao Wu, Yong Zhang, Xintao Wang et al.
Deconfound Semantic Shift and Incompleteness in Incremental Few-shot Semantic Segmentation
Yirui Wu, Yuhang Xia, Hao Li et al.
Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark
Yongliang Wu, Wenbo Zhu, Jiawang Cao et al.
MUCD: Unsupervised Point Cloud Change Detection via Masked Consistency
Yue Wu, Zhipeng Wang, Yongzhe Yuan et al.
Unified Knowledge Maintenance Pruning and Progressive Recovery with Weight Recalling for Large Vision-Language Models
Zimeng Wu, Jiaxin Chen, Yunhong Wang
RETRACTED: GEONet: Global Enhancement and Optimization Network for Lane Detection
Suyang Xi, Yunhao Liu, Hong Ding et al.
PlaNet: Learning to Mitigate Atmospheric Turbulence in Planetary Images
Yifei Xia, Chu Zhou, Chengxuan Zhu et al.
CA-Edit: Causality-Aware Condition Adapter for High-Fidelity Local Facial Attribute Editing
Xiaole Xian, Xilin He, Zenghao Niu et al.
ReMask-Animate: Refined Character Image Animation Using Mask-Guided Adapters
Xunzhi Xiang, Haiwei Xue, Zonghong Dai et al.
SMR-Net: Semantic-Guided Mutually Reinforcing Network for Cross-Modal Image Fusion and Salient Object Detection
Guobao Xiao, Xinyu Liu, Zebin Lin et al.
Boosting Vision State Space Model with Fractal Scanning
Haoke Xiao, Lv Tang, Peng-tao Jiang et al.
Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval
Jian Xiao, Zhenzhen Hu, Jia Li et al.
Cross-modulated Attention Transformer for RGBT Tracking
Yun Xiao, Jiacong Zhao, Andong Lu et al.
Omni-Query Active Learning for Source-Free Domain Adaptive Cross-Modality 3D Semantic Segmentation
Jianxiang Xie, Yao Wu, Yachao Zhang et al.
TextRefiner: Internal Visual Feature as Efficient Refiner for Vision-Language Models Prompt Tuning
Jingjing Xie, Yuxin Zhang, Jun Peng et al.
Discrete Prior-Based Temporal-Coherent Content Prediction for Blind Face Video Restoration
Lianxin Xie, Bingbing Zheng, Wen Xue et al.
Expand VSR Benchmark for VLLM to Expertize in Spatial Rules
Peijin Xie, Lin Sun, Bingquan Liu et al.
PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis
Yifan Xie, Tao Feng, Xin Zhang et al.
HieraFashDiff: Hierarchical Fashion Design with Multi-stage Diffusion Models
Zhifeng Xie, Hao Li, Huiming Ding et al.
Few-Shot Incremental Learning via Foreground Aggregation and Knowledge Transfer for Audio-Visual Semantic Segmentation
Jingqiao Xiu, Mengze Li, Zongxin Yang et al.
DiffScene: Diffusion-Based Safety-Critical Scenario Generation for Autonomous Vehicles
Chejian Xu, Aleksandr Petiushko, Ding Zhao et al.
FR²Seg: Continual Segmentation Across Multiple Sites via Fourier Style Replay and Adaptive Consistency Regularization
Cheng Xu, Weiwen Zhang, Hongrui Zhang et al.
Less Is More: Token Context-Aware Learning for Object Tracking
Chenlong Xu, Bineng Zhong, Qihua Liang et al.
3DHumanEdit: Multi-modal Body Part-aware Conditioning Information Integration for 3D Human Manipulation
FeiFan Xu, Tianyi Chen, Fan Yang et al.
Motion Artifact Removal in Pixel-Frequency Domain via Alternate Masks and Diffusion Model
Jiahua Xu, Dawei Zhou, Lei Hu et al.
OmniSR: Shadow Removal Under Direct and Indirect Lighting
Jiamin Xu, Zelong Li, Yuxin Zheng et al.
Multiple Feature Refining Network for Visual Emotion Distribution Learning
Qinfu Xu, Shaozu Yuan, Yiwei Wei et al.
SCKD: Semi-Supervised Cross-Modality Knowledge Distillation for 4D Radar Object Detection
Ruoyu Xu, Zhiyu Xiang, Chenwei Zhang et al.
LiON: Learning Point-Wise Abstaining Penalty for LiDAR Outlier DetectioN Using Diverse Synthetic Data
Shaocong Xu, Pengfei Li, Qianpu Sun et al.
Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models
Yifang Xu, Yunzhuo Sun, Benxiang Zhai et al.
HOIMamba: Efficient Mamba-based Disentangled Progressive Learning for HOI Detection
Yongchao Xu, Jiawei Liu, Sen Tao et al.
OOTDiffusion: Outfitting Fusion Based Latent Diffusion for Controllable Virtual Try-On
Yuhao Xu, Tao Gu, Weifeng Chen et al.
FLAME: Learning to Navigate with Multimodal LLM in Urban Environments
Yunzhe Xu, Yiyuan Pan, Zhe Liu et al.
FATE: Feature-Adapted Parameter Tuning for Vision-Language Models
Zhengqin Xu, Zelin Peng, Xiaokang Yang et al.
Toward Modality Gap: Vision Prototype Learning for Weakly-supervised Semantic Segmentation with CLIP
Zhongxing Xu, Feilong Tang, Zhe Chen et al.
RetouchGPT: LLM-based Interactive High-Fidelity Face Retouching via Imperfection Prompting
Wen Xue, Chun Ding, Ruotao Xu et al.
Physical Marker: Revealing Invisible Hyperlinks Hidden in Printed Trademarks
Yuliang Xue, Lei Tan, Guobiao Li et al.
Towards Universal Rainy Image Restoration: Benchmark and Baseline
Hujie Yan
SGTC: Semantic-Guided Triplet Co-training for Sparsely Annotated Semi-Supervised Medical Image Segmentation
Ke Yan, Qing Cai, Fan Zhang et al.
Data-Free Universal Attack by Exploiting the Intrinsic Vulnerability of Deep Models
YangTian Yan, Jinyu Tian
Robust Image Hashing Based on Contrastive Masked Autoencoder with Weak-Strong Augmentation Alignment
Cundian Yang, Guibo Luo, Yuesheng Zhu et al.
PlanLLM: Video Procedure Planning with Refinable Large Language Models
Dejie Yang, Zijing Zhao, Yang Liu
3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Detection
Enquan Yang, Peng Xing, Hanyang Sun et al.
Diffusion Prior Interpolation for Flexibility Real-World Face Super-Resolution
Jiarui Yang, Tao Dai, Yufei Zhu et al.
SMamba: Sparse Mamba for Event-based Object Detection
Nan Yang, Yang Wang, Zhanwen Liu et al.
One-Shot Reference-based Structure-Aware Image to Sketch Synthesis
Rui Yang, Honghong Yang, Li Zhao et al.
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding
Senqiao Yang, Jiaming Liu, Renrui Zhang et al.
Asymmetric Hierarchical Difference-aware Interaction Network for Event-guided Motion Deblurring
Wen Yang, Jinjian Wu, Leida Li et al.
Dual Information Purification for Lightweight SAR Object Detection
Xi Yang, Jiachen Sun, Songsong Duan et al.
DriveGazen: Event-Based Driving Status Recognition Using Conventional Camera
Xiaoyin Yang, Xin Yang
Semantic Segmentation on Raindrop Degraded Images Using Two-Stage Dual Teacher-Student Learning
Xin Yang, Wending Yan, Yuan Yuan et al.
ERF: A Benchmark Dataset for Robust Semantic Segmentation Under Extreme Rainfall Conditions
Xin Yang, Xin Zhang, Xinchao Wang
FreqTS: Frequency-Aware Token Selection for Accelerating Diffusion Models
Xinye Yang, Yuxin Yang, Haoran Pang et al.
Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving
Yu Yang, Jianbiao Mei, Yukai Ma et al.
UAWTrack: Universal 3D Single Object Tracking in Adverse Weather
Yuxiang Yang, Hongjie Gu, Yingqi Deng et al.
RealPortrait: Realistic Portrait Animation with Diffusion Transformers
Zejun Yang, Huawei Wei, Zhisheng Wang
Single Image Rolling Shutter Removal with Diffusion Models
Zhanglei Yang, Haipeng Li, Mingbo Hong et al.
MMGDreamer: Mixed-Modality Graph for Geometry-Controllable 3D Indoor Scene Generation
Zhifei Yang, Keyang Lu, Chao Zhang et al.
MoRe: Class Patch Attention Needs Regularization for Weakly Supervised Semantic Segmentation
Zhiwei Yang, Yucong Meng, Kexue Fu et al.
MM-Tracker: Motion Mamba for UAV-platform Multiple Object Tracking
Mufeng Yao, Jinlong Peng, Qingdong He et al.