Most Cited 2025 "message-passing paradigm" Papers
22,274 papers found • Page 82 of 112
Conference
Authentic 4D Driving Simulation with a Video Generation Model
Lening Wang, Wenzhao Zheng, Dalong Du et al.
Lidar Waveforms are Worth 40x128x33 Words
Dominik Scheuble, Hanno Holzhüter, Steven Peters et al.
Spherical Epipolar Rectification for Deep Two-View Absolute Depth Estimation
Pierre-André Brousseau, Sébastien Roy
Stochastic Interpolants for Revealing Stylistic Flows across the History of Art
Pingchuan Ma, Ming Gui, Johannes Schusterbauer et al.
Real3D: Towards Scaling Large Reconstruction Models with Real Images
Hanwen Jiang, Qixing Huang, Georgios Pavlakos
Wide2Long: Learning Lens Compression and Perspective Adjustment for Wide-Angle to Telephoto Translation
Soumyadipta Banerjee, Jiaul Paik, Debashis Sen
Leveraging 2D Priors and SDF Guidance for Urban Scene Rendering
Siddharth Tourani, Jayaram Reddy, Akash Kumbar et al.
SparseLaneSTP: Leveraging Spatio-Temporal Priors with Sparse Transformers for 3D Lane Detection
Maximilian Pittner, Joel Janai, Mario Faigle et al.
Relative Illumination Fields: Learning Medium and Light Independent Underwater Scenes
Mengkun She, Felix Seegräber, David Nakath et al.
Super Resolved Imaging with Adaptive Optics
Robin Swanson, Esther Y. H. Lin, Masen Lamb et al.
HVPUNet: Hybrid-Voxel Point-cloud Upsampling Network
Juhyung Ha, Vibhas Vats, Alimoor Reza et al.
Stealthy Backdoor Attack in Federated Learning via Adaptive Layer-wise Gradient Alignment
Qingqian Yang, Peishen Yan, Xiaoyu Wu et al.
SAC-GNC: SAmple Consensus for adaptive Graduated Non-Convexity
Valter Piedade, Chitturi Sidhartha, José Gaspar et al.
DiffuMatch: Category-Agnostic Spectral Diffusion Priors for Robust Non-rigid Shape Matching
Emery Pierson, Lei Li, Angela Dai et al.
Knowledge Distillation for Learned Image Compression
Yunuo Chen, Zezheng Lyu, Bing He et al.
Invisible Watermarks, Visible Gains: Steering Machine Unlearning with Bi-Level Watermarking Design
Yuhao Sun, Yihua Zhang, Gaowen Liu et al.
Diffusion-Based Extreme High-speed Scenes Reconstruction with the Complementary Vision Sensor
Yapeng Meng, Yihan Lin, Taoyi Wang et al.
CAT: A Unified Click-and-Track Framework for Realistic Tracking
Yongsheng Yuan, Jie Zhao, Dong Wang et al.
Boosting Class Representation via Semantically Related Instances for Robust Long-Tailed Learning with Noisy Labels
Yuhang Li, Zhuying Li, Yuheng Jia
Teeth Reconstruction and Performance Capture Using a Phone Camera
Weixi Zheng, Jingwang Ling, Zhibo Wang et al.
TorchAdapt: Towards Light-Agnostic Real-Time Visual Perception
Khurram Azeem Hashmi, Karthik Suresh, Didier Stricker et al.
Beyond RGB: Adaptive Parallel Processing for RAW Object Detection
Shani Gamrian, Hila Barel, Feiran Li et al.
Met2Net: A Decoupled Two-Stage Spatio-Temporal Forecasting Model for Complex Meteorological Systems
Shaohan Li, Hao Yang, Min Chen et al.
ROAR: Reducing Inversion Error in Generative Image Watermarking
Hanyi Wang, Han Fang, Shi-Lin Wang et al.
Diffusion Transformer meets Multi-level Wavelet Spectrum for Single Image Super-Resolution
Peng Du, Hui Li, Han Xu et al.
Automated Model Evaluation for Object Detection via Prediction Consistency and Reliability
Seungju Yoo, Hyuk Kwon, Joong-Won Hwang et al.
LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing
Federico Girella, Davide Talon, Ziyue Liu et al.
Multispectral Demosaicing via Dual Cameras
SaiKiran Tedla, Junyong Lee, Beixuan Yang et al.
Spatially-Varying Autofocus
Yingsi Qin, Aswin Sankaranarayanan, Matthew O'Toole
Event-based Visual Vibrometry
Xinyu Zhou, Peiqi Duan, Yeliduosi Xiaokaiti et al.
Flexi-FSCIL: Adaptive Knowledge Retention for Breaking the Stability-Plasticity Dilemma in Few-Shot Class-Incremental Learning
Wufei Xie, Yalin Wang, Chenliang Liu et al.
What to Distill? Fast Knowledge Distillation with Adaptive Sampling
Byungchul Chae, Seonyeong Heo
Finding Low-Rank Matrix Weights in DNNs via Riemannian Optimization: RAdaGrad and RAdamW
Fengmiao Bian, Jinyang ZHENG, Ziyun Liu et al.
Rare Text Semantics Were Always There in Your Diffusion Transformer
seil kang, Woojung Han, Dayun Ju et al.
TRNAS: A Training-Free Robust Neural Architecture Search
Yeming Yang, Qingling Zhu, Jianping Luo et al.
AVAM: a Universal Training-free Adaptive Visual Anchoring Embedded into Multimodal Large Language Model for Multi-image Question Answering
Kang Zeng, Guojin Zhong, Jintao Cheng et al.
Multi-Cache Enhanced Prototype Learning for Test-Time Generalization of Vision-Language Models
Xinyu Chen, Haotian Zhai, Can Zhang et al.
SU-RGS: Relightable 3D Gaussian Splatting from Sparse Views under Unconstrained Illuminations
Qi Zhang, Chi Huang, Qian Zhang et al.
Sibai: A Few-Shot Meta-Classifier for Poisoning Detection in Federated Learning
Melanie Götz, Torsten Krauß, Alexandra Dmitrienko
Prototype Guided Backdoor Defense via Activation Space Manipulation
Venkat Adithya Amula, Sunayana Samavedam, Saurabh Saini et al.
Scaling Transformer-Based Novel View Synthesis with Models Token Disentanglement and Synthetic Data
Nithin Gopalakrishnan Nair, Srinivas Kaza, Xuan Luo et al.
Customizing Domain Adapters for Domain Generalization
Yuyang Ji, Zeyi Huang, Haohan Wang et al.
MamV2XCalib: V2X-based Target-less Infrastructure Camera Calibration with State Space Model
Yaoye Zhu, Zhe Wang, Yan Wang
Soft Separation and Distillation: Toward Global Uniformity in Federated Unsupervised Learning
Hung-Chieh Fang, Hsuan-Tien Lin, Irwin King et al.
FedMeNF: Privacy-Preserving Federated Meta-Learning for Neural Fields
Junhyeog Yun, Minui Hong, Gunhee Kim
Adaptive Dual Uncertainty Optimization: Boosting Monocular 3D Object Detection under Test-Time Shifts
Zixuan Hu, Dongxiao Li, Xinzhu Ma et al.
Dynamic Multi-Layer Null Space Projection for Vision-Language Continual Learning
Borui Kang, Lei Wang, Zhiping Wu et al.
AirCache: Activating Inter-modal Relevancy KV Cache Compression for Efficient Large Vision-Language Model Inference
Kai Huang, hao zou, Bochen Wang et al.
Re-coding for Uncertainties: Edge-awareness Semantic Concordance for Resilient Event-RGB Segmentation
Nan Bao, Yifan Zhao, Lin Zhu et al.
FlowStyler: Artistic Video Stylization via Transformation Fields Transports
YuNing Gong, Jiaming Chen, Xiaohua Ren et al.
Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility
Melih Barsbey, Lucas Prieto, Stefanos Zafeiriou et al.
FastJSMA: Accelerating Jacobian-based Saliency Map Attacks through Gradient Decoupling
Zhenghao Gao, Shengjie Xu, Zijing Li et al.
Toward Fair and Accurate Cross-Domain Medical Image Segmentation: A VLM-Driven Active Domain Adaptation Paradigm
Hongqiu Wang, Wu Chen, Xiangde Luo et al.
Federated Continuous Category Discovery and Learning
Lixu Wang, Chenxi Liu, Junfeng Guo et al.
BlueNeg: A 35mm Negative Film Dataset for Restoring Channel-Heterogeneous Deterioration
Hanyuan Liu, Chengze Li, Minshan Xie et al.
Rethinking Key-frame-based Micro-expression Recognition: A Robust and Accurate Framework Against Key-frame Errors
Zheyuan Zhang, Weihao Tang, Hong Chen
Target Bias Is All You Need: Zero-Shot Debiasing of Vision-Language Models with Bias Corpus
Taeuk Jang, Hoin Jung, Xiaoqian Wang
Pretend Benign: A Stealthy Adversarial Attack by Exploiting Vulnerabilities in Cooperative Perception
Hongwei Lin, Dongyu Pan, Qiming Xia et al.
What we need is explicit controllability: Training 3D gaze estimator using only facial images
Tingwei Li, Jun Bao, Zhenzhong Kuang et al.
SemiVisBooster: Boosting Semi-Supervised Learning for Fine-Grained Classification through Pseudo-Label Semantic Guidance
Wenjin Zhang, Xinyu Li, Chenyang Gao et al.
Enhancing Prompt Generation with Adaptive Refinement for Camouflaged Object Detection
Xuehan Chen, Guangyu Ren, Tianhong Dai et al.
Hypergraph Clustering Network with Partial Attribute Imputation
Qianqian Wang, Bowen Zhao, Zhengming Ding et al.
SAMPLE: Semantic Alignment through Temporal-Adaptive Multimodal Prompt Learning for Event-Based Open-Vocabulary Action Recognition
Jing Wang, Rui Zhao, Ruiqin Xiong et al.
Learning Null Geodesics for Gravitational Lensing Rendering in General Relativity
Mingyuan Sun, Zheng Fang, Jiaxu Wang et al.
EFTViT: Efficient Federated Training of Vision Transformers with Masked Images on Resource-Constrained Clients
meihan wu, Tao Chang, Cui Miao et al.
Multi-view Gaze Target Estimation
Qiaomu Miao, Vivek Golani, Jingyi Xu et al.
LIRA: Reasoning Reconstruction via Multimodal Large Language Models
Zhen Zhou, Tong Wang, Yunkai Ma et al.
Class-Wise Federated Averaging for Efficient Personalization
Gyuejeong Lee, Daeyoung Choi
EchoMatch: Partial-to-Partial Shape Matching via Correspondence Reflection
Yizheng Xie, Viktoria Ehm, Paul Roetzer et al.
Learning an Implicit Physics Model for Image-based Fluid Simulation
Emily Jia, Jiageng Mao, Zhiyuan Gao et al.
Graph Few-Shot Learning via Adaptive Spectrum Experts and Cross-Set Distribution Calibration
Yonghao Liu, Yajun Wang, Chunli Guo et al.
Exploiting Frequency Dynamics for Enhanced Multimodal Event-based Action Recognition
Meiqi Cao, Xiangbo Shu, Xin Jiang et al.
RMultiplex200K: Toward Reliable Multimodal Process Supervision for Visual Language Models on Telecommunications
Sijia Chen, Bin Song
PEFTDiff: Diffusion-Guided Transferability Estimation for Parameter-Efficient Fine-Tuning
PRAFFUL KHOBA, Zijian Wang, Chetan Arora et al.
SIC: Similarity-Based Interpretable Image Classification with Neural Networks
Tom Nuno Wolf, Emre Kavak, Fabian Bongratz et al.
CE-FAM: Concept-Based Explanation via Fusion of Activation Maps
Michihiro Kuroki, Toshihiko Yamasaki
Controlling Multimodal LLMs via Reward-guided Decoding
Oscar Mañas, Pierluca D'Oro, Koustuv Sinha et al.
MambaML: Exploring State Space Models for Multi-Label Image Classification
Xuelin Zhu, Jian liu, Jiuxin Cao et al.
CoSMIC: Continual Self-supervised Learning for Multi-Domain Medical Imaging via Conditional Mutual Information Maximization
Yihang Liu, Ying Wen, Longzhen Yang et al.
TWIST & SCOUT: Grounding Multimodal LLM-Experts by Forget-Free Tuning
Aritra Bhowmik, Mohammad Mahdi Derakhshani, Dennis Koelma et al.
AnnofreeOD: Detecting All Classes at Low Frame Rates Without Human Annotations
Boyi Sun, Yuhang Liu, Houxin He et al.
ArchiSet: Benchmarking Editable and Consistent Single-View 3D Reconstruction of Buildings with Specific Window-to-Wall Ratios
Jun Yin, Pengyu Zeng, Licheng Shen et al.
VGMamba: Attribute-to-Location Clue Reasoning for Quantity-Agnostic 3D Visual Grounding
Zhu Yihang, Jinhao Zhang, Yuxuan Wang et al.
Unsupervised Identification of Protein Compositions and Conformations via Implicit Content-Transformation Disentanglement
Mostofa Rafid Uddin, Jana Armouti, Min Xu
Splat-based 3D Scene Reconstruction with Extreme Motion-blur
Hyeonjoong Jang, Dongyoung Choi, Donggun Kim et al.
Diffusion Curriculum: Synthetic-to-Real Data Curriculum via Image-Guided Diffusion
Yijun Liang, Shweta Bhardwaj, Tianyi Zhou
OpenSubstance: A High-quality Measured Dataset of Multi-View and -Lighting Images and Shapes
Fan Pei, jinchen bai, Xiang Feng et al.
AR-1-to-3: Single Image to Consistent 3D Object via Next-View Prediction
Xuying Zhang, Yupeng Zhou, Kai Wang et al.
Dual-Rate Dynamic Teacher for Source-Free Domain Adaptive Object Detection
Qi He, Xiao Wu, Jun-Yan He et al.
MPBR: Multimodal Progressive Bidirectional Reasoning for Open-Set Fine-Grained Recognition
Junfu Tan, Peiguang Jing, Yu Zhu et al.
OV3D-CG: Open-vocabulary 3D Instance Segmentation with Contextual Guidance
Mingquan Zhou, Chen He, Ruiping Wang et al.
CogCM: Cognition-Inspired Contextual Modeling for Audio-Visual Speech Enhancement
Feixiang Wang, Shuang Yang, Shiguang Shan et al.
Consistent Time-of-Flight Depth Denoising via Graph-Informed Geometric Attention
Weida Wang, Changyong He, Jin Zeng et al.
EDFFDNet: Towards Accurate and Efficient Unsupervised Multi-Grid Image Registration
Haokai Zhu, Bo Qu, Si-Yuan Cao et al.
Enhancing Mamba Decoder with Bidirectional Interaction in Multi-Task Dense Prediction
Mang Cao, Sanping Zhou, Yizhe Li et al.
Leveraging Debiased Cross-modal Attention Maps and Code-based Reasoning for Zero-shot Referring Expression Comprehension
Juntao Chen, Wen Shen, Zhihua Wei et al.
UST-SSM: Unified Spatio-Temporal State Space Models for Point Cloud Video Modeling
Peiming Li, Ziyi Wang, Yulin Yuan et al.
Depth Any Event Stream: Enhancing Event-based Monocular Depth Estimation via Dense-to-Sparse Distillation
Jinjing Zhu, Tianbo Pan, Zidong Cao et al.
SHIFT: Smoothing Hallucinations by Information Flow Tuning for Multimodal Large Language Models
Sudong Wang, Yunjian Zhang, Yao Zhu et al.
Quanta Neural Networks: From Photons to Perception
Varun Sundar, Tianyi Zhang, Sacha Jungerman et al.
Automated Red Teaming for Text-to-Image Models through Feedback-Guided Prompt Iteration with Vision-Language Models
Wei Xu, Kangjie Chen, Jiawei Qiu et al.
Enhancing Spatial Reasoning in Multimodal Large Language Models through Reasoning-based Segmentation
Zhenhua Ning, Zhuotao Tian, Shaoshuai Shi et al.
OVG-HQ: Online Video Grounding with Hybrid-modal Queries
Runhao Zeng, Jiaqi Mao, Minghao Lai et al.
CLIPSym: Delving into Symmetry Detection with CLIP
Tinghan Yang, Md Ashiqur Rahman, Raymond A. Yeh
A Good Teacher Adapts Their Knowledge for Distillation
Chengyao Qian, Trung Le, Mehrtash Harandi
VisionMath: Vision-Form Mathematical Problem-Solving
Zongyang Ma, Yuxin Chen, Ziqi Zhang et al.
Towards Comprehensive Lecture Slides Understanding: Large-scale Dataset and Effective Method
Enming Zhang, Yuzhe Li, Yuliang Liu et al.
A Unified Interpretation of Training-Time Out-of-Distribution Detection
Xu Cheng, Xin Jiang, Zechao Li
Removing Out-of-Focus Reflective Flares via Color Alignment
Fengbo Lan, Chang Wen Chen
A Learning-Augmented Dynamic Programming Approach for Orienteering Problem with Time Windows
Guansheng Peng, Lining Xing, Fuyan Ma et al.
RainbowPrompt: Diversity-Enhanced Prompt-Evolving for Continual Learning
Kiseong Hong, Gyeong-Hyeon Kim, Eunwoo Kim
Mamba-3VL: Taming State Space Model for 3D Vision Language Learning
Yuan Wang, Yuxin Chen, Zhongang Qi et al.
Embodied Representation Alignment with Mirror Neurons
Wentao Zhu, Zhining Zhang, Yuwei Ren et al.
M2EIT: Multi-Domain Mixture of Experts for Robust Neural Inertial Tracking
Yan Li, Yang Xu, Changhao Chen et al.
MobileViCLIP: An Efficient Video-Text Model for Mobile Devices
Min Yang, Zihan Jia, Zhilin Dai et al.
EVOLVE: Event-Guided Deformable Feature Transfer and Dual-Memory Refinement for Low-Light Video Object Segmentation
Jong Hyeon Baek, Jiwon oh, Yeong Jun Koh
MATE: Motion-Augmented Temporal Consistency for Event-based Point Tracking
Han Han, Wei Zhai, Yang Cao et al.
Asynchronous Event Error-Minimizing Noise for Safeguarding Event Dataset
Ruofei WANG, Peiqi Duan, Boxin Shi et al.
AG2aussian: Anchor-Graph Structured Gaussian Splatting for Instance-Level 3D Scene Understanding and Editing
Zhaonan Wang, Manyi Li, Changhe Tu
Vector Contrastive Learning For Pixel-Wise Pretraining In Medical Vision
Yuting He, Shuo Li
Benchmarking Multimodal Large Language Models Against Image Corruptions
Xinkuan Qiu, Meina Kan, Yongbin Zhou et al.
A Framework for Double-Blind Federated Adaptation of Foundation Models
Nurbek Tastan, Karthik Nandakumar
Efficient Fine-Tuning of Large Models via Nested Low-Rank Adaptation
Lujun Li, Cheng Lin, Dezhi Li et al.
Dual-level Prototype Learning for Composite Degraded Image Restoration
Zhongze Wang, Haitao Zhao, Lujian Yao et al.
Deterministic Object Pose Confidence Region Estimation
Jinghao Wang, Zhang Li, Zi Wang et al.
Decoupled Multi-Predictor Optimization for Inference-Efficient Model Tuning
Liwei Luo, Shuaitengyuan Li, Dongwei Ren et al.
GReg: Geometry-Aware Region Refinement for Sign Language Video Generation
Tongkai Shi, Lianyu Hu, Fanhua Shang et al.
Unsupervised Part Discovery via Descriptor-Based Masked Image Restoration with Optimized Constraints
Jiahao Xia, Yike Wu, Wenjian Huang et al.
NETracer: A Topology-Aware Iterative Tracing Approach for Tubular Structure Extraction
Chao Liu, Yangbo Jiang, Nenggan Zheng
MotionCtrl: A Real-time Controllable Vision-Language-Motion Model
Bin Cao, Sipeng Zheng, Ye Wang et al.
UIPro: Unleashing Superior Interaction Capability For GUI Agents
Hongxin Li, Jingran Su, Jingfan CHEN et al.
Diffusion Guided Adaptive Augmentation for Generalization in Visual Reinforcement Learning
Jeong Woon Lee, Hyoseok Hwang
From Easy to Hard: The MIR Benchmark for Progressive Interleaved Multi-Image Reasoning
Hang Du, Jiayang Zhang, Guoshun Nan et al.
VLR-Driver: Large Vision-Language-Reasoning Models for Embodied Autonomous Driving
Fanjie Kong, Yitong Li, Weihuang Chen et al.
Vid-Group: Temporal Video Grounding Pretraining from Unlabeled Videos in the Wild
Peijun Bao, Chenqi Kong, SIYUAN YANG et al.
Semi-supervised Deep Transfer for Regression without Domain Alignment
Mainak Biswas, Ambedkar Dukkipati, Devarajan Sridharan
Knowledge Transfer from Interaction Learning
Yilin Gao, Kangyi Chen, Zhongxing Peng et al.
Temperature in Cosine-based Softmax Loss
Takumi Kobayashi
Multi-modal Segment Anything Model for Camouflaged Scene Segmentation
Guangyu Ren, Hengyan Liu, Michalis Lazarou et al.
Cassic: Towards Content-Adaptive State-Space Models for Learned Image Compression
Shiyu Qin, Jinpeng Wang, Yimin Zhou et al.
Beyond the Limits: Overcoming Negative Correlation of Activation-Based Training-Free NAS
Haidong Kang, Lianbo Ma, Pengjun Chen et al.
Boosting Adversarial Transferability via Negative Hessian Trace Regularization
Yunfei Long, Zilin Tian, Liguo Zhang et al.
AcZeroTS: Active Learning for Zero-shot Tissue Segmentation in Pathology Images
Jiao Tang, Junjie Zhou, Bo Qian et al.
OneGT: One-Shot Geometry-Texture Neural Rendering for Head Avatars
Jinshu Chen, Bingchuan Li, Fan Zhang et al.
Unsupervised Visible-Infrared Person Re-identification under Unpaired Settings
Haoyu Yao, Bin Yang, Wenke Huang et al.
Adaptive Prompt Learning via Gaussian Outlier Synthesis for Out-of-distribution Detection
Yongkang Zhang, Dongyu She, Zhong Zhou
Can We Achieve Efficient Diffusion Without Self-Attention? Distilling Self-Attention into Convolutions
ZiYi Dong, Chengxing Zhou, Weijian Deng et al.
Ultra-Precision 6DoF Pose Estimation Using 2-D Interpolated Discrete Fourier Transform
Guowei Shi, Zian Mao, Peisen Huang
AMDANet: Attention-Driven Multi-Perspective Discrepancy Alignment for RGB-Infrared Image Fusion and Segmentation
Haifeng Zhong, Fan Tang, Zhuo Chen et al.
Contact-Aware Refinement of Human Pose Pseudo-Ground Truth via Bioimpedance Sensing
Maria-Paola Forte, Nikos Athanasiou, Giulia Ballardini et al.
OCK: Unsupervised Dynamic Video Prediction with Object-Centric Kinematics
YeonJi Song, Jaein Kim, Suhyung Choi et al.
Prompt Guidance and Human Proximal Perception for HOT Prediction with Regional Joint Loss
Yuxiao Wang, Yu Lei, Zhenao WEI et al.
Semi-ViM: Bidirectional State Space Model for Mitigating Label Imbalance in Semi-Supervised Learning
Hongyang He, Hongyang Xie, Haochen You et al.
Coupling the Generator with Teacher for Effective Data-Free Knowledge Distillation
Xu Chen, Yang Li, Yahong Han et al.
Towards a Universal Image Degradation Model via Content-Degradation Disentanglement
Wenbo Yang, Zhongling Wang, Zhou Wang
Intra-view and Inter-view Correlation Guided Multi-view Novel Class Discovery
Xinhang Wan, Jiyuan Liu, Qian Qu et al.
HUST: High-Fidelity Unbiased Skin Tone Estimation via Texture Quantization
Zimin Ran, Xingyu Ren, Xiang An et al.
Know Your Attention Maps: Class-specific Token Masking for Weakly Supervised Semantic Segmentation
Joëlle Hanna, Damian Borth
Structure-Guided Diffusion Models for High-Fidelity Portrait Shadow Removal
wanchang Yu, Qing Zhang, Rongjia Zheng et al.
FreeDNA: Endowing Domain Adaptation of Diffusion-Based Dense Prediction with Training-Free Domain Noise Alignment
Hang Xu, Jie Huang, Linjiang Huang et al.
ProbMED: A Probabilistic Framework for Medical Multimodal Binding
Yuan Gao, Sangwook Kim, Jianzhong You et al.
Structured Policy Optimization: Enhance Large Vision-Language Model via Self-referenced Dialogue
Guohao Sun, Can Qin, Yihao Feng et al.
FDPT: Federated Discrete Prompt Tuning for Black-Box Visual-Language Models
Jiaqi Wu, Simin Chen, Jing Tang et al.
Spatial Preference Rewarding for MLLMs Spatial Understanding
Han Qiu, Peng Gao, Lewei Lu et al.
Dynamic Group Detection using VLM-augmented Temporal Groupness Graph
Kaname Yokoyama, Chihiro Nakatani, Norimichi Ukita
Open-Unfairness Adversarial Mitigation for Generalized Deepfake Detection
Zhaoyang Li, Zhu Teng, Baopeng Zhang et al.
A Tiny Change, A Giant Leap: Long-Tailed Class-Incremental Learning via Geometric Prototype Alignment
xinyi lai, Luojun Lin, Weijie Chen et al.
CountSE: Soft Exemplar Open-set Object Counting
Shuai Liu, Peng Zhang, Shiwei Zhang et al.
GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices
Xudong LU, Yinghao Chen, Renshou Wu et al.
MedVSR: Medical Video Super-Resolution with Cross State-Space Propagation
Xinyu Liu, Guolei Sun, Cheng Wang et al.
Top2Pano: Learning to Generate Indoor Panoramas from Top-Down View
Zitong Zhang, Suranjan Gautam, Rui Yu
MuGS: Multi-Baseline Generalizable Gaussian Splatting Reconstruction
Yaopeng Lou, Liao Shen, Tianqi Liu et al.
Region-Level Data Attribution for Text-to-Image Generative Models
Trong Bang Nguyen, Phi Le Nguyen, Simon Lucey et al.
Generalization-Preserved Learning: Closing the Backdoor to Catastrophic Forgetting in Continual Deepfake Detection
Xueyi Zhang, Peiyin Zhu, Chengwei Zhang et al.
Confound from All Sides, Distill with Resilience: Multi-Objective Adversarial Paths to Zero-Shot Robustness
Junhao Dong, Jiao Liu, Xinghua Qu et al.
GCAV: A Global Concept Activation Vector Framework for Cross-Layer Consistency in Interpretability
Zhenghao He, Sanchit Sinha, Guangzhi Xiong et al.
A Unified Framework to BRIDGE Complete and Incomplete Deep Multi-View Clustering under Non-IID Missing Patterns
Xiaorui Jiang, Buyun He, Peng Yuan Zhou et al.
NegRefine: Refining Negative Label-Based Zero-Shot OOD Detection
Amirhossein Ansari, Ke Wang, Pulei Xiong
Adversarial Data Augmentation for Single Domain Generalization via Lyapunov Exponent-Guided Optimization
ZUYU ZHANG, Ning Chen, Yongshan Liu et al.
AIRA: Activation-Informed Low-Rank Adaptation for Large Models
Lujun Li, Dezhi Li, Cheng Lin et al.
Face Retouching with Diffusion Data Generation and Spectral Restorement
Zhidan Xu, Xiaoqin Zhang, Shijian Lu
Boosting Generative Adversarial Transferability with Self-supervised Vision Transformer Features
Shangbo Wu, Yu-an Tan, Ruinan Ma et al.
Att-Adapter: A Robust and Precise Domain-Specific Multi-Attributes T2I Diffusion Adapter via Conditional Variational Autoencoder
Wonwoong Cho, Yan-Ying Chen, Matthew Klenk et al.
Neural Solver of Dichromatic Reflection Model for Specular Highlight Removal
Gang Fu
Wavelet Policy: Lifting Scheme for Policy Learning in Long-Horizon Tasks
Hao Huang, Shuaihang Yuan, Geeta Chandra Raju Bethala et al.
UPRE: Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement
Xiao Zhang, Fei Wei, Yong Wang et al.
Class Token as Proxy: Optimal Transport-assisted Proxy Learning for Weakly Supervised Semantic Segmentation
Jian Wang, Tianhong Dai, Bingfeng Zhang et al.
Boundary Probing for Input Privacy Protection When Using LMM Services
Xiaofei Hui, Haoxuan Qu, Ping Hu et al.
AllGCD: Leveraging All Unlabeled Data for Generalized Category Discovery
Xinzi Cao, Ke Chen, Feidiao Yang et al.
Towards Long-Horizon Vision-Language-Action System: Reasoning, Acting and Memory
Daixun Li, Yusi Zhang, Mingxiang Cao et al.
CA2C: A Prior-Knowledge-Free Approach for Robust Label Noise Learning via Asymmetric Co-learning and Co-training
Mengmeng Sheng, Zeren Sun, Tianfei Zhou et al.
Learnable Logit Adjustment for Imbalanced Semi-Supervised Learning under Class Distribution Mismatch
lee hyuck, Taemin Park, Heeyoung Kim
DiffRefine: Diffusion-based Proposal Specific Point Cloud Densification for Cross-Domain Object Detection
Sangyun Shin, Yuhang He, Xinyu Hou et al.
CARL: Causality-guided Architecture Representation Learning for an Interpretable Performance Predictor
Han Ji, Yuqi Feng, Jiahao Fan et al.
TCFG: Truncated Classifier-Free Guidance for Efficient and Scalable Text-to-Image Acceleration
Xiaomeng Fu, Jia Li
Point Cloud Self-supervised Learning via 3D to Multi-view Masked Learner
Zhimin Chen, Xuewei Chen, Xiao Guo et al.
MSA2: Multi-task Framework with Structure-aware and Style-adaptive Character Representation for Open-set Chinese Text Recognition
Yangfu Li, Hongjian Zhan, Qi Liu et al.
DiffPCI: Large Motion Point Cloud frame Interpolation with Diffusion Model
tianyu zhang, Haobo Jiang, jian Yang et al.
MultiModal Action Conditioned Video Simulation
Yichen Li, Antonio Torralba
Soft Local Completeness: Rethinking Completeness in XAI
Ziv Weiss Haddad, Oren Barkan, Yehonatan Elisha et al.