Most Cited 2025 "fixed point iteration" Papers
22,274 papers found • Page 110 of 112
Conference
Pretend Benign: A Stealthy Adversarial Attack by Exploiting Vulnerabilities in Cooperative Perception
Hongwei Lin, Dongyu Pan, Qiming Xia et al.
What we need is explicit controllability: Training 3D gaze estimator using only facial images
Tingwei Li, Jun Bao, Zhenzhong Kuang et al.
SemiVisBooster: Boosting Semi-Supervised Learning for Fine-Grained Classification through Pseudo-Label Semantic Guidance
Wenjin Zhang, Xinyu Li, Chenyang Gao et al.
ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba
Juncan Deng, Shuaiting Li, Zeyu Wang et al.
Enhancing Prompt Generation with Adaptive Refinement for Camouflaged Object Detection
Xuehan Chen, Guangyu Ren, Tianhong Dai et al.
Hypergraph Clustering Network with Partial Attribute Imputation
Qianqian Wang, Bowen Zhao, Zhengming Ding et al.
PhysAnimator: Physics-Guided Generative Cartoon Animation
Tianyi Xie, Yiwei Zhao, Ying Jiang et al.
SAMPLE: Semantic Alignment through Temporal-Adaptive Multimodal Prompt Learning for Event-Based Open-Vocabulary Action Recognition
Jing Wang, Rui Zhao, Ruiqin Xiong et al.
Learning Null Geodesics for Gravitational Lensing Rendering in General Relativity
Mingyuan Sun, Zheng Fang, Jiaxu Wang et al.
Object-centric Video Question Answering with Visual Grounding and Referring
Haochen Wang, Qirui Chen, Cilin Yan et al.
DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness
Ruining Li, Chuanxia Zheng, Christian Rupprecht et al.
EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds
Lu Chen, Yizhou Wang, SHIXIANG TANG et al.
Unbiased Missing-modality Multimodal Learning
Ruiting Dai, Chenxi Li, Yandong Yan et al.
Hybrid-Tower: Fine-grained Pseudo-query Interaction and Generation for Text-to-Video Retrieval
Bangxiang Lan, Ruobing Xie, Ruixiang Zhao et al.
LIRA: Reasoning Reconstruction via Multimodal Large Language Models
Zhen Zhou, Tong Wang, Yunkai Ma et al.
MEH: A Multi-Style Dataset and Toolkit for Advancing Egyptian Hieroglyph Recognition
Maksim Golyadkin, Rubanova Alexandrovna, Aleksandr Utkov et al.
MaskSAM: Auto-prompt SAM with Mask Classification for Volumetric Medical Image Segmentation
Bin Xie, Hao Tang, Bin Duan et al.
Breaking Grid Constraints: Dynamic Graph Reconstruction Network for Multi-organ Segmentation
Junhao Xiao, Yang Wei, Jingyu Wang et al.
Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training
Wooseong Jeong, Jegyeong Cho, Youngho Yoon et al.
Learning an Implicit Physics Model for Image-based Fluid Simulation
Emily Jia, Jiageng Mao, Zhiyuan Gao et al.
Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning
Zeyu Xi, Haoying Sun, Yaofei Wu et al.
Exploiting Frequency Dynamics for Enhanced Multimodal Event-based Action Recognition
Meiqi Cao, Xiangbo Shu, Xin Jiang et al.
Enrich and Detect: Video Temporal Grounding with Multimodal LLMs
Shraman Pramanick, Effrosyni Mavroudi, Yale Song et al.
Dual-Path Temporal Decoder for End-to-End Multi-Object Tracking
Hyunseop Kim, Juheon Jeong, Hanul Kim et al.
Region-aware Anchoring Mechanism for Efficient Referring Visual Grounding
Shuyi Ouyang, Ziwei Niu, Hongyi Wang et al.
Token-Efficient VLM: High-Resolution Image Understanding via Dynamic Region Proposal
Yitong Jiang, Jinwei Gu, Tianfan Xue et al.
How Far are AI-generated Videos from Simulating the 3D Visual World: A Learned 3D Evaluation Approach
Chirui CHANG, Jiahui Liu, Zhengzhe Liu et al.
SIC: Similarity-Based Interpretable Image Classification with Neural Networks
Tom Nuno Wolf, Emre Kavak, Fabian Bongratz et al.
Teaching AI the Anatomy Behind the Scan: Addressing Anatomical Flaws in Medical Image Segmentation with Learnable Prior
Young Seok Jeon, Hongfei Yang, Huazhu Fu et al.
Scaling Tumor Segmentation: Best Lessons from Real and Synthetic Data
Qi Chen, Xinze Zhou, Chen Liu et al.
Open-Vocabulary HOI Detection with Interaction-aware Prompt and Concept Calibration
Ting Lei, Shaofeng Yin, Qingchao Chen et al.
First Attentions Last: Better Exploiting First Attentions for Efficient Parallel Training
Gyudong Kim, Hyukju Na, Jin Kim et al.
LawDIS: Language-Window-based Controllable Dichotomous Image Segmentation
Xinyu Yan, Meijun Sun, Ge-Peng Ji et al.
VideoMiner: Iteratively Grounding Key Frames of Hour-Long Videos via Tree-based Group Relative Policy Optimization
Xinye Cao, Hongcan Guo, Jiawen Qian et al.
WIPES: Wavelet-based Visual Primitives
Wenhao Zhang, Hao Zhu, Delong Wu et al.
MambaML: Exploring State Space Models for Multi-Label Image Classification
Xuelin Zhu, Jian liu, Jiuxin Cao et al.
SSVQ: Unleashing the Potential of Vector Quantization with Sign-Splitting
Shuaiting Li, Juncan Deng, Chengxuan Wang et al.
Vision-Language Neural Graph Featurization for Extracting Retinal Lesions
Taimur Hassan, Anabia Sohail, Muzammal Naseer et al.
Flow-MIL: Constructing Highly-expressive Latent Feature Space For Whole Slide Image Classification Using Normalizing Flow
Yingfan MA, Bohan An, Ao Shen et al.
What Moves the Eyes: Doubling Mechanistic Model Performance Using Deep Networks to Discover and Test Cognitive Hypotheses
Federico D'Agostino, Lisa Schwetlick, Matthias Bethge et al.
CoSMIC: Continual Self-supervised Learning for Multi-Domain Medical Imaging via Conditional Mutual Information Maximization
Yihang Liu, Ying Wen, Longzhen Yang et al.
Towards Robustness of Person Search against Corruptions
Woojung Son, Yoonki Cho, Guoyuan An et al.
VIPerson: Flexibly Generating Virtual Identity for Person Re-Identification
Xiao-Wen Zhang, Delong Zhang, Yi-Xing Peng et al.
SEAL: Semantic Aware Image Watermarking
Kasra Arabi, R. Teal Witter, Chinmay Hegde et al.
ArchiSet: Benchmarking Editable and Consistent Single-View 3D Reconstruction of Buildings with Specific Window-to-Wall Ratios
Jun Yin, Pengyu Zeng, Licheng Shen et al.
UINavBench: A Framework for Comprehensive Evaluation of Interactive Digital Agents
Harsh Agrawal, Eldon Schoop, Xinlei Pan et al.
Unsupervised Identification of Protein Compositions and Conformations via Implicit Content-Transformation Disentanglement
Mostofa Rafid Uddin, Jana Armouti, Min Xu
How Do Optical Flow and Textual Prompts Collaborate to Assist in Audio-Visual Semantic Segmentation?
Yujian Lee, Peng Gao, Yongqi Xu et al.
Splat-based 3D Scene Reconstruction with Extreme Motion-blur
Hyeonjoong Jang, Dongyoung Choi, Donggun Kim et al.
Diffusion Curriculum: Synthetic-to-Real Data Curriculum via Image-Guided Diffusion
Yijun Liang, Shweta Bhardwaj, Tianyi Zhou
Unsupervised Histopathological Image Semantic Segmentation with Overlapping Patches Consistency Constraint
Wentian Cai, Weizhao Weng, Zihao Huang et al.
VISO: Accelerating In-orbit Object Detection with Language-Guided Mask Learning and Sparse Inference
Meiqi Wang, Han Qiu
Advancing Textual Prompt Learning with Anchored Attributes
Zheng Li, Yibing Song, Ming-Ming Cheng et al.
FIND: Few-Shot Anomaly Inspection with Normal-Only Multi-Modal Data
YITING LI, Fayao Liu, Jingyi Liao et al.
AR-1-to-3: Single Image to Consistent 3D Object via Next-View Prediction
Xuying Zhang, Yupeng Zhou, Kai Wang et al.
DC-TTA: Divide-and-Conquer Framework for Test-Time Adaptation of Interactive Segmentation
Jihun Kim, Hoyong Kwon, Hyeokjun Kweon et al.
Dual-Rate Dynamic Teacher for Source-Free Domain Adaptive Object Detection
Qi He, Xiao Wu, Jun-Yan He et al.
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment
Yucheng Suo, Fan Ma, Linchao Zhu et al.
OV3D-CG: Open-vocabulary 3D Instance Segmentation with Contextual Guidance
Mingquan Zhou, Chen He, Ruiping Wang et al.
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
Anurag Bagchi, Zhipeng Bao, Yu-Xiong Wang et al.
MotionBind: Multi-Modal Human Motion Alignment for Retrieval, Recognition, and Generation
Kaleab Kinfu, Rene Vidal
Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis
Peng Zheng, Junke Wang, Yi Chang et al.
Accelerate 3D Object Detection Models via Zero-Shot Attention Key Pruning
Lizhen Xu, Xiuxiu Bai, Xiaojun Jia et al.
CogCM: Cognition-Inspired Contextual Modeling for Audio-Visual Speech Enhancement
Feixiang Wang, Shuang Yang, Shiguang Shan et al.
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Zhisheng Zhong, Chengyao Wang, Yuqi Liu et al.
CalliReader: Contextualizing Chinese Calligraphy via an Embedding-Aligned Vision-Language Model
Yuxuan Luo, Jiaqi Tang, Chenyi Huang et al.
EDFFDNet: Towards Accurate and Efficient Unsupervised Multi-Grid Image Registration
Haokai Zhu, Bo Qu, Si-Yuan Cao et al.
Enhancing Mamba Decoder with Bidirectional Interaction in Multi-Task Dense Prediction
Mang Cao, Sanping Zhou, Yizhe Li et al.
Leveraging Debiased Cross-modal Attention Maps and Code-based Reasoning for Zero-shot Referring Expression Comprehension
Juntao Chen, Wen Shen, Zhihua Wei et al.
UST-SSM: Unified Spatio-Temporal State Space Models for Point Cloud Video Modeling
Peiming Li, Ziyi Wang, Yulin Yuan et al.
SITE: towards Spatial Intelligence Thorough Evaluation
Wenqi Wang, Reuben Tan, Pengyue Zhu et al.
SHIFT: Smoothing Hallucinations by Information Flow Tuning for Multimodal Large Language Models
Sudong Wang, Yunjian Zhang, Yao Zhu et al.
Similarity Memory Prior is All You Need for Medical Image Segmentation
Hao Tang, Zhiqing Guo, Liejun Wang et al.
ODDR: Outlier Detection & Dimension Reduction Based Defense Against Adversarial Patches
Nandish Chattopadhyay, Amira Guesmi, Muhammad Abdullah Hanif et al.
Debiasing Trace Guidance: Top-down Trace Distillation and Bottom-up Velocity Alignment for Unsupervised Anomaly Detection
Xingjian Wang, Li Chai, Jiming Chen
Neural Collapse under Gradient Flow on Shallow ReLU Networks for Orthogonally Separable Data
Hancheng Min, Zhihui Zhu, Rene Vidal
Automated Red Teaming for Text-to-Image Models through Feedback-Guided Prompt Iteration with Vision-Language Models
Wei Xu, Kangjie Chen, Jiawei Qiu et al.
Conformal Prediction for Zero-Shot Models
Julio Silva-Rodríguez, Ismail Ben Ayed, Jose Dolz
Enhancing Spatial Reasoning in Multimodal Large Language Models through Reasoning-based Segmentation
Zhenhua Ning, Zhuotao Tian, Shaoshuai Shi et al.
OVG-HQ: Online Video Grounding with Hybrid-modal Queries
Runhao Zeng, Jiaqi Mao, Minghao Lai et al.
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring
Yufei Zhan, Shurong Zheng, Yousong Zhu et al.
BézierGS: Dynamic Urban Scene Reconstruction with Bézier Curve Gaussian Splatting
Zipei Ma, Junzhe Jiang, Yurui Chen et al.
Convergence Rates for Gradient Descent on the Edge of Stability for Overparametrised Least Squares
Lachlan MacDonald, Hancheng Min, Leandro Palma et al.
CLIPSym: Delving into Symmetry Detection with CLIP
Tinghan Yang, Md Ashiqur Rahman, Raymond A. Yeh
HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
Yusen Zhang, Wenliang Zheng, Aashrith Madasu et al.
HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics
Gueter Josmy Faure, Jia-Fong Yeh, Min-Hung Chen et al.
Text2VDM: Text to Vector Displacement Maps for Expressive and Interactive 3D Sculpting
Hengyu Meng, Duotun Wang, Zhijing Shao et al.
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
Yuzhang Shang, Mu Cai, Bingxin Xu et al.
Towards Comprehensive Lecture Slides Understanding: Large-scale Dataset and Effective Method
Enming Zhang, Yuzhe Li, Yuliang Liu et al.
A Unified Interpretation of Training-Time Out-of-Distribution Detection
Xu Cheng, Xin Jiang, Zechao Li
SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations
Buyun Liang, Liangzu Peng, Jinqi Luo et al.
Federated Domain Generalization with Domain-specific Soft Prompts Generation
Jianhan Wu, Xiaoyang Qu, Zhangcheng Huang et al.
Removing Out-of-Focus Reflective Flares via Color Alignment
Fengbo Lan, Chang Wen Chen
ForgeLens: Data-Efficient Forgery Focus for Generalizable Forgery Image Detection
Yingjian Chen, Lei Zhang, Yakun Niu
Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration
Mark Endo, Xiaohan Wang, Serena Yeung-Levy
Attention on the Sphere
Boris Bonev, Max Rietmann, Andrea Paris et al.
Mamba-3VL: Taming State Space Model for 3D Vision Language Learning
Yuan Wang, Yuxin Chen, Zhongang Qi et al.
Embodied Representation Alignment with Mirror Neurons
Wentao Zhu, Zhining Zhang, Yuwei Ren et al.
DIH-CLIP: Unleashing the Diversity of Multi-Head Self-Attention for Training-Free Open-Vocabulary Semantic Segmentation
Songsong Duan, Xi Yang, Nannan Wang
Selective Contrastive Learning for Weakly Supervised Affordance Grounding
WonJun Moon, Hyun Seok Seong, Jae-Pil Heo
DASH: Detection and Assessment of Systematic Hallucinations of VLMs
Maximilian Augustin, Yannic Neuhaus, Matthias Hein
M2EIT: Multi-Domain Mixture of Experts for Robust Neural Inertial Tracking
Yan Li, Yang Xu, Changhao Chen et al.
MobileViCLIP: An Efficient Video-Text Model for Mobile Devices
Min Yang, Zihan Jia, Zhilin Dai et al.
Real-time Free-view Human Rendering from Sparse-view RGB Videos using Double Unprojected Textures
Guoxing Sun, Rishabh Dabral, Heming Zhu et al.
No More Sibling Rivalry: Debiasing Human-Object Interaction Detection
Bin Yang, Yulin Zhang, Hong-Yu Zhou et al.
Memory-Efficient 4-bit Preconditioned Stochastic Optimization
Jingyang Li, Kuangyu Ding, Kim-chuan Toh et al.
Prompt-driven Transferable Adversarial Attack on Person Re-Identification with Attribute-aware Textual Inversion
Yuan Bian, Min Liu, Yunqi Yi et al.
EVOLVE: Event-Guided Deformable Feature Transfer and Dual-Memory Refinement for Low-Light Video Object Segmentation
Jong Hyeon Baek, Jiwon oh, Yeong Jun Koh
MATE: Motion-Augmented Temporal Consistency for Event-based Point Tracking
Han Han, Wei Zhai, Yang Cao et al.
Asynchronous Event Error-Minimizing Noise for Safeguarding Event Dataset
Ruofei WANG, Peiqi Duan, Boxin Shi et al.
AG2aussian: Anchor-Graph Structured Gaussian Splatting for Instance-Level 3D Scene Understanding and Editing
Zhaonan Wang, Manyi Li, Changhe Tu
Vector Contrastive Learning For Pixel-Wise Pretraining In Medical Vision
Yuting He, Shuo Li
InterGSEdit: Interactive 3D Gaussian Splatting Editing with 3D Geometry-Consistent Attention Prior
Minghao Wen, Shengjie Wu, Kangkan Wang et al.
Anomaly Detection of Integrated Circuits Package Substrates Using the Large Vision Model SAIC: Dataset Construction, Methodology, and Application
Ruiyun Yu, Bingyang Guo, Haoyuan Li
Learnable Retrieval Enhanced Visual-Text Alignment and Fusion for Radiology Report Generation
Qin Zhou, Guoyan Liang, Xindi Li et al.
Benchmarking Multimodal Large Language Models Against Image Corruptions
Xinkuan Qiu, Meina Kan, Yongbin Zhou et al.
Temporal-aware Query Routing for Real-time Video Instance Segmentation
Zesen Cheng, Kehan Li, Yian Zhao et al.
scGeneScope: A Treatment-Matched Single Cell Imaging and Transcriptomics Dataset and Benchmark for Treatment Response Modeling
Joel Dapello, Marcel Nassar, Ridvan Eksi et al.
Weak-to-Strong Generalization under Distribution Shifts
Myeongho Jeon, Jan Sobotka, Suhwan Choi et al.
RvLLM: LLM Runtime Verification with Domain Knowledge
Yedi Zhang, Sun Emma, Annabelle En et al.
Dynamic Dictionary Learning for Remote Sensing Image Segmentation
Xuechao Zou, Yue Li, Shun Zhang et al.
Efficient Fine-Tuning of Large Models via Nested Low-Rank Adaptation
Lujun Li, Cheng Lin, Dezhi Li et al.
Dual-level Prototype Learning for Composite Degraded Image Restoration
Zhongze Wang, Haitao Zhao, Lujian Yao et al.
HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models
ZHIXIANG WEI, Guangting Wang, Xiaoxiao Ma et al.
UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image
Xingyu Liu, Gu Wang, Ruida Zhang et al.
Worse than Zero-shot? A Fact-Checking Dataset for Evaluating the Robustness of RAG Against Misleading Retrievals
Linda Zeng, Rithwik Gupta, Divij Motwani et al.
Deterministic Object Pose Confidence Region Estimation
Jinghao Wang, Zhang Li, Zi Wang et al.
Is CLIP ideal? No. Can we fix it? Yes!
Raphaela Kang, Yue Song, Georgia Gkioxari et al.
Learning Beyond Still Frames: Scaling Vision-Language Models with Video
Yiyuan Zhang, Handong Li, Jing Liu et al.
Efficient Input-level Backdoor Defense on Text-to-Image Synthesis via Neuron Activation Variation
Shengfang ZHAI, Jiajun Li, Yue Liu et al.
Decoupled Multi-Predictor Optimization for Inference-Efficient Model Tuning
Liwei Luo, Shuaitengyuan Li, Dongwei Ren et al.
Revisiting Efficient Semantic Segmentation: Learning Offsets for Better Spatial and Class Feature Alignment
Shi-Chen Zhang, Yunheng Li, Yu-Huan Wu et al.
ACAM-KD: Adaptive and Cooperative Attention Masking for Knowledge Distillation
Qizhen Lan, Qing Tian
GReg: Geometry-Aware Region Refinement for Sign Language Video Generation
Tongkai Shi, Lianyu Hu, Fanhua Shang et al.
AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting
Chung-Ho Wu, Yang-Jung Chen, Ying-Huan Chen et al.
Unsupervised Part Discovery via Descriptor-Based Masked Image Restoration with Optimized Constraints
Jiahao Xia, Yike Wu, Wenjian Huang et al.
NETracer: A Topology-Aware Iterative Tracing Approach for Tubular Structure Extraction
Chao Liu, Yangbo Jiang, Nenggan Zheng
Interpretable point cloud classification using multiple instance learning
Matt De Vries, Reed Naidoo, Olga Fourkioti et al.
MotionCtrl: A Real-time Controllable Vision-Language-Motion Model
Bin Cao, Sipeng Zheng, Ye Wang et al.
UIPro: Unleashing Superior Interaction Capability For GUI Agents
Hongxin Li, Jingran Su, Jingfan CHEN et al.
SALAD -- Semantics-Aware Logical Anomaly Detection
Matic Fučka, Vitjan Zavrtanik, Danijel Skocaj
FineMotion: A Dataset and Benchmark with both Spatial and Temporal Annotation for Fine-grained Motion Generation and Editing
Bizhu Wu, Jinheng Xie, Meidan Ding et al.
Controllable Latent Space Augmentation for Digital Pathology
Sofiène Boutaj, Marin Scalbert, Pierre Marza et al.
Advancing Visual Large Language Model for Multi-granular Versatile Perception
Wentao Xiang, Haoxian Tan, Cong Wei et al.
Beyond Single Images: Retrieval Self-Augmented Unsupervised Camouflaged Object Detection
Ji Du, Xin WANG, Fangwei Hao et al.
Motion4D: Learning 3D-Consistent Motion and Semantics for 4D Scene Understanding
Haoran Zhou, Gim Hee Lee
Modeling Saliency Dataset Bias
Matthias Kümmerer, Harneet Singh Khanuja, Matthias Bethge
VLR-Driver: Large Vision-Language-Reasoning Models for Embodied Autonomous Driving
Fanjie Kong, Yitong Li, Weihuang Chen et al.
Vid-Group: Temporal Video Grounding Pretraining from Unlabeled Videos in the Wild
Peijun Bao, Chenqi Kong, SIYUAN YANG et al.
CARIM: Caption-Based Autonomous Driving Scene Retrieval via Inclusive Text Matching
Minjoo Ki, Dae Jung Kim, Kisung Kim et al.
Knowledge Transfer from Interaction Learning
Yilin Gao, Kangyi Chen, Zhongxing Peng et al.
WIR3D: Visually-Informed and Geometry-Aware 3D Shape Abstraction
Richard Liu, Daniel Fu, Noah Tan et al.
Temperature in Cosine-based Softmax Loss
Takumi Kobayashi
Pseudo-Riemannian Graph Transformer
Viet Quan Le, Cuong Viet Ta
Multi-modal Segment Anything Model for Camouflaged Scene Segmentation
Guangyu Ren, Hengyan Liu, Michalis Lazarou et al.
WeaveSeg: Iterative Contrast-weaving and Spectral Feature-refining for Nuclei Instance Segmentation
Jiajia Li, Huisi Wu, Jing Qin
DisTime: Distribution-based Time Representation for Video Large Language Models
yingsen zeng, Zepeng Huang, Yujie Zhong et al.
Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection
Jinglun Li, Kaixun Jiang, Zhaoyu Chen et al.
Cassic: Towards Content-Adaptive State-Space Models for Learned Image Compression
Shiyu Qin, Jinpeng Wang, Yimin Zhou et al.
SpectralAR: Spectral Autoregressive Visual Generation
Yuanhui Huang, Weiliang Chen, Wenzhao Zheng et al.
Bridging the Gap between Brain and Machine in Interpreting Visual Semantics: Towards Self-adaptive Brain-to-Text Decoding
Jiaxuan Chen, Yu Qi, Yueming Wang et al.
Boosting Adversarial Transferability via Negative Hessian Trace Regularization
Yunfei Long, Zilin Tian, Liguo Zhang et al.
AcZeroTS: Active Learning for Zero-shot Tissue Segmentation in Pathology Images
Jiao Tang, Junjie Zhou, Bo Qian et al.
OneGT: One-Shot Geometry-Texture Neural Rendering for Head Avatars
Jinshu Chen, Bingchuan Li, Fan Zhang et al.
Transformer Key-Value Memories Are Nearly as Interpretable as Sparse Autoencoders
Mengyu Ye, Jun Suzuki, Tatsuro Inaba et al.
Unsupervised Visible-Infrared Person Re-identification under Unpaired Settings
Haoyu Yao, Bin Yang, Wenke Huang et al.
Adaptive Prompt Learning via Gaussian Outlier Synthesis for Out-of-distribution Detection
Yongkang Zhang, Dongyu She, Zhong Zhou
Auto-Controlled Image Perception in MLLMs via Visual Perception Tokens
Runpeng Yu, Xinyin Ma, Xinchao Wang
Can We Achieve Efficient Diffusion Without Self-Attention? Distilling Self-Attention into Convolutions
ZiYi Dong, Chengxing Zhou, Weijian Deng et al.
Ultra-Precision 6DoF Pose Estimation Using 2-D Interpolated Discrete Fourier Transform
Guowei Shi, Zian Mao, Peisen Huang
Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval
WonJun Moon, Cheol-Ho Cho, Woojin Jun et al.
On the sample complexity of semi-supervised multi-objective learning
Tobias Wegel, Geelon So, Junhyung Park et al.
A Differentiable Wave Optics Model for End-to-End Computational Imaging System Optimization
Chi-Jui Ho, Yash Belhe, Steve Rotenberg et al.
Exploring Probabilistic Modeling Beyond Domain Generalization for Semantic Segmentation
I-Hsiang Chen, Hua-En Chang, Wei-Ting Chen et al.
AMDANet: Attention-Driven Multi-Perspective Discrepancy Alignment for RGB-Infrared Image Fusion and Segmentation
Haifeng Zhong, Fan Tang, Zhuo Chen et al.
DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs
JIAHE ZHAO, rongkun Zheng, Yi Wang et al.
OCK: Unsupervised Dynamic Video Prediction with Object-Centric Kinematics
YeonJi Song, Jaein Kim, Suhyung Choi et al.
Information-theoretic Generalization Analysis for VQ-VAEs: A Role of Latent Variables
Futoshi Futami, Masahiro Fujisawa
Prompt Guidance and Human Proximal Perception for HOT Prediction with Regional Joint Loss
Yuxiao Wang, Yu Lei, Zhenao WEI et al.
RA-BUSSeg: Relation-aware Semi-supervised Breast Ultrasound Image Segmentation via Adjacent Propagation and Cross-layer Alignment
Wanting ZHANG, Zhenhui Ding, Guilian Chen et al.
Hierarchical Event Memory for Accurate and Low-latency Online Video Temporal Grounding
Minghang Zheng, Yuxin Peng, Benyuan Sun et al.
Coupling the Generator with Teacher for Effective Data-Free Knowledge Distillation
Xu Chen, Yang Li, Yahong Han et al.
Towards a Universal Image Degradation Model via Content-Degradation Disentanglement
Wenbo Yang, Zhongling Wang, Zhou Wang
Intra-view and Inter-view Correlation Guided Multi-view Novel Class Discovery
Xinhang Wan, Jiyuan Liu, Qian Qu et al.
HUST: High-Fidelity Unbiased Skin Tone Estimation via Texture Quantization
Zimin Ran, Xingyu Ren, Xiang An et al.
Few-Shot Pattern Detection via Template Matching and Regression
Eunchan Jo, Dahyun Kang, Sanghyun Kim et al.
Know Your Attention Maps: Class-specific Token Masking for Weakly Supervised Semantic Segmentation
Joëlle Hanna, Damian Borth
Contextual Dynamic Pricing with Heterogeneous Buyers
Thodoris Lykouris, Sloan Nietert, Princewill Okoroafor et al.
Structure-Guided Diffusion Models for High-Fidelity Portrait Shadow Removal
wanchang Yu, Qing Zhang, Rongjia Zheng et al.
FreeDNA: Endowing Domain Adaptation of Diffusion-Based Dense Prediction with Training-Free Domain Noise Alignment
Hang Xu, Jie Huang, Linjiang Huang et al.
ProbMED: A Probabilistic Framework for Medical Multimodal Binding
Yuan Gao, Sangwook Kim, Jianzhong You et al.
DecAD: Decoupling Anomalies in Latent Space for Multi-Class Unsupervised Anomaly Detection
Xiaolei Wang, Xiaoyang Wang, Huihui Bai et al.
Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective
Yingyu Liang, Zhizhou Sha, Zhenmei Shi et al.
Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation
Toshinori Kitamura, Arnob Ghosh, Tadashi Kozuno et al.
MedSegFactory: Text-Guided Generation of Medical Image-Mask Pairs
Jiawei Mao, Yuhan Wang, Yucheng Tang et al.
FDPT: Federated Discrete Prompt Tuning for Black-Box Visual-Language Models
Jiaqi Wu, Simin Chen, Jing Tang et al.
STDDNet: Harnessing Mamba for Video Polyp Segmentation via Spatial-aligned Temporal Modeling and Discriminative Dynamic Representation Learning
Guilian Chen, Huisi Wu, Jing Qin
CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning
Duo Wu, Jinghe Wang, Yuan Meng et al.
Dynamic Group Detection using VLM-augmented Temporal Groupness Graph
Kaname Yokoyama, Chihiro Nakatani, Norimichi Ukita
Cracking Instance Jigsaw Puzzles: A Superior Alternative to Multiple Instance Learning for Whole Slide Image Analysis
Xiwen Chen, Peijie Qiu, Wenhui Zhu et al.