Most Cited 2025 "schrödinger bridge framework" Papers
22,274 papers found • Page 111 of 112
Conference
Cracking Instance Jigsaw Puzzles: A Superior Alternative to Multiple Instance Learning for Whole Slide Image Analysis
Xiwen Chen, Peijie Qiu, Wenhui Zhu et al.
A Tiny Change, A Giant Leap: Long-Tailed Class-Incremental Learning via Geometric Prototype Alignment
xinyi lai, Luojun Lin, Weijie Chen et al.
CountSE: Soft Exemplar Open-set Object Counting
Shuai Liu, Peng Zhang, Shiwei Zhang et al.
Sparfels: Fast Reconstruction from Sparse Unposed Imagery
Shubhendu Jena, Amine Ouasfi, Mae Younes et al.
Bias-Resilient Weakly Supervised Semantic Segmentation Using Normalizing Flows
Xianglin Qiu, Xiaoyang Wang, Zhen Zhang et al.
Text-guided Visual Prompt DINO for Generic Segmentation
Yuchen Guan, Chong Sun, Canmiao Fu et al.
GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices
Xudong LU, Yinghao Chen, Renshou Wu et al.
MedVSR: Medical Video Super-Resolution with Cross State-Space Propagation
Xinyu Liu, Guolei Sun, Cheng Wang et al.
FE-CLIP: Frequency Enhanced CLIP Model for Zero-Shot Anomaly Detection and Segmentation
Tao Gong, Qi Chu, Bin Liu et al.
Top2Pano: Learning to Generate Indoor Panoramas from Top-Down View
Zitong Zhang, Suranjan Gautam, Rui Yu
SAMora: Enhancing SAM through Hierarchical Self-Supervised Pre-Training for Medical Images
Shuhang Chen, Hangjie Yuan, Pengwei Liu et al.
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
Weiming Ren, Wentao Ma, Huan Yang et al.
MuGS: Multi-Baseline Generalizable Gaussian Splatting Reconstruction
Yaopeng Lou, Liao Shen, Tianqi Liu et al.
Region-Level Data Attribution for Text-to-Image Generative Models
Trong Bang Nguyen, Phi Le Nguyen, Simon Lucey et al.
Trans-Adapter: A Plug-and-Play Framework for Transparent Image Inpainting
Yuekun Dai, Haitian Li, Shangchen Zhou et al.
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
Wenxuan Zhu, Bing Li, Cheng Zheng et al.
Multi-View Slot Attention Using Paraphrased Texts for Face Anti-Spoofing
Jeongmin Yu, Susang Kim, Kisu Lee et al.
Generalization-Preserved Learning: Closing the Backdoor to Catastrophic Forgetting in Continual Deepfake Detection
Xueyi Zhang, Peiyin Zhu, Chengwei Zhang et al.
SegAnyPET: Universal Promptable Segmentation from Positron Emission Tomography Images
Yichi Zhang, Le Xue, Wenbo zhang et al.
LangBridge: Interpreting Image as a Combination of Language Embeddings
Jiaqi Liao, Yuwei Niu, Fanqing Meng et al.
IGD: Instructional Graphic Design with Multimodal Layer Generation
Yadong Qu, Shancheng Fang, Yuxin Wang et al.
Robustifying Zero-Shot Vision Language Models by Subspaces Alignment
Junhao Dong, Piotr Koniusz, Liaoyuan Feng et al.
Exploration via Feature Perturbation in Contextual Bandits
Seouh-won Yi, Min-hwan Oh
Any Large Language Model Can Be a Reliable Judge: Debiasing with a Reasoning-based Bias Detector
Haoyan Yang, Runxue Bao, Cao (Danica) Xiao et al.
CABLD: Contrast-Agnostic Brain Landmark Detection with Consistency-Based Regularization
Soorena Salari, Arash Harirpoush, Hassan Rivaz et al.
The Devil is in the Spurious Correlations: Boosting Moment Retrieval with Dynamic Learning
Xinyang Zhou, Fanyue Wei, Lixin Duan et al.
Parameter-Efficient Adaptation of Geospatial Foundation Models through Embedding Deflection
Romain Thoreau, Valerio Marsocci, Dawa Derksen
On the Recovery of Cameras from Fundamental Matrices
Rakshith Madhavan, Federica Arrigoni
RhythmGuassian: Repurposing Generalizable Gaussian Model For Remote Physiological Measurement
Hao LU, Yuting Zhang, Jiaqi Tang et al.
Superpowering Open-Vocabulary Object Detectors for X-ray Vision
Pablo Garcia-Fernandez, Lorenzo Vaquero, Mingxuan Liu et al.
Generating Physically Sound Designs from Text and a Set of Physical Constraints
Gregory Barber, Todd Henry, Mulugeta Haile
True Impact of Cascade Length in Contextual Cascading Bandits
Hyun-jun Choi, Joongkyu Lee, Min-hwan Oh
CityGS-X: A Scalable Architecture for Efficient and Geometrically Accurate Large-Scale Scene Reconstruction
Yuanyuan Gao, Hao Li, Jiaqi Chen et al.
AIRA: Activation-Informed Low-Rank Adaptation for Large Models
Lujun Li, Dezhi Li, Cheng Lin et al.
Thompson Sampling for Multi-Objective Linear Contextual Bandit
Somangchan Park, Heesang Ann, Min-hwan Oh
Embodied Navigation with Auxiliary Task of Action Description Prediction
Haru Kondoh, Asako Kanezaki
Bayesian Optimization with Preference Exploration using a Monotonic Neural Network Ensemble
Hanyang Wang, Juergen Branke, Matthias Poloczek
Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction
Zeren Jiang, Chuanxia Zheng, Iro Laina et al.
Cross-View Isolated Sign Language Recognition via View Synthesis and Feature Disentanglement
Xin Shen, Xinyu Wang, Lei Shen et al.
Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding
Yuanhan Zhang, Yunice Chew, Yuhao Dong et al.
Semantic versus Identity: A Divide-and-Conquer Approach towards Adjustable Medical Image De-Identification
Yuan Tian, Shuo Wang, Rongzhao Zhang et al.
Face Retouching with Diffusion Data Generation and Spectral Restorement
Zhidan Xu, Xiaoqin Zhang, Shijian Lu
Fuzzy Contrastive Decoding to Alleviate Object Hallucination in Large Vision-Language Models
Jieun Kim, Jinmyeong Kim, Yoonji Kim et al.
Accident Anticipation via Temporal Occurrence Prediction
Tianhao Zhao, Yiyang Zou, Zihao Mao et al.
Zero-Shot Compositional Video Learning with Coding Rate Reduction
Heeseok Jung, Jun-Hyeon Bak, Yujin Jeong et al.
Att-Adapter: A Robust and Precise Domain-Specific Multi-Attributes T2I Diffusion Adapter via Conditional Variational Autoencoder
Wonwoong Cho, Yan-Ying Chen, Matthew Klenk et al.
LaCoOT: Layer Collapse through Optimal Transport
Victor Quétu, Zhu LIAO, Nour Hezbri et al.
Neural Solver of Dichromatic Reflection Model for Specular Highlight Removal
Gang Fu
Wavelet Policy: Lifting Scheme for Policy Learning in Long-Horizon Tasks
Hao Huang, Shuaihang Yuan, Geeta Chandra Raju Bethala et al.
FlowMixer: A Depth-Agnostic Neural Architecture for Interpretable Spatiotemporal Forecasting
Fares Mehouachi, Saif Eddin Jabari
ProSAM: Enhancing the Robustness of SAM-based Visual Reference Segmentation with Probabilistic Prompts
Xiaoqi Wang, Clint Sebastian, Wenbin He et al.
ZipVL: Accelerating Vision-Language Models through Dynamic Token Sparsity
Yefei He, Feng Chen, Jing Liu et al.
Explore In-Context Message Passing Operator for Graph Neural Networks in A Mean Field Game
Tingting Dan, Xinwei Huang, Won Hwa Kim et al.
Contrastive Flow Matching
George Stoica, Vivek Ramanujan, Xiang Fan et al.
Class Token as Proxy: Optimal Transport-assisted Proxy Learning for Weakly Supervised Semantic Segmentation
Jian Wang, Tianhong Dai, Bingfeng Zhang et al.
Topology-aware Graph Diffusion Model with Persistent Homology
Joonhyuk Park, Donghyun Lee, Yujee Song et al.
Representation Shift: Unifying Token Compression with FlashAttention
Joonmyung Choi, Sanghyeok Lee, Byungoh Ko et al.
HOLa: Zero-Shot HOI Detection with Low-Rank Decomposed VLM Feature Adaptation
Qinqian Lei, Bo Wang, Robby Tan
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts
Adnen Abdessaied, Anna Rohrbach, Marcus Rohrbach et al.
AllGCD: Leveraging All Unlabeled Data for Generalized Category Discovery
Xinzi Cao, Ke Chen, Feidiao Yang et al.
Towards Long-Horizon Vision-Language-Action System: Reasoning, Acting and Memory
Daixun Li, Yusi Zhang, Mingxiang Cao et al.
UniFuse: A Unified All-in-One Framework for Multi-Modal Medical Image Fusion Under Diverse Degradations and Misalignments
Dayong Su, Yafei Zhang, Huafeng Li et al.
3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt
Lukas Höllein, Aljaz Bozic, Michael Zollhöfer et al.
Online Mixture of Experts: No-Regret Learning for Optimal Collective Decision-Making
Larkin Liu, Jalal Etesami
Contextual Integrity in LLMs via Reasoning and Reinforcement Learning
Guangchen (Eric) Lan, Huseyin A. Inan, Sahar Abdelnabi et al.
GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scene
Xiao Chen, Tai Wang, Quanyi Li et al.
CopyrightShield: Enhancing Diffusion Model Security Against Copyright Infringement Attacks
Zhixiang Guo, Siyuan Liang, Aishan Liu et al.
CA2C: A Prior-Knowledge-Free Approach for Robust Label Noise Learning via Asymmetric Co-learning and Co-training
Mengmeng Sheng, Zeren Sun, Tianfei Zhou et al.
Learnable Logit Adjustment for Imbalanced Semi-Supervised Learning under Class Distribution Mismatch
lee hyuck, Taemin Park, Heeyoung Kim
Mind the Gap: Aligning Vision Foundation Models to Image Feature Matching
Yuhan Liu, Jingwen Fu, Yang Wu et al.
Seeing the Abstract: Translating the Abstract Language for Vision Language Models
Davide Talon, Federico Girella, Ziyue Liu et al.
CARL: Causality-guided Architecture Representation Learning for an Interpretable Performance Predictor
Han Ji, Yuqi Feng, Jiahao Fan et al.
SPRO: Improving Image Generation via Self-Play
Ritika Jha, Aanisha Bhattacharyya, Yaman Singla et al.
TCFG: Truncated Classifier-Free Guidance for Efficient and Scalable Text-to-Image Acceleration
Xiaomeng Fu, Jia Li
Point Cloud Self-supervised Learning via 3D to Multi-view Masked Learner
Zhimin Chen, Xuewei Chen, Xiao Guo et al.
OPHR: Mastering Volatility Trading with Multi-Agent Deep Reinforcement Learning
Zeting Chen, Xinyu Cai, Molei Qin et al.
MSA2: Multi-task Framework with Structure-aware and Style-adaptive Character Representation for Open-set Chinese Text Recognition
Yangfu Li, Hongjian Zhan, Qi Liu et al.
DiffPCI: Large Motion Point Cloud frame Interpolation with Diffusion Model
tianyu zhang, Haobo Jiang, jian Yang et al.
DiffPS: Leveraging Prior Knowledge of Diffusion Model for Person Search
Giyeol Kim, Sooyoung Yang, Jihyong Oh et al.
Feature Purification Matters: Suppressing Outlier Propagation for Training-Free Open-Vocabulary Semantic Segmentation
Shuo Jin, Siyue Yu, Bingfeng Zhang et al.
ROVI: A VLM-LLM Re-Captioned Dataset for Open-Vocabulary Instance-Grounded Text-to-Image Generation
Cihang Peng, Qiming HOU, Zhong Ren et al.
MultiModal Action Conditioned Video Simulation
Yichen Li, Antonio Torralba
Local Dense Logit Relations for Enhanced Knowledge Distillation
Liuchi Xu, Kang Liu, Jinshuai Liu et al.
FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization
Hao Chen, Shell Xu Hu, Wayne Luk et al.
HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding
JIAHE ZHAO, RuiBing Hou, zejie tian et al.
SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
Yongkun Du, Zhineng Chen, Hongtao Xie et al.
Soft Local Completeness: Rethinking Completeness in XAI
Ziv Weiss Haddad, Oren Barkan, Yehonatan Elisha et al.
ClearSight: Human Vision-Inspired Solutions for Event-Based Motion Deblurring
Xiaopeng LIN, Yulong Huang, Hongwei Ren et al.
PBFG: A New Physically-Based Dataset and Removal of Lens Flares and Glares
Jie Zhu, Sungkil Lee
Correspondence as Video: Test-Time Adaption on SAM2 for Reference Segmentation in the Wild
Haoran Wang, Zekun Li, Jian Zhang et al.
An Information-Theoretic Regularizer for Lossy Neural Image Compression
ZHANG YINGWEN, Meng Wang, Xihua Sheng et al.
Knowledge-Guided Part Segmentation
Xuejian Gou, Fang Liu, Licheng Jiao et al.
Controllable Feature Whitening for Hyperparameter-Free Bias Mitigation
Yooshin Cho, Hanbyel Cho, Janghyeon Lee et al.
Mind the Gap: Detecting Black-box Adversarial Attacks in the Making through Query Update Analysis
Jeonghwan Park, Niall McLaughlin, Ihsen Alouani
KV-Edit: Training-Free Image Editing for Precise Background Preservation
Tianrui Zhu, Shiyi Zhang, Jiawei Shao et al.
FusionPhys: A Flexible Framework for Fusing Complementary Sensing Modalities in Remote Physiological Measurement
Chenhang Ying, Huiyu Yang, Jieyi Ge et al.
DiffVSR: Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations
Xiaohui Li, Yihao Liu, Shuo Cao et al.
Moment Quantization for Video Temporal Grounding
Xiaolong Sun, Le Wang, Sanping Zhou et al.
Test-time Adaptation for Foundation Medical Segmentation Model Without Parametric Updates
Kecheng Chen, Xinyu Luo, Tiexin Qin et al.
Power of Cooperative Supervision: Multiple Teachers Framework for Advanced 3D Semi-Supervised Object Detection
Jin-Hee Lee, Jae-keun Lee, Jeseok Kim et al.
Adapting In-Domain Few-Shot Segmentation to New Domains without Source Domain Retraining
Qi Fan, Kaiqi Liu, Nian Liu et al.
ESCNet:Edge-Semantic Collaborative Network for Camouflaged Object Detection
Sheng Ye, Xin Chen, Yan Zhang et al.
ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching
Yuxuan Yuan, Luyao Tang, Chaoqi Chen et al.
DADet: Safeguarding Image Conditional Diffusion Models against Adversarial and Backdoor Attacks via Diffusion Anomaly Detection
Hongwei Yu, Xinlong Ding, Jiawei Li et al.
ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations
Tianming Liang, Kun-Yu Lin, Chaolei Tan et al.
Multi-Schema Proximity Network for Composed Image Retrieval
Jiangming Shi, Xiangbo Yin, yeyunchen yeyunchen et al.
LEGO-Maker: A Semantic-Driven Algorithm for Text-to-3D Generation
Yifei Zhang, Lei Chen
CNS-Bench: Benchmarking Image Classifier Robustness Under Continuous Nuisance Shifts
Olaf Dünkel, Artur Jesslen, Jiahao Xie et al.
COVTrack: Continuous Open-Vocabulary Tracking via Adaptive Multi-Cue Fusion
Zekun Qian, Ruize Han, Zhixiang Wang et al.
Dense Policy: Bidirectional Autoregressive Learning of Actions
Yue Su, Xinyu Zhan, Hongjie Fang et al.
monoVLN: Bridging the Observation Gap between Monocular and Panoramic Vision and Language Navigation
Ren-Jie Lu, Yu Zhou, hao cheng et al.
Graph Domain Adaptation with Dual-branch Encoder and Two-level Alignment for Whole Slide Image-based Survival Prediction
Yuntao Shou, Xiangyong Cao, PeiqiangYan PeiqiangYan et al.
DOGR: Towards Versatile Visual Document Grounding and Referring
Yinan Zhou, Yuxin Chen, Haokun Lin et al.
ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation
Xiwei Xuan, Ziquan Deng, Kwan-Liu Ma
MonoMobility: Zero-Shot 3D Mobility Analysis from Monocular Videos
Hongyi Zhou, Xiaogang Wang, Yulan Guo et al.
The Burden of Interactive Alignment with Inconsistent Preferences
Ali Shirali
Performing Defocus Deblurring by Modeling its Formation Process
Zhengbo Zhang, Lin Geng Foo, Hossein Rahmani et al.
CasP: Improving Semi-Dense Feature Matching Pipeline Leveraging Cascaded Correspondence Priors for Guidance
Peiqi Chen, Lei Yu, Yi Wan et al.
Supervised Exploratory Learning for Long-Tailed Visual Recognition
Zhongquan Jian, Yanhao Chen, Wangyancheng Wangyancheng et al.
Collective Counterfactual Explanations: Balancing Individual Goals and Collective Dynamics
Ahmad-Reza Ehyaei, Ali Shirali, Samira Samadi
An Efficient Hybrid Vision Transformer for TinyML Applications
Fanhong Zeng, Huanan LI, Juntao Guan et al.
MMAIF: Multi-task and Multi-degradation All-in-One for Image Fusion with Language Guidance
Zihan Cao, Yu Zhong, Ziqi Wang et al.
Blind Video Super-Resolution based on Implicit Kernels
Qiang Zhu, Yuxuan Jiang, Shuyuan Zhu et al.
OmniDiff: A Comprehensive Benchmark for Fine-grained Image Difference Captioning
Yuan Liu, Saihui Hou, Saijie Hou et al.
Toward Long-Tailed Online Anomaly Detection through Class-Agnostic Concepts
Chiao-An Yang, Kuan-Chuan Peng, Raymond A. Yeh
GauCho: Gaussian Distributions with Cholesky Decomposition for Oriented Object Detection
Jeffri Erwin Murrugarra Llerena, José Henrique Marques, Claudio Jung
More Reliable Pseudo-labels, Better Performance: A Generalized Approach to Single Positive Multi-label Learning
Luong Tran, Thieu Vo, Anh Nguyen et al.
Self supervised learning for in vivo localization of microelectrode arrays using raw local field potential
Tianxiao He, Malhar Patel, Chenyi Li et al.
TimeExpert: An Expert-Guided Video LLM for Video Temporal Grounding
Zuhao Yang, Yingchen Yu, Yunqing Zhao et al.
CaptionSmiths: Flexibly Controlling Language Pattern in Image Captioning
Kuniaki Saito, Donghyun Kim, Kwanyong Park et al.
DCHM: Depth-Consistent Human Modeling for Multiview Detection
Jiahao Ma, Tianyu Wang, Miaomiao Liu et al.
Adversarial Robustness of Discriminative Self-Supervised Learning in Vision
Ömer Veysel Çağatan, Ömer TAL, M. Emre Gursoy
HPSv3: Towards Wide-Spectrum Human Preference Score
Yuhang Ma, Keqiang Sun, Xiaoshi Wu et al.
GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation
Tao Liu, Chongyu Wang, Rongjie Li et al.
Active Perception Meets Rule-Guided RL: A Two-Phase Approach for Precise Object Navigation in Complex Environments
Liang Qin, Min Wang, Peiwei Li et al.
HiERO: Understanding the Hierarchy of Human Behavior Enhances Reasoning on Egocentric Videos
Simone Alberto Peirone, Francesca Pistilli, Giuseppe Averta
UNIS: A Unified Framework for Achieving Unbiased Neural Implicit Surfaces in Volume Rendering
Junkai Deng, Hanting Niu, Jiaze Li et al.
Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive Segmentation
You Huang, Lichao Chen, Jiayi Ji et al.
On the Provable Importance of Gradients for Autonomous Language-Assisted Image Clustering
Bo Peng, Jie Lu, Guangquan Zhang et al.
IntrinsicControlNet: Cross-distribution Image Generation with Real and Unreal
Jiayuan Lu, Rengan Xie, Zixuan Xie et al.
MH-LVC: Multi-Hypothesis Temporal Prediction for Learned Conditional Residual Video Coding
Gao Zong lin, Huu-Tai Phung, Yi-Chen Yao et al.
DC-AE 1.5: Accelerating Diffusion Model Convergence with Structured Latent Space
Junyu Chen, Dongyun Zou, Wenkun He et al.
Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions
Yiting Qu, Ziqing Yang, Yihan Ma et al.
Loss Functions for Predictor-based Neural Architecture Search
Han Ji, Yuqi Feng, Jiahao Fan et al.
Advancing Text-to-3D Generation with Linearized Lookahead Variational Score Distillation
Yu Lei, Bingde Liu, Qingsong Xie et al.
Structural Information-based Hierarchical Diffusion for Offline Reinforcement Learning
Xianghua Zeng, Hao Peng, Yicheng Pan et al.
Steering Guidance for Personalized Text-to-Image Diffusion Models
Sunghyun Park, Seokeon Choi, Hyoungwoo Park et al.
ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models
Zifu Wan, Ce Zhang, Silong Yong et al.
Single GPU Task Adaptation of Pathology Foundation Models for Whole Slide Image Analysis
Neeraj Kumar, Chad Vanderbilt
Precise Diffusion Inversion: Towards Novel Samples and Few-Step Models
Jing Zuo, Luoping Cui, Chuang Zhu et al.
Domain-aware Category-level Geometry Learning Segmentation for 3D Point Clouds
Pei He, Lingling Li, Licheng Jiao et al.
Function-centric Bayesian Network for Zero-Shot Object Goal Navigation
Sixian Zhang, Xinyao Yu, Xinhang Song et al.
GaussianReg: Rapid 2D/3D Registration for Emergency Surgery via Explicit 3D Modeling with Gaussian Primitives
Weihao Yu, Xiaoqing Guo, Xinyu Liu et al.
Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning
Marwa Abdulhai, Ryan Cheng, Donovan Clay et al.
ArgoTweak: Towards Self-Updating HD Maps through Structured Priors
Lena Wild, Rafael Valencia, Patric Jensfelt
Event-aided Dense and Continuous Point Tracking: Everywhere and Anytime
Zhexiong Wan, Jianqin Luo, Yuchao Dai et al.
Context-Aware Academic Emotion Dataset and Benchmark
Luming Zhao, Jingwen Xuan, Jiamin Lou et al.
FlowSeek: Optical Flow Made Easier with Depth Foundation Models and Motion Bases
Matteo Poggi, Fabio Tosi
TPG-INR: Target Prior-Guided Implicit 3D CT Reconstruction for Enhanced Sparse-view Imaging
QingleiCao QingleiCao, Ziyao Tang, Xiaoqin Tang
SpatialCrafter: Unleashing the Imagination of Video Diffusion Models for Scene Reconstruction from Limited Observations
Songchun Zhang, Huiyao Xu, Sitong Guo et al.
Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration
JUNSEONG KIM, GeonU Kim, Kim Yu-Ji et al.
All Parts Matter: A Unified Mask-Free Virtual Try-On Framework
Chenghu Du, Shengwu Xiong, Yi Rong
Efficient Visual Place Recognition Through Multimodal Semantic Knowledge Integration
Sitao Zhang, Hongda Mao, Qingshuang Chen et al.
COME: Dual Structure-Semantic Learning with Collaborative MoE for Universal Lesion Detection Across Heterogeneous Ultrasound Datasets
Lingyu Chen, Yawen Zeng, Yue Wang et al.
NATRA: Noise-Agnostic Framework for Trajectory Prediction with Noisy Observations
Rongqing Li, Changsheng Li, Ruilin Lv et al.
MS3D: High-Quality 3D Generation via Multi-Scale Representation Modeling
Guan Luo, Jianfeng Zhang
JPEG Processing Neural Operator for Backward-Compatible Coding
Woo Kyoung Han, Yongjun Lee, Byeonghun Lee et al.
UniDxMD: Towards Unified Representation for Cross-Modal Unsupervised Domain Adaptation in 3D Semantic Segmentation
Zhengyin Liang, Hui Yin, Min Liang et al.
Hybrid Layout Control for Diffusion Transformer: Fewer Annotations, Superior Aesthetics
Keming Wu, Junwen Chen, Zhanhao Liang et al.
PLAN: Proactive Low-Rank Allocation for Continual Learning
XIEQUN WANG, Zhan Zhuang, Yu Zhang
Leveraging Spatial Invariance to Boost Adversarial Transferability
Zihan Zhou, LI LI, Yanli Ren et al.
LayerLock: Non-collapsing Representation Learning with Progressive Freezing
Goker Erdogan, Nikhil Parthasarathy, Catalin Ionescu et al.
T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation
Chieh-Yun Chen, Min Shi, Gong Zhang et al.
FedXDS: Leveraging Model Attribution Methods to counteract Data Heterogeneity in Federated Learning
Maximilian Hoefler, Karsten Mueller, Wojciech Samek
Visual Textualization for Image Prompted Object Detection
Yongjian Wu, Yang Zhou, Jiya Saiyin et al.
TerraMind: Large-Scale Generative Multimodality for Earth Observation
Johannes Jakubik, Felix Yang, Benedikt Blumenstiel et al.
LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs
Haoran Lou, Chunxiao Fan, Ziyan Liu et al.
EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision
Yiming Zhao, Taein Kwon, Paul Streli et al.
AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation
Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin et al.
FHGS: Feature-Homogenized Gaussian Splatting
qigeng duan, Benyun ZHAO, Mingqiao Han et al.
Generative Video Bi-flow
Chen Liu, Tobias Ritschel
Transformer-based Tooth Alignment Prediction with Occlusion and Collision Constraints
DongZhenXing DongZhenXing, Jiazhou Chen
A Unified Framework for Industrial Cel-Animation Colorization with Temporal-Structural Awareness
Xiaoyi Feng, Tao Huang, Peng Wang et al.
Memory-Integrated Reconfigurable Adapters: A Unified Framework for Settings with Multiple Tasks
Susmit Agrawal, Krishn Vishwas Kher, Saksham Mittal et al.
Towards Robust Defense against Customization via Protective Perturbation Resistant to Diffusion-based Purification
Wenkui Yang, Jie Cao, Junxian Duan et al.
SD2Actor: Continuous State Decomposition via Diffusion Embeddings for Robotic Manipulation
lijiayi jiayi
ADCD-Net: Robust Document Image Forgery Localization via Adaptive DCT Feature and Hierarchical Content Disentanglement
KA WONG, Jicheng Zhou, Haiwei Wu et al.
Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis
Xinyu Hou, Zongsheng Yue, Xiaoming Li et al.
PixTalk: Controlling Photorealistic Image Processing and Editing with Language
Marcos Conde, Zihao Lu, Radu Timofte
Scene Graph Guided Generation: Enable Accurate Relations Generation in Text-to-Image Models via Textural Rectification
Guibao SHEN, Luozhou Wang, Jiantao Lin et al.
ReMP-AD: Retrieval-enhanced Multi-modal Prompt Fusion for Few-Shot Industrial Visual Anomaly Detection
Hongchi Ma, Guanglei Yang, Debin Zhao et al.
GMMamba: Group Masking Mamba for Whole Slide Image Classification
Tingting Zheng, Hongxun Yao, Kui Jiang et al.
TimeFormer: Capturing Temporal Relationships of Deformable 3D Gaussians for Robust Reconstruction
Dadong Jiang, Zhi Hou, Zhihui Ke et al.
Beyond Brain Decoding: Visual-Semantic Reconstructions to Mental Creation Extension Based on fMRI
Haodong Jing, Dongyao Jiang, Yongqiang Ma et al.
RareCLIP: Rarity-aware Online Zero-shot Industrial Anomaly Detection
Jianfang He, Min Cao, Silong Peng et al.
Tracing Copied Pixels and Regularizing Patch Affinity in Copy Detection
Yichen Lu, Siwei Nie, Minlong Lu et al.
Pretrained Reversible Generation as Unsupervised Visual Representation Learning
Rongkun Xue, Jinouwen Zhang, Yazhe Niu et al.
BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation
Ruotong Wang, Mingli Zhu, Jiarong Ou et al.
Temporal Rate Reduction Clustering for Human Motion Segmentation
Xianghan Meng, Zhengyu Tong, Zhiyuan Huang et al.
Hierarchy UGP: Hierarchy Unified Gaussian Primitive for Large-Scale Dynamic Scene Reconstruction
Hongyang Sun, Qinglin Yang, Jiawei Wang et al.