Most Cited ICCV "uncertainty-aware exploration" Papers
2,701 papers found • Page 9 of 14
Conference
UniGS: Modeling Unitary 3D Gaussians for Novel View Synthesis from Sparse-view Images
Jiamin WU, Kenkun Liu, Xiaoke Jiang et al.
UnMix-NeRF: Spectral Unmixing Meets Neural Radiance Fields
Fabian Perez, Sara Rojas Martinez, Carlos Hinojosa et al.
Efficient Spiking Point Mamba for Point Cloud Analysis
Peixi Wu, Bosong Chai, Menghua Zheng et al.
Visual Surface Wave Elastography: Revealing Subsurface Physical Properties via Visible Surface Waves
Alexander Ogren, Berthy Feng, Jihoon Ahn et al.
PolarAnything: Diffusion-based Polarimetric Image Synthesis
Kailong Zhang, Youwei Lyu, Heng Guo et al.
MergeOcc: Bridge the Domain Gap between Different LiDARs for Robust Occupancy Prediction
Zikun Xu, Shaobing Xu
LoD-Loc v2: Aerial Visual Localization over Low Level-of-Detail City Models using Explicit Silhouette Alignment
Juelin Zhu, Shuaibang Peng, Long Wang et al.
LANGTRAJ: Diffusion Model and Dataset for Language-Conditioned Trajectory Simulation
WEI-JER Chang, Masayoshi Tomizuka, Wei Zhan et al.
ACE-G: Improving Generalization of Scene Coordinate Regression Through Query Pre-Training
Leonard Bruns, Axel Barroso-Laguna, Tommaso Cavallari et al.
Adversarial Exploitation of Data Diversity Improves Visual Localization
Sihang Li, Siqi Tan, Bowen Chang et al.
AlignDiff: Learning Physically-Grounded Camera Alignment via Diffusion
Liuyue Xie, Jiancong Guo, Ozan Cakmakci et al.
SGAD: Semantic and Geometric-aware Descriptor for Local Feature Matching
Xiangzeng Liu, CHI WANG, Guanglu Shi et al.
Egocentric Action-aware Inertial Localization in Point Clouds with Vision-Language Guidance
Mingfang Zhang, Ryo Yonetani, Yifei Huang et al.
Coordinate-based Speed of Sound Recovery for Aberration-Corrected Photoacoustic Computed Tomography
Tianao Li, Manxiu Cui, Cheng Ma et al.
Purge-Gate: Efficient Backpropagation-Free Test-Time Adaptation for Point Clouds via Token purging
Moslem Yazdanpanah, Ali Bahri, Mehrdad Noori et al.
CF3: Compact and Fast 3D Feature Fields
Hyunjoon Lee, Joonkyu Min, Jaesik Park
ToF-Splatting: Dense SLAM using Sparse Time-of-Flight Depth and Multi-Frame Integration
Andrea Conti, Matteo Poggi, Valerio Cambareri et al.
Unsupervised Imaging Inverse Problems with Diffusion Distribution Matching
Giacomo Meanti, Thomas Ryckeboer, Michael Arbel et al.
Perspective-aware 3D Gaussian Inpainting with Multi-view Consistency
Yuxin CHENG, Binxiao Huang, Taiqiang Wu et al.
Interaction-Merged Motion Planning: Effectively Leveraging Diverse Motion Datasets for Robust Planning
Giwon Lee, Wooseong Jeong, Daehee Park et al.
DONUT: A Decoder-Only Model for Trajectory Prediction
Markus Knoche, Daan de Geus, Bastian Leibe
PanoSplatt3R: Leveraging Perspective Pretraining for Generalized Unposed Wide-Baseline Panorama Reconstruction
Jiahui Ren, Mochu Xiang, Jiajun Zhu et al.
Variance-Based Pruning for Accelerating and Compressing Trained Networks
Uranik Berisha, Jens Mehnert, Alexandru Condurache
Forecasting Continuous Non-Conservative Dynamical Systems in SO(3)
Lennart Bastian, Mohammad Rashed, Nassir Navab et al.
Certifiably Optimal Anisotropic Rotation Averaging
Carl Olsson, Yaroslava Lochman, Johan Malmport et al.
MIORe & VAR-MIORe: Benchmarks to Push the Boundaries of Restoration
George Ciubotariu, Zhuyun Zhou, Zongwei Wu et al.
E-SAM: Training-Free Segment Every Entity Model
WEIMING ZHANG, Dingwen Xiao, Lei Chen et al.
Towards Foundational Models for Single-Chip Radar
Tianshu Huang, Akarsh Prabhakara, Chuhan Chen et al.
Understanding Museum Exhibits using Vision-Language Reasoning
Ada-Astrid Balauca, Sanjana Garai, Stefan Balauca et al.
Gradient Extrapolation for Debiased Representation Learning
Ihab Asaad, Maha Shadaydeh, Joachim Denzler
InstantEdit: Text-Guided Few-Step Image Editing with Piecewise Rectified Flow
Yiming Gong, Zhen Zhu, Minjia Zhang
MemoryTalker: Personalized Speech-Driven 3D Facial Animation via Audio-Guided Stylization
Hyung Kyu Kim, Sangmin Lee, HAK GU KIM
PARTE: Part-Guided Texturing for 3D Human Reconstruction from a Single Image
Hyeongjin Nam, Donghwan Kim, Gyeongsik Moon et al.
LLM-enhanced Action-aware Multi-modal Prompt Tuning for Image-Text Matching
Meng Tian, Shuo Yang, Xinxiao Wu
ShadowHack: Hacking Shadows via Luminance-Color Divide and Conquer
Jin Hu, Mingjia Li, Xiaojie Guo
Easy3D: A Simple Yet Effective Method for 3D Interactive Segmentation
Andrea Simonelli, Norman Müller, Peter Kontschieder
GeoProg3D: Compositional Visual Reasoning for City-Scale 3D Language Fields
Shunsuke Yasuki, Taiki Miyanishi, Nakamasa Inoue et al.
Depth AnyEvent: A Cross-Modal Distillation Paradigm for Event-Based Monocular Depth Estimation
Luca Bartolomei, Enrico Mannocci, Fabio Tosi et al.
WIPES: Wavelet-based Visual Primitives
Wenhao Zhang, Hao Zhu, Delong Wu et al.
Fuse Before Transfer: Knowledge Fusion for Heterogeneous Distillation
Guopeng Li, Qiang Wang, Ke Yan et al.
Memory-Efficient Generative Models via Product Quantization
Jie Shao, Hanxiao Zhang, Hao Yu et al.
Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation
Jiahua Dong, Hui Yin, Wenqi Liang et al.
RAGD: Regional-Aware Diffusion Model for Text-to-Image Generation
Chen Zhennan, Yajie Li, Haofan Wang et al.
Domain Generalizable Portrait Style Transfer
Xinbo Wang, Wenju Xu, Qing Zhang et al.
VQ-SGen: A Vector Quantized Stroke Representation for Creative Sketch Generation
Jiawei Wang, Zhiming Cui, Changjian Li
G2PDiffusion: Cross-species Genotype-to-Phenotype Prediction via Evolutionary Diffusion
Mengdi Liu, Zhangyang Gao, Hong Chang et al.
Task-Specific Zero-shot Quantization-Aware Training for Object Detection
Changhao Li, Xinrui Chen, Ji Wang et al.
DexH2R: A Benchmark for Dynamic Dexterous Grasping in Human-to-Robot Handover
Youzhuo Wang, jiayi ye, Chuyang Xiao et al.
Latent Expression Generation for Referring Image Segmentation and Grounding
Seonghoon Yu, Junbeom Hong, Joonseok Lee et al.
BASIC: Boosting Visual Alignment with Intrinsic Refined Embeddings in Multimodal Large Language Models
Jianting Tang, Yubo Wang, Haoyu Cao et al.
Bridging Diffusion Models and 3D Representations: A 3D Consistent Super-Resolution Framework
Yi-Ting Chen, Ting-Hsuan Liao, Pengsheng Guo et al.
Efficient Input-level Backdoor Defense on Text-to-Image Synthesis via Neuron Activation Variation
Shengfang ZHAI, Jiajun Li, Yue Liu et al.
MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning
Tianhong Gao, Yannian Fu, Weiqun Wu et al.
SALAD -- Semantics-Aware Logical Anomaly Detection
Matic Fučka, Vitjan Zavrtanik, Danijel Skocaj
Visual Relation Diffusion for Human-Object Interaction Detection
Ping Cao, Yepeng Tang, Chunjie Zhang et al.
WIR3D: Visually-Informed and Geometry-Aware 3D Shape Abstraction
Richard Liu, Daniel Fu, Noah Tan et al.
GSV3D: Gaussian Splatting-based Geometric Distillation with Stable Video Diffusion for Single-Image 3D Object Generation
Ye Tao, jiawei zhang, Yahao Shi et al.
METEOR: Multi-Encoder Collaborative Token Pruning for Efficient Vision Language Models
Yuchen Liu, Yaoming Wang, Bowen Shi et al.
PixelStitch: Structure-Preserving Pixel-Wise Bidirectional Warps for Unsupervised Image Stitching
Hengzhe Jin, Lang Nie, Chunyu Lin et al.
S$^3$E: Self-Supervised State Estimation for Radar-Inertial System
Shengpeng Wang, Yulong Xie, Qing Liao et al.
Video Color Grading via Look-Up Table Generation
Seunghyun Shin, Dongmin Shin, Jisu Shin et al.
Trans-Adapter: A Plug-and-Play Framework for Transparent Image Inpainting
Yuekun Dai, Haitian Li, Shangchen Zhou et al.
IGD: Instructional Graphic Design with Multimodal Layer Generation
Yadong Qu, Shancheng Fang, Yuxin Wang et al.
Ask and Remember: A Questions-Only Replay Strategy for Continual Visual Question Answering
Imad Eddine MAROUF, Enzo Tartaglione, Stéphane Lathuilière et al.
Benefit From Seen: Enhancing Open-Vocabulary Object Detection by Bridging Visual and Textual Co-Occurrence Knowledge
Yanqi Li, Jianwei Niu, Tao Ren
Bridging the Gap Between Ideal and Real-world Evaluation: Benchmarking AI-Generated Image Detection in Challenging Scenarios
Chunxiao Li, Xiaoxiao Wang, Meiling Li et al.
You Share Beliefs, I Adapt: Progressive Heterogeneous Collaborative Perception
hao si, Ehsan Javanmardi, Manabu Tsukada
Robust Unfolding Network for HDR Imaging with Modulo Cameras
Zhile Chen, Hui Ji
Embodied Navigation with Auxiliary Task of Action Description Prediction
Haru Kondoh, Asako Kanezaki
IAP: Invisible Adversarial Patch Attack through Perceptibility-Aware Localization and Perturbation Optimization
Subrat Kishore Dutta, Xiao Zhang
OcRFDet: Object-Centric Radiance Fields for Multi-View 3D Object Detection in Autonomous Driving
Mingqian Ji, Jian Yang, Shanshan Zhang
Neural Compression for 3D Geometry Sets
Siyu Ren, Junhui Hou, Weiyao Lin et al.
DAP-MAE: Domain-Adaptive Point Cloud Masked Autoencoder for Effective Cross-Domain Learning
Ziqi Gao, Qiufu Li, Linlin Shen
UniFuse: A Unified All-in-One Framework for Multi-Modal Medical Image Fusion Under Diverse Degradations and Misalignments
Dayong Su, Yafei Zhang, Huafeng Li et al.
FreeCus: Free Lunch Subject-driven Customization in Diffusion Transformers
Yanbing Zhang, Zhe Wang, Qin Zhou et al.
CopyrightShield: Enhancing Diffusion Model Security Against Copyright Infringement Attacks
Zhixiang Guo, Siyuan Liang, Aishan Liu et al.
Dataset Ownership Verification for Pre-trained Masked Models
Yuechen Xie, Jie Song, Yicheng Shan et al.
From Holistic to Localized: Local Enhanced Adapters for Efficient Visual Instruction Fine-Tuning
Pengkun Jiao, Bin Zhu, Jingjing Chen et al.
ClearSight: Human Vision-Inspired Solutions for Event-Based Motion Deblurring
Xiaopeng LIN, Yulong Huang, Hongwei Ren et al.
UMDATrack: Unified Multi-Domain Adaptive Tracking Under Adverse Weather Conditions
Siyuan Yao, Rui Zhu, Ziqi Wang et al.
ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation
Xiwei Xuan, Ziquan Deng, Kwan-Liu Ma
PCR-GS: COLMAP-Free 3D Gaussian Splatting via Pose Co-Regularizations
YU WEI, Jiahui Zhang, Xiaoqin Zhang et al.
Membership Inference Attacks with False Discovery Rate Control
Chenxu Zhao, Wei Qian, Aobo Chen et al.
Blind Video Super-Resolution based on Implicit Kernels
Qiang Zhu, Yuxuan Jiang, Shuyuan Zhu et al.
OmniDiff: A Comprehensive Benchmark for Fine-grained Image Difference Captioning
Yuan Liu, Saihui Hou, Saijie Hou et al.
PLMP - Point-Line Minimal Problems for Projective SfM
Kim Kiehn, Albin Ahlbäck, Kathlén Kohn
SpiLiFormer: Enhancing Spiking Transformers with Lateral Inhibition
Zeqi Zheng, Yanchen Huang, Yingchao Yu et al.
SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference
Samir Khaki, Junxian Guo, Jiaming Tang et al.
Decoding Correlation-Induced Misalignment in the Stable Diffusion Workflow for Text-to-Image Generation
Yunze Tong, Fengda Zhang, Didi Zhu et al.
Steering Guidance for Personalized Text-to-Image Diffusion Models
Sunghyun Park, Seokeon Choi, Hyoungwoo Park et al.
M-SpecGene: Generalized Foundation Model for RGBT Multispectral Vision
Kailai Zhou, Fuqiang Yang, Shixian Wang et al.
MeshMamba: State Space Models for Articulated 3D Mesh Generation and Reconstruction
Yusuke Yoshiyasu, Leyuan Sun, Ryusuke Sagawa
Learning Robust Image Watermarking with Lossless Cover Recovery
jiale chen, Wei Wang, Chongyang Shi et al.
Seeing Through Deepfakes: A Human-Inspired Framework for Multi-Face Detection
Juan Hu, Shaojing Fan, Terence Sim
Deep Space Weather Model: Long-Range Solar Flare Prediction from Multi-Wavelength Images
Shunya Nagashima, Komei Sugiura
Inference-Time Diffusion Model Distillation
Geon Yeong Park, Sang Wan Lee, Jong Ye
Fusion Meets Diverse Conditions: A High-diversity Benchmark and Baseline for UAV-based Multimodal Object Detection with Condition Cues
Chen Chen, Kangcheng Bin, Hu Ting et al.
Rethinking Detecting Salient and Camouflaged Objects in Unconstrained Scenes
Zhangjun Zhou, Yiping Li, Chunlin Zhong et al.
Guiding Noisy Label Conditional Diffusion Models with Score-based Discriminator Correction
Dat Cong, Hieu Tran, Hoang Thanh-Tung
TrackAny3D: Transferring Pretrained 3D Models for Category-unified 3D Point Cloud Tracking
Mengmeng Wang, Haonan Wang, Yulong Li et al.
FreqPDE: Rethinking Positional Depth Embedding for Multi-View 3D Object Detection Transformers
Junjie Zhang, Haisheng Su, Feixiang Song et al.
Enhancing Numerical Prediction of MLLMs with Soft Labeling
Pei Wang, Zhaowei Cai, Hao Yang et al.
MotionDiff: Training-free Zero-shot Interactive Motion Editing via Flow-assisted Multi-view Diffusion
Yikun Ma, Yiqing Li, Jiawei Wu et al.
FPEM: Face Prior Enhanced Facial Attractiveness Prediction for Live Videos with Face Retouching
Hui Li, Xiaoyu Ren, Hongjiu Yu et al.
STD-GS: Exploring Frame-Event Interaction for SpatioTemporal-Disentangled Gaussian Splatting to Reconstruct High-Dynamic Scene
Hanyu Zhou, Haonan Wang, Haoyue Liu et al.
RARE: Refine Any Registration of Pairwise Point Clouds via Zero-Shot Learning
Chengyu Zheng, Honghua Chen, Jin Huang et al.
Zero-Shot Vision Encoder Grafting via LLM Surrogates
Kaiyu Yue, Vasu Singla, Menglin Jia et al.
OV-SCAN: Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection
Adrian Chow, Evelien Riddell, Yimu Wang et al.
ULTHO: Ultra-Lightweight yet Efficient Hyperparameter Optimization in Deep Reinforcement Learning
Mingqi Yuan, Bo Li, Xin Jin et al.
ATAS: Any-to-Any Self-Distillation for Enhanced Open-Vocabulary Dense Prediction
Soonwoo Cha, Jiwoo Song, Juan Yeo et al.
A Plug-and-Play Physical Motion Restoration Approach for In-the-Wild High-Difficulty Motions
Youliang Zhang, Ronghui Li, Yachao Zhang et al.
ViT-Split: Unleashing the Power of Vision Foundation Models via Efficient Splitting Heads
Yifan Li, Xin Li, Tianqin Li et al.
When Confidence Fails: Revisiting Pseudo-Label Selection in Semi-supervised Semantic Segmentation
Pan Liu, Jinshi Liu
Unlearning the Noisy Correspondence Makes CLIP More Robust
Haochen Han, Alex Jinpeng Wang, Peijun Ye et al.
Global-Aware Monocular Semantic Scene Completion with State Space Models
Shijie Li, Zhongyao Cheng, Rong Li et al.
DIMO: Diverse 3D Motion Generation for Arbitrary Objects
Linzhan Mou, Jiahui Lei, Chen Wang et al.
Beyond Blur: A Fluid Perspective on Generative Diffusion Models
Grzegorz Gruszczynski, Jakub Meixner, Michał Włodarczyk et al.
Revisiting Adversarial Patch Defenses on Object Detectors: Unified Evaluation, Large-Scale Dataset, and New Insights
Junhao Zheng, Jiahao Sun, Chenhao Lin et al.
AIM: Amending Inherent Interpretability via Self-Supervised Masking
Eyad Alshami, Shashank Agnihotri, Bernt Schiele et al.
One Last Attention for Your Vision-Language Model
Liang Chen, Ghazi Shazan Ahmad, Tianjun Yao et al.
Text-IRSTD: Leveraging Semantic Text to Promote Infrared Small Target Detection in Complex Scenes
Feng Huang, Shuyuan Zheng, Zhaobing Qiu et al.
Balancing Conservatism and Aggressiveness: Prototype-Affinity Hybrid Network for Few-Shot Segmentation
Tianyu Zou, Shengwu Xiong, Ruilin Yao et al.
MCOP: Multi-UAV Collaborative Occupancy Prediction
Zefu Lin, Wenbo Chen, Xiaojuan Jin et al.
Serialization based Point Cloud Oversegmentation
chenghui Lu, Dilong Li, Jianlong Kwan et al.
Reinforcement Learning-Guided Data Selection via Redundancy Assessment
Suorong Yang, Peijia Li, Furao Shen et al.
Recognizing Actions from Robotic View for Natural Human-Robot Interaction
Ziyi Wang, Peiming Li, Hong Liu et al.
DDB: Diffusion Driven Balancing to Address Spurious Correlations
Aryan Yazdan Parast, Basim Azam, Naveed Akhtar
TurboVSR: Fantastic Video Upscalers and Where to Find Them
Zhongdao Wang, Guodongfang Zhao, Jingjing Ren et al.
FRET: Feature Redundancy Elimination for Test Time Adaptation
Linjing You, Jiabao Lu, Xiayuan Huang et al.
SPA: Efficient User-Preference Alignment against Uncertainty in Medical Image Segmentation
Jiayuan Zhu, Junde Wu, Cheng Ouyang et al.
Controllable and Expressive One-Shot Video Head Swapping
Chaonan Ji, Jinwei Qi, Peng Zhang et al.
Learning Dense Feature Matching via Lifting Single 2D Image to 3D Space
Yingping Liang, Yutao Hu, Wenqi Shao et al.
Seeing 3D Through 2D Lenses: 3D Few-Shot Class-Incremental Learning via Cross-Modal Geometric Rectification
Tuo Xiang, Xuemiao Xu, Bangzhen Liu et al.
RayGaussX: Accelerating Gaussian-Based Ray Marching for Real-Time and High-Quality Novel View Synthesis
Hugo Blanc, Jean-Emmanuel Deschaud, Alexis Paljic
Learning Pixel-adaptive Multi-layer Perceptrons for Real-time Image Enhancement
Junyu Lou, Xiaorui Zhao, Kexuan Shi et al.
CULTURE3D: A Large-Scale and Diverse Dataset of Cultural Landmarks and Terrains for Gaussian-Based Scene Rendering
xinyi zheng, Steve Zhang, Weizhe Lin et al.
Information-Bottleneck Driven Binary Neural Network for Change Detection
Kaijie Yin, Zhiyuan Zhang, Shu Kong et al.
VisHall3D: Monocular Semantic Scene Completion from Reconstructing the Visible Regions to Hallucinating the Invisible Regions
Haoang Lu, Yuanqi Su, Xiaoning Zhang et al.
Evidential Knowledge Distillation
Liangyu Xiang, Junyu Gao, Changsheng Xu
Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models
Wei Suo, Ji Ma, Mengyang Sun et al.
DanceEditor: Towards Iterative Editable Music-driven Dance Generation with Open-Vocabulary Descriptions
Hengyuan Zhang, Zhe Li, Xingqun Qi et al.
TAG-WM: Tamper-Aware Generative Image Watermarking via Diffusion Inversion Sensitivity
Yuzhuo Chen, Zehua Ma, Han Fang et al.
Diffusion-based 3D Hand Motion Recovery with Intuitive Physics
Yufei Zhang, Zijun Cui, Jeffrey Kephart et al.
Language Decoupling with Fine-grained Knowledge Guidance for Referring Multi-object Tracking
guangyao Li, Siping Zhuang, Yajun Jian et al.
Reminiscence Attack on Residuals: Exploiting Approximate Machine Unlearning for Privacy
Yaxin Xiao, Qingqing Ye, Li Hu et al.
Devil is in the Uniformity: Exploring Diverse Learners within Transformer for Image Restoration
Shihao Zhou, Dayu Li, Jinshan Pan et al.
EMatch: A Unified Framework for Event-based Optical Flow and Stereo Matching
Pengjie Zhang, Lin Zhu, Xiao Wang et al.
CanFields: Consolidating Diffeomorphic Flows for Non-Rigid 4D Interpolation from Arbitrary-Length Sequences
Miaowei Wang, Changjian Li, Amir Vaxman
QR-LoRA: Efficient and Disentangled Fine-tuning via QR Decomposition for Customized Generation
Jiahui Yang, Yongjia Ma, Donglin Di et al.
Multidimensional Byte Pair Encoding: Shortened Sequences for Improved Visual Data Generation
Tim Elsner, Paula Usinger, Julius Nehring-Wirxel et al.
PoseAnchor: Robust Root Position Estimation for 3D Human Pose Estimation
Jun-Hee Kim, Jumin Han, Seong-Whan Lee
Self-Supervised Sparse Sensor Fusion for Long Range Perception
Edoardo Palladin, Samuel Brucker, Filippo Ghilotti et al.
Implicit Counterfactual Learning for Audio-Visual Segmentation
Mingfeng Zha, Tianyu Li, Guoqing Wang et al.
Competitive Distillation: A Simple Learning Strategy for Improving Visual Classification
Daqian Shi, Xiaolei Diao, Xu Chen et al.
AIComposer: Any Style and Content Image Composition via Feature Integration
Haowen Li, Zhenfeng Fan, Zhang Wen et al.
Rethink Sparse Signals for Pose-guided Text-to-image Generation
Wenjie Xuan, Jing Zhang, Juhua Liu et al.
Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions
Tommaso Galliena, Tommaso Apicella, Stefano Rosa et al.
Single-Scanline Relative Pose Estimation for Rolling Shutter Cameras
Petr Hruby, Marc Pollefeys
OURO: A Self-Bootstrapped Framework for Enhancing Multimodal Scene Understanding
Tianrun Xu, Guanyu Chen, Ye Li et al.
ResidualViT for Efficient Temporally Dense Video Encoding
Mattia Soldan, Fabian Caba Heilbron, Bernard Ghanem et al.
Beyond Low-Rank Tuning: Model Prior-Guided Rank Allocation for Effective Transfer in Low-Data and Large-Gap Regimes.
Chuyan Zhang, Kefan Wang, Yun Gu
MUG: Pseudo Labeling Augmented Audio-Visual Mamba Network for Audio-Visual Video Parsing
Langyu Wang, Langyu Wang, Yingying Chen et al.
Progressive Homeostatic and Plastic Prompt Tuning for Audio-Visual Multi-Task Incremental Learning
Jiong Yin, Liang Li, Jiehua Zhang et al.
AutoPrompt: Automated Red-Teaming of Text-to-Image Models via LLM-Driven Adversarial Prompts
Yufan Liu, Wanqian Zhang, Huashan Chen et al.
FreeDance: Towards Harmonic Free-Number Group Dance Generation via a Unified Framework
Yiwen Zhao, Yang Wang, Liting Wen et al.
LLM-Assisted Semantic Guidance for Sparsely Annotated Remote Sensing Object Detection
Wei Liao, Chunyan Xu, Chenxu Wang et al.
BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation
Yuanhong Yu, Xingyi He, Chen Zhao et al.
SRefiner: Soft-Braid Attention for Multi-Agent Trajectory Refinement
Liwen Xiao, Zhiyu Pan, Zhicheng Wang et al.
Seam360GS: Seamless 360° Gaussian Splatting from Real-World Omnidirectional Images
Changha Shin, Woong Oh Cho, Seon Joo Kim
SpikePack: Enhanced Information Flow in Spiking Neural Networks with High Hardware Compatibility
Guobin Shen, Jindong Li, Tenglong Li et al.
FA: Forced Prompt Learning of Vision-Language Models for Out-of-Distribution Detection
Xinhua Lu, Runhe Lai, Yanqi Wu et al.
ARMO: Autoregressive Rigging for Multi-Category Objects
mingze sun, Shiwei Mao, Keyi Chen et al.
SuperEvent: Cross-Modal Learning of Event-based Keypoint Detection for SLAM
Yannick Burkhardt, Simon Schaefer, Stefan Leutenegger
Mind the Cost of Scaffold! Benign Clients May Even Become Accomplices of Backdoor Attack
Xingshuo Han, Xuanye Zhang, Xiang Lan et al.
BlinkTrack: Feature Tracking over 80 FPS via Events and Images
Yichen Shen, Yijin Li, Shuo Chen et al.
DICE: Staleness-Centric Optimizations for Parallel Diffusion MoE Inference
Jiajun Luo, Lizhuo Luo, Jianru Xu et al.
Measuring the Impact of Rotation Equivariance on Aerial Object Detection
Xiuyu Wu, Xinhao Wang, Xiubin Zhu et al.
Wasserstein Style Distribution Analysis and Transform for Stylized Image Generation
Xi Yu, Xiang Gu, Zhihao Shi et al.
Visual Intention Grounding for Egocentric Assistants
Pengzhan Sun, Junbin Xiao, Tze Ho Elden Tse et al.
MVQA: Mamba with Unified Sampling for Efficient Video Quality Assessment
Yachun Mi, Yu Li, Weicheng Meng et al.
Breaking Grid Constraints: Dynamic Graph Reconstruction Network for Multi-organ Segmentation
Junhao Xiao, Yang Wei, Jingyu Wang et al.
Prototype-based Contrastive Learning with Stage-wise Progressive Augmentation for Self-Supervised Fine-Grained Learning
BaoFeng Tan, Xiu-Shen Wei, Lin Zhao
Enrich and Detect: Video Temporal Grounding with Multimodal LLMs
Shraman Pramanick, Effrosyni Mavroudi, Yale Song et al.
Region-aware Anchoring Mechanism for Efficient Referring Visual Grounding
Shuyi Ouyang, Ziwei Niu, Hongyi Wang et al.
CogCM: Cognition-Inspired Contextual Modeling for Audio-Visual Speech Enhancement
Feixiang Wang, Shuang Yang, Shiguang Shan et al.
Token-Efficient VLM: High-Resolution Image Understanding via Dynamic Region Proposal
Yitong Jiang, Jinwei Gu, Tianfan Xue et al.
Teaching AI the Anatomy Behind the Scan: Addressing Anatomical Flaws in Medical Image Segmentation with Learnable Prior
Young Seok Jeon, Hongfei Yang, Huazhu Fu et al.
EDFFDNet: Towards Accurate and Efficient Unsupervised Multi-Grid Image Registration
Haokai Zhu, Bo Qu, Si-Yuan Cao et al.
Enhancing Mamba Decoder with Bidirectional Interaction in Multi-Task Dense Prediction
Mang Cao, Sanping Zhou, Yizhe Li et al.
Leveraging Debiased Cross-modal Attention Maps and Code-based Reasoning for Zero-shot Referring Expression Comprehension
Juntao Chen, Wen Shen, Zhihua Wei et al.
UST-SSM: Unified Spatio-Temporal State Space Models for Point Cloud Video Modeling
Peiming Li, Ziyi Wang, Yulin Yuan et al.
Vision-Language Neural Graph Featurization for Extracting Retinal Lesions
Taimur Hassan, Anabia Sohail, Muzammal Naseer et al.
SHIFT: Smoothing Hallucinations by Information Flow Tuning for Multimodal Large Language Models
Sudong Wang, Yunjian Zhang, Yao Zhu et al.
Flow-MIL: Constructing Highly-expressive Latent Feature Space For Whole Slide Image Classification Using Normalizing Flow
Yingfan MA, Bohan An, Ao Shen et al.
Towards Robustness of Person Search against Corruptions
Woojung Son, Yoonki Cho, Guoyuan An et al.
VIPerson: Flexibly Generating Virtual Identity for Person Re-Identification
Xiao-Wen Zhang, Delong Zhang, Yi-Xing Peng et al.
Engage for All: Making Ordinary Image Descriptions Appealing Again!
Yuyan Chen, Yifan Jiang, Li Zhou et al.
Automated Red Teaming for Text-to-Image Models through Feedback-Guided Prompt Iteration with Vision-Language Models
Wei Xu, Kangjie Chen, Jiawei Qiu et al.
Omni-scene Perception-oriented Point Cloud Geometry Enhancement for Coordinate Quantization
Wang Liu, Wei Gao
Enhancing Spatial Reasoning in Multimodal Large Language Models through Reasoning-based Segmentation
Zhenhua Ning, Zhuotao Tian, Shaoshuai Shi et al.