Most Cited ICCV "multimodal benchmark" Papers
2,701 papers found • Page 8 of 14
Conference
Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion
shengyuan zhang, An Zhao, Ling Yang et al.
SuperEvent: Cross-Modal Learning of Event-based Keypoint Detection for SLAM
Yannick Burkhardt, Simon Schaefer, Stefan Leutenegger
RogSplat: Robust Gaussian Splatting via Generative Priors
Hanyang Kong, Xingyi Yang, Xinchao Wang
FedAGC: Federated Continual Learning with Asymmetric Gradient Correction
Chengchao Zhang, Fanhua Shang, Hongying Liu et al.
ATCTrack: Aligning Target-Context Cues with Dynamic Target States for Robust Vision-Language Tracking
Xiaokun Feng, Shiyu Hu, Xuchen Li et al.
Joint Learning of Pose Regression and Denoising Diffusion with Score Scaling Sampling for Category-level 6D Pose Estimation
Seunghyun Lee, Tae-Kyun Kim
Intra-modal and Cross-modal Synchronization for Audio-visual Deepfake Detection and Temporal Localization
Ashutosh Anshul, Shreyas Gopal, Deepu Rajan et al.
Enhancing Adversarial Transferability by Balancing Exploration and Exploitation with Gradient-Guided Sampling
Zenghao Niu, Weicheng Xie, Siyang Song et al.
CWNet: Causal Wavelet Network for Low-Light Image Enhancement
Tongshun Zhang, Pingping Liu, Yubing Lu et al.
MinCD-PnP: Learning 2D-3D Correspondences with Approximate Blind PnP
Pei An, Jiaqi Yang, Muyao Peng et al.
SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation
Shiqi Huang, Shuting He, Huaiyuan Qin et al.
Federated Representation Angle Learning
Liping Yi, Han Yu, Gang Wang et al.
GeoDistill: Geometry-Guided Self-Distillation for Weakly Supervised Cross-View Localization
Shaowen Tong, Zimin Xia, Alexandre Alahi et al.
CoTMR: Chain-of-Thought Multi-Scale Reasoning for Training-Free Zero-Shot Composed Image Retrieval
Zelong Sun, Dong Jing, Zhiwu Lu
The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow-Based Generation
Ho Kei Cheng, Alex Schwing
Diffusion-based Source-biased Model for Single Domain Generalized Object Detection
Han Jiang, Wenfei Yang, Tianzhu Zhang et al.
DepthSync: Diffusion Guidance-Based Depth Synchronization for Scale- and Geometry-Consistent Video Depth Estimation
Yue-Jiang Dong, Wang Zhao, Jiale Xu et al.
Measuring the Impact of Rotation Equivariance on Aerial Object Detection
Xiuyu Wu, Xinhao Wang, Xiubin Zhu et al.
InfiniDreamer: Arbitrarily Long Human Motion Generation via Segment Score Distillation
Wenjie Zhuo, Fan Ma, Hehe Fan
Enhanced Pansharpening via Quaternion Spatial-Spectral Interactions
Dong Li, Chunhui Luo, Yuanfei Bao et al.
Client2Vec: Improving Federated Learning by Distribution Shifts Aware Client Indexing
Yongxin Guo, Lin Wang, Xiaoying Tang et al.
Instance-Level Video Depth in Groups Beyond Occlusions
Yuan Liang, Yang Zhou, Ziming Sun et al.
Flow Stochastic Segmentation Networks
Fabio De Sousa Ribeiro, Omar Todd, Charles Jones et al.
Future-Aware Interaction Network For Motion Forecasting
Shijie Li, Chunyu Liu, Xun Xu et al.
From Gaze to Movement: Predicting Visual Attention for Autonomous Driving Human-Machine Interaction based on Programmatic Imitation Learning
Yexin Huang, Yongbin Lin, Lishengsa Yue et al.
ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Predictions
Dubing Chen, Jin Fang, Wencheng Han et al.
DreamCube: RGB-D Panorama Generation via Multi-plane Synchronization
Yukun Huang, Yanning Zhou, Jianan Wang et al.
From Enhancement to Understanding: Build a Generalized Bridge for Low-light Vision via Semantically Consistent Unsupervised Fine-tuning
Sen Wang, Shao Zeng, Tianjun Gu et al.
ScanEdit: Hierarchically-Guided Functional 3D Scan Editing
Mohamed El Amine Boudjoghra, Ivan Laptev, Angela Dai
Optical Model-Driven Sharpness Mapping for Autofocus in Small Depth-of-Field and Severe Defocus Scenarios
Chen-Liang Fan, Mingpei Cao, Chih-Chien Hung et al.
HyPiDecoder: Hybrid Pixel Decoder for Efficient Segmentation and Detection
Fengzhe Zhou, Humphrey Shi
MMAD: Multi-label Micro-Action Detection in Videos
Kun Li, pengyu Liu, Dan Guo et al.
G2D: Boosting Multimodal Learning with Gradient-Guided Distillation
Mohammed Rakib, Arunkumar Bagavathi
Unified Video Generation via Next-Set Prediction in Continuous Domain
Zhanzhou Feng, Qingpei Guo, Xinyu Xiao et al.
LazyMAR: Accelerating Masked Autoregressive Models via Feature Caching
Feihong Yan, qingyan wei, Jiayi Tang et al.
Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models
Hongyang Wei, Shuaizheng Liu, Chun Yuan et al.
Omni-scene Perception-oriented Point Cloud Geometry Enhancement for Coordinate Quantization
Wang Liu, Wei Gao
Auto-Regressive Transformation for Image Alignment
Kanggeon Lee, Soochahn Lee, Kyoung Mu Lee
Training-Free Industrial Defect Generation with Diffusion Models
Ruyi Xu, Yen-Tzu Chiu, Tai-I Chen et al.
Feature Decomposition-Recomposition in Large Vision-Language Model for Few-Shot Class-Incremental Learning
Zongyao Xue, Meina Kan, Shiguang Shan et al.
TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models
Ruidong Chen, honglin guo, Lanjun Wang et al.
Can Knowledge be Transferred from Unimodal to Multimodal? Investigating the Transitivity of Multimodal Knowledge Editing
Lingyong Fang, Xinzhong Wang, Depeng depeng wang et al.
Zero-Shot Composed Image Retrieval via Dual-Stream Instruction-Aware Distillation
Wenliang Zhong, Rob Barton, Weizhi An et al.
Token Activation Map to Visually Explain Multimodal LLMs
Yi Li, Hualiang Wang, Xinpeng Ding et al.
Learning Neural Scene Representation from iToF Imaging
Wenjie Chang, Hanzhi Chang, Yueyi Zhang et al.
Multi-Modal Multi-Task Unified Embedding Model (M3T-UEM): A Task-Adaptive Representation Learning Framework
Rohan Sharma, Changyou Chen, Feng-Ju Chang et al.
ConsNoTrainLoRA: Data-driven Weight Initialization of Low-rank Adapters using Constraints
Debasmit Das, Hyoungwoo Park, Munawar Hayat et al.
Task-Aware Prompt Gradient Projection for Parameter-Efficient Tuning Federated Class-Incremental Learning
Hualong Ke, Yachao Zhang, Jiangming Shi et al.
UDC-VIT: A Real-World Video Dataset for Under-Display Cameras
Kyusu Ahn, JiSoo Kim, Sangik Lee et al.
InvRGB+L: Inverse Rendering of Complex Scenes with Unified Color and LiDAR Reflectance Modeling
Xiaoxue Chen, Bhargav Chandaka, Chih-Hao Lin et al.
Is Visual in-Context Learning for Compositional Medical Tasks within Reach?
Simon Reiß, Zdravko Marinov, Alexander Jaus et al.
SpinMeRound: Consistent Multi-View Identity Generation Using Diffusion Models
Stathis Galanakis, Alexandros Lattas, Stylianos Moschoglou et al.
Geminio: Language-Guided Gradient Inversion Attacks in Federated Learning
Junjie Shan, Ziqi Zhao, Jialin Lu et al.
ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation
Daniel Winter, Asaf Shul, Matan Cohen et al.
Active Learning Meets Foundation Models: Fast Remote Sensing Data Annotation for Object Detection
Marvin Burges, Philipe Dias, Dalton Lunga et al.
Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks
Jiawei Wang, Yushen Zuo, Yuanjun Chai et al.
Optimal Transport for Brain-Image Alignment: Unveiling Redundancy and Synergy in Neural Information Processing
Yang Xiao, Wang Lu, Jie Ji et al.
MVTrajecter: Multi-View Pedestrian Tracking with Trajectory Motion Cost and Trajectory Appearance Cost
Taiga Yamane, Ryo Masumura, Satoshi Suzuki et al.
Intervening in Black Box: Concept Bottleneck Model for Enhancing Human Neural Network Mutual Understanding
Nuoye Xiong, Anqi Dong, Ning Wang et al.
Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning
Wooseong Jeong, Kuk-Jin Yoon
DisCoPatch: Taming Adversarially-driven Batch Statistics for Improved Out-of-Distribution Detection
Francisco Caetano, Christiaan Viviers, Luis Zavala-Mondragón et al.
Scaling and Taming Adversarial Training with Synthetic Data
Juntao Wu, Xianting Huang, Yu Chen et al.
DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization
Zihan Ding, Chi Jin, Difan Liu et al.
Music Grounding by Short Video
Zijie Xin, Minquan Wang, Jingyu Liu et al.
Fewer Denoising Steps or Cheaper Per-Step Inference: Towards Compute-Optimal Diffusion Model Deployment
Zhenbang Du, Yonggan Fu, Lifu Wang et al.
Your Text Encoder Can Be An Object-Level Watermarking Controller
Naresh Kumar Devulapally, Mingzhen Huang, Vishal Asnani et al.
Chimera: Improving Generalist Model with Domain-Specific Experts
Tianshuo Peng, Mingsheng Li, Jiakang Yuan et al.
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
Yuseung Lee, Jihyeon Je, Chanho Park et al.
Enhanced Event-based Dense Stereo via Cross-Sensor Knowledge Distillation
Haihao Zhang, Yunjian Zhang, Jianing Li et al.
GauUpdate: New Object Insertion in 3D Gaussian Fields with Consistent Global Illumination
Chengwei REN, Fan Zhang, Liangchao Xu et al.
Any-SSR: How Recursive Least Squares Works in Continual Learning of Large Language Model
Kai Tong, Kang Pan, Xiao Zhang et al.
Not Only Vision: Evolve Visual Speech Recognition via Peripheral Information
Zhaoxin Yuan, Shuang Yang, Shiguang Shan et al.
KOEnsAttack: Towards Efficient Data-Free Black-Box Adversarial Attacks via Knowledge-Orthogonalized Substitute Ensembles
Chaoyong Yang, Jia-Li Yin, Bin Chen et al.
Erasing More Than Intended? How Concept Erasure Degrades the Generation of Non-Target Concepts
Ibtihel Amara, Ahmed Imtiaz Humayun, Ivana Kajic et al.
ReFlex: Text-Guided Editing of Real Images in Rectified Flow via Mid-Step Feature Extraction and Attention Adaptation
Jimyeong Kim, Jungwon Park, Yeji Song et al.
MDD: A Dataset for Text-and-Music Conditioned Duet Dance Generation
Prerit Gupta, Jason Alexander Fotso-Puepi, Zhengyuan Li et al.
Removing Cost Volumes from Optical Flow Estimators
Simon Kiefhaber, Stefan Roth, Simone Schaub-Meyer
PASG: A Closed-Loop Framework for Automated Geometric Primitive Extraction and Semantic Anchoring in Robotic Manipulation
Zhihao ZHU, Yifan Zheng, Siyu Pan et al.
PanSt3R: Multi-view Consistent Panoptic Segmentation
Lojze Zust, Yohann Cabon, Juliette Marrie et al.
GARF: Learning Generalizable 3D Reassembly for Real-World Fractures
Sihang Li, Zeyu Jiang, Grace Chen et al.
Imbalance in Balance: Online Concept Balancing in Generation Models
Yukai Shi, Jiarong Ou, Rui Chen et al.
Progressive Distribution Bridging: Unsupervised Adaptation for Large-scale Pre-trained Models via Adaptive Auxiliary Data
Weinan He, Yixin Zhang, Zilei Wang
RALoc: Enhancing Outdoor LiDAR Localization via Rotation Awareness
Yuyang Yang, Wen Li, Sheng Ao et al.
Derm1M: A Million-scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology
Siyuan Yan, Ming Hu, Yiwen Jiang et al.
SummDiff: Generative Modeling of Video Summarization with Diffusion
Kwanseok Kim, Jaehoon Hahm, Sumin Kim et al.
MagicID: Hybrid Preference Optimization for ID-Consistent and Dynamic-Preserved Video Customization
Hengjia Li, Lifan Jiang, Xi Xiao et al.
Towards Performance Consistency in Multi-Level Model Collaboration
Qi Li, Runpeng Yu, Xinchao Wang
Visual Interestingness Decoded: How GPT-4o Mirrors Human Interests
Fitim Abdullahu, Helmut Grabner
FuXi-RTM: A Physics-Guided Prediction Framework with Radiative Transfer Modeling
qiusheng huang, Xiaohui Zhong, Xu Fan et al.
D-Attn: Decomposed Attention for Large Vision-and-Language Model
Chia-Wen Kuo, Sijie Zhu, Fan Chen et al.
Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens
Dongwon Kim, Ju He, Qihang Yu et al.
Understanding Personal Concept in Open-Vocabulary Semantic Segmentation
Sunghyun Park, Jungsoo Lee, Shubhankar Borse et al.
Discovering Divergent Representations between Text-to-Image Models
Lisa Dunlap, Trevor Darrell, Joseph Gonzalez et al.
VRM: Knowledge Distillation via Virtual Relation Matching
Weijia Zhang, Fei Xie, Weidong Cai et al.
CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving
Rui Song, Chenwei Liang, Yan Xia et al.
Clink! Chop! Thud! - Learning Object Sounds from Real-World Interactions
Mengyu Yang, Yiming Chen, Haozheng Pei et al.
UnZipLoRA: Separating Content and Style from a Single Image
Chang Liu, Viraj Shah, Aiyu Cui et al.
Learning Visual Proxy for Compositional Zero-Shot Learning
Shiyu Zhang, Cheng Yan, Yang Liu et al.
ImageGem: In-the-wild Generative Image Interaction Dataset for Generative Model Personalization
Yuanhe Guo, Linxi Xie, Zhuoran Chen et al.
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory
Chenhao Zheng, Jieyu Zhang, Mohammadreza Salehi et al.
LGA-Net: Learning Local and Global Affinities for Sparse Scribble based Image Colorization
Hongjin Lyu, Bo Li, Paul Rosin et al.
Gaze-Language Alignment for Zero-Shot Prediction of Visual Search Targets from Human Gaze Scanpaths
Sounak Mondal, Naveen Sendhilnathan, Ting Zhang et al.
SAM Encoder Breach by Adversarial Simplicial Complex Triggers Downstream Model Failures
Yi Qin, Rui Wang, Tao Huang et al.
Semi-supervised Concept Bottleneck Models
Lijie Hu, Tianhao Huang, Huanyi Xie et al.
DuET: Dual Incremental Object Detection via Exemplar-Free Task Arithmetic
Munish Monga, Vishal Chudasama, Pankaj Wasnik et al.
WINS: Winograd Structured Pruning for Fast Winograd Convolution
Cheonjun Park, Hyunjae Oh, Mincheol Park et al.
Sparsity Outperforms Low-Rank Projections in Few-Shot Adaptation
Nairouz Mrabah, Nicolas Richet, Ismail Ayed et al.
ART: Adaptive Relation Tuning for Generalized Relation Prediction
Gopika Sudhakaran, Hikaru Shindo, Patrick Schramowski et al.
Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion
Aleksandar Jevtić, Christoph Reich, Felix Wimbauer et al.
DISTIL: Data-Free Inversion of Suspicious Trojan Inputs via Latent Diffusion
Hossein Mirzaei, Zeinab Taghavi, Sepehr Rezaee et al.
Factorized Learning for Temporally Grounded Video-Language Models
Wenzheng Zeng, Difei Gao, Mike Zheng Shou et al.
FedPall: Prototype-based Adversarial and Collaborative Learning for Federated Learning with Feature Drift
yong zhang, Feng Liang, Guanghu Yuan et al.
MissRAG: Addressing the Missing Modality Challenge in Multimodal Large Language Models
Vittorio Pipoli, Alessia Saporita, Federico Bolelli et al.
Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Zhiqi Ge, Juncheng Li, Xinglei Pang et al.
No Pose at All: Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views
Ranran Huang, Krystian Mikolajczyk
External Knowledge Injection for CLIP-Based Class-Incremental Learning
Da-Wei Zhou, Kai-Wen Li, Jingyi Ning et al.
Cooperative Pseudo Labeling for Unsupervised Federated Classification
Kuangpu Guo, Lijun Sheng, Yongcan Yu et al.
MemDistill: Distilling LiDAR Knowledge into Memory for Camera-Only 3D Object Detection
Donghyeon Kwon, Youngseok Yoon, Hyeongseok Son et al.
From Sharp to Blur: Unsupervised Domain Adaptation for 2D Human Pose Estimation Under Extreme Motion Blur Using Event Cameras
Youngho Kim, Hoonhee Cho, Kuk-Jin Yoon
Augmenting Moment Retrieval: Zero-Dependency Two-Stage Learning
Zhengxuan Wei, Jiajin Tang, Sibei Yang
PAN-Crafter: Learning Modality-Consistent Alignment for PAN-Sharpening
Jeonghyeok Do, Sungpyo Kim, Geunhyuk Youk et al.
Continual Adaptation: Environment-Conditional Parameter Generation for Object Detection in Dynamic Scenarios
Deng Li, Aming WU, Yang Li et al.
Differentially Private Fine-Tuning of Diffusion Models
Yu-Lin Tsai, Yizhe Li, Zekai Chen et al.
IRGPT: Understanding Real-world Infrared Image with Bi-cross-modal Curriculum on Large-scale Benchmark
Zhe Cao, Jin Zhang, Ruiheng Zhang
One Object, Multiple Lies: A Benchmark for Cross-task Adversarial Attack on Unified Vision-Language Models
Jiale Zhao, XINYANG JIANG, Junyao Gao et al.
Unknown Text Learning for CLIP-based Few-Shot Open-set Recognition
Rui Ma, Qilong Wang, Bing Cao et al.
Reducing Unimodal Bias in Multi-Modal Semantic Segmentation with Multi-Scale Functional Entropy Regularization
Xu Zheng, Yuanhuiyi Lyu, Lutao Jiang et al.
Hyper-Depth: Hypergraph-based Multi-Scale Representation Fusion for Monocular Depth Estimation
Lin Bie, Siqi Li, Yifan Feng et al.
Learning Separable Fine-Grained Representation via Dendrogram Construction from Coarse Labels for Fine-grained Visual Recognition
Guanghui Shi, Xuefeng liang, Wenjie Li et al.
PRVQL: Progressive Knowledge-guided Refinement for Robust Egocentric Visual Query Localization
Bing Fan, Yunhe Feng, Yapeng Tian et al.
Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models
Hyundong Jin, Hyung Jin Chang, Eunwoo Kim
PRO-VPT: Distribution-Adaptive Visual Prompt Tuning via Prompt Relocation
Chikai Shang, Mengke Li, Yiqun Zhang et al.
Language-Driven Multi-Label Zero-Shot Learning with Semantic Granularity
Shouwen Wang, Qian Wan, Junbin Gao et al.
Generalized Deep Multi-view Clustering via Causal Learning with Partially Aligned Cross-view Correspondence
Xihong Yang, Siwei Wang, Jiaqi Jin et al.
Less is More: Empowering GUI Agent with Context-Aware Simplification
Gongwei Chen, Xurui Zhou, Rui Shao et al.
IM360: Large-scale Indoor Mapping with 360 Cameras
Dongki Jung, Jaehoon Choi, Yonghan Lee et al.
EventUPS: Uncalibrated Photometric Stereo Using an Event Camera
Jinxiu Liang, Bohan Yu, Siqi Yang et al.
Rethinking the Upsampling Process in Light Field Super-Resolution with Spatial-Epipolar Implicit Image Function
Ruixuan Cong, Yu Wang, Mingyuan Zhao et al.
Harnessing Input-Adaptive Inference for Efficient VLN
Dongwoo Kang, Akhil Perincherry, Zachary Coalson et al.
When Lighting Deceives: Exposing Vision-Language Models' Illumination Vulnerability Through Illumination Transformation Attack
Hanqing Liu, Shouwei Ruan, Yao Huang et al.
PersonaCraft: Personalized and Controllable Full-Body Multi-Human Scene Generation Using Occlusion-Aware 3D-Conditioned Diffusion
Gwanghyun Kim, Suh Jeon Jeon, Seunggyu Lee et al.
SemTalk: Holistic Co-speech Motion Generation with Frame-level Semantic Emphasis
Xiangyue Zhang, Jianfang Li, Jiaxu Zhang et al.
Guiding Diffusion Models with Adaptive Negative Sampling Without External Resources
Alakh Desai, Nuno Vasconcelos
DiffDoctor: Diagnosing Image Diffusion Models Before Treating
Yiyang Wang, Xi Chen, Xiaogang Xu et al.
Training-Free Class Purification for Open-Vocabulary Semantic Segmentation
Qi Chen, Lingxiao Yang, Yun Chen et al.
MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval
Jaeseok Byun, Young Kyun Jang, Seokhyeon Jeong et al.
Adaptive Learning of High-Value Regions for Semi-Supervised Medical Image Segmentation
Tao Lei, Ziyao Yang, Xingwu wang et al.
Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning
Xinyao Liu, Diping Song
Keep Your Friends Close, and Your Enemies Farther: Distance-aware Voxel-wise Contrastive Learning for Semi-supervised Multi-organ Segmentation
Haochen Zhao, Jianwei Niu, Xuefeng Liu et al.
Integrating Biological Knowledge for Robust Microscopy Image Profiling on De Novo Cell Lines
Jiayuan Chen, Thai-Hoang Pham, Yuanlong Wang et al.
Spectral Sensitivity Estimation with an Uncalibrated Diffraction Grating
Lilika Makabe, Hiroaki Santo, Fumio Okura et al.
TransiT: Transient Transformer for Non-line-of-sight Videography
Ruiqian Li, Siyuan Shen, Suan Xia et al.
LA-MOTR: End-to-End Multi-Object Tracking by Learnable Association
Peng Wang, Yongcai Wang, Hualong Cao et al.
On the Complexity-Faithfulness Trade-off of Gradient-Based Explanations
Amir Mehrpanah, Matteo Gamba, Kevin Smith et al.
Learning to Inference Adaptively for Multimodal Large Language Models
Zhuoyan Xu, Khoi Nguyen, Preeti Mukherjee et al.
Hierarchical Divide-and-Conquer Grouping for Classification Adaptation of Pre-Trained Models
Ziqian Lu, Yunlong Yu, Qinyue Tong et al.
Lark: Low-Rank Updates After Knowledge Localization for Few-shot Class-Incremental Learning
Jinxin Shi, Jiabao Zhao, Yifan Yang et al.
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models
Kaichen Zhang, Yifei Shen, Bo Li et al.
p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
Jun Zhang, Desen Meng, Zhengming Zhang et al.
FedDifRC: Unlocking the Potential of Text-to-Image Diffusion Models in Heterogeneous Federated Learning
Huan Wang, Haoran Li, Huaming Chen et al.
Category-Specific Selective Feature Enhancement for Long-Tailed Multi-Label Image Classification
Ruiqi Du, Xu Tang, Xiangrong Zhang et al.
Registration beyond Points: General Affine Subspace Alignment via Geodesic Distance on Grassmann Manifold
Jaeho Shin, Hyeonjae Gil, Junwoo Jang et al.
Mind the Gap: Preserving and Compensating for the Modality Gap in CLIP-Based Continual Learning
Linlan Huang, Xusheng Cao, Haori Lu et al.
An Efficient Post-hoc Framework for Reducing Task Discrepancy of Text Encoders for Composed Image Retrieval
Jaeseok Byun, Seokhyeon Jeong, Wonjae Kim et al.
ZIUM: Zero-Shot Intent-Aware Adversarial Attack on Unlearned Models
Hyun Jun Yook, Ga San Jhun, Cho Hyun et al.
Find a Scapegoat: Poisoning Membership Inference Attack and Defense to Federated Learning
Wenjin Mo, Zhiyuan Li, Minghong Fang et al.
To Label or Not to Label: PALM – A Predictive Model for Evaluating Sample Efficiency in Active Learning Models
Julia Machnio, Mads Nielsen, Mostafa Mehdipour Ghazi
Uncalibrated Structure from Motion on a Sphere
Jonathan Ventura, Viktor Larsson, Fredrik Kahl
Personalized Federated Learning under Local Supervision
Qiqi Liu, Jiaqiang Li, Yuchen Liu et al.
Noise-Modeled Diffusion Models for Low-Light Spike Image Restoration
Ruonan Liu, Lin Zhu, Xijie Xiang et al.
Prototype-based Contrastive Learning with Stage-wise Progressive Augmentation for Self-Supervised Fine-Grained Learning
BaoFeng Tan, Xiu-Shen Wei, Lin Zhao
Radiant Foam: Real-Time Differentiable Ray Tracing
Shrisudhan Govindarajan, Daniel Rebain, Kwang Moo Yi et al.
COSTARR: Consolidated Open Set Technique with Attenuation for Robust Recognition
Ryan Rabinowitz, Steve Cruz, Walter Scheirer et al.
Information Density Principle for MLLM Benchmarks
Chunyi Li, Xiaozhe Li, Zicheng Zhang et al.
Perspective-Aware Teaching: Adapting Knowledge for Heterogeneous Distillation
Jhe-Hao Lin, Yi Yao, Chan-Feng Hsu et al.
Is Meta-Learning Out? Rethinking Unsupervised Few-Shot Classification with Limited Entropy
Yunchuan Guan, Yu Liu, Ke Zhou et al.
LMM-Det: Make Large Multimodal Models Excel in Object Detection
Jincheng Li, Chunyu Xie, Ji Ao et al.
Attention to the Burtiness in Visual Prompt Tuning!
Yuzhu Wang, Manni Duan, Shu Kong
Long-Tailed Classification with Multi-Granularity Semantics
Yuting Liu, Liu Yang, Yu Wang
ReTracker: Exploring Image Matching for Robust Online Any Point Tracking
Dongli Tan, Xingyi He, Sida Peng et al.
Multimodal Large Language Model-Guided ISP Hyperparameter Optimization with Dynamic Preference Learning
Xinyu Sun, Zhikun Zhao, congyan lang et al.
FEVER-OOD: Free Energy Vulnerability Elimination for Robust Out-of-Distribution Detection
Brian Isaac-Medina, Mauricio Che, Yona Falinie A. Gaus et al.
Adversarial Purification via Super-Resolution and Diffusion
Mincheol Park, Cheonjun Park, Seungseop Lim et al.
FedWSQ: Efficient Federated Learning with Weight Standardization and Distribution-Aware Non-Uniform Quantization
Seung-Wook Kim, Seongyeol Kim, Jiah Kim et al.
CMAD: Correlation-Aware and Modalities-Aware Distillation for Multimodal Sentiment Analysis with Missing Modalities
Yan Zhuang, Minhao Liu, Wei Bai et al.
SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models
Xianfu Cheng, Wei Zhang, Shiwei Zhang et al.
Revelio: Interpreting and leveraging semantic information in diffusion models
Dahye Kim, Xavier Thomas, Deepti Ghadiyaram
CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting
Siyu Jiao, Haoye Dong, Yuyang Yin et al.
ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges
Jiaxin Ai, Pengfei Zhou, xu Pan et al.
Failure Cases Are Better Learned But Boundary Says Sorry: Facilitating Smooth Perception Change for Accuracy-Robustness Trade-Off in Adversarial Training
Yanyun Wang, Li Liu
SplatTalk: 3D VQA with Gaussian Splatting
Anh Thai, Kyle Genova, Songyou Peng et al.
Improved Noise Schedule for Diffusion Training
Tiankai Hang, Shuyang Gu, Jianmin Bao et al.
Secure On-Device Video OOD Detection Without Backpropagation
Li Li, Peilin Cai, Yuxiao Zhou et al.
Learning Counterfactually Decoupled Attention for Open-World Model Attribution
Yu Zheng, Boyang Gong, Fanye Kong et al.
Latte: Collaborative Test-Time Adaptation of Vision-Language Models in Federated Learning
Wenxuan Bao, Ruxi Deng, Ruizhong Qiu et al.
Is Less More? Exploring Token Condensation as Training-free Test-time Adaptation
Zixin Wang, Dong Gong, Sen Wang et al.
Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness
Qifan Yu, Zhebei Shen, Zhongqi Yue et al.
Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations
Chongjie Si, Zhiyi Shi, Xuehui Wang et al.
Test-Time Prompt Tuning for Zero-Shot Depth Completion
Chanhwi Jeong, Inhwan Bae, Jin-Hwi Park et al.
One Encoder to Rule them All: Representation Learning for Model-free Visual Reinforcement Learning using Fourier Neural Operators
Parag Dutta, Mohd Ayyoob, Shalabh Bhatnagar et al.