Most Cited ICCV "adaptive feature fusion" Papers
2,701 papers found • Page 5 of 14
Conference
Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation
Jungeun Kim, Hyeongwoo Jeon, Jongseong Bae et al.
GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning
Kelin Yu, Sheng Zhang, Harshit Soora et al.
TF-TI2I: Training-Free Text-and-Image-to-Image Generation via Multi-Modal Implicit-Context Learning In Text-to-Image Models
Teng-Fang Hsiao, Bo-Kai Ruan, Yi-Lun Wu et al.
DuoLoRA : Cycle-consistent and Rank-disentangled Content-Style Personalization
Aniket Roy, Shubhankar Borse, Shreya Kadambi et al.
IDF: Iterative Dynamic Filtering Networks for Generalizable Image Denoising
Dongjin Kim, Jaekyun Ko, Muhammad Kashif Ali et al.
Learning Streaming Video Representation via Multitask Training
Yibin Yan, Jilan Xu, Shangzhe Di et al.
What You Have is What You Track: Adaptive and Robust Multimodal Tracking
Yuedong Tan, Jiawei Shao, Eduard Zamfir et al.
Articulate3D: Holistic Understanding of 3D Scenes as Universal Scene Description
Anna-Maria Halacheva, Yang Miao, Jan-Nico Zaech et al.
EAMamba: Efficient All-Around Vision State Space Model for Image Restoration
Yu-Cheng Lin, Yu-Syuan Xu, Hao-Wei Chen et al.
D3: Training-Free AI-Generated Video Detection Using Second-Order Features
Chende Zheng, Ruiqi suo, Chenhao Lin et al.
GS-ID: Illumination Decomposition on Gaussian Splatting via Adaptive Light Aggregation and Diffusion-Guided Material Priors
Kang DU, Zhihao Liang, Yulin Shen et al.
CutS3D: Cutting Semantics in 3D for 2D Unsupervised Instance Segmentation
Leon Sick, Dominik Engel, Sebastian Hartwig et al.
Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation
Tiange Xiang, Kai Li, Chengjiang Long et al.
GAP: Gaussianize Any Point Clouds with Text Guidance
Weiqi Zhang, Junsheng Zhou, Haotian Geng et al.
Towards Higher Effective Rank in Parameter-Efficient Fine-tuning using Khatri-Rao Product
Paul Albert, Frederic Zhang, Hemanth Saratchandran et al.
OuroMamba: A Data-Free Quantization Framework for Vision Mamba
Akshat Ramachandran, Mingyu Lee, Huan Xu et al.
Acknowledging Focus Ambiguity in Visual Questions
Chongyan Chen, Yu-Yun Tseng, Zhuoheng Li et al.
ARGUS: Hallucination and Omission Evaluation in Video-LLMs
Ruchit Rawal, Reza Shirkavand, Heng Huang et al.
OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering
Shiyong Liu, Xiao Tang, Zhihao Li et al.
GECKO: Gigapixel Vision-Concept Contrastive Pretraining in Histopathology
Saarthak Kapse, Pushpak Pati, Srikar Yellapragada et al.
INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance
Chenwei Lin, Hanjia Lyu, Xian Xu et al.
Test-time Adaptation for Foundation Medical Segmentation Model Without Parametric Updates
Kecheng Chen, Xinyu Luo, Tiexin Qin et al.
VGGSounder: Audio-Visual Evaluations for Foundation Models
Daniil Zverev, Thaddäus Wiedemer, Ameya Prabhu et al.
TimeFormer: Capturing Temporal Relationships of Deformable 3D Gaussians for Robust Reconstruction
Dadong Jiang, Zhi Hou, Zhihui Ke et al.
Learning to Generalize without Bias for Open-Vocabulary Action Recognition
Yating Yu, Congqi Cao, Yifan Zhang et al.
Sculpting Memory: Multi-Concept Forgetting in Diffusion Models via Dynamic Mask and Concept-Aware Optimization
Li, Yang Xiao, Jie Ji et al.
Self-Calibrated Variance-Stabilizing Transformations for Real-World Image Denoising
Sébastien Herbreteau, Michael Unser
MP-HSIR: A Multi-Prompt Framework for Universal Hyperspectral Image Restoration
Zhehui Wu, Yong Chen, Naoto Yokoya et al.
DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding
Wenwen Yu, Zhibo Yang, Yuliang Liu et al.
Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models
Xudong Li, Zihao Huang, Yan Zhang et al.
Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering
shanlin sun, Yifan Wang, Hanwen Zhang et al.
UniEgoMotion: A Unified Model for Egocentric Motion Reconstruction, Forecasting, and Generation
Chaitanya Patel, Hiroki Nakamura, Yuta Kyuragi et al.
SAM4D: Segment Anything in Camera and LiDAR Streams
Jianyun Xu, Song Wang, Ziqian Ni et al.
MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models
Young-Jun Lee, Byung-Kwan Lee, Jianshu Zhang et al.
SparseRecon: Neural Implicit Surface Reconstruction from Sparse Views with Feature and Depth Consistencies
Liang Han, Xu Zhang, Haichuan Song et al.
BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation
Ruotong Wang, Mingli Zhu, Jiarong Ou et al.
A Token-level Text Image Foundation Model for Document Understanding
Tongkun Guan, Zining Wang, Pei Fu et al.
Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models
Sangwon Baik, Hyeonwoo Kim, Hanbyul Joo
Self-supervised Learning of Hybrid Part-aware 3D Representations of 2D Gaussians and Superquadrics
Zhirui Gao, Renjiao Yi, Yuhang Huang et al.
Region-based Cluster Discrimination for Visual Representation Learning
Yin Xie, Kaicheng Yang, Xiang An et al.
SViM3D: Stable Video Material Diffusion for Single Image 3D Generation
Andreas Engelhardt, Mark Boss, Vikram Voleti et al.
Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues
Francesco Taioli, Edoardo Zorzi, Gianni Franchi et al.
Controllable 3D Outdoor Scene Generation via Scene Graphs
Yuheng Liu, Xinke Li, Yuning Zhang et al.
Multi-View 3D Point Tracking
Frano Rajič, Haofei Xu, Marko Mihajlovic et al.
Dynamic Multimodal Prototype Learning in Vision-Language Models
Xingyu Zhu, Shuo Wang, Beier Zhu et al.
CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image
Arindam Dutta, Meng Zheng, Zhongpai Gao et al.
Not all Views are Created Equal: Analyzing Viewpoint Instabilities in Vision Foundation Models
Mateusz Michalkiewicz, Xinyue Bai, Mahsa Baktashmotlagh et al.
SP2T: Sparse Proxy Attention for Dual-stream Point Transformer
Jiaxu Wan, Hong Zhang, Ziqi He et al.
MonoMobility: Zero-Shot 3D Mobility Analysis from Monocular Videos
Hongyi Zhou, Xiaogang Wang, Yulan Guo et al.
QuickSplat: Fast 3D Surface Reconstruction via Learned Gaussian Initialization
Yueh-Cheng Liu, Lukas Höllein, Matthias Nießner et al.
BridgeDepth: Bridging Monocular and Stereo Reasoning with Latent Alignment
Tongfan Guan, Jiaxin Guo, Chen Wang et al.
CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching
Zizhuo Li, Yifan Lu, Linfeng Tang et al.
UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis
Yuanrui Wang, Cong Han, Yafei Li et al.
BokehDiff: Neural Lens Blur with One-Step Diffusion
Chengxuan Zhu, Qingnan Fan, Qi Zhang et al.
SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation
Wenjia Wang, Liang Pan, Zhiyang Dou et al.
ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers
Qianhao Yuan, Qingyu Zhang, yanjiang liu et al.
Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing
Taihang Hu, Linxuan Li, Kai Wang et al.
LightSwitch: Multi-view Relighting with Material-guided Diffusion
Yehonathan Litman, Fernando De la Torre, Shubham Tulsiani
PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs
Teng Zhou, Xiaoyu Zhang, Yongchuan Tang
Bilateral Collaboration with Large Vision-Language Models for Open Vocabulary Human-Object Interaction Detection
Yupeng Hu, Changxing Ding, Chang Sun et al.
LayerD: Decomposing Raster Graphic Designs into Layers
Tomoyuki Suzuki, Kang-Jun Liu, Naoto Inoue et al.
PriorMotion: Generative Class-Agnostic Motion Prediction with Raster-Vector Motion Field Priors
Kangan Qian, Jinyu Miao, Xinyu Jiao et al.
ResGS: Residual Densification of 3D Gaussian for Efficient Detail Recovery
Yanzhe Lyu, Kai Cheng, Kang Xin et al.
SMoLoRA: Exploring and Defying Dual Catastrophic Forgetting in Continual Visual Instruction Tuning
Ziqi Wang, Chang Che, Qi Wang et al.
Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness
Qifan Yu, Zhebei Shen, Zhongqi Yue et al.
Inverse 3D Microscopy Rendering for Cell Shape Inference with Active Mesh
Sacha Ichbiah, Anshuman Sinha, Fabrice Delbary et al.
Balanced Image Stylization with Style Matching Score
Yuxin Jiang, Liming Jiang, Shuai Yang et al.
VOVTrack: Exploring the Potentiality in Raw Videos for Open-Vocabulary Multi-Object Tracking
Zekun Qian, Ruize Han, Junhui Hou et al.
3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection
Yung-Hsu Yang, Luigi Piccinelli, Mattia Segu et al.
Occupancy Learning with Spatiotemporal Memory
Ziyang Leng, Jiawei Yang, Wenlong Yi et al.
CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting
Siyu Jiao, Haoye Dong, Yuyang Yin et al.
From Image to Video: An Empirical Study of Diffusion Representations
Pedro Vélez, Luisa Polania Cabrera, Yi Yang et al.
TrustMark: Robust Watermarking and Watermark Removal for Arbitrary Resolution Images
Tu Bui, Shruti Agarwal, John Collomosse
X-Capture: An Open-Source Portable Device for Multi-Sensory Learning
Samuel Clarke, Suzannah Wistreich, Yanjie Ze et al.
Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion
Aleksandar Jevtić, Christoph Reich, Felix Wimbauer et al.
DiMPLe - Disentangled Multi-Modal Prompt Learning: Enhancing Out-Of-Distribution Alignment with Invariant and Spurious Feature Separation
Umaima Rahman, Mohammad Yaqub, Dwarikanath Mahapatra
LLaFEA: Frame-Event Complementary Fusion for Fine-Grained Spatiotemporal Understanding in LMMs
Hanyu Zhou, Gim Hee Lee
Jigsaw++: Imagining Complete Shape Priors for Object Reassembly
Jiaxin Lu, Gang Hua, Qixing Huang
Bring Your Rear Cameras for Egocentric 3D Human Pose Estimation
HIroyasu Akada, Jian Wang, Vladislav Golyanik et al.
GaRe: Relightable 3D Gaussian Splatting for Outdoor Scenes from Unconstrained Photo Collections
Haiyang Bai, Jiaqi Zhu, Songru Jiang et al.
A Differentiable Wave Optics Model for End-to-End Computational Imaging System Optimization
Chi-Jui Ho, Yash Belhe, Steve Rotenberg et al.
Predict-Optimize-Distill: A Self-Improving Cycle for 4D Object Understanding
Mingxuan Wu, Huang Huang, Justin Kerr et al.
GT-Loc: Unifying When and Where in Images through a Joint Embedding Space
David G. Shatwell, Ishan Rajendrakumar Dave, Swetha Sirnam et al.
Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection
Jinglun Li, Kaixun Jiang, Zhaoyu Chen et al.
On the Generalization of Representation Uncertainty in Earth Observation
Spyros Kondylatos, Nikolaos Ioannis Bountos, Dimitrios Michail et al.
On Large Multimodal Models as Open-World Image Classifiers
Alessandro Conti, Massimiliano Mancini, Enrico Fini et al.
Joint Diffusion Models in Continual Learning
Paweł Skierś, Kamil Deja
ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling
Jinhyung Park, Javier Romero, Shunsuke Saito et al.
Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features
Chancharik Mitra, Brandon Huang, Tianning Chai et al.
CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous Driving
Changxing Liu, Genjia Liu, Zijun Wang et al.
FiffDepth: Feed-forward Transformation of Diffusion-Based Generators for Detailed Depth Estimation
Yunpeng Bai, Qixing Huang
Pinco: Position-induced Consistent Adapter for Diffusion Transformer in Foreground-conditioned Inpainting
Guangben Lu, Yuzhen N/A, Zhimin Sun et al.
VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding
Minchao Jiang, Shunyu Jia, Jiaming Gu et al.
Online Language Splatting
Saimouli Katragadda, Cho-Ying Wu, Yuliang Guo et al.
Dynamic Reconstruction of Hand-Object Interaction with Distributed Force-aware Contact Representation
Zhenjun Yu, Wenqiang Xu, Pengfei Xie et al.
Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation
Junyu Xie, Tengda Han, Max Bain et al.
Resonance: Learning to Predict Social-Aware Pedestrian Trajectories as Co-Vibrations
Conghao Wong, Ziqian Zou, Beihao Xia
GS-Occ3D: Scaling Vision-only Occupancy Reconstruction with Gaussian Splatting
Baijun Ye, Minghui Qin, Saining Zhang et al.
RoboTron-Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction
Yufeng Zhong, Chengjian Feng, Feng yan et al.
Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle
Miroslav Purkrabek, Jiri Matas
Enhancing Image Restoration Transformer via Adaptive Translation Equivariance
JiaKui Hu, Zhengjian Yao, Lujia Jin et al.
SCAN: Bootstrapping Contrastive Pre-training for Data Efficiency
Yangyang Guo, Mohan Kankanhalli
Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning
Zihua Zhao, Feng Hong, Mengxi Chen et al.
EmbodiedSplat: Personalized Real-to-Sim-to-Real Navigation with Gaussian Splats from a Mobile Device
Gunjan Chhablani, Xiaomeng Ye, Muhammad Zubair Irshad et al.
Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations
Hai Huang, Yan Xia, Sashuai Zhou et al.
Grouped Speculative Decoding for Autoregressive Image Generation
Junhyuk So, Juncheol Shin, Hyunho Kook et al.
Task Vector Quantization for Memory-Efficient Model Merging
Youngeun Kim, Seunghwan Lee, Aecheon Jung et al.
AHCPTQ: Accurate and Hardware-Compatible Post-Training Quantization for Segment Anything Model
Wenlun Zhang, Yunshan Zhong, Shimpei Ando et al.
O-MaMa: Learning Object Mask Matching between Egocentric and Exocentric Views
Lorenzo Mur-Labadia, Maria Santos-Villafranca, Jesus Bermudez-cameo et al.
Scale Your Instructions: Enhance the Instruction-Following Fidelity of Unified Image Generation Model by Self-Adaptive Attention Scaling
Chao Zhou, Tianyi Wei, Nenghai Yu
Semantic Causality-Aware Vision-Based 3D Occupancy Prediction
Dubing Chen, Huan Zheng, Yucheng Zhou et al.
X2-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction
Weihao Yu, Yuanhao Cai, Ruyi Zha et al.
Dark-ISP: Enhancing RAW Image Processing for Low-Light Object Detection
Jiasheng Guo, Xin Gao, Yuxiang Yan et al.
Large-scale Pre-training for Grounded Video Caption Generation
Evangelos Kazakos, Cordelia Schmid, Josef Sivic
Align Your Rhythm: Generating Highly Aligned Dance Poses with Gating-Enhanced Rhythm-Aware Feature Representation
Congyi Fan, Jian Guan, Xuanjia Zhao et al.
4D Gaussian Splatting SLAM
Yanyan Li, Youxu Fang, Zunjie Zhu et al.
Weakly Supervised Visible-Infrared Person Re-Identification via Heterogeneous Expert Collaborative Consistency Learning
Yafei Zhang, Lingqi Kong, Huafeng Li et al.
Integrating Visual Interpretation and Linguistic Reasoning for Geometric Problem Solving
Zixian Guo, Ming Liu, Qilong Wang et al.
PHATNet: A Physics-guided Haze Transfer Network for Domain-adaptive Real-world Image Dehazing
Fu-Jen Tsai, Yan-Tsung Peng, Yen-Yu Lin et al.
TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models
Ziyang Luo, Nian Liu, Xuguang Yang et al.
Boosting Adversarial Transferability via Residual Perturbation Attack
Jinjia Peng, Zeze Tao, Huibing Wang et al.
Breaking the Encoder Barrier for Seamless Video-Language Understanding
Handong Li, Yiyuan Zhang, Longteng Guo et al.
SEAL: Semantic Aware Image Watermarking
Kasra Arabi, R. Teal Witter, Chinmay Hegde et al.
TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis
Tri Ton, Ji Woo Hong, Chang Yoo
PlugMark: A Plug-in Zero-Watermarking Framework for Diffusion Models
Pengzhen Chen, Yanwei Liu, Xiaoyan Gu et al.
SHeaP: Self-supervised Head Geometry Predictor Learned via 2D Gaussians
Liam Schoneveld, Zhe Chen, Davide Davoli et al.
Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow
Ruyang Liu, Shangkun Sun, Haoran Tang et al.
DMesh++: An Efficient Differentiable Mesh for Complex Shapes
Sanghyun Son, Matheus Gadelha, Yang Zhou et al.
SemGes: Semantics-aware Co-Speech Gesture Generation using Semantic Coherence and Relevance Learning
Lanmiao Liu, Esam Ghaleb, asli ozyurek et al.
Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator
Ronglai Zuo, Rolandos Alexandros Potamias, Evangelos Ververas et al.
Amodal Depth Anything: Amodal Depth Estimation in the Wild
Zhenyu Li, Mykola Lavreniuk, Jian Shi et al.
Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints
Guanjie Chen, Xinyu Zhao, Yucheng Zhou et al.
AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion
Yangyi Huang, Ye Yuan, Xueting Li et al.
MVGBench: a Comprehensive Benchmark for Multi-view Generation Models
Xianghui Xie, Jan Lenssen, Gerard Pons-Moll
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
Anurag Bagchi, Zhipeng Bao, Yu-Xiong Wang et al.
StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion
Ziyu Guo, Young-Yoon Lee, Joseph Liu et al.
Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning
Zeyu Xi, Haoying Sun, Yaofei Wu et al.
UniRes: Universal Image Restoration for Complex Degradations
Mo Zhou, Keren Ye, Mauricio Delbracio et al.
PVChat: Personalized Video Chat with One-Shot Learning
YUFEI SHI, Weilong Yan, Gang Xu et al.
Efficient Unsupervised Shortcut Learning Detection and Mitigation in Transformers
Lukas Kuhn, sari sadiya, Jörg Schlötterer et al.
GenM3: Generative Pretrained Multi-path Motion Model for Text Conditional Human Motion Generation
Junyu Shi, Lijiang LIU, Yong Sun et al.
From Panels to Prose: Generating Literary Narratives from Comics
Ragav Sachdeva, Andrew Zisserman
VAFlow: Video-to-Audio Generation with Cross-Modality Flow Matching
Xihua Wang, Xin Cheng, Yuyue Wang et al.
Self-Reinforcing Prototype Evolution with Dual-Knowledge Cooperation for Semi-Supervised Lifelong Person Re-Identification
Kunlun Xu, Fan Zhuo, Jiangmeng Li et al.
GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation
Wentao Hu, Shunkai Li, Ziqiao Peng et al.
Visual Modality Prompt for Adapting Vision-Language Object Detectors
Heitor Rapela Medeiros, Atif Belal, Srikanth Muralidharan et al.
Memory-Efficient 4-bit Preconditioned Stochastic Optimization
Jingyang Li, Kuangyu Ding, Kim-chuan Toh et al.
MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation
Fu Rong, Meng Lan, Qian Zhang et al.
CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models
Quang-Binh Nguyen, Minh Luu, Quang Nguyen et al.
Discretized Gaussian Representation for Tomographic Reconstruction
Shaokai Wu, Yuxiang Lu, Yapan Guo et al.
FG-OrIU: Towards Better Forgetting via Feature-Gradient Orthogonality for Incremental Unlearning
qian feng, Jiahang Tu, Mintong Kang et al.
Cross-Subject Mind Decoding from Inaccurate Representations
Yangyang Xu, Bangzhen Liu, Wenqi Shao et al.
Exploiting Diffusion Prior for Task-driven Image Restoration
Jaeha Kim, Junghun Oh, Kyoung Mu Lee
Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image
Jerred Chen, Ronald Clark
Progressive Test Time Energy Adaptation for Medical Image Segmentation
Xiaoran Zhang, Byung-Woo Hong, Hyoungseob Park et al.
Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval
Dohwan Ko, Ji Soo Lee, Minhyuk Choi et al.
Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting
Jiaxin Huang, Sheng Miao, Bangbang Yang et al.
Video Individual Counting for Moving Drones
Yaowu Fan, Jia Wan, Tao Han et al.
BATCLIP: Bimodal Online Test-Time Adaptation for CLIP
Sarthak Kumar Maharana, Baoming Zhang, Leonid Karlinsky et al.
Generalizable Object Re-Identification via Visual In-Context Prompting
Zhizhong Huang, Xiaoming Liu
Who is a Better Talker: Subjective and Objective Quality Assessment for AI-Generated Talking Heads
Yingjie Zhou, Jiezhang Cao, Zicheng Zhang et al.
Forgetting Through Transforming: Enabling Federated Unlearning via Class-Aware Representation Transformation
Qi Guo, Zhen Tian, Minghao Yao et al.
OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography
Li Caoshuo, Zengmao Ding, Xiaobin Hu et al.
NeRF Is a Valuable Assistant for 3D Gaussian Splatting
Shuangkang Fang, I-Chao Shen, Takeo Igarashi et al.
Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval
WonJun Moon, Cheol-Ho Cho, Woojin Jun et al.
VideoAds for Fast-Paced Video Understanding
Zheyuan Zhang, Wanying Dou, Linkai Peng et al.
SC-Captioner: Improving Image Captioning with Self-Correction by Reinforcement Learning
Lin Zhang, Xianfang Zeng, Kangcong Li et al.
M2SFormer: Multi-Spectral and Multi-Scale Attention with Edge-Aware Difficulty Guidance for Image Forgery Localization
Ju-Hyeon Nam, Dong-Hyun Moon, Sang-Chul Lee
DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion
Maksim Siniukov, Di Chang, Minh Tran et al.
ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail
Chandan Yeshwanth, David Rozenberszki, Angela Dai
Robust and Efficient 3D Gaussian Splatting for Urban Scene Reconstruction
Zhensheng Yuan, Haozhi Huang, Zhen Xiong et al.
Towards a Universal 3D Medical Multi-modality Generalization via Learning Personalized Invariant Representation
Zhaorui Tan, Xi Yang, Tan Pan et al.
Heavy Labels Out! Dataset Distillation with Label Space Lightening
Ruonan Yu, Songhua Liu, Zigeng Chen et al.
Dataset Distillation via Vision-Language Category Prototype
YAWEN ZOU, Guang Li, Duo Su et al.
ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering
Kaisi Guan, Zhengfeng Lai, Yuchong Sun et al.
Hallucinatory Image Tokens: A Training-free EAZY Approach to Detecting and Mitigating Object Hallucinations in LVLMs
Liwei Che, Qingze T Liu, Jing Jia et al.
Diving into the Fusion of Monocular Priors for Generalized Stereo Matching
Chengtang Yao, Lidong Yu, Zhidan Liu et al.
Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning
Xinyao Liu, Diping Song
Stereo Any Video: Temporally Consistent Stereo Matching
Junpeng Jing, Weixun Luo, Ye Mao et al.
VAGUE: Visual Contexts Clarify Ambiguous Expressions
Heejeong Nam, Jinwoo Ahn, Keummin Ka et al.
Taming Flow Matching with Unbalanced Optimal Transport into Fast Pansharpening
Zihan Cao, Yu Zhong, Liang-Jian Deng
Disentangled Clothed Avatar Generation with Layered Representation
Weitian Zhang, Yichao Yan, Sijing Wu et al.
Extending Foundational Monocular Depth Estimators to Fisheye Cameras with Calibration Tokens
Suchisrit Gangopadhyay, Jung Hee Kim, Xien Chen et al.
HairCUP: Hair Compositional Universal Prior for 3D Gaussian Avatars
Byungjun Kim, Shunsuke Saito, Giljoo Nam et al.
Hierarchical-aware Orthogonal Disentanglement Framework for Fine-grained Skeleton-based Action Recognition
Haochen Chang, Pengfei Ren, Haoyang Zhang et al.
Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection
Taehoon Kim, Jongwook Choi, Yonghyun Jeong et al.
MOVE: Motion-Guided Few-Shot Video Object Segmentation
Kaining Ying, Hengrui Hu, Henghui Ding
DiffDoctor: Diagnosing Image Diffusion Models Before Treating
Yiyang Wang, Xi Chen, Xiaogang Xu et al.
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
Guoyizhe Wei, Rama Chellappa
Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA
Zhixuan Li, Hyunse Yoon, Sanghoon Lee et al.
AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving
Ruifei Zhang, Junlin Xie, Wei Zhang et al.
Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models
Zerui Tao, Yuhta Takida, Naoki Murata et al.
F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration
Lu Liu, Huiyu Duan, Qiang Hu et al.
Joint Self-Supervised Video Alignment and Action Segmentation
Ali Shah Ali, Syed Ahmed Mahmood, Mubin Saeed et al.
Revisiting Pool-based Prompt Learning for Few-shot Class-incremental Learning
Yongwei Jiang, Yixiong Zou, Yuhua Li et al.
Generate, Refine, and Encode: Leveraging Synthesized Novel Samples for On-the-Fly Fine-Grained Category Discovery
Xiao Liu, Nan Pu, Haiyang Zheng et al.
EA-Vit: Efficient Adaptation for Elastic Vision Transformer
Chen Zhu, Wangbo Zhao, Huiwen Zhang et al.
LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion
Fangfu Liu, Hao Li, Jiawei Chi et al.
DAViD: Data-efficient and Accurate Vision Models from Synthetic Data
Fatemeh Saleh, Sadegh Aliakbarian, Charlie Hewitt et al.
A Visual Leap in CLIP Compositionality Reasoning through Generation of Counterfactual Sets
Zexi Jia, Chuanwei Huang, Yeshuang Zhu et al.