Most Cited ICCV "narrative coherence" Papers
2,701 papers found • Page 6 of 14
Conference
Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions
Tommaso Galliena, Tommaso Apicella, Stefano Rosa et al.
FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation
Yasser Benigmim, Mohammad Fahes, Tuan-Hung Vu et al.
AIComposer: Any Style and Content Image Composition via Feature Integration
Haowen Li, Zhenfeng Fan, Zhang Wen et al.
Text Embedding Knows How to Quantize Text-Guided Diffusion Models
Hongjae Lee, Myungjun Son, Dongjea Kang et al.
Referring Expression Comprehension for Small Objects
Kanoko Goto, Takumi Hirose, Mahiro Ukai et al.
PersPose: 3D Human Pose Estimation with Perspective Encoding and Perspective Rotation
Xiaoyang Hao, Han Li
Achieving More with Less: Additive Prompt Tuning for Rehearsal-Free Class-Incremental Learning
Haoran Chen, Ping Wang, Zihan Zhou et al.
Multidimensional Byte Pair Encoding: Shortened Sequences for Improved Visual Data Generation
Tim Elsner, Paula Usinger, Julius Nehring-Wirxel et al.
Describe, Adapt and Combine: Empowering CLIP Encoders for Open-set 3D Object Retrieval
Zhichuan Wang, Yang Zhou, Zhe Liu et al.
StrandHead: Text to Hair-Disentangled 3D Head Avatars Using Human-Centric Priors
Xiaokun Sun, Zeyu Cai, Ying Tai et al.
FakeRadar: Probing Forgery Outliers to Detect Unknown Deepfake Videos
Zhaolun Li, Jichang Li, Yinqi Cai et al.
Devil is in the Uniformity: Exploring Diverse Learners within Transformer for Image Restoration
Shihao Zhou, Dayu Li, Jinshan Pan et al.
Diffusion-based 3D Hand Motion Recovery with Intuitive Physics
Yufei Zhang, Zijun Cui, Jeffrey Kephart et al.
VisHall3D: Monocular Semantic Scene Completion from Reconstructing the Visible Regions to Hallucinating the Invisible Regions
Haoang Lu, Yuanqi Su, Xiaoning Zhang et al.
M-Net: MRI Brain Tumor Sequential Segmentation Network via Mesh-Cast
Jiacheng Lu, Hui Ding, Shiyu Zhang et al.
Activation Subspaces for Out-of-Distribution Detection
Barış Zöngür, Robin Hesse, Stefan Roth
CULTURE3D: A Large-Scale and Diverse Dataset of Cultural Landmarks and Terrains for Gaussian-Based Scene Rendering
xinyi zheng, Steve Zhang, Weizhe Lin et al.
Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control
Seongmin Park, Hyungmin Kim, Sangwoo kim et al.
ASCENT: Annotation-free Self-supervised Contrastive Embeddings for 3D Neuron Tracking in Fluorescence Microscopy
Haejun Han, Hang Lu
Augmented and Softened Matching for Unsupervised Visible-Infrared Person Re-Identification
Zhiqi Pang, Chunyu Wang, Lingling Zhao et al.
SPA: Efficient User-Preference Alignment against Uncertainty in Medical Image Segmentation
Jiayuan Zhu, Junde Wu, Cheng Ouyang et al.
PINO: Person-Interaction Noise Optimization for Long-Duration and Customizable Motion Generation of Arbitrary-Sized Groups
Sakuya Ota, Qing Yu, Kent Fujiwara et al.
TurboVSR: Fantastic Video Upscalers and Where to Find Them
Zhongdao Wang, Guodongfang Zhao, Jingjing Ren et al.
Towards Adversarial Robustness via Debiased High-Confidence Logit Alignment
Kejia Zhang, Juanjuan Weng, Zhiming Luo et al.
Serialization based Point Cloud Oversegmentation
chenghui Lu, Dilong Li, Jianlong Kwan et al.
Balancing Conservatism and Aggressiveness: Prototype-Affinity Hybrid Network for Few-Shot Segmentation
Tianyu Zou, Shengwu Xiong, Ruilin Yao et al.
Beyond Blur: A Fluid Perspective on Generative Diffusion Models
Grzegorz Gruszczynski, Jakub Meixner, Michał Włodarczyk et al.
LoRAverse: A Submodular Framework to Retrieve Diverse Adapters for Diffusion Models
Mert Sonmezer, Matthew Zheng, Pinar Yanardag
DIMO: Diverse 3D Motion Generation for Arbitrary Objects
Linzhan Mou, Jiahui Lei, Chen Wang et al.
Global-Aware Monocular Semantic Scene Completion with State Space Models
Shijie Li, Zhongyao Cheng, Rong Li et al.
VAFlow: Video-to-Audio Generation with Cross-Modality Flow Matching
Xihua Wang, Xin Cheng, Yuyue Wang et al.
ContraGS: Codebook-Condensed and Trainable Gaussian Splatting for Fast, Memory-Efficient Reconstruction
Sankeerth Durvasula, Sharanshangar Muhunthan, Zain Moustafa et al.
After the Party: Navigating the Mapping From Color to Ambient Lighting
Florin-Alexandru Vasluianu, Tim Seizinger, Zongwei Wu et al.
Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning
Zedong Wang, Siyuan Li, Dan Xu
VSRM: A Robust Mamba-Based Framework for Video Super-Resolution
Phu Tran Dinh, Hung Dao, Daeyoung Kim
Revisiting Pool-based Prompt Learning for Few-shot Class-incremental Learning
Yongwei Jiang, Yixiong Zou, Yuhua Li et al.
Tune-Your-Style: Intensity-tunable 3D Style Transfer with Gaussian Splatting
Yian Zhao, rushi ye, Ruochong Zheng et al.
ULTHO: Ultra-Lightweight yet Efficient Hyperparameter Optimization in Deep Reinforcement Learning
Mingqi Yuan, Bo Li, Xin Jin et al.
Bi-Level Optimization for Self-Supervised AI-Generated Face Detection
Mian Zou, Nan Zhong, Baosheng Yu et al.
Context Guided Transformer Entropy Modeling for Video Compression
Junlong Tong, Wei Zhang, Yaohui Jin et al.
Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing
Joowon Kim, Ziseok Lee, Donghyeon Cho et al.
Causality-guided Prompt Learning for Vision-language Models via Visual Granulation
Mengyu Gao, Qiulei Dong
STD-GS: Exploring Frame-Event Interaction for SpatioTemporal-Disentangled Gaussian Splatting to Reconstruct High-Dynamic Scene
Hanyu Zhou, Haonan Wang, Haoyue Liu et al.
Rethinking Detecting Salient and Camouflaged Objects in Unconstrained Scenes
Zhangjun Zhou, Yiping Li, Chunlin Zhong et al.
Taming the Untamed: Graph-Based Knowledge Retrieval and Reasoning for MLLMs to Conquer the Unknown
Bowen Wang, Zhouqiang Jiang, Yasuaki Susumu et al.
Fusion Meets Diverse Conditions: A High-diversity Benchmark and Baseline for UAV-based Multimodal Object Detection with Condition Cues
Chen Chen, Kangcheng Bin, Hu Ting et al.
Dual Reciprocal Learning of Language-based Human Motion Understanding and Generation
CHEN LIANG, Zhicheng Shi, Wenguan Wang et al.
Learning Implicit Features with Flow-Infused Transformations for Realistic Virtual Try-On
Delong Zhang, Qiwei Huang, Yang Sun et al.
DAMap: Distance-aware MapNet for High Quality HD Map Construction
JINPENG DONG, Chen Li, Yutong Lin et al.
M-SpecGene: Generalized Foundation Model for RGBT Multispectral Vision
Kailai Zhou, Fuqiang Yang, Shixian Wang et al.
PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes
Ahmed Abdelreheem, Filippo Aleotti, Jamie Watson et al.
SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference
Samir Khaki, Junxian Guo, Jiaming Tang et al.
CompleteMe: Reference-based Human Image Completion
Yu-Ju Tsai, Brian Price, Qing Liu et al.
PLMP - Point-Line Minimal Problems for Projective SfM
Kim Kiehn, Albin Ahlbäck, Kathlén Kohn
PCR-GS: COLMAP-Free 3D Gaussian Splatting via Pose Co-Regularizations
YU WEI, Jiahui Zhang, Xiaoqin Zhang et al.
Causal Disentanglement and Cross-Modal Alignment for Enhanced Few-Shot Learning
Tianjiao Jiang, Zhen Zhang, Yuhang Liu et al.
UMDATrack: Unified Multi-Domain Adaptive Tracking Under Adverse Weather Conditions
Siyuan Yao, Rui Zhu, Ziqi Wang et al.
Fast Globally Optimal and Geometrically Consistent 3D Shape Matching
Paul Roetzer, Florian Bernard
FICGen: Frequency-Inspired Contextual Disentanglement for Layout-driven Degraded Image Generation
Wenzhuang Wang, Yifan Zhao, Mingcan Ma et al.
From Holistic to Localized: Local Enhanced Adapters for Efficient Visual Instruction Fine-Tuning
Pengkun Jiao, Bin Zhu, Jingjing Chen et al.
Dataset Ownership Verification for Pre-trained Masked Models
Yuechen Xie, Jie Song, Yicheng Shan et al.
DAP-MAE: Domain-Adaptive Point Cloud Masked Autoencoder for Effective Cross-Domain Learning
Ziqi Gao, Qiufu Li, Linlin Shen
ShortFT: Diffusion Model Alignment via Shortcut-based Fine-Tuning
Xiefan Guo, Miaomiao Cui, Liefeng Bo et al.
Balanced Sharpness-Aware Minimization for Imbalanced Regression
Yahao Liu, Qin Wang, Lixin Duan et al.
SpecGuard: Spectral Projection-based Advanced Invisible Watermarking
Inzamamul Alam, Md Islam, Simon Woo et al.
AnimalClue: Recognizing Animals by their Traces
Risa Shinoda, Nakamasa Inoue, Iro Laina et al.
Beyond Simple Edits: Composed Video Retrieval with Dense Modifications
Omkar Thawakar, Dmitry Demidov, Ritesh Thawkar et al.
Robust Unfolding Network for HDR Imaging with Modulo Cameras
Zhile Chen, Hui Ji
You Share Beliefs, I Adapt: Progressive Heterogeneous Collaborative Perception
hao si, Ehsan Javanmardi, Manabu Tsukada
VITAL: More Understandable Feature Visualization through Distribution Alignment and Relevant Information Flow
Ada Görgün, Bernt Schiele, Jonas Fischer
ConformalSAM: Unlocking the Potential of Foundational Segmentation Models in Semi-Supervised Semantic Segmentation with Conformal Prediction
Danhui Chen, Ziquan Liu, Chuxi Yang et al.
HypDAE: Hyperbolic Diffusion Autoencoders for Hierarchical Few-shot Image Generation
Lingxiao Li, Kaixuan Fan, Boqing Gong et al.
Progressive Artwork Outpainting via Latent Diffusion Models
Dae-Young Song, Jung-Jae Yu, Donghyeon Cho
MosaicDiff: Training-free Structural Pruning for Diffusion Model Acceleration Reflecting Pretraining Dynamics
Bowei Guo, Shengkun Tang, Cong Zeng et al.
GT-Mean Loss: A Simple Yet Effective Solution for Brightness Mismatch in Low-Light Image Enhancement
Jingxi Liao, Shijie Hao, Richang Hong et al.
TITAN-Guide: Taming Inference-Time Alignment for Guided Text-to-Video Diffusion Models
Christian Simon, Masato Ishii, Akio Hayakawa et al.
Video Color Grading via Look-Up Table Generation
Seunghyun Shin, Dongmin Shin, Jisu Shin et al.
S$^3$E: Self-Supervised State Estimation for Radar-Inertial System
Shengpeng Wang, Yulong Xie, Qing Liao et al.
METEOR: Multi-Encoder Collaborative Token Pruning for Efficient Vision Language Models
Yuchen Liu, Yaoming Wang, Bowen Shi et al.
Robust Low-light Scene Restoration via Illumination Transition
Ze Li, Feng Zhang, Xiatian Zhu et al.
GSV3D: Gaussian Splatting-based Geometric Distillation with Stable Video Diffusion for Single-Image 3D Object Generation
Ye Tao, jiawei zhang, Yahao Shi et al.
Visual Relation Diffusion for Human-Object Interaction Detection
Ping Cao, Yepeng Tang, Chunjie Zhang et al.
TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes
Yan Xia, Yunxiang Lu, Rui Song et al.
Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction
Giuseppe Cartella, Vittorio Cuculo, Alessandro D'Amelio et al.
BASIC: Boosting Visual Alignment with Intrinsic Refined Embeddings in Multimodal Large Language Models
Jianting Tang, Yubo Wang, Haoyu Cao et al.
Outlier-Aware Post-Training Quantization for Image Super-Resolution
Hailing Wang, Jianglin Lu, Yitian Zhang et al.
MeshPad: Interactive Sketch-Conditioned Artist-Reminiscent Mesh Generation and Editing
Haoxuan Li, Ziya Erkoç, Lei Li et al.
LiON-LoRA: Rethinking LoRA Fusion to Unify Controllable Spatial and Temporal Generation for Video Diffusion
Yisu Zhang, Chenjie Cao, Chaohui Yu et al.
Latent Expression Generation for Referring Image Segmentation and Grounding
Seonghoon Yu, Junbeom Hong, Joonseok Lee et al.
Toward Material-Agnostic System Identification from Videos
Yizhou Zhao, Haoyu Chen, Chunjiang Liu et al.
MoSiC: Optimal-Transport Motion Trajectory for Dense Self-Supervised Learning
Mohammadreza Salehi, Shashanka Venkataramanan, Ioana Simion et al.
Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks
Bhishma Dedhia, David Bourgin, Krishna Kumar Singh et al.
Task-Specific Zero-shot Quantization-Aware Training for Object Detection
Changhao Li, Xinrui Chen, Ji Wang et al.
CoST: Efficient Collaborative Perception From Unified Spatiotemporal Perspective
Zongheng Tang, Yi Liu, Yifan Sun et al.
PoseAnchor: Robust Root Position Estimation for 3D Human Pose Estimation
Jun-Hee Kim, Jumin Han, Seong-Whan Lee
VQ-SGen: A Vector Quantized Stroke Representation for Creative Sketch Generation
Jiawei Wang, Zhiming Cui, Changjian Li
SAFER: Sharpness Aware layer-selective Finetuning for Enhanced Robustness in vision transformers
Bhavna Gopal, Huanrui Yang, Mark Horton et al.
AstroLoc: Robust Space to Ground Image Localizer
Gabriele Berton, Alex Stoken, Carlo Masone
Domain Generalizable Portrait Style Transfer
Xinbo Wang, Wenju Xu, Qing Zhang et al.
Learning Pixel-adaptive Multi-layer Perceptrons for Real-time Image Enhancement
Junyu Lou, Xiaorui Zhao, Kexuan Shi et al.
MCOP: Multi-UAV Collaborative Occupancy Prediction
Zefu Lin, Wenbo Chen, Xiaojuan Jin et al.
CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models
Junho Kim, Hyungjin Chung, Byung-Hoon Kim
Stylized-Face: A Million-level Stylized Face Dataset for Face Recognition
Zhengyuan Peng, Jianqing Xu, Yuge Huang et al.
ForCenNet: Foreground-Centric Network for Document Image Rectification
Peng Cai, liqiang liqiang, Kaicheng Yang et al.
Enhancing Numerical Prediction of MLLMs with Soft Labeling
Pei Wang, Zhaowei Cai, Hao Yang et al.
DAViD: Data-efficient and Accurate Vision Models from Synthetic Data
Fatemeh Saleh, Sadegh Aliakbarian, Charlie Hewitt et al.
Guiding Noisy Label Conditional Diffusion Models with Score-based Discriminator Correction
Dat Cong, Hieu Tran, Hoang Thanh-Tung
MotionDiff: Training-free Zero-shot Interactive Motion Editing via Flow-assisted Multi-view Diffusion
Yikun Ma, Yiqing Li, Jiawei Wu et al.
MeshMamba: State Space Models for Articulated 3D Mesh Generation and Reconstruction
Yusuke Yoshiyasu, Leyuan Sun, Ryusuke Sagawa
Decoding Correlation-Induced Misalignment in the Stable Diffusion Workflow for Text-to-Image Generation
Yunze Tong, Fengda Zhang, Didi Zhu et al.
Learning Robust Image Watermarking with Lossless Cover Recovery
jiale chen, Wei Wang, Chongyang Shi et al.
DISTA-Net: Dynamic Closely-Spaced Infrared Small Target Unmixing
Shengdong Han, Shangdong Yang, Yuxuan Li et al.
FPEM: Face Prior Enhanced Facial Attractiveness Prediction for Live Videos with Face Retouching
Hui Li, Xiaoyu Ren, Hongjiu Yu et al.
OmniVTON: Training-Free Universal Virtual Try-On
Zhaotong Yang, Yuhui Li, Shengfeng He et al.
Federated Prompt-Tuning with Heterogeneous and Incomplete Multimodal Client Data
Hang Phung, Manh Nguyen, Thanh Huynh et al.
Membership Inference Attacks with False Discovery Rate Control
Chenxu Zhao, Wei Qian, Aobo Chen et al.
Fuse Before Transfer: Knowledge Fusion for Heterogeneous Distillation
Guopeng Li, Qiang Wang, Ke Yan et al.
PUMPS: Skeleton-Agnostic Point-based Universal Motion Pre-Training for Synthesis in Human Motion Tasks
Clinton A Mo, Kun Hu, Chengjiang Long et al.
IAP: Invisible Adversarial Patch Attack through Perceptibility-Aware Localization and Perturbation Optimization
Subrat Kishore Dutta, Xiao Zhang
DeepShield: Fortifying Deepfake Video Detection with Local and Global Forgery Analysis
Yinqi Cai, Jichang Li, Zhaolun Li et al.
Depth AnyEvent: A Cross-Modal Distillation Paradigm for Event-Based Monocular Depth Estimation
Luca Bartolomei, Enrico Mannocci, Fabio Tosi et al.
SpiLiFormer: Enhancing Spiking Transformers with Lateral Inhibition
Zeqi Zheng, Yanchen Huang, Yingchao Yu et al.
Towards Efficient General Feature Prediction in Masked Skeleton Modeling
Shengkai Sun, Zefan Zhang, Jianfeng Dong et al.
MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning
Tianhong Gao, Yannian Fu, Weiqun Wu et al.
Skeleton Motion Words for Unsupervised Skeleton-based Temporal Action Segmentation
Uzay Gökay, Federico Spurio, Dominik Bach et al.
FED-PsyAU: Privacy-Preserving Micro-Expression Recognition via Psychological AU Coordination and Dynamic Facial Motion Modeling
Jingting Li, Yu Qian, Lin Zhao et al.
Ask and Remember: A Questions-Only Replay Strategy for Continual Visual Question Answering
Imad Eddine MAROUF, Enzo Tartaglione, Stéphane Lathuilière et al.
NoiseController: Towards Consistent Multi-view Video Generation via Noise Decomposition and Collaboration
Haotian Dong, Xin WANG, Di Lin et al.
G2PDiffusion: Cross-species Genotype-to-Phenotype Prediction via Evolutionary Diffusion
Mengdi Liu, Zhangyang Gao, Hong Chang et al.
Bridging Diffusion Models and 3D Representations: A 3D Consistent Super-Resolution Framework
Yi-Ting Chen, Ting-Hsuan Liao, Pengsheng Guo et al.
Easy3D: A Simple Yet Effective Method for 3D Interactive Segmentation
Andrea Simonelli, Norman Müller, Peter Kontschieder
Balancing Task-invariant Interaction and Task-specific Adaptation for Unified Image Fusion
Xingyu Hu, Junjun Jiang, Chenyang Wang et al.
IM-LUT: Interpolation Mixing Look-Up Tables for Image Super-Resolution
Sejin Park, Sangmin Lee, Kyong Hwan Jin et al.
GeoProg3D: Compositional Visual Reasoning for City-Scale 3D Language Fields
Shunsuke Yasuki, Taiki Miyanishi, Nakamasa Inoue et al.
LLM-enhanced Action-aware Multi-modal Prompt Tuning for Image-Text Matching
Meng Tian, Shuo Yang, Xinxiao Wu
Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation
Jiahua Dong, Hui Yin, Wenqi Liang et al.
A Plug-and-Play Physical Motion Restoration Approach for In-the-Wild High-Difficulty Motions
Youliang Zhang, Ronghui Li, Yachao Zhang et al.
Reverse Convolution and Its Applications to Image Restoration
Xuhong Huang, Shiqi Liu, Kai Zhang et al.
DPoser-X: Diffusion Model as Robust 3D Whole-body Human Pose Prior
Junzhe Lu, Jing Lin, Hongkun Dou et al.
MoMaps: Semantics-Aware Scene Motion Generation with Motion Maps
Jiahui Lei, Kyle Genova, George Kopanas et al.
InstantEdit: Text-Guided Few-Step Image Editing with Piecewise Rectified Flow
Yiming Gong, Zhen Zhu, Minjia Zhang
2D Gaussian Splatting-based Sparse-view Transparent Object Depth Reconstruction via Physics Simulation for Scene Update
Jeongyun Kim, Seunghoon Jeong, Giseop Kim et al.
Unsupervised Imaging Inverse Problems with Diffusion Distribution Matching
Giacomo Meanti, Thomas Ryckeboer, Michael Arbel et al.
Tiling artifacts and trade-offs of feature normalization in the segmentation of large biological images
Elena Buglakova, Anwai Archit, Edoardo D'Imprima et al.
TeethGenerator: A two-stage framework for paired pre- and post-orthodontic 3D dental data generation
Changsong Lei, Yaqian Liang, Shaofeng Wang et al.
Hybrid-TTA: Continual Test-time Adaptation via Dynamic Domain Shift Detection
Hyewon Park, Hyejin Park, Jueun Ko et al.
Kaputt: A Large-Scale Dataset for Visual Defect Detection
Sebastian Höfer, Dorian Henning, Artemij Amiranashvili et al.
TrackAny3D: Transferring Pretrained 3D Models for Category-unified 3D Point Cloud Tracking
Mengmeng Wang, Haonan Wang, Yulong Li et al.
DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding
Xiaoyi Bao, Chen-Wei Xie, Hao Tang et al.
Seeing the Unseen: A Semantic Alignment and Context-Aware Prompt Framework for Open-Vocabulary Camouflaged Object Segmentation
Peng Ren, Tian Bai, Jing Sun et al.
HoliTracer: Holistic Vectorization of Geographic Objects from Large-Size Remote Sensing Imagery
Yu Wang, Bo Dang, Wanchun Li et al.
Fish2Mesh Transformer: 3D Human Mesh Recovery from Egocentric Vision
Tianma Shen, Aditya Shrish Puranik, James Vong et al.
Towards Foundational Models for Single-Chip Radar
Tianshu Huang, Akarsh Prabhakara, Chuhan Chen et al.
Breaking Rectangular Shackles: Cross-View Object Segmentation for Fine-Grained Object Geo-Localization
Qingwang Zhang, Yingying Zhu
Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval
Dohwan Ko, Ji Soo Lee, Minhyuk Choi et al.
MIORe & VAR-MIORe: Benchmarks to Push the Boundaries of Restoration
George Ciubotariu, Zhuyun Zhou, Zongwei Wu et al.
Aligning Effective Tokens with Video Anomaly in Large Language Models
YINGXIAN Chen, Jiahui Liu, Ruidi Fan et al.
Certifiably Optimal Anisotropic Rotation Averaging
Carl Olsson, Yaroslava Lochman, Johan Malmport et al.
Forecasting Continuous Non-Conservative Dynamical Systems in SO(3)
Lennart Bastian, Mohammad Rashed, Nassir Navab et al.
Learning Yourself: Class-Incremental Semantic Segmentation with Language-Inspired Bootstrapped Disentanglement
Ruitao Wu, Yifan Zhao, Jia Li
Learning 3D Scene Analogies with Neural Contextual Scene Maps
Junho Kim, Gwangtak Bae, Eun Sun Lee et al.
Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions
Liang Xu, Chengqun Yang, Zili Lin et al.
Verbalized Representation Learning for Interpretable Few-Shot Generalization
Cheng-Fu Yang, Da Yin, Wenbo Hu et al.
Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text matching
Yang Liu, Wentao Feng, Zhuoyao Liu et al.
PS3: A Multimodal Transformer Integrating Pathology Reports with Histology Images and Biological Pathways for Cancer Survival Prediction
Manahil Raza, Ayesha Azam, Talha Qaiser et al.
PERSONA: Personalized Whole-Body 3D Avatar with Pose-Driven Deformations from a Single Image
Geonhee Sim, Gyeongsik Moon
From Prompt to Progression: Taming Video Diffusion Models for Seamless Attribute Transition
Ling Lo, Kelvin Chan, Wen-Huang Cheng et al.
DiSCO-3D : Discovering and Segmenting Sub-Concepts from Open-vocabulary Queries in NeRF
Doriand Petit, Steve Bourgeois, Vincent Gay-Bellile et al.
GlassWizard: Harvesting Diffusion Priors for Glass Surface Detection
Wenxue Li, Tian Ye, Xinyu Xiong et al.
DONUT: A Decoder-Only Model for Trajectory Prediction
Markus Knoche, Daan de Geus, Bastian Leibe
Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection
Shizhen Zhao, Jiahui Liu, Xin Wen et al.
PanoSplatt3R: Leveraging Perspective Pretraining for Generalized Unposed Wide-Baseline Panorama Reconstruction
Jiahui Ren, Mochu Xiang, Jiajun Zhu et al.
Epipolar Consistent Attention Aggregation Network for Unsupervised Light Field Disparity Estimation
Chen Gao, Shuo Zhang, Youfang Lin
Prior-aware Dynamic Temporal Modeling Framework for Sequential 3D Hand Pose Estimation
Pengfei Ren, Jingyu Wang, Haifeng Sun et al.
DRaM-LHM: A Quaternion Framework for Iterative Camera Pose Estimation
Chen Lin, Weizhi Du, Zhixiang Min et al.
Arti-PG: A Toolbox for Procedurally Synthesizing Large-Scale and Diverse Articulated Objects with Rich Annotations
Jianhua Sun, Yuxuan Li, Jiude Wei et al.
Manual-PA: Learning 3D Part Assembly from Instruction Diagrams
Jiahao Zhang, Anoop Cherian, Cristian Rodriguez-Opazo et al.
Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension
Xiyao Wang, Zhengyuan Yang, Linjie Li et al.
MGSfM: Multi-Camera Geometry Driven Global Structure-from-Motion
peilin Tao, Hainan Cui, Diantao Tu et al.
Learning Large Motion Estimation from Intermediate Representations with a High-Resolution Optical Flow Dataset Featuring Long-Range Dynamic Motion
Hoonhee Cho, Yuhwan Jeong, Kuk-Jin Yoon
GeoExplorer: Active Geo-localization with Curiosity-Driven Exploration
Li Mi, Manon Béchaz, Zeming Chen et al.
NAPPure: Adversarial Purification for Robust Image Classification under Non-Additive Perturbations
Junjie Nan, Jianing Li, Wei Chen et al.
PEFTDiff: Diffusion-Guided Transferability Estimation for Parameter-Efficient Fine-Tuning
PRAFFUL KHOBA, Zijian Wang, Chetan Arora et al.
Is Tracking really more challenging in First Person Egocentric Vision?
Matteo Dunnhofer, Zaira Manigrasso, Christian Micheloni
Stochastic Interpolants for Revealing Stylistic Flows across the History of Art
Pingchuan Ma, Ming Gui, Johannes Schusterbauer et al.
POMATO: Marrying Pointmap Matching with Temporal Motions for Dynamic 3D Reconstruction
Songyan Zhang, Yongtao Ge, Jinyuan Tian et al.
Multispectral Demosaicing via Dual Cameras
SaiKiran Tedla, Junyong Lee, Beixuan Yang et al.
Flexi-FSCIL: Adaptive Knowledge Retention for Breaking the Stability-Plasticity Dilemma in Few-Shot Class-Incremental Learning
Wufei Xie, Yalin Wang, Chenliang Liu et al.
Staining and Locking Computer Vision Models Without Retraining
Oliver Sutton, Qinghua Zhou, George Leete et al.
AVAM: a Universal Training-free Adaptive Visual Anchoring Embedded into Multimodal Large Language Model for Multi-image Question Answering
Kang Zeng, Guojin Zhong, Jintao Cheng et al.
Analyzing Finetuning Representation Shift for Multimodal LLMs Steering
Pegah KHAYATAN, Mustafa Shukor, Jayneel Parekh et al.
RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction
Johannes Künzel, Anna Hilsmann, Peter Eisert
Prototype Guided Backdoor Defense via Activation Space Manipulation
Venkat Adithya Amula, Sunayana Samavedam, Saurabh Saini et al.
Dynamic Multi-Layer Null Space Projection for Vision-Language Continual Learning
Borui Kang, Lei Wang, Zhiping Wu et al.
Robust 3D Object Detection using Probabilistic Point Clouds from Single-Photon LiDARs
Bhavya Goyal, Felipe Gutierrez-Barragan, Wei Lin et al.
CE-FAM: Concept-Based Explanation via Fusion of Activation Maps
Michihiro Kuroki, Toshihiko Yamasaki
Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility
Melih Barsbey, Lucas Prieto, Stefanos Zafeiriou et al.
Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Zitian Wang, Yue Liao, RONG KANG et al.
SynCity: Training-Free Generation of 3D Cities
Paul Engstler, Aleksandar Shtedritski, Iro Laina et al.
Multi-view Gaze Target Estimation
Qiaomu Miao, Vivek Golani, Jingyi Xu et al.