Most Cited ICCV "protein-molecule interaction network" Papers
2,701 papers found • Page 8 of 14
Conference
PRO-VPT: Distribution-Adaptive Visual Prompt Tuning via Prompt Relocation
Chikai Shang, Mengke Li, Yiqun Zhang et al.
Granular Concept Circuits: Toward a Fine-Grained Circuit Discovery for Concept Representations
Dahee Kwon, Sehyun Lee, Jaesik Choi
IM360: Large-scale Indoor Mapping with 360 Cameras
Dongki Jung, Jaehoon Choi, Yonghan Lee et al.
DLFR-Gen: Diffusion-based Video Generation with Dynamic Latent Frame Rate
Zhihang Yuan, Rui Xie, Yuzhang Shang et al.
Probabilistic Prototype Calibration of Vision-language Models for Generalized Few-shot Semantic Segmentation
Jie Liu, Jiayi Shen, Pan Zhou et al.
Training-Free Class Purification for Open-Vocabulary Semantic Segmentation
Qi Chen, Lingxiao Yang, Yun Chen et al.
Revisiting Point Cloud Completion: Are We Ready For The Real-World?
Stuti Pathak, Prashant Kumar, Dheeraj Baiju et al.
Progressive Artwork Outpainting via Latent Diffusion Models
Dae-Young Song, Jung-Jae Yu, Donghyeon Cho
A Conditional Probability Framework for Compositional Zero-shot Learning
Peng Wu, Qiuxia Lai, Hao Fang et al.
An Efficient Post-hoc Framework for Reducing Task Discrepancy of Text Encoders for Composed Image Retrieval
Jaeseok Byun, Seokhyeon Jeong, Wonjae Kim et al.
Federated Prompt-Tuning with Heterogeneous and Incomplete Multimodal Client Data
Hang Phung, Manh Nguyen, Thanh Huynh et al.
Find a Scapegoat: Poisoning Membership Inference Attack and Defense to Federated Learning
Wenjin Mo, Zhiyuan Li, Minghong Fang et al.
A Linear N-Point Solver for Structure and Motion from Asynchronous Tracks
Hang Su, Yunlong Feng, Daniel Gehrig et al.
VITAL: More Understandable Feature Visualization through Distribution Alignment and Relevant Information Flow
Ada Görgün, Bernt Schiele, Jonas Fischer
Taming the Untamed: Graph-Based Knowledge Retrieval and Reasoning for MLLMs to Conquer the Unknown
Bowen Wang, Zhouqiang Jiang, Yasuaki Susumu et al.
Causality-guided Prompt Learning for Vision-language Models via Visual Granulation
Mengyu Gao, Qiulei Dong
Auxiliary Prompt Tuning of Vision-Language Models for Few-Shot Out-of-Distribution Detection
Wenjun Miao, Guansong Pang, Zihan Wang et al.
Is Less More? Exploring Token Condensation as Training-free Test-time Adaptation
Zixin Wang, Dong Gong, Sen Wang et al.
Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations
Chongjie Si, Zhiyi Shi, Xuehui Wang et al.
Achieving More with Less: Additive Prompt Tuning for Rehearsal-Free Class-Incremental Learning
Haoran Chen, Ping Wang, Zihan Zhou et al.
Meta-Learning Dynamic Center Distance: Hard Sample Mining for Learning with Noisy Labels
Chenyu Mu, Yijun Qu, Jiexi Yan et al.
On the Robustness Tradeoff in Fine-Tuning
Kunyang Li, Jean-Charles Noirot Ferrand, Ryan Sheatsley et al.
Dataset Distillation as Data Compression: A Rate-Utility Perspective
Youneng Bao, Yiping Liu, Zhuo Chen et al.
Divide-and-Conquer for Enhancing Unlabeled Learning, Stability, and Plasticity in Semi-supervised Continual Learning
Yue Duan, Taicai Chen, Lei Qi et al.
HumorDB: Can AI understand graphical humor?
Vedaant V Jain, Gabriel Kreiman, Felipe Feitosa
Active Membership Inference Test (aMINT): Enhancing Model Auditability with Multi-Task Learning.
Daniel DeAlcala, Aythami Morales, Julian Fierrez et al.
One-Shot Knowledge Transfer for Scalable Person Re-Identification
Longhua Li, Lei Qi, Xin Geng
ShortFT: Diffusion Model Alignment via Shortcut-based Fine-Tuning
Xiefan Guo, Miaomiao Cui, Liefeng Bo et al.
Seal Your Backdoor with Variational Defense
Ivan Sabolic, Matej Grcic, Siniša Šegvić
CODE-CL: Conceptor-Based Gradient Projection for Deep Continual Learning
Marco P. Apolinario, Sakshi Choudhary, Kaushik Roy
Causal Disentanglement and Cross-Modal Alignment for Enhanced Few-Shot Learning
Tianjiao Jiang, Zhen Zhang, Yuhang Liu et al.
BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
Shengao Wang, Arjun Chandra, Aoming Liu et al.
Improving Large Vision and Language Models by Learning from a Panel of Peers
Jefferson Hernandez, Jing Shi, Simon Jenni et al.
Verbalized Representation Learning for Interpretable Few-Shot Generalization
Cheng-Fu Yang, Da Yin, Wenbo Hu et al.
Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection
Shizhen Zhao, Jiahui Liu, Xin Wen et al.
Towards Privacy-preserved Pre-training of Remote Sensing Foundation Models with Federated Mutual-guidance Learning
Jieyi Tan, Chengwei Zhang, Bo Dang et al.
Boosting Domain Generalized and Adaptive Detection with Diffusion Models: Fitness, Generalization, and Transferability
Boyong He, Yuxiang Ji, Zhuoyue Tan et al.
ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers
Hanwen Cao, Haobo Lu, Xiaosen Wang et al.
Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Zitian Wang, Yue Liao, RONG KANG et al.
Staining and Locking Computer Vision Models Without Retraining
Oliver Sutton, Qinghua Zhou, George Leete et al.
The Inter-Intra Modal Measure: A Predictive Lens on Fine-Tuning Outcomes in Vision-Language Models
Laura Niss, Kevin Vogt-Lowell, Theodoros Tsiligkaridis
PoseSyn: Synthesizing Diverse 3D Pose Data from In-the-Wild 2D Data
CHANGHEE YANG, Hyeonseop Song, Seokhun Choi et al.
Human-in-the-Loop Local Corrections of 3D Scene Layouts via Infilling
Christopher Xie, Armen Avetisyan, Henry Howard-Jenkins et al.
AstroLoc: Robust Space to Ground Image Localizer
Gabriele Berton, Alex Stoken, Carlo Masone
Toward Material-Agnostic System Identification from Videos
Yizhou Zhao, Haoyu Chen, Chunjiang Liu et al.
Boosting Multi-View Indoor 3D Object Detection via Adaptive 3D Volume Construction
Runmin Zhang, Zhu Yu, Si-Yuan Cao et al.
Robust Low-light Scene Restoration via Illumination Transition
Ze Li, Feng Zhang, Xiatian Zhu et al.
Dual Reciprocal Learning of Language-based Human Motion Understanding and Generation
CHEN LIANG, Zhicheng Shi, Wenguan Wang et al.
HazeFlow: Revisit Haze Physical Model as ODE and Non-Homogeneous Haze Generation for Real-World Dehazing
Junseong Shin, Seungwoo Chung, Yunjeong Yang et al.
DAMap: Distance-aware MapNet for High Quality HD Map Construction
JINPENG DONG, Chen Li, Yutong Lin et al.
PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes
Ahmed Abdelreheem, Filippo Aleotti, Jamie Watson et al.
Understanding Flatness in Generative Models: Its Role and Benefits
Taehwan Lee, Kyeongkook Seo, Jaejun Yoo et al.
Princeton365: A Diverse Dataset with Accurate Camera Pose
Karhan Kayan, Stamatis Alexandropoulos, Rishabh Jain et al.
Voyaging into Perpetual Dynamic Scenes from a Single View
Fengrui Tian, Tianjiao Ding, Jinqi Luo et al.
Learning 3D Scene Analogies with Neural Contextual Scene Maps
Junho Kim, Gwangtak Bae, Eun Sun Lee et al.
RegGS: Unposed Sparse Views Gaussian Splatting with 3DGS Registration
Chong Cheng, Yu Hu, Sicheng Yu et al.
PersPose: 3D Human Pose Estimation with Perspective Encoding and Perspective Rotation
Xiaoyang Hao, Han Li
Breaking Rectangular Shackles: Cross-View Object Segmentation for Fine-Grained Object Geo-Localization
Qingwang Zhang, Yingying Zhu
MaskHand: Generative Masked Modeling for Robust Hand Mesh Reconstruction in the Wild
Muhammad Usama Saleem, Ekkasit Pinyoanuntapong, Mayur Patel et al.
Fish2Mesh Transformer: 3D Human Mesh Recovery from Egocentric Vision
Tianma Shen, Aditya Shrish Puranik, James Vong et al.
HoliTracer: Holistic Vectorization of Geographic Objects from Large-Size Remote Sensing Imagery
Yu Wang, Bo Dang, Wanchun Li et al.
DialNav: Multi-turn Dialog Navigation with a Remote Guide
Leekyeung Han, Hyunji Min, Gyeom Hwangbo et al.
Online Dense Point Tracking with Streaming Memory
Qiaole Dong, Yanwei Fu
Hybrid-TTA: Continual Test-time Adaptation via Dynamic Domain Shift Detection
Hyewon Park, Hyejin Park, Jueun Ko et al.
Learning on the Go: A Meta-learning Object Navigation Model
Xiaorong Qin, Xinhang Song, Sixian Zhang et al.
ProGait: A Multi-Purpose Video Dataset and Benchmark for Transfemoral Prosthesis Users
Xiangyu Yin, Boyuan Yang, Weichen Liu et al.
After the Party: Navigating the Mapping From Color to Ambient Lighting
Florin-Alexandru Vasluianu, Tim Seizinger, Zongwei Wu et al.
3D Gaussian Map with Open-Set Semantic Grouping for Vision-Language Navigation
Jianzhe Gao, Rui Liu, Wenguan Wang
DyGS-SLAM: Real-Time Accurate Localization and Gaussian Reconstruction for Dynamic Scenes
Xinggang Hu, Chenyangguang Zhang, Mingyuan Zhao et al.
EvRT-DETR: Latent Space Adaptation of Image Detectors for Event-based Vision
Dmitrii Torbunov, Yihui Ren, Animesh Ghose et al.
A Hyperdimensional One Place Signature to Represent Them All: Stackable Descriptors For Visual Place Recognition
Connor Malone, Somayeh Hussaini, Tobias Fischer et al.
MoMaps: Semantics-Aware Scene Motion Generation with Motion Maps
Jiahui Lei, Kyle Genova, George Kopanas et al.
Expressive Talking Human from Single-Image with Imperfect Priors
Jun Xiang, Yudong Guo, Leipeng Hu et al.
Beyond Label Semantics: Language-Guided Action Anatomy for Few-shot Action Recognition
Zefeng Qian, Xincheng Yao, Yifei Huang et al.
Reverse Convolution and Its Applications to Image Restoration
Xuhong Huang, Shiqi Liu, Kai Zhang et al.
MamTiff-CAD: Multi-Scale Latent Diffusion with Mamba+ for Complex Parametric Sequence
Liyuan Deng, Yunpeng Bai, Yongkang Dai et al.
Im2Haircut: Single-view Strand-based Hair Reconstruction for Human Avatars
Vanessa Sklyarova, Egor Zakharov, Malte Prinzler et al.
AFUNet: Cross-Iterative Alignment-Fusion Synergy for HDR Reconstruction via Deep Unfolding Paradigm
Xinyue Li, Zhangkai Ni, Wenhan Yang
PINO: Person-Interaction Noise Optimization for Long-Duration and Customizable Motion Generation of Arbitrary-Sized Groups
Sakuya Ota, Qing Yu, Kent Fujiwara et al.
TeRA: Rethinking Text-guided Realistic 3D Avatar Generation
Yanwen Wang, Yiyu Zhuang, Jiawei Zhang et al.
Vulnerability-Aware Spatio-Temporal Learning for Generalizable Deepfake Video Detection
Dat NGUYEN, Marcella Astrid, Anis Kacem et al.
Latent Swap Joint Diffusion for 2D Long-Form Latent Generation
Yusheng Dai, Chenxi Wang, Chang Li et al.
Augmented and Softened Matching for Unsupervised Visible-Infrared Person Re-Identification
Zhiqi Pang, Chunyu Wang, Lingling Zhao et al.
Balancing Task-invariant Interaction and Task-specific Adaptation for Unified Image Fusion
Xingyu Hu, Junjun Jiang, Chenyang Wang et al.
PatchScaler: An Efficient Patch-Independent Diffusion Model for Image Super-Resolution
Yong Liu, Hang Dong, Jinshan Pan et al.
PrimHOI: Compositional Human-Object Interaction via Reusable Primitives
Kai Jia, Tengyu Liu, Mingtao Pei et al.
Event-Driven Storytelling with Multiple Lifelike Humans in a 3D Scene
Donggeun Lim, Jinseok Bae, Inwoo Hwang et al.
Hipandas: Hyperspectral Image Joint Denoising and Super-Resolution by Image Fusion with the Panchromatic Image
Shuang Xu, Zixiang Zhao, Haowen Bai et al.
Skeleton Motion Words for Unsupervised Skeleton-based Temporal Action Segmentation
Uzay Gökay, Federico Spurio, Dominik Bach et al.
Towards Efficient General Feature Prediction in Masked Skeleton Modeling
Shengkai Sun, Zefan Zhang, Jianfeng Dong et al.
How Would It Sound? Material-Controlled Multimodal Acoustic Profile Generation for Indoor Scenes
Mahnoor Saad, Ziad Al-Halah
Occlusion-robust Stylization for Drawing-based 3D Animation
Sunjae Yoon, Gwanhyeong Koo, Younghwan Lee et al.
DeepShield: Fortifying Deepfake Video Detection with Local and Global Forgery Analysis
Yinqi Cai, Jichang Li, Zhaolun Li et al.
GeoAvatar: Adaptive Geometrical Gaussian Splatting for 3D Head Avatar
SeungJun Moon, Hah Min Lew, Seungeun Lee et al.
Bitrate-Controlled Diffusion for Disentangling Motion and Content in Video
Xiao Li, Qi Chen, Xiulian Peng et al.
Tiling artifacts and trade-offs of feature normalization in the segmentation of large biological images
Elena Buglakova, Anwai Archit, Edoardo D'Imprima et al.
Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control
Seongmin Park, Hyungmin Kim, Sangwoo kim et al.
Group-wise Scaling and Orthogonal Decomposition for Domain-Invariant Feature Extraction in Face Anti-Spoofing
Seungjin Jung, Kanghee Lee, Yonghyun Jeong et al.
FakeRadar: Probing Forgery Outliers to Detect Unknown Deepfake Videos
Zhaolun Li, Jichang Li, Yinqi Cai et al.
StrandHead: Text to Hair-Disentangled 3D Head Avatars Using Human-Centric Priors
Xiaokun Sun, Zeyu Cai, Ying Tai et al.
T2Bs: Text-to-Character Blendshapes via Video Generation
Jiahao Luo, Chaoyang Wang, Michael Vasilkovsky et al.
DuoCLR: Dual-Surrogate Contrastive Learning for Skeleton-based Human Action Segmentation
Haitao Tian
IM-LUT: Interpolation Mixing Look-Up Tables for Image Super-Resolution
Sejin Park, Sangmin Lee, Kyong Hwan Jin et al.
NoiseController: Towards Consistent Multi-view Video Generation via Noise Decomposition and Collaboration
Haotian Dong, Xin WANG, Di Lin et al.
FED-PsyAU: Privacy-Preserving Micro-Expression Recognition via Psychological AU Coordination and Dynamic Facial Motion Modeling
Jingting Li, Yu Qian, Lin Zhao et al.
PUMPS: Skeleton-Agnostic Point-based Universal Motion Pre-Training for Synthesis in Human Motion Tasks
Clinton A Mo, Kun Hu, Chengjiang Long et al.
DISTA-Net: Dynamic Closely-Spaced Infrared Small Target Unmixing
Shengdong Han, Shangdong Yang, Yuxuan Li et al.
ASCENT: Annotation-free Self-supervised Contrastive Embeddings for 3D Neuron Tracking in Fluorescence Microscopy
Haejun Han, Hang Lu
VSRM: A Robust Mamba-Based Framework for Video Super-Resolution
Phu Tran Dinh, Hung Dao, Daeyoung Kim
AnimalClue: Recognizing Animals by their Traces
Risa Shinoda, Nakamasa Inoue, Iro Laina et al.
SCFlow: Implicitly Learning Style and Content Disentanglement with Flow Models
Pingchuan Ma, Xiaopei Yang, Ming Gui et al.
ForCenNet: Foreground-Centric Network for Document Image Rectification
Peng Cai, liqiang liqiang, Kaicheng Yang et al.
CycleVAR: Repurposing Autoregressive Model for Unsupervised One-Step Image Translation
Yi Liu, Shengqian Li, Zuzeng Lin et al.
Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks
Bhishma Dedhia, David Bourgin, Krishna Kumar Singh et al.
Text Embedding Knows How to Quantize Text-Guided Diffusion Models
Hongjae Lee, Myungjun Son, Dongjea Kang et al.
SegmentDreamer: Towards High-fidelity Text-to-3D Synthesis with Segmented Consistency Trajectory Distillation
Jiahao Zhu, Zixuan Chen, Guangcong Wang et al.
IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance
Jiayi Guo, Chuanhao Yan, Xingqian Xu et al.
Outlier-Aware Post-Training Quantization for Image Super-Resolution
Hailing Wang, Jianglin Lu, Yitian Zhang et al.
Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction
Giuseppe Cartella, Vittorio Cuculo, Alessandro D'Amelio et al.
MeshPad: Interactive Sketch-Conditioned Artist-Reminiscent Mesh Generation and Editing
Haoxuan Li, Ziya Erkoç, Lei Li et al.
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
Kwanyoung Kim, Byeongsu Sim
D3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection
Yanran Zhang, Bingyao Yu, Yu Zheng et al.
TITAN-Guide: Taming Inference-Time Alignment for Guided Text-to-Video Diffusion Models
Christian Simon, Masato Ishii, Akio Hayakawa et al.
CompSlider: Compositional Slider for Disentangled Multiple-Attribute Image Generation
Zixin Zhu, Kevin Duarte, Mamshad Nayeem Rizve et al.
FairHuman: Boosting Hand and Face Quality in Human Image Generation with Minimum Potential Delay Fairness in Diffusion Models
Yuxuan Wang, Tianwei Cao, Huayu Zhang et al.
HypDAE: Hyperbolic Diffusion Autoencoders for Hierarchical Few-shot Image Generation
Lingxiao Li, Kaixuan Fan, Boqing Gong et al.
V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models
Jisoo Kim, Wooseok Seo, Junwan Kim et al.
Streamlining Image Editing with Layered Diffusion Brushes
Peyman Gholami, Robert Xiao
GlassWizard: Harvesting Diffusion Priors for Glass Surface Detection
Wenxue Li, Tian Ye, Xinyu Xiong et al.
DACoN: DINO for Anime Paint Bucket Colorization with Any Number of Reference Images
Kazuma Nagata, Naoshi Kaneko
SpecGuard: Spectral Projection-based Advanced Invisible Watermarking
Inzamamul Alam, Md Islam, Simon Woo et al.
Preserve Anything: Controllable Image Synthesis with Object Preservation
Prasen Kumar Sharma, Neeraj Matiyali, Siddharth Srivastava et al.
CompleteMe: Reference-based Human Image Completion
Yu-Ju Tsai, Brian Price, Qing Liu et al.
REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder
Yitian Zhang, Long Mai, Aniruddha Mahapatra et al.
From Prompt to Progression: Taming Video Diffusion Models for Seamless Attribute Transition
Ling Lo, Kelvin Chan, Wen-Huang Cheng et al.
Learning Implicit Features with Flow-Infused Transformations for Realistic Virtual Try-On
Delong Zhang, Qiwei Huang, Yang Sun et al.
Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing
Joowon Kim, Ziseok Lee, Donghyeon Cho et al.
Context Guided Transformer Entropy Modeling for Video Compression
Junlong Tong, Wei Zhang, Yaohui Jin et al.
UIP2P: Unsupervised Instruction-based Image Editing via Edit Reversibility Constraint
Enis Simsar, Alessio Tonioni, Yongqin Xian et al.
Tune-Your-Style: Intensity-tunable 3D Style Transfer with Gaussian Splatting
Yian Zhao, rushi ye, Ruochong Zheng et al.
Blended Point Cloud Diffusion for Localized Text-guided Shape Editing
Etai Sella, Noam Atia, Ron Mokady et al.
Pretrained Reversible Generation as Unsupervised Visual Representation Learning
Rongkun Xue, Jinouwen Zhang, Yazhe Niu et al.
Attention to Neural Plagiarism: Diffusion Models Can Plagiarize Your Copyrighted Images!
zihang zou, Boqing Gong, Liqiang Wang
HiERO: Understanding the Hierarchy of Human Behavior Enhances Reasoning on Egocentric Videos
Simone Alberto Peirone, Francesca Pistilli, Giuseppe Averta
DiSCO-3D : Discovering and Segmenting Sub-Concepts from Open-vocabulary Queries in NeRF
Doriand Petit, Steve Bourgeois, Vincent Gay-Bellile et al.
M-Net: MRI Brain Tumor Sequential Segmentation Network via Mesh-Cast
Jiacheng Lu, Hui Ding, Shiyu Zhang et al.
Moment Quantization for Video Temporal Grounding
Xiaolong Sun, Le Wang, Sanping Zhou et al.
S⁴M: Boosting Semi-Supervised Instance Segmentation with SAM
Heeji Yoon, Heeseong Shin, Eunbeen Hong et al.
Describe, Adapt and Combine: Empowering CLIP Encoders for Open-set 3D Object Retrieval
Zhichuan Wang, Yang Zhou, Zhe Liu et al.
Enhancing Zero-shot Object Counting via Text-guided Local Ranking and Number-evoked Global Attention
Shiwei Zhang, Qi Zhou, Wei Ke
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
Wenxuan Zhu, Bing Li, Cheng Zheng et al.
Referring Expression Comprehension for Small Objects
Kanoko Goto, Takumi Hirose, Mahiro Ukai et al.
Text-guided Visual Prompt DINO for Generic Segmentation
Yuchen Guan, Chong Sun, Canmiao Fu et al.
FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation
Yasser Benigmim, Mohammad Fahes, Tuan-Hung Vu et al.
Sparse-Dense Side-Tuner for efficient Video Temporal Grounding
David Pujol-Perich, Sergio Escalera, Albert Clapés
Learning Yourself: Class-Incremental Semantic Segmentation with Language-Inspired Bootstrapped Disentanglement
Ruitao Wu, Yifan Zhao, Jia Li
Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text matching
Yang Liu, Wentao Feng, Zhuoyao Liu et al.
PS3: A Multimodal Transformer Integrating Pathology Reports with Histology Images and Biological Pathways for Cancer Survival Prediction
Manahil Raza, Ayesha Azam, Talha Qaiser et al.
Controllable-LPMoE: Adapting to Challenging Object Segmentation via Dynamic Local Priors from Mixture-of-Experts
Yanguang Sun, Jiawei Lian, jian Yang et al.
Revisiting Efficient Semantic Segmentation: Learning Offsets for Better Spatial and Class Feature Alignment
Shi-Chen Zhang, Yunheng Li, Yu-Huan Wu et al.
Pseudo-SD: Pseudo Controlled Stable Diffusion for Semi-Supervised and Cross-Domain Semantic Segmentation
Dong Zhao, Qi Zang, Shuang Wang et al.
Free-MoRef: Instantly Multiplexing Context Perception Capabilities of Video-MLLMs within Single Inference
KUO WANG, Quanlong Zheng, Junlin Xie et al.
Towards Fine-grained Interactive Segmentation in Images and Videos
Yuan Yao, Qiushi Yang, Miaomiao Cui et al.
TAB: Transformer Attention Bottlenecks enable User Intervention and Debugging in Vision-Language Models
Pooyan Rahmanzadehgervi, Hung Nguyen, Rosanne Liu et al.
Aligning Effective Tokens with Video Anomaly in Large Language Models
YINGXIAN Chen, Jiahui Liu, Ruidi Fan et al.
No More Sibling Rivalry: Debiasing Human-Object Interaction Detection
Bin Yang, Yulin Zhang, Hong-Yu Zhou et al.
Borrowing Eyes for the Blind Spot: Overcoming Data Scarcity in Malicious Video Detection via Cross-Domain Retrieval Augmentation
Rongpei Hong, Jian Lang, Ting Zhong et al.
Plug-in Feedback Self-adaptive Attention in CLIP for Training-free Open-Vocabulary Segmentation
Zhixiang Chi, Yanan Wu, Li Gu et al.
Intermediate Connectors and Geometric Priors for Language-Guided Affordance Segmentation on Unseen Object Categories
Yicong Li, Yiyang Chen, Zhenyuan Ma et al.
How Do Optical Flow and Textual Prompts Collaborate to Assist in Audio-Visual Semantic Segmentation?
Yujian Lee, Peng Gao, Yongqi Xu et al.
Scheduling Weight Transitions for Quantization-Aware Training
Junghyup Lee, Jeimin Jeon, Dohyung Kim et al.
HarmonySeg: Tubular Structure Segmentation with Deep-Shallow Feature Fusion and Growth-Suppression Balanced Loss
Ke Zhang, Yi Huang, Wei Liu et al.
Seeing the Unseen: A Semantic Alignment and Context-Aware Prompt Framework for Open-Vocabulary Camouflaged Object Segmentation
Peng Ren, Tian Bai, Jing Sun et al.
DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding
Xiaoyi Bao, Chen-Wei Xie, Hao Tang et al.
SSVQ: Unleashing the Potential of Vector Quantization with Sign-Splitting
Shuaiting Li, Juncan Deng, Chengxuan Wang et al.
VideoMiner: Iteratively Grounding Key Frames of Hour-Long Videos via Tree-based Group Relative Policy Optimization
Xinye Cao, Hongcan Guo, Jiawen Qian et al.
LawDIS: Language-Window-based Controllable Dichotomous Image Segmentation
Xinyu Yan, Meijun Sun, Ge-Peng Ji et al.
ModalTune: Fine-Tuning Slide-Level Foundation Models with Multi-Modal Information for Multi-task Learning in Digital Pathology
Vishwesh Ramanathan, Tony Xu, Pushpak Pati et al.
Open-Vocabulary HOI Detection with Interaction-aware Prompt and Concept Calibration
Ting Lei, Shaofeng Yin, Qingchao Chen et al.
LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance
Zhang Li, Biao Yang, Qiang Liu et al.
Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training
Wooseong Jeong, Jegyeong Cho, Youngho Yoon et al.
Conditional Latent Diffusion Models for Zero-Shot Instance Segmentation
Maximilian Ulmer, Wout Boerdijk, Rudolph Triebel et al.
HyperGCT: A Dynamic Hyper-GNN-Learned Geometric Constraint for 3D Registration
Xiyu Zhang, Jiayi Ma, Jianwei Guo et al.
All in One: Visual-Description-Guided Unified Point Cloud Segmentation
Zongyan Han, Mohamed El Amine Boudjoghra, Jiahua Dong et al.
Benchmarking Burst Super-Resolution for Polarization Images: Noise Dataset and Analysis
Inseung Hwang, Kiseok Choi, Hyunho Ha et al.
RESCUE: Crowd Evacuation Simulation via Controlling SDM-United Characters
Xiaolin Liu, Tianyi zhou, Hongbo Kang et al.
Generalizable Non-Line-of-Sight Imaging with Learnable Physical Priors
Shida Sun, Yue Li, Yueyi Zhang et al.
AdaptiveAE: An Adaptive Exposure Strategy for HDR Capturing in Dynamic Scenes
Tianyi Xu, Fan Zhang, Boxin Shi et al.
Benchmarking Egocentric Visual-Inertial SLAM at City Scale
Anusha Krishnan, Shaohui Liu, Paul-Edouard Sarlin et al.
A Real-world Display Inverse Rendering Dataset
Seokjun Choi, Hoon-Gyu Chung, Yujin Jeon et al.
AutoScape: Geometry-Consistent Long-Horizon Scene Generation
Jiacheng Chen, Ziyu Jiang, Mingfu Liang et al.
RGE-GS: Reward-Guided Expansive Driving Scene Reconstruction via Diffusion Priors
Sicong Du, Jiarun Liu, Qifeng Chen et al.
Scene Coordinate Reconstruction Priors
Wenjing Bian, Axel Barroso-Laguna, Tommaso Cavallari et al.
TeethGenerator: A two-stage framework for paired pre- and post-orthodontic 3D dental data generation
Changsong Lei, Yaqian Liang, Shaofeng Wang et al.
Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge
Linshen Liu, Boyan Su, Junyue Jiang et al.
ClaraVid: A Holistic Scene Reconstruction Benchmark From Aerial Perspective With Delentropy-Based Complexity Profiling
Radu Beche, Sergiu Nedevschi
Discontinuity-aware Normal Integration for Generic Central Camera Models
Francesco Milano, Manuel Lopez-Antequera, Naina Dhingra et al.
SL2A-INR: Single-Layer Learnable Activation for Implicit Neural Representation
Reza Rezaeian, Moein Heidari, Reza Azad et al.
DoppDrive: Doppler-Driven Temporal Aggregation for Improved Radar Object Detection
Yuval Haitman, Oded Bialer
GVDepth: Zero-Shot Monocular Depth Estimation for Ground Vehicles based on Probabilistic Cue Fusion
Karlo Koledic, Luka Petrovic, Ivan Marković et al.