Most Cited ICCV "adaptive token caching" Papers
2,701 papers found • Page 11 of 14
Conference
A3GS: Arbitrary Artistic Style into Arbitrary 3D Gaussian Splatting
Zhiyuan Fang, Rengan Xie, Xuancheng Jin et al.
HouseTour: A Virtual Real Estate A(I)gent
Ata Çelen, Iro Armeni, Daniel Barath et al.
HyTIP: Hybrid Temporal Information Propagation for Masked Conditional Residual Video Coding
Yi-Hsin Chen, Yi-Chen Yao, Kuan-Wei Ho et al.
DACoN: DINO for Anime Paint Bucket Colorization with Any Number of Reference Images
Kazuma Nagata, Naoshi Kaneko
Free2Guide: Training-Free Text-to-Video Alignment using Image LVLM
Jaemin Kim, Bryan Sangwoo Kim, Jong Ye
VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE
Yazhou Xing, Yang Fei, Yingqing He et al.
DIA: The Adversarial Exposure of Deterministic Inversion in Diffusion Models
SeungHoo Hong, GeonHo Son, Juhun Lee et al.
GFPack++: Attention-Driven Gradient Fields for Optimizing 2D Irregular Packing
Tianyang Xue, Lin Lu, Yang Liu et al.
LACONIC: A 3D Layout Adapter for Controllable Image Creation
Léopold Maillard, Tom Durand, Adrien RAMANANA RAHARY et al.
Preserve Anything: Controllable Image Synthesis with Object Preservation
Prasen Kumar Sharma, Neeraj Matiyali, Siddharth Srivastava et al.
Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing
Taihang Hu, Linxuan Li, Kai Wang et al.
Parametric Shadow Control for Portrait Generation in Text-to-Image Diffusion Models
Haoming Cai, Tsung-Wei Huang, Shiv Gehlot et al.
EEGMirror: Leveraging EEG data in the wild via Montage-Agnostic Self-Supervision for EEG to Video Decoding
Xuan-Hao Liu, Bao-liang Lu, Wei-Long Zheng
Accelerating Diffusion Sampling via Exploiting Local Transition Coherence
shangwen zhu, Han Zhang, Zhantao Yang et al.
SA-LUT: Spatial Adaptive 4D Look-Up Table for Photorealistic Style Transfer
Zerui Gong, Zhonghua Wu, Qingyi Tao et al.
UniversalBooth: Model-Agnostic Personalized Text-to-Image Generation
Songhua Liu, Ruonan Yu, Xinchao Wang
UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis
Yuanrui Wang, Cong Han, Yafei Li et al.
Semantic Discrepancy-aware Detector for Image Forgery Identification
Wang Ziye, Minghang Yu, Chunyan Xu et al.
CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching
Zizhuo Li, Yifan Lu, Linfeng Tang et al.
Gain-MLP: Improving HDR Gain Map Encoding via a Lightweight MLP
Trevor Canham, SaiKiran Tedla, Michael Murdoch et al.
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation
shaojin wu, Mengqi Huang, wenxu wu et al.
FlexGen: Flexible Multi-View Generation from Text and Image Inputs
Xinli Xu, Wenhang Ge, Jiantao Lin et al.
LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing
Achint Soni, Meet Soni, Sirisha Rambhatla
Teleportraits: Training-Free People Insertion into Any Scene
Jialu Gao, Joseph K J, Fernando De la Torre
DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution
Zheng-Peng Duan, jiawei zhang, Xin Jin et al.
FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning
Hang Guo, Yawei Li, Taolin Zhang et al.
QK-Edit: Revisiting Attention-based Injection in MM-DiT for Image and Video Editing
Tiancheng SHEN, Jun Hao Liew, Zilong Huang et al.
Beyond Isolated Words: Diffusion Brush for Handwritten Text-Line Generation
Gang Dai, Yifan Zhang, Yutao Qin et al.
BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation
Ruotong Wang, Mingli Zhu, Jiarong Ou et al.
Pretrained Reversible Generation as Unsupervised Visual Representation Learning
Rongkun Xue, Jinouwen Zhang, Yazhe Niu et al.
Tracing Copied Pixels and Regularizing Patch Affinity in Copy Detection
Yichen Lu, Siwei Nie, Minlong Lu et al.
Beyond Brain Decoding: Visual-Semantic Reconstructions to Mental Creation Extension Based on fMRI
Haodong Jing, Dongyao Jiang, Yongqiang Ma et al.
PixTalk: Controlling Photorealistic Image Processing and Editing with Language
Marcos Conde, Zihao Lu, Radu Timofte
ADCD-Net: Robust Document Image Forgery Localization via Adaptive DCT Feature and Hierarchical Content Disentanglement
KA WONG, Jicheng Zhou, Haiwei Wu et al.
Towards Robust Defense against Customization via Protective Perturbation Resistant to Diffusion-based Purification
Wenkui Yang, Jie Cao, Junxian Duan et al.
A Unified Framework for Industrial Cel-Animation Colorization with Temporal-Structural Awareness
Xiaoyi Feng, Tao Huang, Peng Wang et al.
Generative Video Bi-flow
Chen Liu, Tobias Ritschel
AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation
Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin et al.
T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation
Chieh-Yun Chen, Min Shi, Gong Zhang et al.
LayerLock: Non-collapsing Representation Learning with Progressive Freezing
Goker Erdogan, Nikhil Parthasarathy, Catalin Ionescu et al.
JPEG Processing Neural Operator for Backward-Compatible Coding
Woo Kyoung Han, Yongjun Lee, Byeonghun Lee et al.
All Parts Matter: A Unified Mask-Free Virtual Try-On Framework
Chenghu Du, Shengwu Xiong, Yi Rong
Function-centric Bayesian Network for Zero-Shot Object Goal Navigation
Sixian Zhang, Xinyao Yu, Xinhang Song et al.
Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions
Yiting Qu, Ziqing Yang, Yihan Ma et al.
DC-AE 1.5: Accelerating Diffusion Model Convergence with Structured Latent Space
Junyu Chen, Dongyun Zou, Wenkun He et al.
MH-LVC: Multi-Hypothesis Temporal Prediction for Learned Conditional Residual Video Coding
Gao Zong lin, Huu-Tai Phung, Yi-Chen Yao et al.
On the Provable Importance of Gradients for Autonomous Language-Assisted Image Clustering
Bo Peng, Jie Lu, Guangquan Zhang et al.
Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive Segmentation
You Huang, Lichao Chen, Jiayi Ji et al.
HiERO: Understanding the Hierarchy of Human Behavior Enhances Reasoning on Egocentric Videos
Simone Alberto Peirone, Francesca Pistilli, Giuseppe Averta
CaptionSmiths: Flexibly Controlling Language Pattern in Image Captioning
Kuniaki Saito, Donghyun Kim, Kwanyong Park et al.
An Efficient Hybrid Vision Transformer for TinyML Applications
Fanhong Zeng, Huanan LI, Juntao Guan et al.
Graph Domain Adaptation with Dual-branch Encoder and Two-level Alignment for Whole Slide Image-based Survival Prediction
Yuntao Shou, Xiangyong Cao, PeiqiangYan PeiqiangYan et al.
CNS-Bench: Benchmarking Image Classifier Robustness Under Continuous Nuisance Shifts
Olaf Dünkel, Artur Jesslen, Jiahao Xie et al.
Multi-Schema Proximity Network for Composed Image Retrieval
Jiangming Shi, Xiangbo Yin, yeyunchen yeyunchen et al.
ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations
Tianming Liang, Kun-Yu Lin, Chaolei Tan et al.
ESCNet:Edge-Semantic Collaborative Network for Camouflaged Object Detection
Sheng Ye, Xin Chen, Yan Zhang et al.
Test-time Adaptation for Foundation Medical Segmentation Model Without Parametric Updates
Kecheng Chen, Xinyu Luo, Tiexin Qin et al.
Moment Quantization for Video Temporal Grounding
Xiaolong Sun, Le Wang, Sanping Zhou et al.
SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
Yongkun Du, Zhineng Chen, Hongtao Xie et al.
ROVI: A VLM-LLM Re-Captioned Dataset for Open-Vocabulary Instance-Grounded Text-to-Image Generation
Cihang Peng, Qiming HOU, Zhong Ren et al.
Feature Purification Matters: Suppressing Outlier Propagation for Training-Free Open-Vocabulary Semantic Segmentation
Shuo Jin, Siyue Yu, Bingfeng Zhang et al.
DiffPS: Leveraging Prior Knowledge of Diffusion Model for Person Search
Giyeol Kim, Sooyoung Yang, Jihyong Oh et al.
Mind the Gap: Aligning Vision Foundation Models to Image Feature Matching
Yuhan Liu, Jingwen Fu, Yang Wu et al.
Representation Shift: Unifying Token Compression with FlashAttention
Joonmyung Choi, Sanghyeok Lee, Byungoh Ko et al.
ZipVL: Accelerating Vision-Language Models through Dynamic Token Sparsity
Yefei He, Feng Chen, Jing Liu et al.
ProSAM: Enhancing the Robustness of SAM-based Visual Reference Segmentation with Probabilistic Prompts
Xiaoqi Wang, Clint Sebastian, Wenbin He et al.
LaCoOT: Layer Collapse through Optimal Transport
Victor Quétu, Zhu LIAO, Nour Hezbri et al.
Zero-Shot Compositional Video Learning with Coding Rate Reduction
Heeseok Jung, Jun-Hyeon Bak, Yujin Jeong et al.
Fuzzy Contrastive Decoding to Alleviate Object Hallucination in Large Vision-Language Models
Jieun Kim, Jinmyeong Kim, Yoonji Kim et al.
Semantic versus Identity: A Divide-and-Conquer Approach towards Adjustable Medical Image De-Identification
Yuan Tian, Shuo Wang, Rongzhao Zhang et al.
Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding
Yuanhan Zhang, Yunice Chew, Yuhao Dong et al.
Cross-View Isolated Sign Language Recognition via View Synthesis and Feature Disentanglement
Xin Shen, Xinyu Wang, Lei Shen et al.
Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction
Zeren Jiang, Chuanxia Zheng, Iro Laina et al.
Superpowering Open-Vocabulary Object Detectors for X-ray Vision
Pablo Garcia-Fernandez, Lorenzo Vaquero, Mingxuan Liu et al.
RhythmGuassian: Repurposing Generalizable Gaussian Model For Remote Physiological Measurement
Hao LU, Yuting Zhang, Jiaqi Tang et al.
On the Recovery of Cameras from Fundamental Matrices
Rakshith Madhavan, Federica Arrigoni
The Devil is in the Spurious Correlations: Boosting Moment Retrieval with Dynamic Learning
Xinyang Zhou, Fanyue Wei, Lixin Duan et al.
CABLD: Contrast-Agnostic Brain Landmark Detection with Consistency-Based Regularization
Soorena Salari, Arash Harirpoush, Hassan Rivaz et al.
Robustifying Zero-Shot Vision Language Models by Subspaces Alignment
Junhao Dong, Piotr Koniusz, Liaoyuan Feng et al.
SegAnyPET: Universal Promptable Segmentation from Positron Emission Tomography Images
Yichi Zhang, Le Xue, Wenbo zhang et al.
Multi-View Slot Attention Using Paraphrased Texts for Face Anti-Spoofing
Jeongmin Yu, Susang Kim, Kisu Lee et al.
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
Wenxuan Zhu, Bing Li, Cheng Zheng et al.
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
Weiming Ren, Wentao Ma, Huan Yang et al.
SAMora: Enhancing SAM through Hierarchical Self-Supervised Pre-Training for Medical Images
Shuhang Chen, Hangjie Yuan, Pengwei Liu et al.
FE-CLIP: Frequency Enhanced CLIP Model for Zero-Shot Anomaly Detection and Segmentation
Tao Gong, Qi Chu, Bin Liu et al.
Text-guided Visual Prompt DINO for Generic Segmentation
Yuchen Guan, Chong Sun, Canmiao Fu et al.
Bias-Resilient Weakly Supervised Semantic Segmentation Using Normalizing Flows
Xianglin Qiu, Xiaoyang Wang, Zhen Zhang et al.
Cracking Instance Jigsaw Puzzles: A Superior Alternative to Multiple Instance Learning for Whole Slide Image Analysis
Xiwen Chen, Peijie Qiu, Wenhui Zhu et al.
STDDNet: Harnessing Mamba for Video Polyp Segmentation via Spatial-aligned Temporal Modeling and Discriminative Dynamic Representation Learning
Guilian Chen, Huisi Wu, Jing Qin
MedSegFactory: Text-Guided Generation of Medical Image-Mask Pairs
Jiawei Mao, Yuhan Wang, Yucheng Tang et al.
DecAD: Decoupling Anomalies in Latent Space for Multi-Class Unsupervised Anomaly Detection
Xiaolei Wang, Xiaoyang Wang, Huihui Bai et al.
Few-Shot Pattern Detection via Template Matching and Regression
Eunchan Jo, Dahyun Kang, Sanghyun Kim et al.
Hierarchical Event Memory for Accurate and Low-latency Online Video Temporal Grounding
Minghang Zheng, Yuxin Peng, Benyuan Sun et al.
RA-BUSSeg: Relation-aware Semi-supervised Breast Ultrasound Image Segmentation via Adjacent Propagation and Cross-layer Alignment
Wanting ZHANG, Zhenhui Ding, Guilian Chen et al.
DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs
JIAHE ZHAO, rongkun Zheng, Yi Wang et al.
Exploring Probabilistic Modeling Beyond Domain Generalization for Semantic Segmentation
I-Hsiang Chen, Hua-En Chang, Wei-Ting Chen et al.
Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval
WonJun Moon, Cheol-Ho Cho, Woojin Jun et al.
Auto-Controlled Image Perception in MLLMs via Visual Perception Tokens
Runpeng Yu, Xinyin Ma, Xinchao Wang
Bridging the Gap between Brain and Machine in Interpreting Visual Semantics: Towards Self-adaptive Brain-to-Text Decoding
Jiaxuan Chen, Yu Qi, Yueming Wang et al.
DisTime: Distribution-based Time Representation for Video Large Language Models
yingsen zeng, Zepeng Huang, Yujie Zhong et al.
WeaveSeg: Iterative Contrast-weaving and Spectral Feature-refining for Nuclei Instance Segmentation
Jiajia Li, Huisi Wu, Jing Qin
CARIM: Caption-Based Autonomous Driving Scene Retrieval via Inclusive Text Matching
Minjoo Ki, Dae Jung Kim, Kisung Kim et al.
Modeling Saliency Dataset Bias
Matthias Kümmerer, Harneet Singh Khanuja, Matthias Bethge
Beyond Single Images: Retrieval Self-Augmented Unsupervised Camouflaged Object Detection
Ji Du, Xin WANG, Fangwei Hao et al.
Advancing Visual Large Language Model for Multi-granular Versatile Perception
Wentao Xiang, Haoxian Tan, Cong Wei et al.
Controllable Latent Space Augmentation for Digital Pathology
Sofiène Boutaj, Marin Scalbert, Pierre Marza et al.
Interpretable point cloud classification using multiple instance learning
Matt De Vries, Reed Naidoo, Olga Fourkioti et al.
Revisiting Efficient Semantic Segmentation: Learning Offsets for Better Spatial and Class Feature Alignment
Shi-Chen Zhang, Yunheng Li, Yu-Huan Wu et al.
Learning Beyond Still Frames: Scaling Vision-Language Models with Video
Yiyuan Zhang, Handong Li, Jing Liu et al.
Is CLIP ideal? No. Can we fix it? Yes!
Raphaela Kang, Yue Song, Georgia Gkioxari et al.
HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models
ZHIXIANG WEI, Guangting Wang, Xiaoxiao Ma et al.
Dynamic Dictionary Learning for Remote Sensing Image Segmentation
Xuechao Zou, Yue Li, Shun Zhang et al.
Temporal-aware Query Routing for Real-time Video Instance Segmentation
Zesen Cheng, Kehan Li, Yian Zhao et al.
Learnable Retrieval Enhanced Visual-Text Alignment and Fusion for Radiology Report Generation
Qin Zhou, Guoyan Liang, Xindi Li et al.
Anomaly Detection of Integrated Circuits Package Substrates Using the Large Vision Model SAIC: Dataset Construction, Methodology, and Application
Ruiyun Yu, Bingyang Guo, Haoyuan Li
Prompt-driven Transferable Adversarial Attack on Person Re-Identification with Attribute-aware Textual Inversion
Yuan Bian, Min Liu, Yunqi Yi et al.
Memory-Efficient 4-bit Preconditioned Stochastic Optimization
Jingyang Li, Kuangyu Ding, Kim-chuan Toh et al.
No More Sibling Rivalry: Debiasing Human-Object Interaction Detection
Bin Yang, Yulin Zhang, Hong-Yu Zhou et al.
DASH: Detection and Assessment of Systematic Hallucinations of VLMs
Maximilian Augustin, Yannic Neuhaus, Matthias Hein
DIH-CLIP: Unleashing the Diversity of Multi-Head Self-Attention for Training-Free Open-Vocabulary Semantic Segmentation
Songsong Duan, Xi Yang, Nannan Wang
Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration
Mark Endo, Xiaohan Wang, Serena Yeung-Levy
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
Yuzhang Shang, Mu Cai, Bingxin Xu et al.
HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics
Gueter Josmy Faure, Jia-Fong Yeh, Min-Hung Chen et al.
HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
Yusen Zhang, Wenliang Zheng, Aashrith Madasu et al.
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring
Yufei Zhan, Shurong Zheng, Yousong Zhu et al.
Debiasing Trace Guidance: Top-down Trace Distillation and Bottom-up Velocity Alignment for Unsupervised Anomaly Detection
Xingjian Wang, Li Chai, Jiming Chen
ODDR: Outlier Detection & Dimension Reduction Based Defense Against Adversarial Patches
Nandish Chattopadhyay, Amira Guesmi, Muhammad Abdullah Hanif et al.
Similarity Memory Prior is All You Need for Medical Image Segmentation
Hao Tang, Zhiqing Guo, Liejun Wang et al.
CalliReader: Contextualizing Chinese Calligraphy via an Embedding-Aligned Vision-Language Model
Yuxuan Luo, Jiaqi Tang, Chenyi Huang et al.
Accelerate 3D Object Detection Models via Zero-Shot Attention Key Pruning
Lizhen Xu, Xiuxiu Bai, Xiaojun Jia et al.
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
Anurag Bagchi, Zhipeng Bao, Yu-Xiong Wang et al.
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment
Yucheng Suo, Fan Ma, Linchao Zhu et al.
DC-TTA: Divide-and-Conquer Framework for Test-Time Adaptation of Interactive Segmentation
Jihun Kim, Hoyong Kwon, Hyeokjun Kweon et al.
FIND: Few-Shot Anomaly Inspection with Normal-Only Multi-Modal Data
YITING LI, Fayao Liu, Jingyi Liao et al.
VISO: Accelerating In-orbit Object Detection with Language-Guided Mask Learning and Sparse Inference
Meiqi Wang, Han Qiu
Unsupervised Histopathological Image Semantic Segmentation with Overlapping Patches Consistency Constraint
Wentian Cai, Weizhao Weng, Zihao Huang et al.
How Do Optical Flow and Textual Prompts Collaborate to Assist in Audio-Visual Semantic Segmentation?
Yujian Lee, Peng Gao, Yongqi Xu et al.
UINavBench: A Framework for Comprehensive Evaluation of Interactive Digital Agents
Harsh Agrawal, Eldon Schoop, Xinlei Pan et al.
VIPerson: Flexibly Generating Virtual Identity for Person Re-Identification
Xiao-Wen Zhang, Delong Zhang, Yi-Xing Peng et al.
Towards Robustness of Person Search against Corruptions
Woojung Son, Yoonki Cho, Guoyuan An et al.
Flow-MIL: Constructing Highly-expressive Latent Feature Space For Whole Slide Image Classification Using Normalizing Flow
Yingfan MA, Bohan An, Ao Shen et al.
Vision-Language Neural Graph Featurization for Extracting Retinal Lesions
Taimur Hassan, Anabia Sohail, Muzammal Naseer et al.
SSVQ: Unleashing the Potential of Vector Quantization with Sign-Splitting
Shuaiting Li, Juncan Deng, Chengxuan Wang et al.
VideoMiner: Iteratively Grounding Key Frames of Hour-Long Videos via Tree-based Group Relative Policy Optimization
Xinye Cao, Hongcan Guo, Jiawen Qian et al.
LawDIS: Language-Window-based Controllable Dichotomous Image Segmentation
Xinyu Yan, Meijun Sun, Ge-Peng Ji et al.
Open-Vocabulary HOI Detection with Interaction-aware Prompt and Concept Calibration
Ting Lei, Shaofeng Yin, Qingchao Chen et al.
Scaling Tumor Segmentation: Best Lessons from Real and Synthetic Data
Qi Chen, Xinze Zhou, Chen Liu et al.
Teaching AI the Anatomy Behind the Scan: Addressing Anatomical Flaws in Medical Image Segmentation with Learnable Prior
Young Seok Jeon, Hongfei Yang, Huazhu Fu et al.
Token-Efficient VLM: High-Resolution Image Understanding via Dynamic Region Proposal
Yitong Jiang, Jinwei Gu, Tianfan Xue et al.
Region-aware Anchoring Mechanism for Efficient Referring Visual Grounding
Shuyi Ouyang, Ziwei Niu, Hongyi Wang et al.
Enrich and Detect: Video Temporal Grounding with Multimodal LLMs
Shraman Pramanick, Effrosyni Mavroudi, Yale Song et al.
Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning
Zeyu Xi, Haoying Sun, Yaofei Wu et al.
Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training
Wooseong Jeong, Jegyeong Cho, Youngho Yoon et al.
Breaking Grid Constraints: Dynamic Graph Reconstruction Network for Multi-organ Segmentation
Junhao Xiao, Yang Wei, Jingyu Wang et al.
MaskSAM: Auto-prompt SAM with Mask Classification for Volumetric Medical Image Segmentation
Bin Xie, Hao Tang, Bin Duan et al.
MEH: A Multi-Style Dataset and Toolkit for Advancing Egyptian Hieroglyph Recognition
Maksim Golyadkin, Rubanova Alexandrovna, Aleksandr Utkov et al.
Hybrid-Tower: Fine-grained Pseudo-query Interaction and Generation for Text-to-Video Retrieval
Bangxiang Lan, Ruobing Xie, Ruixiang Zhao et al.
Unbiased Missing-modality Multimodal Learning
Ruiting Dai, Chenxi Li, Yandong Yan et al.
ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba
Juncan Deng, Shuaiting Li, Zeyu Wang et al.
B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens
Zhuqiang Lu, Zhenfei Yin, Mengwei He et al.
DM-EFS: Dynamically Multiplexed Expanded Features Set Form for Robust and Efficient Small Object Detection
Aashish Sharma
DiffTell: A High-Quality Dataset for Describing Image Manipulation Changes
Zonglin Di, Jing Shi, Yifei Fan et al.
Mixture-of-Scores: Robust Image-Text Data Valuation via Three Lines of Code
WU Sitong, Haoru Tan, Yukang Chen et al.
Inverse Image-Based Rendering for Light Field Generation from Single Images
Hyunjun Jung, Hae-Gon Jeon
HyperGCT: A Dynamic Hyper-GNN-Learned Geometric Constraint for 3D Registration
Xiyu Zhang, Jiayi Ma, Jianwei Guo et al.
AD-GS: Object-Aware B-Spline Gaussian Splatting for Self-Supervised Autonomous Driving
Jiawei Xu, Kai Deng, Zexin Fan et al.
Axis-level Symmetry Detection with Group-Equivariant Representation
Wongyun Yu, Ahyun Seo, Minsu Cho
PossLoss: A Reliable and Sensitive Facial Landmark Detection Loss Function
Qikui Zhu
U-ViLAR: Uncertainty-Aware Visual Localization for Autonomous Driving via Differentiable Association and Registration
Xiaofan Li, Zhihao Xu, Chenming Wu et al.
Group Inertial Poser: Multi-Person Pose and Global Translation from Sparse Inertial Sensors and Ultra-Wideband Ranging
Ying Xue, Jiaxi Jiang, Rayan Armani et al.
RESCUE: Crowd Evacuation Simulation via Controlling SDM-United Characters
Xiaolin Liu, Tianyi zhou, Hongbo Kang et al.
SG-LDM: Semantic-Guided LiDAR Generation via Latent-Aligned Diffusion
Zhengkang Xiang, Zizhao Li, Amir Khodabandeh et al.
PointGAC: Geometric-Aware Codebook for Masked Point Modeling
Abiao Li, Chenlei Lv, Guofeng Mei et al.
Statistical Confidence Rescoring for Robust 3D Scene Graph Generation from Multi-View Images
Qi Xun Yeo, Yanyan Li, Gim Hee Lee
PRM: Photometric Stereo based Large Reconstruction Model
Wenhang Ge, Jiantao Lin, Guibao SHEN et al.
Hi3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging
Chongjie Ye, Yushuang Wu, Ziteng Lu et al.
Dual-S3D: Hierarchical Dual-Path Selective SSM-CNN for High-Fidelity Implicit Reconstruction
Luoxi Zhang, Pragyan Shrestha, Yu Zhou et al.
FastPoint: Accelerating 3D Point Cloud Model Inference via Sample Point Distance Prediction
Donghyun Lee, Dawoon Jeong, Jae W. Lee et al.
RobuSTereo: Robust Zero-Shot Stereo Matching under Adverse Weather
Yuran Wang, Yingping Liang, Yutao Hu et al.
MMGeo: Multimodal Compositional Geo-Localization for UAVs
Yuxiang Ji, Boyong He, Zhuoyue Tan et al.
Large Scene Generation with Cube-Absorb Discrete Diffusion
Qianjiang Hu, Wei Hu
SynAD: Enhancing Real-World End-to-End Autonomous Driving Models through Synthetic Data Integration
Jongsuk Kim, Jae Young Lee, Gyojin Han et al.
Benchmarking Egocentric Visual-Inertial SLAM at City Scale
Anusha Krishnan, Shaohui Liu, Paul-Edouard Sarlin et al.
Gaussian-based World Model: Gaussian Priors for Voxel-Based Occupancy Prediction and Future Motion Prediction
Tuo Feng, Wenguan Wang, Yi Yang
Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction
JIXUAN FAN, Wanhua Li, Yifei Han et al.
DAA*: Deep Angular A Star for Image-based Path Planning
Zhiwei Xu
RCTDistill: Cross-Modal Knowledge Distillation Framework for Radar-Camera 3D Object Detection with Temporal Fusion
Geonho Bang, Minjae Seong, Jisong Kim et al.
GSRecon: Efficient Generalizable Gaussian Splatting for Surface Reconstruction from Sparse Views
Hang Yang, Le Hui, Jianjun Qian et al.
REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment
Haonan Han, Rui Yang, Huan Liao et al.
Towards Safer and Understandable Driver Intention Prediction
Mukilan Karuppasamy, Shankar Gangisetty, Shyam Nandan Rai et al.
InstaDrive: Instance-Aware Driving World Models for Realistic and Consistent Video Generation
Zhuoran Yang, Xi Guo, Chenjing Ding et al.
NormalLoc: Visual Localization on Textureless 3D Models using Surface Normals
Jiro Abe, Gaku Nakano, Kazumine Ogura
EmbodiedSplat: Personalized Real-to-Sim-to-Real Navigation with Gaussian Splats from a Mobile Device
Gunjan Chhablani, Xiaomeng Ye, Muhammad Zubair Irshad et al.
NGD: Neural Gradient Based Deformation for Monocular Garment Reconstruction
Soham Dasgupta, Shanthika Naik, Preet Savalia et al.
Lifting the Structural Morphing for Wide-Angle Images Rectification: Unified Content and Boundary Modeling
Wenting Luan, Siqi Lu, Yongbin Zheng et al.
RayletDF: Raylet Distance Fields for Generalizable 3D Surface Reconstruction from Point Clouds or Gaussians
Shenxing Wei, Jinxi Li, Yafei YANG et al.
Semantic-guided Camera Ray Regression for Visual Localization
Yesheng Zhang, Xu Zhao
Polarimetric Neural Field via Unified Complex-Valued Wave Representation
Chu Zhou, Yixin Yang, Junda Liao et al.
High-Precision 3D Measurement of Complex Textured Surfaces Using Multiple Filtering Approach
Yuchong Chen, Jian Yu, Shaoyan Gai et al.
From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos
Chenjian Gao, Lihe Ding, Rui Han et al.