Most Cited ICCV "continual adaptation" Papers
2,701 papers found • Page 7 of 14
Conference
FIND: Few-Shot Anomaly Inspection with Normal-Only Multi-Modal Data
YITING LI, Fayao Liu, Jingyi Liao et al.
Unsupervised Histopathological Image Semantic Segmentation with Overlapping Patches Consistency Constraint
Wentian Cai, Weizhao Weng, Zihao Huang et al.
How Do Optical Flow and Textual Prompts Collaborate to Assist in Audio-Visual Semantic Segmentation?
Yujian Lee, Peng Gao, Yongqi Xu et al.
UINavBench: A Framework for Comprehensive Evaluation of Interactive Digital Agents
Harsh Agrawal, Eldon Schoop, Xinlei Pan et al.
LawDIS: Language-Window-based Controllable Dichotomous Image Segmentation
Xinyu Yan, Meijun Sun, Ge-Peng Ji et al.
Open-Vocabulary HOI Detection with Interaction-aware Prompt and Concept Calibration
Ting Lei, Shaofeng Yin, Qingchao Chen et al.
Teaching AI the Anatomy Behind the Scan: Addressing Anatomical Flaws in Medical Image Segmentation with Learnable Prior
Young Seok Jeon, Hongfei Yang, Huazhu Fu et al.
Enrich and Detect: Video Temporal Grounding with Multimodal LLMs
Shraman Pramanick, Effrosyni Mavroudi, Yale Song et al.
Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning
Zeyu Xi, Haoying Sun, Yaofei Wu et al.
Conditional Latent Diffusion Models for Zero-Shot Instance Segmentation
Maximilian Ulmer, Wout Boerdijk, Rudolph Triebel et al.
Breaking Grid Constraints: Dynamic Graph Reconstruction Network for Multi-organ Segmentation
Junhao Xiao, Yang Wei, Jingyu Wang et al.
Hybrid-Tower: Fine-grained Pseudo-query Interaction and Generation for Text-to-Video Retrieval
Bangxiang Lan, Ruobing Xie, Ruixiang Zhao et al.
ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba
Juncan Deng, Shuaiting Li, Zeyu Wang et al.
Mixture-of-Scores: Robust Image-Text Data Valuation via Three Lines of Code
WU Sitong, Haoru Tan, Yukang Chen et al.
Axis-level Symmetry Detection with Group-Equivariant Representation
Wongyun Yu, Ahyun Seo, Minsu Cho
U-ViLAR: Uncertainty-Aware Visual Localization for Autonomous Driving via Differentiable Association and Registration
Xiaofan Li, Zhihao Xu, Chenming Wu et al.
Statistical Confidence Rescoring for Robust 3D Scene Graph Generation from Multi-View Images
Qi Xun Yeo, Yanyan Li, Gim Hee Lee
Hi3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging
Chongjie Ye, Yushuang Wu, Ziteng Lu et al.
Dual-S3D: Hierarchical Dual-Path Selective SSM-CNN for High-Fidelity Implicit Reconstruction
Luoxi Zhang, Pragyan Shrestha, Yu Zhou et al.
MMGeo: Multimodal Compositional Geo-Localization for UAVs
Yuxiang Ji, Boyong He, Zhuoyue Tan et al.
Large Scene Generation with Cube-Absorb Discrete Diffusion
Qianjiang Hu, Wei Hu
SynAD: Enhancing Real-World End-to-End Autonomous Driving Models through Synthetic Data Integration
Jongsuk Kim, Jae Young Lee, Gyojin Han et al.
Benchmarking Egocentric Visual-Inertial SLAM at City Scale
Anusha Krishnan, Shaohui Liu, Paul-Edouard Sarlin et al.
DAA*: Deep Angular A Star for Image-based Path Planning
Zhiwei Xu
RCTDistill: Cross-Modal Knowledge Distillation Framework for Radar-Camera 3D Object Detection with Temporal Fusion
Geonho Bang, Minjae Seong, Jisong Kim et al.
EmbodiedSplat: Personalized Real-to-Sim-to-Real Navigation with Gaussian Splats from a Mobile Device
Gunjan Chhablani, Xiaomeng Ye, Muhammad Zubair Irshad et al.
NGD: Neural Gradient Based Deformation for Monocular Garment Reconstruction
Soham Dasgupta, Shanthika Naik, Preet Savalia et al.
RayletDF: Raylet Distance Fields for Generalizable 3D Surface Reconstruction from Point Clouds or Gaussians
Shenxing Wei, Jinxi Li, Yafei YANG et al.
Semantic-guided Camera Ray Regression for Visual Localization
Yesheng Zhang, Xu Zhao
Polarimetric Neural Field via Unified Complex-Valued Wave Representation
Chu Zhou, Yixin Yang, Junda Liao et al.
From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos
Chenjian Gao, Lihe Ding, Rui Han et al.
Street Gaussians without 3D Object Tracker
Ruida Zhang, Chengxi Li, Chenyangguang Zhang et al.
HiNeuS: High-fidelity Neural Surface Mitigating Low-texture and Reflective Ambiguity
Yida Wang, Xueyang Zhang, Kun Zhan et al.
I2-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting
Zhimin Liao, Ping Wei, Ruijie Zhang et al.
MetaScope: Optics-Driven Neural Network for Ultra-Micro Metalens Endoscopy
Wuyang Li, Wentao Pan, Xiaoyuan Liu et al.
Free-running vs Synchronous: Single-Photon Lidar for High-flux 3D Imaging
Ruangrawee Kitichotkul, Shashwath Bharadwaj, Joshua Rapp et al.
Leaps and Bounds: An Improved Point Cloud Winding Number Formulation for Fast Normal Estimation and Surface Reconstruction
Chamin Hewa Koneputugodage, Dylan Campbell, Stephen Gould
Harnessing Text-to-Image Diffusion Models for Point Cloud Self-Supervised Learning
Yiyang Chen, Shanshan Zhao, Lunhao Duan et al.
OD-RASE: Ontology-Driven Risk Assessment and Safety Enhancement for Autonomous Driving
Kota Shimomura, Masaki Nambata, Atsuya Ishikawa et al.
UniGS: Modeling Unitary 3D Gaussians for Novel View Synthesis from Sparse-view Images
Jiamin WU, Kenkun Liu, Xiaoke Jiang et al.
TOTP: Transferable Online Pedestrian Trajectory Prediction with Temporal-Adaptive Mamba Latent Diffusion
Ziyang Ren, Ping Wei, Shangqi Deng et al.
UnMix-NeRF: Spectral Unmixing Meets Neural Radiance Fields
Fabian Perez, Sara Rojas Martinez, Carlos Hinojosa et al.
MaterialMVP: Illumination-Invariant Material Generation via Multi-view PBR Diffusion
Zebin He, Mx Yang, Shuhui Yang et al.
Visual Surface Wave Elastography: Revealing Subsurface Physical Properties via Visible Surface Waves
Alexander Ogren, Berthy Feng, Jihoon Ahn et al.
LANGTRAJ: Diffusion Model and Dataset for Language-Conditioned Trajectory Simulation
WEI-JER Chang, Masayoshi Tomizuka, Wei Zhan et al.
Temporal Overlapping Prediction: A Self-supervised Pre-training Method for LiDAR Moving Object Segmentation
Ziliang Miao, Runjian Chen, Yixi Cai et al.
GeoFormer: Geometry Point Encoder for 3D Object Detection with Graph-based Transformer
Xin Jin, Haisheng Su, Cong Ma et al.
AlignDiff: Learning Physically-Grounded Camera Alignment via Diffusion
Liuyue Xie, Jiancong Guo, Ozan Cakmakci et al.
Tile-wise vs. Image-wise: Random-Tile Loss and Training Paradigm for Gaussian Splatting
Xiaoyu Zhang, Weihong Pan, Xiaojun Xiang et al.
RoCo-Sim: Enhancing Roadside Collaborative Perception through Foreground Simulation
Yuwen Du, Anning Hu, Zichen Chao et al.
LaneDiffusion: Improving Centerline Graph Learning via Prior Injected BEV Feature Generation
Zijie Wang, Weiming Zhang, Wei Zhang et al.
Planar Affine Rectification from Local Change of Scale and Orientation
Yuval Nissan, Marc Pollefeys, Daniel Barath
ERNet: Efficient Non-Rigid Registration Network for Point Sequences
Guangzhao He, Yuxi Xiao, Zhen Xu et al.
Doppler-Aware LiDAR-RADAR Fusion for Weather-Robust 3D Detection
Yujeong Chae, Heejun Park, Hyeonseong Kim et al.
Egocentric Action-aware Inertial Localization in Point Clouds with Vision-Language Guidance
Mingfang Zhang, Ryo Yonetani, Yifei Huang et al.
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
Yifan Lu, Xuanchi Ren, Jiawei Yang et al.
GenFlow3D: Generative Scene Flow Estimation and Prediction on Point Cloud Sequences
Hanlin Li, Wenming Weng, Yueyi Zhang et al.
Splat-LOAM: Gaussian Splatting LiDAR Odometry and Mapping
Emanuele Giacomini, Luca Di Giammarino, Lorenzo De Rebotti et al.
AAA-Gaussians: Anti-Aliased and Artifact-Free 3D Gaussian Rendering
Michael Steiner, Thomas Köhler, Lukas Radl et al.
SAFT: Shape and Appearance of Fabrics from Template via Differentiable Physical Simulations from Monocular Video
David Stotko, Reinhard Klein
BridgeDepth: Bridging Monocular and Stereo Reasoning with Latent Alignment
Tongfan Guan, Jiaxin Guo, Chen Wang et al.
Decoupled Diffusion Sparks Adaptive Scene Generation
Yunsong Zhou, Naisheng Ye, William Ljungbergh et al.
Recover Biological Structure from Sparse-View Diffraction Images with Neural Volumetric Prior
Renzhi He, Haowen Zhou, Yubei Chen et al.
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
Xin Zhou, DINGKANG LIANG, Sifan Tu et al.
Instant GaussianImage: A Generalizable and Self-Adaptive Image Representation via 2D Gaussian Splatting
Zhaojie Zeng, Yuesong Wang, Chao Yang et al.
NeuraLeaf: Neural Parametric Leaf Models with Shape and Deformation Disentanglement
Yang Yang, Dongni Mao, Hiroaki Santo et al.
Stochastic Gradient Estimation for Higher-Order Differentiable Rendering
Zican Wang, Michael Fischer, Tobias Ritschel
Uncertainty-Aware Diffusion-Guided Refinement of 3D Scenes
Sarosij Bose, Arindam Dutta, Sayak Nag et al.
HouseCrafter: Lifting Floorplans to 3D Scenes with 2D Diffusion Models
YIWEN CHEN, Hieu Nguyen, Vikram Voleti et al.
Hi-Gaussian: Hierarchical Gaussians under Normalized Spherical Projection for Single-View 3D Reconstruction
Binjian Xie, Pengju Zhang, Hao Wei et al.
Exploiting Vision Language Model for Training-Free 3D Point Cloud OOD Detection via Graph Score Propagation
Tiankai Chen, Yushu Li, Adam Goodge et al.
Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving
Junhao Ge, Zuhong Liu, Longteng Fan et al.
Lidar Waveforms are Worth 40x128x33 Words
Dominik Scheuble, Hanno Holzhüter, Steven Peters et al.
Wide2Long: Learning Lens Compression and Perspective Adjustment for Wide-Angle to Telephoto Translation
Soumyadipta Banerjee, Jiaul Paik, Debashis Sen
SparseLaneSTP: Leveraging Spatio-Temporal Priors with Sparse Transformers for 3D Lane Detection
Maximilian Pittner, Joel Janai, Mario Faigle et al.
Relative Illumination Fields: Learning Medium and Light Independent Underwater Scenes
Mengkun She, Felix Seegräber, David Nakath et al.
HVPUNet: Hybrid-Voxel Point-cloud Upsampling Network
Juhyung Ha, Vibhas Vats, Alimoor Reza et al.
Stealthy Backdoor Attack in Federated Learning via Adaptive Layer-wise Gradient Alignment
Qingqian Yang, Peishen Yan, Xiaoyu Wu et al.
RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model
Huiyang Hu, Peijin Wang, Hanbo Bi et al.
SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling
Xianglong He, Zi-Xin Zou, Chia Hao Chen et al.
Diffusion Transformer meets Multi-level Wavelet Spectrum for Single Image Super-Resolution
Peng Du, Hui Li, Han Xu et al.
Spatially-Varying Autofocus
Yingsi Qin, Aswin Sankaranarayanan, Matthew O'Toole
M2SFormer: Multi-Spectral and Multi-Scale Attention with Edge-Aware Difficulty Guidance for Image Forgery Localization
Ju-Hyeon Nam, Dong-Hyun Moon, Sang-Chul Lee
Articulate3D: Holistic Understanding of 3D Scenes as Universal Scene Description
Anna-Maria Halacheva, Yang Miao, Jan-Nico Zaech et al.
SU-RGS: Relightable 3D Gaussian Splatting from Sparse Views under Unconstrained Illuminations
Qi Zhang, Chi Huang, Qian Zhang et al.
Gradient Extrapolation for Debiased Representation Learning
Ihab Asaad, Maha Shadaydeh, Joachim Denzler
World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model
Yupeng Zheng, Pengxuan Yang, Zebin Xing et al.
Customizing Domain Adapters for Domain Generalization
Yuyang Ji, Zeyi Huang, Haohan Wang et al.
Soft Separation and Distillation: Toward Global Uniformity in Federated Unsupervised Learning
Hung-Chieh Fang, Hsuan-Tien Lin, Irwin King et al.
Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image
Jerred Chen, Ronald Clark
Adaptive Dual Uncertainty Optimization: Boosting Monocular 3D Object Detection under Test-Time Shifts
Zixuan Hu, Dongxiao Li, Xinzhu Ma et al.
Beyond Losses Reweighting: Empowering Multi-Task Learning via the Generalization Perspective
Hoang Phan, Tung Lam Tran, Quyen Tran et al.
Learning Null Geodesics for Gravitational Lensing Rendering in General Relativity
Mingyuan Sun, Zheng Fang, Jiaxu Wang et al.
Object-centric Video Question Answering with Visual Grounding and Referring
Haochen Wang, Qirui Chen, Cilin Yan et al.
Exploiting Frequency Dynamics for Enhanced Multimodal Event-based Action Recognition
Meiqi Cao, Xiangbo Shu, Xin Jiang et al.
How Far are AI-generated Videos from Simulating the 3D Visual World: A Learned 3D Evaluation Approach
Chirui CHANG, Jiahui Liu, Zhengzhe Liu et al.
WIPES: Wavelet-based Visual Primitives
Wenhao Zhang, Hao Zhu, Delong Wu et al.
CoSMIC: Continual Self-supervised Learning for Multi-Domain Medical Imaging via Conditional Mutual Information Maximization
Yihang Liu, Ying Wen, Longzhen Yang et al.
Diffusion Curriculum: Synthetic-to-Real Data Curriculum via Image-Guided Diffusion
Yijun Liang, Shweta Bhardwaj, Tianyi Zhou
Advancing Textual Prompt Learning with Anchored Attributes
Zheng Li, Yibing Song, Ming-Ming Cheng et al.
Dual-Rate Dynamic Teacher for Source-Free Domain Adaptive Object Detection
Qi He, Xiao Wu, Jun-Yan He et al.
OV3D-CG: Open-vocabulary 3D Instance Segmentation with Contextual Guidance
Mingquan Zhou, Chen He, Ruiping Wang et al.
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Zhisheng Zhong, Chengyao Wang, Yuqi Liu et al.
Enhancing Mamba Decoder with Bidirectional Interaction in Multi-Task Dense Prediction
Mang Cao, Sanping Zhou, Yizhe Li et al.
SITE: towards Spatial Intelligence Thorough Evaluation
Wenqi Wang, Reuben Tan, Pengyue Zhu et al.
SHIFT: Smoothing Hallucinations by Information Flow Tuning for Multimodal Large Language Models
Sudong Wang, Yunjian Zhang, Yao Zhu et al.
Text2VDM: Text to Vector Displacement Maps for Expressive and Interactive 3D Sculpting
Hengyu Meng, Duotun Wang, Zhijing Shao et al.
Mamba-3VL: Taming State Space Model for 3D Vision Language Learning
Yuan Wang, Yuxin Chen, Zhongang Qi et al.
MobileViCLIP: An Efficient Video-Text Model for Mobile Devices
Min Yang, Zihan Jia, Zhilin Dai et al.
MATE: Motion-Augmented Temporal Consistency for Event-based Point Tracking
Han Han, Wei Zhai, Yang Cao et al.
Asynchronous Event Error-Minimizing Noise for Safeguarding Event Dataset
Ruofei WANG, Peiqi Duan, Boxin Shi et al.
Vector Contrastive Learning For Pixel-Wise Pretraining In Medical Vision
Yuting He, Shuo Li
Efficient Fine-Tuning of Large Models via Nested Low-Rank Adaptation
Lujun Li, Cheng Lin, Dezhi Li et al.
Dual-level Prototype Learning for Composite Degraded Image Restoration
Zhongze Wang, Haitao Zhao, Lujian Yao et al.
Efficient Input-level Backdoor Defense on Text-to-Image Synthesis via Neuron Activation Variation
Shengfang ZHAI, Jiajun Li, Yue Liu et al.
GReg: Geometry-Aware Region Refinement for Sign Language Video Generation
Tongkai Shi, Lianyu Hu, Fanhua Shang et al.
FineMotion: A Dataset and Benchmark with both Spatial and Temporal Annotation for Fine-grained Motion Generation and Editing
Bizhu Wu, Jinheng Xie, Meidan Ding et al.
Knowledge Transfer from Interaction Learning
Yilin Gao, Kangyi Chen, Zhongxing Peng et al.
WIR3D: Visually-Informed and Geometry-Aware 3D Shape Abstraction
Richard Liu, Daniel Fu, Noah Tan et al.
Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection
Jinglun Li, Kaixun Jiang, Zhaoyu Chen et al.
Cassic: Towards Content-Adaptive State-Space Models for Learned Image Compression
Shiyu Qin, Jinpeng Wang, Yimin Zhou et al.
SpectralAR: Spectral Autoregressive Visual Generation
Yuanhui Huang, Weiliang Chen, Wenzhao Zheng et al.
Boosting Adversarial Transferability via Negative Hessian Trace Regularization
Yunfei Long, Zilin Tian, Liguo Zhang et al.
OneGT: One-Shot Geometry-Texture Neural Rendering for Head Avatars
Jinshu Chen, Bingchuan Li, Fan Zhang et al.
Unsupervised Visible-Infrared Person Re-identification under Unpaired Settings
Haoyu Yao, Bin Yang, Wenke Huang et al.
Adaptive Prompt Learning via Gaussian Outlier Synthesis for Out-of-distribution Detection
Yongkang Zhang, Dongyu She, Zhong Zhou
A Differentiable Wave Optics Model for End-to-End Computational Imaging System Optimization
Chi-Jui Ho, Yash Belhe, Steve Rotenberg et al.
OCK: Unsupervised Dynamic Video Prediction with Object-Centric Kinematics
YeonJi Song, Jaein Kim, Suhyung Choi et al.
Intra-view and Inter-view Correlation Guided Multi-view Novel Class Discovery
Xinhang Wan, Jiyuan Liu, Qian Qu et al.
HUST: High-Fidelity Unbiased Skin Tone Estimation via Texture Quantization
Zimin Ran, Xingyu Ren, Xiang An et al.
ProbMED: A Probabilistic Framework for Medical Multimodal Binding
Yuan Gao, Sangwook Kim, Jianzhong You et al.
CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning
Duo Wu, Jinghe Wang, Yuan Meng et al.
Dynamic Group Detection using VLM-augmented Temporal Groupness Graph
Kaname Yokoyama, Chihiro Nakatani, Norimichi Ukita
CountSE: Soft Exemplar Open-set Object Counting
Shuai Liu, Peng Zhang, Shiwei Zhang et al.
GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices
Xudong LU, Yinghao Chen, Renshou Wu et al.
MedVSR: Medical Video Super-Resolution with Cross State-Space Propagation
Xinyu Liu, Guolei Sun, Cheng Wang et al.
Trans-Adapter: A Plug-and-Play Framework for Transparent Image Inpainting
Yuekun Dai, Haitian Li, Shangchen Zhou et al.
Generalization-Preserved Learning: Closing the Backdoor to Catastrophic Forgetting in Continual Deepfake Detection
Xueyi Zhang, Peiyin Zhu, Chengwei Zhang et al.
IGD: Instructional Graphic Design with Multimodal Layer Generation
Yadong Qu, Shancheng Fang, Yuxin Wang et al.
Parameter-Efficient Adaptation of Geospatial Foundation Models through Embedding Deflection
Romain Thoreau, Valerio Marsocci, Dawa Derksen
CityGS-X: A Scalable Architecture for Efficient and Geometrically Accurate Large-Scale Scene Reconstruction
Yuanyuan Gao, Hao Li, Jiaqi Chen et al.
AIRA: Activation-Informed Low-Rank Adaptation for Large Models
Lujun Li, Dezhi Li, Cheng Lin et al.
Face Retouching with Diffusion Data Generation and Spectral Restorement
Zhidan Xu, Xiaoqin Zhang, Shijian Lu
Att-Adapter: A Robust and Precise Domain-Specific Multi-Attributes T2I Diffusion Adapter via Conditional Variational Autoencoder
Wonwoong Cho, Yan-Ying Chen, Matthew Klenk et al.
Class Token as Proxy: Optimal Transport-assisted Proxy Learning for Weakly Supervised Semantic Segmentation
Jian Wang, Tianhong Dai, Bingfeng Zhang et al.
3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt
Lukas Höllein, Aljaz Bozic, Michael Zollhöfer et al.
GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scene
Xiao Chen, Tai Wang, Quanyi Li et al.
CA2C: A Prior-Knowledge-Free Approach for Robust Label Noise Learning via Asymmetric Co-learning and Co-training
Mengmeng Sheng, Zeren Sun, Tianfei Zhou et al.
Point Cloud Self-supervised Learning via 3D to Multi-view Masked Learner
Zhimin Chen, Xuewei Chen, Xiao Guo et al.
MSA2: Multi-task Framework with Structure-aware and Style-adaptive Character Representation for Open-set Chinese Text Recognition
Yangfu Li, Hongjian Zhan, Qi Liu et al.
MultiModal Action Conditioned Video Simulation
Yichen Li, Antonio Torralba
FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization
Hao Chen, Shell Xu Hu, Wayne Luk et al.
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
Zhangquan Chen, Xufang Luo, Dongsheng Li
ClearSight: Human Vision-Inspired Solutions for Event-Based Motion Deblurring
Xiaopeng LIN, Yulong Huang, Hongwei Ren et al.
LEGO-Maker: A Semantic-Driven Algorithm for Text-to-3D Generation
Yifei Zhang, Lei Chen
Dense Policy: Bidirectional Autoregressive Learning of Actions
Yue Su, Xinyu Zhan, Hongjie Fang et al.
DOGR: Towards Versatile Visual Document Grounding and Referring
Yinan Zhou, Yuxin Chen, Haokun Lin et al.
ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation
Xiwei Xuan, Ziquan Deng, Kwan-Liu Ma
MonoMobility: Zero-Shot 3D Mobility Analysis from Monocular Videos
Hongyi Zhou, Xiaogang Wang, Yulan Guo et al.
Performing Defocus Deblurring by Modeling its Formation Process
Zhengbo Zhang, Lin Geng Foo, Hossein Rahmani et al.
Supervised Exploratory Learning for Long-Tailed Visual Recognition
Zhongquan Jian, Yanhao Chen, Wangyancheng Wangyancheng et al.
OmniDiff: A Comprehensive Benchmark for Fine-grained Image Difference Captioning
Yuan Liu, Saihui Hou, Saijie Hou et al.
TimeExpert: An Expert-Guided Video LLM for Video Temporal Grounding
Zuhao Yang, Yingchen Yu, Yunqing Zhao et al.
Active Perception Meets Rule-Guided RL: A Two-Phase Approach for Precise Object Navigation in Complex Environments
Liang Qin, Min Wang, Peiwei Li et al.
GaussianReg: Rapid 2D/3D Registration for Emergency Surgery via Explicit 3D Modeling with Gaussian Primitives
Weihao Yu, Xiaoqing Guo, Xinyu Liu et al.
ArgoTweak: Towards Self-Updating HD Maps through Structured Priors
Lena Wild, Rafael Valencia, Patric Jensfelt
SpatialCrafter: Unleashing the Imagination of Video Diffusion Models for Scene Reconstruction from Limited Observations
Songchun Zhang, Huiyao Xu, Sitong Guo et al.
Hybrid Layout Control for Diffusion Transformer: Fewer Annotations, Superior Aesthetics
Keming Wu, Junwen Chen, Zhanhao Liang et al.
FedXDS: Leveraging Model Attribution Methods to counteract Data Heterogeneity in Federated Learning
Maximilian Hoefler, Karsten Mueller, Wojciech Samek
Visual Textualization for Image Prompted Object Detection
Yongjian Wu, Yang Zhou, Jiya Saiyin et al.
LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs
Haoran Lou, Chunxiao Fan, Ziyan Liu et al.
GMMamba: Group Masking Mamba for Whole Slide Image Classification
Tingting Zheng, Hongxun Yao, Kui Jiang et al.
RareCLIP: Rarity-aware Online Zero-shot Industrial Anomaly Detection
Jianfang He, Min Cao, Silong Peng et al.
Temporal Rate Reduction Clustering for Human Motion Segmentation
Xianghan Meng, Zhengyu Tong, Zhiyuan Huang et al.
Separation for Better Integration: Disentangling Edge and Motion in Event-based Deblurring
Yufei Zhu, Hao Chen, Yongjian Deng et al.
Diversity-Enhanced Distribution Alignment for Dataset Distillation
Hongcheng Li, Yucan Zhou, Xiaoyan Gu et al.
Adapt Foundational Segmentation Models with Heterogeneous Searching Space
Li Yi, Jie Hu, Songan Zhang et al.
Think Twice: Test-Time Reasoning for Robust CLIP Zero-Shot Classification
Shenyu Lu, Zhaoying Pan, Xiaoqian Wang
Counting Stacked Objects
Corentin Dumery, Noa Ette, Aoxiang Fan et al.
GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling
Pinxin Liu, Luchuan Song, Junhua Huang et al.
SDFormer: Vision-based 3D Semantic Scene Completion via SAM-assisted Dual-channel Voxel Transformer
Yujie Xue, Huilong Pi, Jiapeng Zhang et al.
TopoTTA: Topology-Enhanced Test-Time Adaptation for Tubular Structure Segmentation
Jiale Zhou, Wenhan Wang, Shikun Li et al.
MagShield: Towards Better Robustness in Sparse Inertial Motion Capture Under Magnetic Disturbances
Yunzhe Shao, Xinyu Yi, Lu Yin et al.
DeFSS: Image-to-Mask Denoising Learning for Few-shot Segmentation
Zishu Qin, Junhao Xu, Weifeng Ge
TAD-E2E: A Large-scale End-to-end Autonomous Driving Dataset
Chang Liu, mingxuzhu mingxuzhu, Zheyuan Zhang et al.
Photolithography Overlay Map Generation with Implicit Knowledge Distillation Diffusion Transformer
YuanFu Yang, Hsiu-Hui Hsiao
VehicleMAE: View-asymmetry Mutual Learning for Vehicle Re-identification Pre-training via Masked AutoEncoders
Qi Wang, Zeyu Zhang, Dong Wang et al.
Multi-scenario Overlapping Text Segmentation with Depth Awareness
Yang Liu, Xudong Xie, Yuliang Liu et al.
FullDiT: Video Generative Foundation Models with Multimodal Control via Full Attention
Xuan Ju, Weicai Ye, Quande Liu et al.
Learning Hierarchical Line Buffer for Image Processing
Jiacheng Li, Feiran Li, Daisuke Iso
Humans as Checkerboards: Calibrating Camera Motion Scale for World-Coordinate Human Mesh Recovery
Fengyuan Yang, Kerui Gu, Ha Linh Nguyen et al.
GeoMan: Temporally Consistent Human Geometry Estimation using Image-to-Video Diffusion
Gwanghyun Kim, Xueting Li, Ye Yuan et al.
Stereo Any Video: Temporally Consistent Stereo Matching
Junpeng Jing, Weixun Luo, Ye Mao et al.
ViT-Split: Unleashing the Power of Vision Foundation Models via Efficient Splitting Heads
Yifan Li, Xin Li, Tianqin Li et al.
Cycle-Consistent Learning for Joint Layout-to-Image Generation and Object Detection
Xinhao Cai, Qiuxia Lai, Gensheng Pei et al.
CarGait: Cross-Attention based Re-ranking for Gait recognition
Gavriel Habib, Noa Barzilay, Or Shimshi et al.
StyleSRN: Scene Text Image Super-Resolution with Text Style Embedding
Shengrong Yuan, Runmin Wang, Ke Hao et al.
Frequency-Guided Diffusion for Training-Free Text-Driven Image Translation
Zheng Gao, Jifei Song, Zhensong Zhang et al.
Frequency-Semantic Enhanced Variational Autoencoder for Zero-Shot Skeleton-based Action Recognition
Wenhan Wu, Zhishuai Guo, Chen Chen et al.
Cross-Category Subjectivity Generalization for Style-Adaptive Sketch Re-ID
Zechao Hu, Zhengwei Yang, Hao Li et al.