Most Cited ICCV "microtransactions" Papers
2,701 papers found • Page 13 of 14
Conference
Efficient Input-level Backdoor Defense on Text-to-Image Synthesis via Neuron Activation Variation
Shengfang ZHAI, Jiajun Li, Yue Liu et al.
Decoupled Multi-Predictor Optimization for Inference-Efficient Model Tuning
Liwei Luo, Shuaitengyuan Li, Dongwei Ren et al.
ACAM-KD: Adaptive and Cooperative Attention Masking for Knowledge Distillation
Qizhen Lan, Qing Tian
GReg: Geometry-Aware Region Refinement for Sign Language Video Generation
Tongkai Shi, Lianyu Hu, Fanhua Shang et al.
Unsupervised Part Discovery via Descriptor-Based Masked Image Restoration with Optimized Constraints
Jiahao Xia, Yike Wu, Wenjian Huang et al.
NETracer: A Topology-Aware Iterative Tracing Approach for Tubular Structure Extraction
Chao Liu, Yangbo Jiang, Nenggan Zheng
MotionCtrl: A Real-time Controllable Vision-Language-Motion Model
Bin Cao, Sipeng Zheng, Ye Wang et al.
UIPro: Unleashing Superior Interaction Capability For GUI Agents
Hongxin Li, Jingran Su, Jingfan CHEN et al.
SALAD -- Semantics-Aware Logical Anomaly Detection
Matic Fučka, Vitjan Zavrtanik, Danijel Skocaj
FineMotion: A Dataset and Benchmark with both Spatial and Temporal Annotation for Fine-grained Motion Generation and Editing
Bizhu Wu, Jinheng Xie, Meidan Ding et al.
VLR-Driver: Large Vision-Language-Reasoning Models for Embodied Autonomous Driving
Fanjie Kong, Yitong Li, Weihuang Chen et al.
Vid-Group: Temporal Video Grounding Pretraining from Unlabeled Videos in the Wild
Peijun Bao, Chenqi Kong, SIYUAN YANG et al.
Knowledge Transfer from Interaction Learning
Yilin Gao, Kangyi Chen, Zhongxing Peng et al.
WIR3D: Visually-Informed and Geometry-Aware 3D Shape Abstraction
Richard Liu, Daniel Fu, Noah Tan et al.
Temperature in Cosine-based Softmax Loss
Takumi Kobayashi
Multi-modal Segment Anything Model for Camouflaged Scene Segmentation
Guangyu Ren, Hengyan Liu, Michalis Lazarou et al.
Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection
Jinglun Li, Kaixun Jiang, Zhaoyu Chen et al.
Cassic: Towards Content-Adaptive State-Space Models for Learned Image Compression
Shiyu Qin, Jinpeng Wang, Yimin Zhou et al.
SpectralAR: Spectral Autoregressive Visual Generation
Yuanhui Huang, Weiliang Chen, Wenzhao Zheng et al.
Boosting Adversarial Transferability via Negative Hessian Trace Regularization
Yunfei Long, Zilin Tian, Liguo Zhang et al.
AcZeroTS: Active Learning for Zero-shot Tissue Segmentation in Pathology Images
Jiao Tang, Junjie Zhou, Bo Qian et al.
OneGT: One-Shot Geometry-Texture Neural Rendering for Head Avatars
Jinshu Chen, Bingchuan Li, Fan Zhang et al.
Unsupervised Visible-Infrared Person Re-identification under Unpaired Settings
Haoyu Yao, Bin Yang, Wenke Huang et al.
Adaptive Prompt Learning via Gaussian Outlier Synthesis for Out-of-distribution Detection
Yongkang Zhang, Dongyu She, Zhong Zhou
Can We Achieve Efficient Diffusion Without Self-Attention? Distilling Self-Attention into Convolutions
ZiYi Dong, Chengxing Zhou, Weijian Deng et al.
Ultra-Precision 6DoF Pose Estimation Using 2-D Interpolated Discrete Fourier Transform
Guowei Shi, Zian Mao, Peisen Huang
A Differentiable Wave Optics Model for End-to-End Computational Imaging System Optimization
Chi-Jui Ho, Yash Belhe, Steve Rotenberg et al.
AMDANet: Attention-Driven Multi-Perspective Discrepancy Alignment for RGB-Infrared Image Fusion and Segmentation
Haifeng Zhong, Fan Tang, Zhuo Chen et al.
OCK: Unsupervised Dynamic Video Prediction with Object-Centric Kinematics
YeonJi Song, Jaein Kim, Suhyung Choi et al.
Prompt Guidance and Human Proximal Perception for HOT Prediction with Regional Joint Loss
Yuxiao Wang, Yu Lei, Zhenao WEI et al.
Coupling the Generator with Teacher for Effective Data-Free Knowledge Distillation
Xu Chen, Yang Li, Yahong Han et al.
Towards a Universal Image Degradation Model via Content-Degradation Disentanglement
Wenbo Yang, Zhongling Wang, Zhou Wang
Intra-view and Inter-view Correlation Guided Multi-view Novel Class Discovery
Xinhang Wan, Jiyuan Liu, Qian Qu et al.
HUST: High-Fidelity Unbiased Skin Tone Estimation via Texture Quantization
Zimin Ran, Xingyu Ren, Xiang An et al.
Know Your Attention Maps: Class-specific Token Masking for Weakly Supervised Semantic Segmentation
Joëlle Hanna, Damian Borth
Structure-Guided Diffusion Models for High-Fidelity Portrait Shadow Removal
wanchang Yu, Qing Zhang, Rongjia Zheng et al.
FreeDNA: Endowing Domain Adaptation of Diffusion-Based Dense Prediction with Training-Free Domain Noise Alignment
Hang Xu, Jie Huang, Linjiang Huang et al.
ProbMED: A Probabilistic Framework for Medical Multimodal Binding
Yuan Gao, Sangwook Kim, Jianzhong You et al.
Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective
Yingyu Liang, Zhizhou Sha, Zhenmei Shi et al.
FDPT: Federated Discrete Prompt Tuning for Black-Box Visual-Language Models
Jiaqi Wu, Simin Chen, Jing Tang et al.
CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning
Duo Wu, Jinghe Wang, Yuan Meng et al.
Dynamic Group Detection using VLM-augmented Temporal Groupness Graph
Kaname Yokoyama, Chihiro Nakatani, Norimichi Ukita
A Tiny Change, A Giant Leap: Long-Tailed Class-Incremental Learning via Geometric Prototype Alignment
xinyi lai, Luojun Lin, Weijie Chen et al.
CountSE: Soft Exemplar Open-set Object Counting
Shuai Liu, Peng Zhang, Shiwei Zhang et al.
Sparfels: Fast Reconstruction from Sparse Unposed Imagery
Shubhendu Jena, Amine Ouasfi, Mae Younes et al.
GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices
Xudong LU, Yinghao Chen, Renshou Wu et al.
MedVSR: Medical Video Super-Resolution with Cross State-Space Propagation
Xinyu Liu, Guolei Sun, Cheng Wang et al.
Top2Pano: Learning to Generate Indoor Panoramas from Top-Down View
Zitong Zhang, Suranjan Gautam, Rui Yu
MuGS: Multi-Baseline Generalizable Gaussian Splatting Reconstruction
Yaopeng Lou, Liao Shen, Tianqi Liu et al.
Region-Level Data Attribution for Text-to-Image Generative Models
Trong Bang Nguyen, Phi Le Nguyen, Simon Lucey et al.
Trans-Adapter: A Plug-and-Play Framework for Transparent Image Inpainting
Yuekun Dai, Haitian Li, Shangchen Zhou et al.
Generalization-Preserved Learning: Closing the Backdoor to Catastrophic Forgetting in Continual Deepfake Detection
Xueyi Zhang, Peiyin Zhu, Chengwei Zhang et al.
LangBridge: Interpreting Image as a Combination of Language Embeddings
Jiaqi Liao, Yuwei Niu, Fanqing Meng et al.
IGD: Instructional Graphic Design with Multimodal Layer Generation
Yadong Qu, Shancheng Fang, Yuxin Wang et al.
Parameter-Efficient Adaptation of Geospatial Foundation Models through Embedding Deflection
Romain Thoreau, Valerio Marsocci, Dawa Derksen
CityGS-X: A Scalable Architecture for Efficient and Geometrically Accurate Large-Scale Scene Reconstruction
Yuanyuan Gao, Hao Li, Jiaqi Chen et al.
AIRA: Activation-Informed Low-Rank Adaptation for Large Models
Lujun Li, Dezhi Li, Cheng Lin et al.
Embodied Navigation with Auxiliary Task of Action Description Prediction
Haru Kondoh, Asako Kanezaki
Face Retouching with Diffusion Data Generation and Spectral Restorement
Zhidan Xu, Xiaoqin Zhang, Shijian Lu
Att-Adapter: A Robust and Precise Domain-Specific Multi-Attributes T2I Diffusion Adapter via Conditional Variational Autoencoder
Wonwoong Cho, Yan-Ying Chen, Matthew Klenk et al.
Neural Solver of Dichromatic Reflection Model for Specular Highlight Removal
Gang Fu
Wavelet Policy: Lifting Scheme for Policy Learning in Long-Horizon Tasks
Hao Huang, Shuaihang Yuan, Geeta Chandra Raju Bethala et al.
Contrastive Flow Matching
George Stoica, Vivek Ramanujan, Xiang Fan et al.
Class Token as Proxy: Optimal Transport-assisted Proxy Learning for Weakly Supervised Semantic Segmentation
Jian Wang, Tianhong Dai, Bingfeng Zhang et al.
HOLa: Zero-Shot HOI Detection with Low-Rank Decomposed VLM Feature Adaptation
Qinqian Lei, Bo Wang, Robby Tan
AllGCD: Leveraging All Unlabeled Data for Generalized Category Discovery
Xinzi Cao, Ke Chen, Feidiao Yang et al.
Towards Long-Horizon Vision-Language-Action System: Reasoning, Acting and Memory
Daixun Li, Yusi Zhang, Mingxiang Cao et al.
UniFuse: A Unified All-in-One Framework for Multi-Modal Medical Image Fusion Under Diverse Degradations and Misalignments
Dayong Su, Yafei Zhang, Huafeng Li et al.
3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt
Lukas Höllein, Aljaz Bozic, Michael Zollhöfer et al.
GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scene
Xiao Chen, Tai Wang, Quanyi Li et al.
CopyrightShield: Enhancing Diffusion Model Security Against Copyright Infringement Attacks
Zhixiang Guo, Siyuan Liang, Aishan Liu et al.
CA2C: A Prior-Knowledge-Free Approach for Robust Label Noise Learning via Asymmetric Co-learning and Co-training
Mengmeng Sheng, Zeren Sun, Tianfei Zhou et al.
Learnable Logit Adjustment for Imbalanced Semi-Supervised Learning under Class Distribution Mismatch
lee hyuck, Taemin Park, Heeyoung Kim
CARL: Causality-guided Architecture Representation Learning for an Interpretable Performance Predictor
Han Ji, Yuqi Feng, Jiahao Fan et al.
TCFG: Truncated Classifier-Free Guidance for Efficient and Scalable Text-to-Image Acceleration
Xiaomeng Fu, Jia Li
Point Cloud Self-supervised Learning via 3D to Multi-view Masked Learner
Zhimin Chen, Xuewei Chen, Xiao Guo et al.
MSA2: Multi-task Framework with Structure-aware and Style-adaptive Character Representation for Open-set Chinese Text Recognition
Yangfu Li, Hongjian Zhan, Qi Liu et al.
DiffPCI: Large Motion Point Cloud frame Interpolation with Diffusion Model
tianyu zhang, Haobo Jiang, jian Yang et al.
MultiModal Action Conditioned Video Simulation
Yichen Li, Antonio Torralba
Local Dense Logit Relations for Enhanced Knowledge Distillation
Liuchi Xu, Kang Liu, Jinshuai Liu et al.
FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization
Hao Chen, Shell Xu Hu, Wayne Luk et al.
HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding
JIAHE ZHAO, RuiBing Hou, zejie tian et al.
Soft Local Completeness: Rethinking Completeness in XAI
Ziv Weiss Haddad, Oren Barkan, Yehonatan Elisha et al.
ClearSight: Human Vision-Inspired Solutions for Event-Based Motion Deblurring
Xiaopeng LIN, Yulong Huang, Hongwei Ren et al.
PBFG: A New Physically-Based Dataset and Removal of Lens Flares and Glares
Jie Zhu, Sungkil Lee
Correspondence as Video: Test-Time Adaption on SAM2 for Reference Segmentation in the Wild
Haoran Wang, Zekun Li, Jian Zhang et al.
An Information-Theoretic Regularizer for Lossy Neural Image Compression
ZHANG YINGWEN, Meng Wang, Xihua Sheng et al.
Knowledge-Guided Part Segmentation
Xuejian Gou, Fang Liu, Licheng Jiao et al.
Controllable Feature Whitening for Hyperparameter-Free Bias Mitigation
Yooshin Cho, Hanbyel Cho, Janghyeon Lee et al.
KV-Edit: Training-Free Image Editing for Precise Background Preservation
Tianrui Zhu, Shiyi Zhang, Jiawei Shao et al.
FusionPhys: A Flexible Framework for Fusing Complementary Sensing Modalities in Remote Physiological Measurement
Chenhang Ying, Huiyu Yang, Jieyi Ge et al.
DiffVSR: Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations
Xiaohui Li, Yihao Liu, Shuo Cao et al.
Power of Cooperative Supervision: Multiple Teachers Framework for Advanced 3D Semi-Supervised Object Detection
Jin-Hee Lee, Jae-keun Lee, Jeseok Kim et al.
Adapting In-Domain Few-Shot Segmentation to New Domains without Source Domain Retraining
Qi Fan, Kaiqi Liu, Nian Liu et al.
ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching
Yuxuan Yuan, Luyao Tang, Chaoqi Chen et al.
DADet: Safeguarding Image Conditional Diffusion Models against Adversarial and Backdoor Attacks via Diffusion Anomaly Detection
Hongwei Yu, Xinlong Ding, Jiawei Li et al.
LEGO-Maker: A Semantic-Driven Algorithm for Text-to-3D Generation
Yifei Zhang, Lei Chen
COVTrack: Continuous Open-Vocabulary Tracking via Adaptive Multi-Cue Fusion
Zekun Qian, Ruize Han, Zhixiang Wang et al.
Dense Policy: Bidirectional Autoregressive Learning of Actions
Yue Su, Xinyu Zhan, Hongjie Fang et al.
monoVLN: Bridging the Observation Gap between Monocular and Panoramic Vision and Language Navigation
Ren-Jie Lu, Yu Zhou, hao cheng et al.
DOGR: Towards Versatile Visual Document Grounding and Referring
Yinan Zhou, Yuxin Chen, Haokun Lin et al.
ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation
Xiwei Xuan, Ziquan Deng, Kwan-Liu Ma
MonoMobility: Zero-Shot 3D Mobility Analysis from Monocular Videos
Hongyi Zhou, Xiaogang Wang, Yulan Guo et al.
Performing Defocus Deblurring by Modeling its Formation Process
Zhengbo Zhang, Lin Geng Foo, Hossein Rahmani et al.
CasP: Improving Semi-Dense Feature Matching Pipeline Leveraging Cascaded Correspondence Priors for Guidance
Peiqi Chen, Lei Yu, Yi Wan et al.
Supervised Exploratory Learning for Long-Tailed Visual Recognition
Zhongquan Jian, Yanhao Chen, Wangyancheng Wangyancheng et al.
MMAIF: Multi-task and Multi-degradation All-in-One for Image Fusion with Language Guidance
Zihan Cao, Yu Zhong, Ziqi Wang et al.
Blind Video Super-Resolution based on Implicit Kernels
Qiang Zhu, Yuxuan Jiang, Shuyuan Zhu et al.
OmniDiff: A Comprehensive Benchmark for Fine-grained Image Difference Captioning
Yuan Liu, Saihui Hou, Saijie Hou et al.
Toward Long-Tailed Online Anomaly Detection through Class-Agnostic Concepts
Chiao-An Yang, Kuan-Chuan Peng, Raymond A. Yeh
More Reliable Pseudo-labels, Better Performance: A Generalized Approach to Single Positive Multi-label Learning
Luong Tran, Thieu Vo, Anh Nguyen et al.
TimeExpert: An Expert-Guided Video LLM for Video Temporal Grounding
Zuhao Yang, Yingchen Yu, Yunqing Zhao et al.
DCHM: Depth-Consistent Human Modeling for Multiview Detection
Jiahao Ma, Tianyu Wang, Miaomiao Liu et al.
Adversarial Robustness of Discriminative Self-Supervised Learning in Vision
Ömer Veysel Çağatan, Ömer TAL, M. Emre Gursoy
HPSv3: Towards Wide-Spectrum Human Preference Score
Yuhang Ma, Keqiang Sun, Xiaoshi Wu et al.
Active Perception Meets Rule-Guided RL: A Two-Phase Approach for Precise Object Navigation in Complex Environments
Liang Qin, Min Wang, Peiwei Li et al.
UNIS: A Unified Framework for Achieving Unbiased Neural Implicit Surfaces in Volume Rendering
Junkai Deng, Hanting Niu, Jiaze Li et al.
IntrinsicControlNet: Cross-distribution Image Generation with Real and Unreal
Jiayuan Lu, Rengan Xie, Zixuan Xie et al.
Loss Functions for Predictor-based Neural Architecture Search
Han Ji, Yuqi Feng, Jiahao Fan et al.
Advancing Text-to-3D Generation with Linearized Lookahead Variational Score Distillation
Yu Lei, Bingde Liu, Qingsong Xie et al.
Steering Guidance for Personalized Text-to-Image Diffusion Models
Sunghyun Park, Seokeon Choi, Hyoungwoo Park et al.
ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models
Zifu Wan, Ce Zhang, Silong Yong et al.
Domain-aware Category-level Geometry Learning Segmentation for 3D Point Clouds
Pei He, Lingling Li, Licheng Jiao et al.
GaussianReg: Rapid 2D/3D Registration for Emergency Surgery via Explicit 3D Modeling with Gaussian Primitives
Weihao Yu, Xiaoqing Guo, Xinyu Liu et al.
ArgoTweak: Towards Self-Updating HD Maps through Structured Priors
Lena Wild, Rafael Valencia, Patric Jensfelt
Event-aided Dense and Continuous Point Tracking: Everywhere and Anytime
Zhexiong Wan, Jianqin Luo, Yuchao Dai et al.
Context-Aware Academic Emotion Dataset and Benchmark
Luming Zhao, Jingwen Xuan, Jiamin Lou et al.
FlowSeek: Optical Flow Made Easier with Depth Foundation Models and Motion Bases
Matteo Poggi, Fabio Tosi
TPG-INR: Target Prior-Guided Implicit 3D CT Reconstruction for Enhanced Sparse-view Imaging
QingleiCao QingleiCao, Ziyao Tang, Xiaoqin Tang
SpatialCrafter: Unleashing the Imagination of Video Diffusion Models for Scene Reconstruction from Limited Observations
Songchun Zhang, Huiyao Xu, Sitong Guo et al.
Efficient Visual Place Recognition Through Multimodal Semantic Knowledge Integration
Sitao Zhang, Hongda Mao, Qingshuang Chen et al.
COME: Dual Structure-Semantic Learning with Collaborative MoE for Universal Lesion Detection Across Heterogeneous Ultrasound Datasets
Lingyu Chen, Yawen Zeng, Yue Wang et al.
NATRA: Noise-Agnostic Framework for Trajectory Prediction with Noisy Observations
Rongqing Li, Changsheng Li, Ruilin Lv et al.
MS3D: High-Quality 3D Generation via Multi-Scale Representation Modeling
Guan Luo, Jianfeng Zhang
UniDxMD: Towards Unified Representation for Cross-Modal Unsupervised Domain Adaptation in 3D Semantic Segmentation
Zhengyin Liang, Hui Yin, Min Liang et al.
Hybrid Layout Control for Diffusion Transformer: Fewer Annotations, Superior Aesthetics
Keming Wu, Junwen Chen, Zhanhao Liang et al.
PLAN: Proactive Low-Rank Allocation for Continual Learning
XIEQUN WANG, Zhan Zhuang, Yu Zhang
Leveraging Spatial Invariance to Boost Adversarial Transferability
Zihan Zhou, LI LI, Yanli Ren et al.
FedXDS: Leveraging Model Attribution Methods to counteract Data Heterogeneity in Federated Learning
Maximilian Hoefler, Karsten Mueller, Wojciech Samek
Visual Textualization for Image Prompted Object Detection
Yongjian Wu, Yang Zhou, Jiya Saiyin et al.
TerraMind: Large-Scale Generative Multimodality for Earth Observation
Johannes Jakubik, Felix Yang, Benedikt Blumenstiel et al.
LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs
Haoran Lou, Chunxiao Fan, Ziyan Liu et al.
Transformer-based Tooth Alignment Prediction with Occlusion and Collision Constraints
DongZhenXing DongZhenXing, Jiazhou Chen
SD2Actor: Continuous State Decomposition via Diffusion Embeddings for Robotic Manipulation
lijiayi jiayi
Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis
Xinyu Hou, Zongsheng Yue, Xiaoming Li et al.
Scene Graph Guided Generation: Enable Accurate Relations Generation in Text-to-Image Models via Textural Rectification
Guibao SHEN, Luozhou Wang, Jiantao Lin et al.
ReMP-AD: Retrieval-enhanced Multi-modal Prompt Fusion for Few-Shot Industrial Visual Anomaly Detection
Hongchi Ma, Guanglei Yang, Debin Zhao et al.
GMMamba: Group Masking Mamba for Whole Slide Image Classification
Tingting Zheng, Hongxun Yao, Kui Jiang et al.
TimeFormer: Capturing Temporal Relationships of Deformable 3D Gaussians for Robust Reconstruction
Dadong Jiang, Zhi Hou, Zhihui Ke et al.
RareCLIP: Rarity-aware Online Zero-shot Industrial Anomaly Detection
Jianfang He, Min Cao, Silong Peng et al.
Temporal Rate Reduction Clustering for Human Motion Segmentation
Xianghan Meng, Zhengyu Tong, Zhiyuan Huang et al.
Hierarchy UGP: Hierarchy Unified Gaussian Primitive for Large-Scale Dynamic Scene Reconstruction
Hongyang Sun, Qinglin Yang, Jiawei Wang et al.
Backdoor Mitigation by Distance-Driven Detoxification
Shaokui Wei, Jiayin Liu, Hongyuan Zha
Democratizing High-Fidelity Co-Speech Gesture Video Generation
Xu Yang, Shaoli Huang, Shenbo Xie et al.
UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI
Fangwei Zhong, Kui Wu, Churan Wang et al.
HFD-Teacher: High-Frequency Depth Distillation from Depth Foundation Models for Enhanced Depth Completion
Zhiyuan Yang, Anqi Cheng, Haiyue Zhu et al.
Separation for Better Integration: Disentangling Edge and Motion in Event-based Deblurring
Yufei Zhu, Hao Chen, Yongjian Deng et al.
Diversity-Enhanced Distribution Alignment for Dataset Distillation
Hongcheng Li, Yucan Zhou, Xiaoyan Gu et al.
Height-Fidelity Dense Global Fusion for Multi-modal 3D Object Detection
Hanshi Wang, Jin Gao, Weiming Hu et al.
SMSTracker: Tri-path Score Mask Sigma Fusion for Multi-Modal Tracking
Sixian Chan, Zedong Li, Xiaoqin Zhang et al.
Two Losses, One Goal: Balancing Conflict Gradients for Semi-supervised Semantic Segmentation
Rui Sun, Huayu Mai, Wangkai Li et al.
Region-based Cluster Discrimination for Visual Representation Learning
Yin Xie, Kaicheng Yang, Xiang An et al.
CMB-ML: A Cosmic Microwave Background Dataset for the Oldest Possible Computer Vision Task
James Amato, Yunan Xie, Leonel Medina-Varela et al.
Adapt Foundational Segmentation Models with Heterogeneous Searching Space
Li Yi, Jie Hu, Songan Zhang et al.
Think Twice: Test-Time Reasoning for Robust CLIP Zero-Shot Classification
Shenyu Lu, Zhaoying Pan, Xiaoqian Wang
Shape of Motion: 4D Reconstruction from a Single Video
Qianqian Wang, Vickie Ye, Hang Gao et al.
EditCLIP: Representation Learning for Image Editing
Qian Wang, Aleksandar Cvejic, Abdelrahman Eldesokey et al.
Counting Stacked Objects
Corentin Dumery, Noa Ette, Aoxiang Fan et al.
Allowing Oscillation Quantization: Overcoming Solution Space Limitation in Low Bit-Width Quantization
Weiying Xie, Zihan Meng, Jitao Ma et al.
MOVE: Motion-Guided Few-Shot Video Object Segmentation
Kaining Ying, Hengrui Hu, Henghui Ding
CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation
Dengke Zhang, Fagui Liu, Quan Tang
mmCooper: A Multi-agent Multi-stage Communication-efficient and Collaboration-robust Cooperative Perception Framework
Bingyi Liu, Jian Teng, Hongfei Xue et al.
FreqPDE: Rethinking Positional Depth Embedding for Multi-View 3D Object Detection Transformers
Junjie Zhang, Haisheng Su, Feixiang Song et al.
GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling
Pinxin Liu, Luchuan Song, Junhua Huang et al.
SDFormer: Vision-based 3D Semantic Scene Completion via SAM-assisted Dual-channel Voxel Transformer
Yujie Xue, Huilong Pi, Jiapeng Zhang et al.
TopoTTA: Topology-Enhanced Test-Time Adaptation for Tubular Structure Segmentation
Jiale Zhou, Wenhan Wang, Shikun Li et al.
RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control
Teng Li, Guangcong Zheng, Rui Jiang et al.
MagShield: Towards Better Robustness in Sparse Inertial Motion Capture Under Magnetic Disturbances
Yunzhe Shao, Xinyu Yi, Lu Yin et al.
DeFSS: Image-to-Mask Denoising Learning for Few-shot Segmentation
Zishu Qin, Junhao Xu, Weifeng Ge
TAD-E2E: A Large-scale End-to-end Autonomous Driving Dataset
Chang Liu, mingxuzhu mingxuzhu, Zheyuan Zhang et al.
VAGUE: Visual Contexts Clarify Ambiguous Expressions
Heejeong Nam, Jinwoo Ahn, Keummin Ka et al.
Photolithography Overlay Map Generation with Implicit Knowledge Distillation Diffusion Transformer
YuanFu Yang, Hsiu-Hui Hsiao
What's Making That Sound Right Now? Video-centric Audio-Visual Localization
hahyeon choi, Junhoo Lee, Nojun Kwak
VehicleMAE: View-asymmetry Mutual Learning for Vehicle Re-identification Pre-training via Masked AutoEncoders
Qi Wang, Zeyu Zhang, Dong Wang et al.
MagicCity: Geometry-Aware 3D City Generation from Satellite Imagery with Multi-View Consistency
Xingbo YAO, xuanmin Wang, Hao WU et al.
RARE: Refine Any Registration of Pairwise Point Clouds via Zero-Shot Learning
Chengyu Zheng, Honghua Chen, Jin Huang et al.
Multi-scenario Overlapping Text Segmentation with Depth Awareness
Yang Liu, Xudong Xie, Yuliang Liu et al.
Zero-Shot Vision Encoder Grafting via LLM Surrogates
Kaiyu Yue, Vasu Singla, Menglin Jia et al.
OV-SCAN: Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection
Adrian Chow, Evelien Riddell, Yimu Wang et al.
FullDiT: Video Generative Foundation Models with Multimodal Control via Full Attention
Xuan Ju, Weicai Ye, Quande Liu et al.
SC-Lane: Slope-aware and Consistent Road Height Estimation Framework for 3D Lane Detection
Chaesong Park, Eunbin Seo, JihyeonHwang JihyeonHwang et al.
Exploring the Visual Feature Space for Multimodal Neural Decoding
Weihao Xia, Cengiz Oztireli
ConceptSplit: Decoupled Multi-Concept Personalization of Diffusion Models via Token-wise Adaptation and Attention Disentanglement
Habin Lim, Youngseob Won, Juwon Seo et al.
Backdoor Defense via Enhanced Splitting and Trap Isolation
Hongrui Yu, Lu Qi, Wanyu Lin et al.
Learning Hierarchical Line Buffer for Image Processing
Jiacheng Li, Feiran Li, Daisuke Iso
ATAS: Any-to-Any Self-Distillation for Enhanced Open-Vocabulary Dense Prediction
Soonwoo Cha, Jiwoo Song, Juan Yeo et al.
Humans as Checkerboards: Calibrating Camera Motion Scale for World-Coordinate Human Mesh Recovery
Fengyuan Yang, Kerui Gu, Ha Linh Nguyen et al.
D3: Training-Free AI-Generated Video Detection Using Second-Order Features
Chende Zheng, Ruiqi suo, Chenhao Lin et al.
Overcoming Dual Drift for Continual Long-Tailed Visual Question Answering
Feifei Zhang, Zhihao Wang, Xi Zhang et al.
GeoMan: Temporally Consistent Human Geometry Estimation using Image-to-Video Diffusion
Gwanghyun Kim, Xueting Li, Ye Yuan et al.