Most Cited ICCV "backdoor mitigation" Papers
2,701 papers found • Page 5 of 14
Conference
DALIP: Distribution Alignment-based Language-Image Pre-Training for Domain-Specific Data
Junjie Wu, Jiangtao Xie, Zhaolin Zhang et al.
Self-Reinforcing Prototype Evolution with Dual-Knowledge Cooperation for Semi-Supervised Lifelong Person Re-Identification
Kunlun Xu, Fan Zhuo, Jiangmeng Li et al.
What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization
Xavier Thomas, Deepti Ghadiyaram
RePoseD: Efficient Relative Pose Estimation With Known Depth Information
Yaqing Ding, Viktor Kocur, VACLAV VAVRA et al.
Frequency-Guided Posterior Sampling for Diffusion-Based Image Restoration
Darshan Thaker, Abhishek Goyal, Rene Vidal
Understanding Co-speech Gestures in-the-wild
Sindhu Hegde, K R Prajwal, Taein Kwon et al.
Bridging the Skeleton-Text Modality Gap: Diffusion-Powered Modality Alignment for Zero-shot Skeleton-based Action Recognition
Jeonghyeok Do, Munchurl Kim
Aligning Constraint Generation with Design Intent in Parametric CAD
Evan Casey, Tianyu Zhang, Shu Ishida et al.
FG-OrIU: Towards Better Forgetting via Feature-Gradient Orthogonality for Incremental Unlearning
qian feng, Jiahang Tu, Mintong Kang et al.
Quantifying and Narrowing the Unknown: Interactive Text-to-Video Retrieval via Uncertainty Minimization
Bingqing Zhang, Zhuo Cao, Heming Du et al.
Synchronization of Multiple Videos
Avihai Naaman, Ron Shapira Weber, Oren Freifeld
Timestep-Aware Diffusion Model for Extreme Image Rescaling
Ce Wang, Zhenyu Hu, Wanjie Sun et al.
ODP-Bench: Benchmarking Out-of-Distribution Performance Prediction
Han Yu, Kehan Li, Dongbai Li et al.
Demeter: A Parametric Model of Crop Plant Morphology from the Real World
Tianhang Cheng, Albert Zhai, Evan Chen et al.
What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning
Chi-Hsi Kung, Frangil Ramirez, Juhyung Ha et al.
Contact-Aware Amodal Completion for Human-Object Interaction via Multi-Regional Inpainting
Seunggeun Chi, Pin-Hao Huang, Enna Sachdeva et al.
Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views
Xiangdong Zhang, Shaofeng Zhang, Junchi Yan
ScenePainter: Semantically Consistent Perpetual 3D Scene Generation with Concept Relation Alignment
Chong Xia, Shengjun Zhang, Fangfu Liu et al.
Learning Visual Hierarchies in Hyperbolic Space for Image Retrieval
Ziwei Wang, Sameera Ramasinghe, Chenchen Xu et al.
LINR-PCGC: Lossless Implicit Neural Representations for Point Cloud Geometry Compression
Wenjie Huang, Qi Yang, Shuting Xia et al.
D2ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition
Wenjie Pei, Qizhong Tan, Guangming Lu et al.
HUG: Hierarchical Urban Gaussian Splatting with Block-Based Reconstruction for Large-Scale Aerial Scenes
Mai Su, Zhongtao Wang, Huishan Au et al.
Who is a Better Talker: Subjective and Objective Quality Assessment for AI-Generated Talking Heads
Yingjie Zhou, Jiezhang Cao, Zicheng Zhang et al.
Robust 3D Object Detection using Probabilistic Point Clouds from Single-Photon LiDARs
Bhavya Goyal, Felipe Gutierrez-Barragan, Wei Lin et al.
IDF: Iterative Dynamic Filtering Networks for Generalizable Image Denoising
Dongjin Kim, Jaekyun Ko, Muhammad Kashif Ali et al.
Identity Preserving 3D Head Stylization with Multiview Score Distillation
Bahri Batuhan Bilecen, Ahmet Berke Gokmen, Furkan Güzelant et al.
Adversarial Attention Perturbations for Large Object Detection Transformers
Zachary Yahn, Selim Tekin, Fatih Ilhan et al.
CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting
Lei Tian, Xiaomin Li, Liqian Ma et al.
LaRender: Training-Free Occlusion Control in Image Generation via Latent Rendering
Xiaohang Zhan, Dingming Liu
Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation
Shaowei Liu, chuan guo, Bing Zhou et al.
Generalized Few-Shot Point Cloud Segmentation via LLM-Assisted Hyper-Relation Matching
Zhaoyang Li, Yuan Wang, Guoxin Xiong et al.
CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation
Xiao Lin, Yun Peng, Liuyi Wang et al.
SMGDiff: Soccer Motion Generation using Diffusion Probabilistic Models
Hongdi Yang, Chengyang Li, Zhenxuan Wu et al.
Generative Active Learning for Long-tail Trajectory Prediction via Controllable Diffusion Model
Daehee Park, Monu Surana, Pranav Desai et al.
Generate, Transduct, Adapt: Iterative Transduction with VLMs
Oindrila Saha, Logan Lawrence, Grant Horn et al.
Generative Adversarial Diffusion
U-Chae Jun, Jaeeun Ko, Jiwoo Kang
MixA-Q: Revisiting Activation Sparsity for Vision Transformers from a Mixed-Precision Quantization Perspective
Weitian Wang, Shubham rai, Cecilia De la Parra et al.
Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization
Kangle Deng, Hsueh-Ti Derek Liu, Yiheng Zhu et al.
EvolvingGrasp: Evolutionary Grasp Generation via Efficient Preference Alignment
Yufei Zhu, Yiming Zhong, Zemin Yang et al.
FaceCraft4D: Animated 3D Facial Avatar Generation from a Single Image
Fei Yin, Mallikarjun Reddy, Chun-Han Yao et al.
Sequential keypoint density estimator: an overlooked baseline of skeleton-based video anomaly detection
Anja Delić, Matej Grcic, Siniša Šegvić
MGSR: 2D/3D Mutual-boosted Gaussian Splatting for High-fidelity Surface Reconstruction under Various Light Conditions
Qingyuan Zhou, Yuehu Gong, Weidong Yang et al.
PseudoMapTrainer: Learning Online Mapping without HD Maps
Christian Löwens, Thorben Funke, Jingchao Xie et al.
MOSAIC: Generating Consistent, Privacy-Preserving Scenes from Multiple Depth Views in Multi-Room Environments
Zhixuan Liu, Haokun Zhu, Rui Chen et al.
Multi-Object Sketch Animation by Scene Decomposition and Motion Planning
Jingyu Liu, Zijie Xin, Yuhan Fu et al.
Leveraging Local Patch Alignment to Seam-cutting for Large Parallax Image Stitching
Tianli Liao, Chenyang Zhao, Lei Li et al.
DIP: Unsupervised Dense In-Context Post-training of Visual Representations
Sophia Sirko-Galouchenko, Spyros Gidaris, Antonin Vobecky et al.
Towards Open-World Generation of Stereo Images and Unsupervised Matching
Feng Qiao, Zhexiao Xiong, Eric Xing et al.
Extending Foundational Monocular Depth Estimators to Fisheye Cameras with Calibration Tokens
Suchisrit Gangopadhyay, Jung Hee Kim, Xien Chen et al.
Stylized-Face: A Million-level Stylized Face Dataset for Face Recognition
Zhengyuan Peng, Jianqing Xu, Yuge Huang et al.
Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection
Taehoon Kim, Jongwook Choi, Yonghyun Jeong et al.
CoMoGaussian: Continuous Motion-Aware Gaussian Splatting from Motion-Blurred Images
Jungho Lee, DongHyeong Kim, Dogyoon Lee et al.
Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models
Zerui Tao, Yuhta Takida, Naoki Murata et al.
Joint Asymmetric Loss for Learning with Noisy Labels
Jialiang Wang, Xianming Liu, Xiong Zhou et al.
F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration
Lu Liu, Huiyu Duan, Qiang Hu et al.
Degradation-Modeled Multipath Diffusion for Tunable Metalens Photography
Jianing Zhang, Jiayi Zhu, Feiyu Ji et al.
Noise2Score3D: Tweedie's Approach for Unsupervised Point Cloud Denoising
Xiangbin Wei, Yuanfeng Wang, Ao XU et al.
SketchSplat: 3D Edge Reconstruction via Differentiable Multi-view Sketch Splatting
Haiyang Ying, Matthias Zwicker
DNF-Intrinsic: Deterministic Noise-Free Diffusion for Indoor Inverse Rendering
Rongjia Zheng, Qing Zhang, Chengjiang Long et al.
EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception
Sanjoy Chowdhury, Subrata Biswas, Sayan Nag et al.
GCRayDiffusion: Pose-Free Surface Reconstruction via Geometric Consistent Ray Diffusion
Li-Heng Chen, Zi-Xin Zou, Chang Liu et al.
Web Artifact Attacks Disrupt Vision Language Models
Maan Qraitem, Piotr Teterwak, Kate Saenko et al.
A Unified Framework for Motion Reasoning and Generation in Human Interaction
Jeongeun Park, Sungjoon Choi, Sangdoo Yun
Online Generic Event Boundary Detection
Hyung Rok Jung, Daneul Kim, Seunggyun Lim et al.
StruMamba3D: Exploring Structural Mamba for Self-supervised Point Cloud Representation Learning
Chuxin Wang, Yixin Zha, Wenfei Yang et al.
Global Regulation and Excitation via Attention Tuning for Stereo Matching
Jiahao LI, Xinhong Chen, Zhengmin JIANG et al.
Enhancing Transformers Through Conditioned Embedded Tokens
Hemanth Saratchandran, Simon Lucey
Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models
Xudong Li, Zihao Huang, Yan Zhang et al.
LookOut: Real-World Humanoid Egocentric Navigation
Boxiao Pan, Adam Harley, Francis Engelmann et al.
SAMO: A Lightweight Sharpness-Aware Approach for Multi-Task Optimization with Joint Global-Local Perturbation
Hao Ban, Gokul Ram Subramani, Kaiyi Ji
PROGRESSOR: A Perceptually Guided Reward Estimator with Self-Supervised Online Refinement
Tewodros W. Ayalew, Xiao Zhang, Kevin Y Wu et al.
VTimeCoT: Thinking by Drawing for Video Temporal Grounding and Reasoning
Jinglei Zhang, Yuanfan Guo, Rolandos Alexandros Potamias et al.
Multi-modal Multi-platform Person Re-Identification: Benchmark and Method
Ruiyang Ha, Songyi Jiang, Bin Li et al.
GraspCoT: Integrating Physical Property Reasoning for 6-DoF Grasping under Flexible Language Instructions
Xiaomeng Chu, Jiajun Deng, Guoliang You et al.
RoboPearls: Editable Video Simulation for Robot Manipulation
Tao Tang, Likui Zhang, Youpeng Wen et al.
TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models
Ziyang Luo, Nian Liu, Xuguang Yang et al.
PRISM: Reducing Spurious Implicit Biases in Vision-Language Models with LLM-Guided Embedding Projection
Mahdiyar Molahasani, Azadeh Motamedi, Michael Greenspan et al.
GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding
Zijun Lin, Shuting He, Cheston Tan et al.
Object-level Correlation for Few-Shot Segmentation
chunlin wen, Yu Zhang, Jie Fan et al.
Prior2Former - Evidential Modeling of Mask Transformers for Assumption-Free Open-World Panoptic Segmentation
Sebastian Schmidt, Julius Koerner, Dominik Fuchsgruber et al.
Efficient Multi-Person Motion Prediction by Lightweight Spatial and Temporal Interactions
Yuanhong Zheng, Ruixuan Yu, Jian Sun
CoST: Efficient Collaborative Perception From Unified Spatiotemporal Perspective
Zongheng Tang, Yi Liu, Yifan Sun et al.
Training-Free Personalization via Retrieval and Reasoning on Fingerprints
Deepayan Das, Davide Talon, Yiming Wang et al.
Teaching VLMs to Localize Specific Objects from In-context Examples
Sivan Doveh, Nimrod Shabtay, Eli Schwartz et al.
Cross-Architecture Distillation Made Simple with Redundancy Suppression
Weijia Zhang, Yuehao Liu, Wu Ran et al.
AMD: Adaptive Momentum and Decoupled Contrastive Learning Framework for Robust Long-Tail Trajectory Prediction
Bin Rao, Haicheng Liao, Yanchen Guan et al.
ViCTr: Vital Consistency Transfer for Pathology Aware Image Synthesis
Onkar Susladkar, Gayatri Deshmukh, Yalcin Tur et al.
Can3Tok: Canonical 3D Tokenization and Latent Modeling of Scene-Level 3D Gaussians
Quankai Gao, Iliyan Georgiev, Tuanfeng Wang et al.
Sim-DETR: Unlock DETR for Temporal Sentence Grounding
Jiajin Tang, Zhengxuan Wei, Yuchen Zhu et al.
Alleviating Textual Reliance in Medical Language-guided Segmentation via Prototype-driven Semantic Approximation
Shuchang Ye, Usman Naseem, Mingyuan Meng et al.
Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding
Huy Ta, Duy Anh Huynh, Yutong Xie et al.
Unlocking Constraints: Source-Free Occlusion-Aware Seamless Segmentation
Yihong Cao, Jiaming Zhang, Xu Zheng et al.
CAFA: a Controllable Automatic Foley Artist
Roi Benita, Michael Finkelson, Tavi Halperin et al.
Everything is a Video: Unifying Modalities through Next-Frame Prediction
G Thomas Hudson, Dean Slack, Thomas Winterbottom et al.
Diorama: Unleashing Zero-shot Single-view 3D Indoor Scene Modeling
Qirui Wu, Denys Iliash, Daniel Ritchie et al.
Intrepretable Zero-Shot Learning with Locally-Aligned Vision-Language Model
Shiming Chen, Bowen Duan, Salman Khan et al.
Boosting Vision Semantic Density with Anatomy Normality Modeling for Medical Vision-language Pre-training
Weiwei Cao, Jianpeng Zhang, Zhongyi Shui et al.
Refer to Any Segmentation Mask Group With Vision-Language Prompts
Shengcao Cao, Zijun Wei, Jason Kuen et al.
A Hidden Stumbling Block in Generalized Category Discovery: Distracted Attention
Qiyu Xu, Zhanxuan Hu, Yu Duan et al.
Towards a Universal 3D Medical Multi-modality Generalization via Learning Personalized Invariant Representation
Zhaorui Tan, Xi Yang, Tan Pan et al.
TRACE: Learning 3D Gaussian Physical Dynamics from Multi-view Videos
Jinxi Li, Ziyang Song, Bo Yang
VideoAds for Fast-Paced Video Understanding
Zheyuan Zhang, Wanying Dou, Linkai Peng et al.
Triad: Empowering LMM-based Anomaly Detection with Expert-guided Region-of-Interest Tokenizer and Manufacturing Process
Yuanze Li, Shihao Yuan, Haolin Wang et al.
Adaptive Articulated Object Manipulation On The Fly with Foundation Model Reasoning and Part Grounding
Xiaojie Zhang, Yuanfei Wang, Ruihai Wu et al.
MR-FIQA: Face Image Quality Assessment with Multi-Reference Representations from Synthetic Data Generation
Fu-Zhao Ou, Chongyi Li, Shiqi Wang et al.
Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models
Xiao Liang, Di Wang, Zhicheng Jiao et al.
Preacher: Paper-to-Video Agentic System
Jingwei Liu, Ling Yang, Hao Luo et al.
Gait-X: Exploring X modality for Generalized Gait Recognition
Zengbin Wang, Saihui Hou, Junjie Li et al.
From Objects to Events: Unlocking Complex Visual Understanding in Object Detectors via LLM-guided Symbolic Reasoning
Yuhui Zeng, Haoxiang Wu, Wenjie Nie et al.
MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling
Yingyue Li, Bencheng Liao, Wenyu Liu et al.
Diffusion-Based Imaginative Coordination for Bimanual Manipulation
Huilin Xu, Jian Ding, Jiakun Xu et al.
Improving Noise Efficiency in Privacy-preserving Dataset Distillation
Runkai Zheng, Vishnu Dasu, Yinong Wang et al.
UniConvNet: Expanding Effective Receptive Field while Maintaining Asymptotically Gaussian Distribution for ConvNets of Any Scale
Yuhao Wang, Wei Xi
G2SF: Geometry-Guided Score Fusion for Multimodal Industrial Anomaly Detection
Chengyu Tao, Xuanming Cao, Juan Du
Learn2Synth: Learning Optimal Data Synthesis Using Hypergradients for Brain Image Segmentation
Xiaoling Hu, Xiangrui Zeng, Oula Puonti et al.
Structure-aware Semantic Discrepancy and Consistency for 3D Medical Image Self-supervised Learning
Tan Pan, Zhaorui Tan, Kaiyu Guo et al.
COIN: Confidence Score-Guided Distillation for Annotation-Free Cell Segmentation
Sanghyun Jo, Seo Lee, Seungwoo Lee et al.
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
Guoyizhe Wei, Rama Chellappa
Exploiting Domain Properties in Language-Driven Domain Generalization for Semantic Segmentation
Seogkyu Jeon, Kibeom Hong, Hyeran Byun
Towards Adversarial Robustness via Debiased High-Confidence Logit Alignment
Kejia Zhang, Juanjuan Weng, Zhiming Luo et al.
MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation
Vladislav Bargatin, Egor Chistov, Alexander Yakovenko et al.
Training-Free Generation of Temporally Consistent Rewards from VLMs
Yinuo Zhao, Jiale Yuan, Zhiyuan Xu et al.
Guiding Diffusion-Based Articulated Object Generation by Partial Point Cloud Alignment and Physical Plausibility Constraints
Jens U. Kreber, Joerg Stueckler
Adaptive Routing of Text-to-Image Generation Requests Between Large Cloud Model and Light-Weight Edge Model
Zewei Xin, Qinya Li, Chaoyue Niu et al.
Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models
Young Kyun Jang, Ser-Nam Lim
VSC: Visual Search Compositional Text-to-Image Diffusion Model
Do Dat, Nam Hyeon-Woo, Po-Yuan Mao et al.
A Structure-aware and Motion-adaptive Framework for 3D Human Pose Estimation with Mamba
Ye Lu, Jie Wang, Jianjun Gao et al.
DCT-Shield: A Robust Frequency Domain Defense against Malicious Image Editing
Aniruddha Bala, Rohit Chowdhury, Rohan Jaiswal et al.
AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Shouwei Ruan, Hanqing Liu, Yao Huang et al.
GausSim: Foreseeing Reality by Gaussian Simulator for Elastic Objects
Yidi Shao, Mu Huang, Chen Change Loy et al.
Learnable Feature Patches and Vectors for Boosting Low-light Image Enhancement without External Knowledge
Xiaogang Xu, Jiafei Wu, Qingsen Yan et al.
PASTA: Part-Aware Sketch-to-3D Shape Generation with Text-Aligned Prior
Seunggwan Lee, Hwanhee Jung, ByoungSoo Koh et al.
SciVid: Cross-Domain Evaluation of Video Models in Scientific Applications
Yana Hasson, Pauline Luc, Liliane Momeni et al.
Acknowledging Focus Ambiguity in Visual Questions
Chongyan Chen, Yu-Yun Tseng, Zhuoheng Li et al.
Trust but Verify: Programmatic VLM Evaluation in the Wild
Viraj Prabhu, Senthil Purushwalkam, An Yan et al.
FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging
Zichen Tang, Haihong E, Jiacheng Liu et al.
Supercharged One-step Text-to-Image Diffusion Models with Negative Prompts
Viet Nguyen, Anh Nguyen, Trung Dao et al.
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis
Tao Han, Wanghan Xu, Junchao Gong et al.
Denoising Token Prediction in Masked Autoregressive Models
Ting Yao, Yehao Li, Yingwei Pan et al.
Continual Multiple Instance Learning with Enhanced Localization for Histopathological Whole Slide Image Analysis
Byung Hyun Lee, Wongi Jeong, Woojae Han et al.
EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow
Yixiang Chen, Peiyan Li, Yan Huang et al.
DAViD: Data-efficient and Accurate Vision Models from Synthetic Data
Fatemeh Saleh, Sadegh Aliakbarian, Charlie Hewitt et al.
ViLU: Learning Vision-Language Uncertainties for Failure Prediction
Marc Lafon, Yannis Karmim, Julio Silva-Rodríguez et al.
Physics Context Builders: A Modular Framework for Physical Reasoning in Vision-Language Models
Vahid Balazadeh, Mohammadmehdi Ataei, Hyunmin Cheong et al.
Subjective Camera 1.0: Bridging Human Cognition and Visual Reconstruction through Sequence-Aware Sketch-Guided Diffusion
Haoyang Chen, Dongfang Sun, Caoyuan Ma et al.
Trade-offs in Image Generation: How Do Different Dimensions Interact?
Sicheng Zhang, Binzhu Xie, Zhonghao Yan et al.
Color Matching Using Hypernetwork-Based Kolmogorov-Arnold Networks
Artem Nikonorov, Georgy Perevozchikov, Andrei Korepanov et al.
CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation
Jianyu Wu, Yizhou Wang, Xiangyu Yue et al.
FedMVP: Federated Multimodal Visual Prompt Tuning for Vision-Language Models
Mainak Singha, Subhankar Roy, Sarthak Mehrotra et al.
Deep Incomplete Multi-view Clustering with Distribution Dual-Consistency Recovery Guidance
Jiaqi Jin, Siwei Wang, Zhibin Dong et al.
Learning Robust Image Watermarking with Lossless Cover Recovery
jiale chen, Wei Wang, Chongyang Shi et al.
Seeing Through Deepfakes: A Human-Inspired Framework for Multi-Face Detection
Juan Hu, Shaojing Fan, Terence Sim
M-SpecGene: Generalized Foundation Model for RGBT Multispectral Vision
Kailai Zhou, Fuqiang Yang, Shixian Wang et al.
MeshMamba: State Space Models for Articulated 3D Mesh Generation and Reconstruction
Yusuke Yoshiyasu, Leyuan Sun, Ryusuke Sagawa
MosaicDiff: Training-free Structural Pruning for Diffusion Model Acceleration Reflecting Pretraining Dynamics
Bowei Guo, Shengkun Tang, Cong Zeng et al.
Decoding Correlation-Induced Misalignment in the Stable Diffusion Workflow for Text-to-Image Generation
Yunze Tong, Fengda Zhang, Didi Zhu et al.
SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference
Samir Khaki, Junxian Guo, Jiaming Tang et al.
CompSlider: Compositional Slider for Disentangled Multiple-Attribute Image Generation
Zixin Zhu, Kevin Duarte, Mamshad Nayeem Rizve et al.
PLMP - Point-Line Minimal Problems for Projective SfM
Kim Kiehn, Albin Ahlbäck, Kathlén Kohn
TITAN-Guide: Taming Inference-Time Alignment for Guided Text-to-Video Diffusion Models
Christian Simon, Masato Ishii, Akio Hayakawa et al.
PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes
Ahmed Abdelreheem, Filippo Aleotti, Jamie Watson et al.
VITAL: More Understandable Feature Visualization through Distribution Alignment and Relevant Information Flow
Ada Görgün, Bernt Schiele, Jonas Fischer
SpiLiFormer: Enhancing Spiking Transformers with Lateral Inhibition
Zeqi Zheng, Yanchen Huang, Yingchao Yu et al.
PCR-GS: COLMAP-Free 3D Gaussian Splatting via Pose Co-Regularizations
YU WEI, Jiahui Zhang, Xiaoqin Zhang et al.
UMDATrack: Unified Multi-Domain Adaptive Tracking Under Adverse Weather Conditions
Siyuan Yao, Rui Zhu, Ziqi Wang et al.
Membership Inference Attacks with False Discovery Rate Control
Chenxu Zhao, Wei Qian, Aobo Chen et al.
D3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection
Yanran Zhang, Bingyao Yu, Yu Zheng et al.
FICGen: Frequency-Inspired Contextual Disentanglement for Layout-driven Degraded Image Generation
Wenzhuang Wang, Yifan Zhao, Mingcan Ma et al.
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
Kwanyoung Kim, Byeongsu Sim
MeshPad: Interactive Sketch-Conditioned Artist-Reminiscent Mesh Generation and Editing
Haoxuan Li, Ziya Erkoç, Lei Li et al.
Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction
Giuseppe Cartella, Vittorio Cuculo, Alessandro D'Amelio et al.
Outlier-Aware Post-Training Quantization for Image Super-Resolution
Hailing Wang, Jianglin Lu, Yitian Zhang et al.
A Linear N-Point Solver for Structure and Motion from Asynchronous Tracks
Hang Su, Yunlong Feng, Daniel Gehrig et al.
Normal and Abnormal Pathology Knowledge-Augmented Vision-Language Model for Anomaly Detection in Pathology Images
Jinsol Song, Jiamu Wang, Anh Nguyen et al.
FreeCus: Free Lunch Subject-driven Customization in Diffusion Transformers
Yanbing Zhang, Zhe Wang, Qin Zhou et al.
OmniVTON: Training-Free Universal Virtual Try-On
Zhaotong Yang, Yuhui Li, Shengfeng He et al.
DAMap: Distance-aware MapNet for High Quality HD Map Construction
JINPENG DONG, Chen Li, Yutong Lin et al.
DAP-MAE: Domain-Adaptive Point Cloud Masked Autoencoder for Effective Cross-Domain Learning
Ziqi Gao, Qiufu Li, Linlin Shen
Dataset Ownership Verification for Pre-trained Masked Models
Yuechen Xie, Jie Song, Yicheng Shan et al.
Neural Compression for 3D Geometry Sets
Siyu Ren, Junhui Hou, Weiyao Lin et al.
IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance
Jiayi Guo, Chuanhao Yan, Xingqian Xu et al.
From Holistic to Localized: Local Enhanced Adapters for Efficient Visual Instruction Fine-Tuning
Pengkun Jiao, Bin Zhu, Jingjing Chen et al.
Beyond Simple Edits: Composed Video Retrieval with Dense Modifications
Omkar Thawakar, Dmitry Demidov, Ritesh Thawkar et al.
SegmentDreamer: Towards High-fidelity Text-to-3D Synthesis with Segmented Consistency Trajectory Distillation
Jiahao Zhu, Zixuan Chen, Guangcong Wang et al.
Balanced Sharpness-Aware Minimization for Imbalanced Regression
Yahao Liu, Qin Wang, Lixin Duan et al.
You Share Beliefs, I Adapt: Progressive Heterogeneous Collaborative Perception
hao si, Ehsan Javanmardi, Manabu Tsukada
Robust Unfolding Network for HDR Imaging with Modulo Cameras
Zhile Chen, Hui Ji
IAP: Invisible Adversarial Patch Attack through Perceptibility-Aware Localization and Perturbation Optimization
Subrat Kishore Dutta, Xiao Zhang
Bridging the Gap Between Ideal and Real-world Evaluation: Benchmarking AI-Generated Image Detection in Challenging Scenarios
Chunxiao Li, Xiaoxiao Wang, Meiling Li et al.
Ask and Remember: A Questions-Only Replay Strategy for Continual Visual Question Answering
Imad Eddine MAROUF, Enzo Tartaglione, Stéphane Lathuilière et al.
Benefit From Seen: Enhancing Open-Vocabulary Object Detection by Bridging Visual and Textual Co-Occurrence Knowledge
Yanqi Li, Jianwei Niu, Tao Ren
HazeFlow: Revisit Haze Physical Model as ODE and Non-Homogeneous Haze Generation for Real-World Dehazing
Junseong Shin, Seungwoo Chung, Yunjeong Yang et al.
Dual Reciprocal Learning of Language-based Human Motion Understanding and Generation
CHEN LIANG, Zhicheng Shi, Wenguan Wang et al.
Text Embedding Knows How to Quantize Text-Guided Diffusion Models
Hongjae Lee, Myungjun Son, Dongjea Kang et al.
Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks
Bhishma Dedhia, David Bourgin, Krishna Kumar Singh et al.
Video Color Grading via Look-Up Table Generation
Seunghyun Shin, Dongmin Shin, Jisu Shin et al.
S$^3$E: Self-Supervised State Estimation for Radar-Inertial System
Shengpeng Wang, Yulong Xie, Qing Liao et al.
Robust Low-light Scene Restoration via Illumination Transition
Ze Li, Feng Zhang, Xiatian Zhu et al.
ForCenNet: Foreground-Centric Network for Document Image Rectification
Peng Cai, liqiang liqiang, Kaicheng Yang et al.
METEOR: Multi-Encoder Collaborative Token Pruning for Efficient Vision Language Models
Yuchen Liu, Yaoming Wang, Bowen Shi et al.