Most Cited CVPR "clean-label backdoor attack" Papers
5,589 papers found • Page 7 of 28
Conference
Reversible Decoupling Network for Single Image Reflection Removal
Hao Zhao, Mingjia Li, Qiming Hu et al.
GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities
Rao Fu, Dingxi Zhang, Alex Jiang et al.
Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos
Chiara Plizzari, Alessio Tonioni, Yongqin Xian et al.
Bridging Modalities: Improving Universal Multimodal Retrieval by Multimodal Large Language Models
Xin Zhang, Yanzhao Zhang, Wen Xie et al.
NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training
Dar-Yen Chen, Hmrishav Bandyopadhyay, Kai Zou et al.
Assessing and Learning Alignment of Unimodal Vision and Language Models
Le Zhang, Qian Yang, Aishwarya Agrawal
X-Dyna: Expressive Dynamic Human Image Animation
Di Chang, Hongyi Xu, You Xie et al.
DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes
Jinxiu Liu, Shaoheng Lin, Yinxiao Li et al.
MVSAnywhere: Zero-Shot Multi-View Stereo
Sergio Izquierdo, Mohamed Sayed, Michael Firman et al.
Generative 3D Part Assembly via Part-Whole-Hierarchy Message Passing
Bi'an Du, Xiang Gao, Wei Hu et al.
Dissecting and Mitigating Diffusion Bias via Mechanistic Interpretability
Yingdong Shi, Changming Li, Yifan Wang et al.
What You See is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs
Alex Trevithick, Matthew Chan, Towaki Takikawa et al.
TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation
Xiaopei Wu, Yuenan Hou, Xiaoshui Huang et al.
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
Michael Dorkenwald, Nimrod Barazani, Cees G. M. Snoek et al.
UnionFormer: Unified-Learning Transformer with Multi-View Representation for Image Manipulation Detection and Localization
Shuaibo Li, Wei Ma, Jianwei Guo et al.
R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning
Lijun Sheng, Jian Liang, Zilei Wang et al.
ReCoRe: Regularized Contrastive Representation Learning of World Model
Rudra P, K. Poudel, Harit Pandya et al.
Learning Instance-Aware Correspondences for Robust Multi-Instance Point Cloud Registration in Cluttered Scenes
Zhiyuan Yu, Zheng Qin, lintao zheng et al.
Towards Universal Soccer Video Understanding
Jiayuan Rao, Haoning Wu, Hao Jiang et al.
SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models
Zilan Wang, Junfeng Guo, Jiacheng Zhu et al.
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
Tiehan Fan, Kepan Nan, Rui Xie et al.
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Weijia Wu, Mingyu Liu, Zeyu Zhu et al.
SkillMimic: Learning Basketball Interaction Skills from Demonstrations
Yinhuai Wang, Qihan Zhao, Runyi Yu et al.
Estimating Noisy Class Posterior with Part-level Labels for Noisy Label Learning
Rui Zhao, Bin Shi, Jianfei Ruan et al.
ProTeCt: Prompt Tuning for Taxonomic Open Set Classification
Tz-Ying Wu, Chih-Hui Ho, Nuno Vasconcelos
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models
Zhihang Liu, Chen-Wei Xie, Pandeng Li et al.
Personalized Preference Fine-tuning of Diffusion Models
Meihua Dang, Anikait Singh, Linqi Zhou et al.
Learning Intra-view and Cross-view Geometric Knowledge for Stereo Matching
Rui Gong, Weide Liu, ZAIWANG GU et al.
DPU: Dynamic Prototype Updating for Multimodal Out-of-Distribution Detection
Li Li, Huixian Gong, Hao Dong et al.
HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions
Hao Xu, Li Haipeng, Yinqiao Wang et al.
DiffSCI: Zero-Shot Snapshot Compressive Imaging via Iterative Spectral Diffusion Model
Zhenghao Pan, Haijin Zeng, Jiezhang Cao et al.
IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation
Yiren Song, Pei Yang, Hai Ci et al.
Adversarial Backdoor Attack by Naturalistic Data Poisoning on Trajectory Prediction in Autonomous Driving
Mozhgan Pourkeshavarz, Mohammad Sabokrou, Amir Rasouli
DRAWER: Digital Reconstruction and Articulation With Environment Realism
Hongchi Xia, Entong Su, Marius Memmel et al.
SURE: SUrvey REcipes for building reliable and robust deep networks
Yuting Li, Yingyi Chen, Xuanlong Yu et al.
Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks
Haijin Zeng, Xiangming Wang, Yongyong Chen et al.
Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting
Runsong Zhu, Shi Qiu, ZHENGZHE LIU et al.
Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video
David Yifan Yao, Albert J. Zhai, Shenlong Wang
Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues
Youngjoon Jang, Haran Raajesh, Liliane Momeni et al.
Robust Distillation via Untargeted and Targeted Intermediate Adversarial Samples
Junhao Dong, Piotr Koniusz, Junxi Chen et al.
SAIST: Segment Any Infrared Small Target Model Guided by Contrastive Language-Image Pretraining
Mingjin Zhang, Xiaolong Li, Fei Gao et al.
Scaling Inference Time Compute for Diffusion Models
Nanye Ma, Shangyuan Tong, Haolin Jia et al.
Retrieving Semantics from the Deep: an RAG Solution for Gesture Synthesis
M. Hamza Mughal, Rishabh Dabral, Merel CJ Scholman et al.
NeRF Analogies: Example-Based Visual Attribute Transfer for NeRFs
Michael Fischer, Zhengqin Li, Thu Nguyen-Phuoc et al.
Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models
Weiwei Cao, Jianpeng Zhang, Yingda Xia et al.
Region-Based Representations Revisited
Michal Shlapentokh-Rothman, Ansel Blume, Yao Xiao et al.
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research
James Burgess, Jeffrey J Nirschl, Laura Bravo-Sánchez et al.
In2SET: Intra-Inter Similarity Exploiting Transformer for Dual-Camera Compressive Hyperspectral Imaging
Xin Wang, Lizhi Wang, Xiangtian Ma et al.
Docopilot: Improving Multimodal Models for Document-Level Understanding
Yuchen Duan, Zhe Chen, Yusong Hu et al.
JamMa: Ultra-lightweight Local Feature Matching with Joint Mamba
Xiaoyong Lu, Songlin Du
Slice3D: Multi-Slice Occlusion-Revealing Single View 3D Reconstruction
Yizhi Wang, Wallace Lira, Wenqi Wang et al.
Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation
Guy Yariv, Yuval Kirstain, Amit Zohar et al.
HRAvatar: High-Quality and Relightable Gaussian Head Avatar
Dongbin Zhang, Yunfei Liu, Lijian Lin et al.
Synthetic Data is an Elegant GIFT for Continual Vision-Language Models
Bin Wu, Wuxuan Shi, Jinqiao Wang et al.
Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective
Duowang Zhu, Xiaohu Huang, Haiyan Huang et al.
VisionArena: 230k Real World User-VLM Conversations with Preference Labels
Christopher Chou, Lisa Dunlap, Wei-Lin Chiang et al.
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
Jian Yang, Dacheng Yin, Yizhou Zhou et al.
Consistent and Controllable Image Animation with Motion Diffusion Models
Xin Ma, Yaohui Wang, Gengyun Jia et al.
Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments
Luke Rowe, Roger Girgis, Anthony Gosselin et al.
Human Motion Instruction Tuning
Lei Li, Sen Jia, Jianhao Wang et al.
Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis
Yu Yuan, Xijun Wang, Yichen Sheng et al.
Unlearning through Knowledge Overwriting: Reversible Federated Unlearning via Selective Sparse Adapter
Zhengyi Zhong, Weidong Bao, Ji Wang et al.
LQMFormer: Language-aware Query Mask Transformer for Referring Image Segmentation
Nisarg Shah, Vibashan VS, Vishal M. Patel
MBQ: Modality-Balanced Quantization for Large Vision-Language Models
Shiyao Li, Yingchun Hu, Xuefei Ning et al.
Effective Video Mirror Detection with Inconsistent Motion Cues
Alex Warren, Ke Xu, Jiaying Lin et al.
GDA: Generalized Diffusion for Robust Test-time Adaptation
Yun-Yun Tsai, Fu-Chen Chen, Albert Chen et al.
Single-View Scene Point Cloud Human Grasp Generation
Yan-Kang Wang, Chengyi Xing, Yi-Lin Wei et al.
ECVC: Exploiting Non-Local Correlations in Multiple Frames for Contextual Video Compression
Wei Jiang, Junru Li, Kai Zhang et al.
ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models
Ozgur Kara, Krishna Kumar Singh, Feng Liu et al.
Causal Composition Diffusion Model for Closed-loop Traffic Generation
Haohong Lin, Xin Huang, Tung Phan-Minh et al.
UniHuman: A Unified Model For Editing Human Images in the Wild
Nannan Li, Qing Liu, Krishna Kumar Singh et al.
USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation
Xiaoqi Wang, Wenbin He, Xiwei Xuan et al.
SALAD: Skeleton-aware Latent Diffusion for Text-driven Motion Generation and Editing
Seokhyeon Hong, Chaelin Kim, Serin Yoon et al.
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
Runhui Huang, Xinpeng Ding, Chunwei Wang et al.
UniNet: A Contrastive Learning-guided Unified Framework with Feature Selection for Anomaly Detection
Shun Wei, Jielin Jiang, Xiaolong Xu
Rethinking Training for De-biasing Text-to-Image Generation: Unlocking the Potential of Stable Diffusion
Eunji Kim, Siwon Kim, Minjun Park et al.
Move Anything with Layered Scene Diffusion
Jiawei Ren, Mengmeng Xu, Jui-Chieh Wu et al.
Real-World Mobile Image Denoising Dataset with Efficient Baselines
Roman Flepp, Andrey Ignatov, Radu Timofte et al.
MCPNet: An Interpretable Classifier via Multi-Level Concept Prototypes
Bor Shiun Wang, Chien-Yi Wang, Wei-Chen Chiu
Partial-to-Partial Shape Matching with Geometric Consistency
Viktoria Ehm, Maolin Gao, Paul Roetzer et al.
Event-based Video Super-Resolution via State Space Models
Zeyu Xiao, Xinchao Wang
Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D
Mukund Varma T, Peihao Wang, Zhiwen Fan et al.
How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval?
Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain et al.
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
Xueqing Wu, Yuheng Ding, Bingxuan Li et al.
FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models
Haokun Chen, Hang Li, Yao Zhang et al.
SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation
Jiaben Chen, Huaizu Jiang
Representing Part-Whole Hierarchies in Foundation Models by Learning Localizability Composability and Decomposability from Anatomy via Self Supervision
Mohammad Reza Hosseinzadeh Taher, Michael Gotway, Jianming Liang
RoDLA: Benchmarking the Robustness of Document Layout Analysis Models
Yufan Chen, Jiaming Zhang, Kunyu Peng et al.
3D Neural Edge Reconstruction
Lei Li, Songyou Peng, Zehao Yu et al.
CVT-xRF: Contrastive In-Voxel Transformer for 3D Consistent Radiance Fields from Sparse Inputs
Yingji Zhong, Lanqing Hong, Zhenguo Li et al.
TANGO: Training-free Embodied AI Agents for Open-world Tasks
Filippo Ziliotto, Tommaso Campari, Luciano Serafini et al.
ASIGN: An Anatomy-aware Spatial Imputation Graphic Network for 3D Spatial Transcriptomics
Junchao Zhu, Ruining Deng, Tianyuan Yao et al.
MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation
Shuwei Shi, Biao Gong, Xi Chen et al.
An Empirical Study of the Generalization Ability of Lidar 3D Object Detectors to Unseen Domains
George Eskandar
SketchINR: A First Look into Sketches as Implicit Neural Representations
Hmrishav Bandyopadhyay, Ayan Kumar Bhunia, Pinaki Nath Chowdhury et al.
HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation
Kun Liu, Qi Liu, Xinchen Liu et al.
3D Multi-frame Fusion for Video Stabilization
Zhan Peng, Xinyi Ye, Weiyue Zhao et al.
MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild
Zeren Jiang, Chen Guo, Manuel Kaufmann et al.
Novel Class Discovery for Ultra-Fine-Grained Visual Categorization
Qi Jia, Yaqi Cai, Qi Jia et al.
ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large Language Models
Hao Yin, Guangzong Si, Zilei Wang
Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching
Bin Wang, Fan Wu, Linke Ouyang et al.
AG-VPReID: A Challenging Large-Scale Benchmark for Aerial-Ground Video-based Person Re-Identification
Huy Nguyen, Kien Nguyen Thanh, Akila Pemasiri et al.
Boosting Adversarial Training via Fisher-Rao Norm-based Regularization
Xiangyu Yin, Wenjie Ruan
F3Loc: Fusion and Filtering for Floorplan Localization
Changan Chen, Rui Wang, Christoph Vogel et al.
PH-Net: Semi-Supervised Breast Lesion Segmentation via Patch-wise Hardness
Siyao Jiang, Huisi Wu, Junyang Chen et al.
Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation
Qi Lv, Hao Li, Xiang Deng et al.
Functional Diffusion
Biao Zhang, Peter Wonka
AKiRa: Augmentation Kit on Rays for Optical Video Generation
Xi Wang, Robin Courant, Marc Christie et al.
Learning Hazing to Dehazing: Towards Realistic Haze Generation for Real-World Image Dehazing
Ruiyi Wang, Yushuo Zheng, Zicheng Zhang et al.
ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning
Zhenyang Liu, Yikai Wang, Sixiao Zheng et al.
Patient-Level Anatomy Meets Scanning-Level Physics: Personalized Federated Low-Dose CT Denoising Empowered by Large Language Model
Ziyuan Yang, Yingyu Chen, Zhiwen Wang et al.
ACL: Activating Capability of Linear Attention for Image Restoration
Yubin Gu, Yuan Meng, Jiayi Ji et al.
S²MVTC: a Simple yet Efficient Scalable Multi-View Tensor Clustering
Zhen Long, Qiyuan Wang, Yazhou Ren et al.
Unsupervised Foundation Model-Agnostic Slide-Level Representation Learning
Tim Lenz, Peter Neidlinger, Marta Ligero et al.
StarGen: A Spatiotemporal Autoregression Framework with Video Diffusion Model for Scalable and Controllable Scene Generation
Shangjin Zhai, Zhichao Ye, Jialin Liu et al.
Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning
Rashindrie Perera, Saman Halgamuge
Generative Powers of Ten
Xiaojuan Wang, Janne Kontkanen, Brian Curless et al.
RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation
Mingfei Han, Liang Ma, Kamila Zhumakhanova et al.
Image Generation Diversity Issues and How to Tame Them
Mischa Dombrowski, Weitong Zhang, Hadrien Reynaud et al.
EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild
Yumeng Liu, Xiaoxiao Long, Zemin Yang et al.
Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models
Luo Jiayun, Siddhesh Khandelwal, Leonid Sigal et al.
MotionPro: A Precise Motion Controller for Image-to-Video Generation
Zhongwei Zhang, Fuchen Long, Zhaofan Qiu et al.
NexusGS: Sparse View Synthesis with Epipolar Depth Priors in 3D Gaussian Splatting
Yulong Zheng, Zicheng Jiang, Shengfeng He et al.
VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding
Kangsan Kim, Geon Park, Youngwan Lee et al.
SmartEraser: Remove Anything from Images using Masked-Region Guidance
Longtao Jiang, Zhendong Wang, Jianmin Bao et al.
Defense Against Adversarial Attacks on No-Reference Image Quality Models with Gradient Norm Regularization
Yujia Liu, Chenxi Yang, Dingquan Li et al.
One-for-More: Continual Diffusion Model for Anomaly Detection
Xiaofan Li, Xin Tan, Zhuo Chen et al.
Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content
Rohit Kundu, Hao Xiong, Vishal Mohanty et al.
STEREO: A Two-Stage Framework for Adversarially Robust Concept Erasing from Text-to-Image Diffusion Models
Koushik Srivatsan, Fahad Shamshad, Muzammal Naseer et al.
Unexplored Faces of Robustness and Out-of-Distribution: Covariate Shifts in Environment and Sensor Domains
Eunsu Baek, Keondo Park, Ji-yoon Kim et al.
The Illusion of Unlearning: The Unstable Nature of Machine Unlearning in Text-to-Image Diffusion Models
Naveen George, Karthik Nandan Dasaraju, Rutheesh Reddy Chittepu et al.
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
Wei Chen, Lin Li, Yongqi Yang et al.
MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation
Yukang Lin, Hokit Fung, Jianjin Xu et al.
MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model
Chenjie Cao, Chaohui Yu, Shang Liu et al.
TexOct: Generating Textures of 3D Models with Octree-based Diffusion
Jialun Liu, Chenming Wu, Xinqi Liu et al.
Accelerating Neural Field Training via Soft Mining
Shakiba Kheradmand, Daniel Rebain, Gopal Sharma et al.
Multi-Attribute Interactions Matter for 3D Visual Grounding
Can Xu, Yuehui Han, Rui Xu et al.
Correcting Diffusion Generation through Resampling
Yujian Liu, Yang Zhang, Tommi Jaakkola et al.
Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction
Xiaoyang Lyu, Chirui Chang, Peng Dai et al.
The Power of Context: How Multimodality Improves Image Super-Resolution
Kangfu Mei, Vishal M. Patel, Mojtaba Sahraee-Ardakan et al.
Discover and Mitigate Multiple Biased Subgroups in Image Classifiers
Zeliang Zhang, Mingqian Feng, Zhiheng Li et al.
DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation
Zhixuan Liang, Yao Mu, Yixiao Wang et al.
Robust Overfitting Does Matter: Test-Time Adversarial Purification With FGSM
Linyu Tang, Lei Zhang
Universal Novelty Detection Through Adaptive Contrastive Learning
Hossein Mirzaei, Mojtaba Nafez, Mohammad Jafari et al.
Federated Online Adaptation for Deep Stereo
Matteo Poggi, Fabio Tosi
DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations
Ziqiao Peng, Yanbo Fan, Haoyu Wu et al.
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
Shehreen Azad, Vibhav Vineet, Yogesh S. Rawat
MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting
Sangwoon Kwak, Joonsoo Kim, Jun Young Jeong et al.
Zero-Shot Monocular Scene Flow Estimation in the Wild
Yiqing Liang, Abhishek Badki, Hang Su et al.
MG-MotionLLM: A Unified Framework for Motion Comprehension and Generation across Multiple Granularities
Bizhu Wu, Jinheng Xie, Keming Shen et al.
Prototype-Based Image Prompting for Weakly Supervised Histopathological Image Segmentation
Qingchen Tang, Lei Fan, Maurice Pagnucco et al.
Mr. DETR: Instructive Multi-Route Training for Detection Transformers
Chang-Bin Zhang, Yujie Zhong, Kai Han
CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers
Shahaf Arica, Or Rubin, Sapir Gershov et al.
DropoutGS: Dropping Out Gaussians for Better Sparse-view Rendering
Yexing Xu, Longguang Wang, Minglin Chen et al.
Improving Bird's Eye View Semantic Segmentation by Task Decomposition
Tianhao Zhao, Yongcan Chen, Yu Wu et al.
Unsupervised Gaze Representation Learning from Multi-view Face Images
Yiwei Bao, Feng Lu
PairAug: What Can Augmented Image-Text Pairs Do for Radiology?
Yutong Xie, Qi Chen, Sinuo Wang et al.
Instant Adversarial Purification with Adversarial Consistency Distillation
Chun Tong Lei, Hon Ming Yam, Zhongliang Guo et al.
EgoLM: Multi-Modal Language Model of Egocentric Motions
Fangzhou Hong, Vladimir Guzov, Hyo Jin Kim et al.
DRiVE: Diffusion-based Rigging Empowers Generation of Versatile and Expressive Characters
Mingze Sun, Junting Dong, Junhao Chen et al.
Uncertainty-guided Perturbation for Image Super-Resolution Diffusion Model
Leheng Zhang, Weiyi You, Kexuan Shi et al.
VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation
Saksham Singh Kushwaha, Yapeng Tian
PointInfinity: Resolution-Invariant Point Diffusion Models
Zixuan Huang, Justin Johnson, Shoubhik Debnath et al.
LoRACLR: Contrastive Adaptation for Customization of Diffusion Models
Enis Simsar, Thomas Hofmann, Federico Tombari et al.
METASCENES: Towards Automated Replica Creation for Real-world 3D Scans
Huangyue Yu, Baoxiong Jia, Yixin Chen et al.
MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation
Mingcheng Li, Xiaolu Hou, Ziyang Liu et al.
Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis
Zicheng Zhang, RUOBING ZHENG, Bonan Li et al.
Asymmetric Masked Distillation for Pre-Training Small Foundation Models
Zhiyu Zhao, Bingkun Huang, Sen Xing et al.
Completion as Enhancement: A Degradation-Aware Selective Image Guided Network for Depth Completion
Zhiqiang Yan, Zhengxue Wang, Kun Wang et al.
Weakly Supervised Monocular 3D Detection with a Single-View Image
Xueying Jiang, Sheng Jin, Lewei Lu et al.
PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion and Explicit Remeshing
Peng Li, Wangguandong Zheng, Yuan Liu et al.
SpiritSight Agent: Advanced GUI Agent with One Look
Zhiyuan Huang, Ziming Cheng, Junting Pan et al.
JointSQ: Joint Sparsification-Quantization for Distributed Learning
Weiying Xie, Haowei Li, Ma Jitao et al.
PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor
Jaewon Jung, Hongsun Jang, Jaeyong Song et al.
ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding
Zhenxing Zhang, Yaxiong Wang, Lechao Cheng et al.
Adapter Merging with Centroid Prototype Mapping for Scalable Class-Incremental Learning
Takuma Fukuda, Hiroshi Kera, Kazuhiko Kawamoto
Robust Depth Enhancement via Polarization Prompt Fusion Tuning
Kei IKEMURA, Yiming Huang, Felix Heide et al.
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models
Junzhe Chen, Tianshu Zhang, Shiyu Huang et al.
Quantifying Task Priority for Multi-Task Optimization
Wooseong Jeong, Kuk-Jin Yoon
When the Future Becomes the Past: Taming Temporal Correspondence for Self-supervised Video Representation Learning
Yang Liu, Qianqian Xu, Peisong Wen et al.
EMOE: Modality-Specific Enhanced Dynamic Emotion Experts
Yiyang Fang, Wenke Huang, Guancheng Wan et al.
Disentangled Pre-training for Human-Object Interaction Detection
Zhuolong Li, Xingao Li, Changxing Ding et al.
nnWNet: Rethinking the Use of Transformers in Biomedical Image Segmentation and Calling for a Unified Evaluation Benchmark
Yanfeng Zhou, Lingrui Li, Le Lu et al.
Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities
Michele Mazzamuto, Antonino Furnari, Yoichi Sato et al.
ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting
Shaofei Cai, Zihao Wang, Kewei Lian et al.
OmniStyle: Filtering High Quality Style Transfer Data at Scale
Ye Wang, Ruiqi Liu, Jiang Lin et al.
Physical Backdoor: Towards Temperature-based Backdoor Attacks in the Physical World
Wen Yin, Jian Lou, Pan Zhou et al.
HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator
Fan Yang, Ru Zhen, Jianing Wang et al.
FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning
Gaojian Wang, Feng Lin, Tong Wu et al.
From Activation to Initialization: Scaling Insights for Optimizing Neural Fields
Hemanth Saratchandran, Sameera Ramasinghe, Simon Lucey
A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives
Simone Alberto Peirone, Francesca Pistilli, Antonio Alliegro et al.
GenPC: Zero-shot Point Cloud Completion via 3D Generative Priors
An Li, Zhe Zhu, Mingqiang Wei
NeRFPrior: Learning Neural Radiance Field as a Prior for Indoor Scene Reconstruction
Wenyuan Zhang, Emily Yue-ting Jia, Junsheng Zhou et al.
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Hanyang Wang, Fangfu Liu, Jiawei Chi et al.
GaussianUDF: Inferring Unsigned Distance Functions through 3D Gaussian Splatting
Shujuan Li, Yu-Shen Liu, Zhizhong Han
BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations
Weixi Feng, Chao Liu, Sifei Liu et al.
CDMAD: Class-Distribution-Mismatch-Aware Debiasing for Class-Imbalanced Semi-Supervised Learning
Hyuck Lee, Heeyoung Kim
Flattening the Parent Bias: Hierarchical Semantic Segmentation in the Poincaré Ball
Simon Weber, Barış Zöngür, Nikita Araslanov et al.
LiDAR-RT: Gaussian-based Ray Tracing for Dynamic LiDAR Re-simulation
Chenxu Zhou, Lvchang Fu, Sida Peng et al.
RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion
Xiaomeng Chu, Jiajun Deng, Guoliang You et al.