Most Cited ICCV "attention localization" Papers
2,701 papers found • Page 7 of 14
Conference
Variance-Based Pruning for Accelerating and Compressing Trained Networks
Uranik Berisha, Jens Mehnert, Alexandru Condurache
PoseSyn: Synthesizing Diverse 3D Pose Data from In-the-Wild 2D Data
CHANGHEE YANG, Hyeonseop Song, Seokhun Choi et al.
Conditional Latent Diffusion Models for Zero-Shot Instance Segmentation
Maximilian Ulmer, Wout Boerdijk, Rudolph Triebel et al.
Outlier-Aware Post-Training Quantization for Image Super-Resolution
Hailing Wang, Jianglin Lu, Yitian Zhang et al.
Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning
Zedong Wang, Siyuan Li, Dan Xu
Achieving More with Less: Additive Prompt Tuning for Rehearsal-Free Class-Incremental Learning
Haoran Chen, Ping Wang, Zihan Zhou et al.
Learning Pixel-adaptive Multi-layer Perceptrons for Real-time Image Enhancement
Junyu Lou, Xiaorui Zhao, Kexuan Shi et al.
MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning
Tianhong Gao, Yannian Fu, Weiqun Wu et al.
CULTURE3D: A Large-Scale and Diverse Dataset of Cultural Landmarks and Terrains for Gaussian-Based Scene Rendering
xinyi zheng, Steve Zhang, Weizhe Lin et al.
Mind the Cost of Scaffold! Benign Clients May Even Become Accomplices of Backdoor Attack
Xingshuo Han, Xuanye Zhang, Xiang Lan et al.
TrackAny3D: Transferring Pretrained 3D Models for Category-unified 3D Point Cloud Tracking
Mengmeng Wang, Haonan Wang, Yulong Li et al.
PanoSplatt3R: Leveraging Perspective Pretraining for Generalized Unposed Wide-Baseline Panorama Reconstruction
Jiahui Ren, Mochu Xiang, Jiajun Zhu et al.
MotionDiff: Training-free Zero-shot Interactive Motion Editing via Flow-assisted Multi-view Diffusion
Yikun Ma, Yiqing Li, Jiawei Wu et al.
IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance
Jiayi Guo, Chuanhao Yan, Xingqian Xu et al.
Latent Swap Joint Diffusion for 2D Long-Form Latent Generation
Yusheng Dai, Chenxi Wang, Chang Li et al.
BlinkTrack: Feature Tracking over 80 FPS via Events and Images
Yichen Shen, Yijin Li, Shuo Chen et al.
Aligning Moments in Time using Video Queries
Yogesh Kumar, Uday Agarwal, Manish Gupta et al.
MoMaps: Semantics-Aware Scene Motion Generation with Motion Maps
Jiahui Lei, Kyle Genova, George Kopanas et al.
GVDepth: Zero-Shot Monocular Depth Estimation for Ground Vehicles based on Probabilistic Cue Fusion
Karlo Koledic, Luka Petrovic, Ivan Marković et al.
CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models
Junho Kim, Hyungjin Chung, Byung-Hoon Kim
DONUT: A Decoder-Only Model for Trajectory Prediction
Markus Knoche, Daan de Geus, Bastian Leibe
Fish2Mesh Transformer: 3D Human Mesh Recovery from Egocentric Vision
Tianma Shen, Aditya Shrish Puranik, James Vong et al.
RGE-GS: Reward-Guided Expansive Driving Scene Reconstruction via Diffusion Priors
Sicong Du, Jiarun Liu, Qifeng Chen et al.
Bridging Diffusion Models and 3D Representations: A 3D Consistent Super-Resolution Framework
Yi-Ting Chen, Ting-Hsuan Liao, Pengsheng Guo et al.
PS3: A Multimodal Transformer Integrating Pathology Reports with Histology Images and Biological Pathways for Cancer Survival Prediction
Manahil Raza, Ayesha Azam, Talha Qaiser et al.
BASIC: Boosting Visual Alignment with Intrinsic Refined Embeddings in Multimodal Large Language Models
Jianting Tang, Yubo Wang, Haoyu Cao et al.
Granular Concept Circuits: Toward a Fine-Grained Circuit Discovery for Concept Representations
Dahee Kwon, Sehyun Lee, Jaesik Choi
Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions
Liang Xu, Chengqun Yang, Zili Lin et al.
Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing
Joowon Kim, Ziseok Lee, Donghyeon Cho et al.
VisHall3D: Monocular Semantic Scene Completion from Reconstructing the Visible Regions to Hallucinating the Invisible Regions
Haoang Lu, Yuanqi Su, Xiaoning Zhang et al.
DICE: Staleness-Centric Optimizations for Parallel Diffusion MoE Inference
Jiajun Luo, Lizhuo Luo, Jianru Xu et al.
LiON-LoRA: Rethinking LoRA Fusion to Unify Controllable Spatial and Temporal Generation for Video Diffusion
Yisu Zhang, Chenjie Cao, Chaohui Yu et al.
Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval
Dohwan Ko, Ji Soo Lee, Minhyuk Choi et al.
PERSONA: Personalized Whole-Body 3D Avatar with Pose-Driven Deformations from a Single Image
Geonhee Sim, Gyeongsik Moon
Controllable-LPMoE: Adapting to Challenging Object Segmentation via Dynamic Local Priors from Mixture-of-Experts
Yanguang Sun, Jiawei Lian, jian Yang et al.
Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models
Wei Suo, Ji Ma, Mengyang Sun et al.
FPEM: Face Prior Enhanced Facial Attractiveness Prediction for Live Videos with Face Retouching
Hui Li, Xiaoyu Ren, Hongjiu Yu et al.
Latent Expression Generation for Referring Image Segmentation and Grounding
Seonghoon Yu, Junbeom Hong, Joonseok Lee et al.
Breaking Rectangular Shackles: Cross-View Object Segmentation for Fine-Grained Object Geo-Localization
Qingwang Zhang, Yingying Zhu
Pseudo-SD: Pseudo Controlled Stable Diffusion for Semi-Supervised and Cross-Domain Semantic Segmentation
Dong Zhao, Qi Zang, Shuang Wang et al.
Seeing Through Deepfakes: A Human-Inspired Framework for Multi-Face Detection
Juan Hu, Shaojing Fan, Terence Sim
DexH2R: A Benchmark for Dynamic Dexterous Grasping in Human-to-Robot Handover
Youzhuo Wang, jiayi ye, Chuyang Xiao et al.
HoliTracer: Holistic Vectorization of Geographic Objects from Large-Size Remote Sensing Imagery
Yu Wang, Bo Dang, Wanchun Li et al.
Boosting Multi-View Indoor 3D Object Detection via Adaptive 3D Volume Construction
Runmin Zhang, Zhu Yu, Si-Yuan Cao et al.
DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization
Dongyeun Lee, jiwan hur, Hyounguk Shon et al.
Visual Intention Grounding for Egocentric Assistants
Pengzhan Sun, Junbin Xiao, Tze Ho Elden Tse et al.
SAFER: Sharpness Aware layer-selective Finetuning for Enhanced Robustness in vision transformers
Bhavna Gopal, Huanrui Yang, Mark Horton et al.
SegmentDreamer: Towards High-fidelity Text-to-3D Synthesis with Segmented Consistency Trajectory Distillation
Jiahao Zhu, Zixuan Chen, Guangcong Wang et al.
StrandHead: Text to Hair-Disentangled 3D Head Avatars Using Human-Centric Priors
Xiaokun Sun, Zeyu Cai, Ying Tai et al.
FakeRadar: Probing Forgery Outliers to Detect Unknown Deepfake Videos
Zhaolun Li, Jichang Li, Yinqi Cai et al.
Context Guided Transformer Entropy Modeling for Video Compression
Junlong Tong, Wei Zhang, Yaohui Jin et al.
Free-MoRef: Instantly Multiplexing Context Perception Capabilities of Video-MLLMs within Single Inference
KUO WANG, Quanlong Zheng, Junlin Xie et al.
Towards Fine-grained Interactive Segmentation in Images and Videos
Yuan Yao, Qiushi Yang, Miaomiao Cui et al.
Augmented and Softened Matching for Unsupervised Visible-Infrared Person Re-Identification
Zhiqi Pang, Chunyu Wang, Lingling Zhao et al.
Representing 3D Shapes With 64 Latent Vectors for 3D Diffusion Models
In Cho, Youngbeom Yoo, Subin Jeon et al.
TAB: Transformer Attention Bottlenecks enable User Intervention and Debugging in Vision-Language Models
Pooyan Rahmanzadehgervi, Hung Nguyen, Rosanne Liu et al.
SL2A-INR: Single-Layer Learnable Activation for Implicit Neural Representation
Reza Rezaeian, Moein Heidari, Reza Azad et al.
Perspective-aware 3D Gaussian Inpainting with Multi-view Consistency
Yuxin CHENG, Binxiao Huang, Taiqiang Wu et al.
Task-Specific Zero-shot Quantization-Aware Training for Object Detection
Changhao Li, Xinrui Chen, Ji Wang et al.
Time-Aware Auto White Balance in Mobile Photography
Mahmoud Afifi, Luxi Zhao, Abhijith Punnappurath et al.
Leveraging Panoptic Scene Graph for Evaluating Fine-Grained Text-to-Image Generation
Xueqing Deng, Linjie Yang, Qihang Yu et al.
ViewSRD: 3D Visual Grounding via Structured Multi-View Decomposition
Ronggang Huang, Haoxin Yang, Yan Cai et al.
Physical Degradation Model-Guided Interferometric Hyperspectral Reconstruction with Unfolding Transformer
Yuansheng Li, Yunhao Zou, Linwei Chen et al.
VPR-Cloak: A First Look at Privacy Cloak Against Visual Place Recognition
Shuting Dong, Mingzhi Chen, Feng Lu et al.
Evidential Knowledge Distillation
Liangyu Xiang, Junyu Gao, Changsheng Xu
Hierarchical Variational Test-Time Prompt Generation for Zero-Shot Generalization
Zhaoyang Wu, Fang Liu, Licheng Jiao et al.
GUAVA: Generalizable Upper Body 3D Gaussian Avatar
Dongbin Zhang, Yunfei Liu, Lijian Lin et al.
CO2-Net: A Physics-Informed Spatio-Temporal Model for Global Surface CO2 Reconstruction
Hao Zheng, Yuting Zheng, Hanbo Huang et al.
HOMO-Feature: Cross-Arbitrary-Modal Image Matching with Homomorphism of Organized Major Orientation
Chenzhong Gao, Wei Li, Desheng Weng
OCSplats: Observation Completeness Quantification and Label Noise Separation in 3DGS
Han Ling, Yinghui Sun, Xian Xu et al.
GSOT3D: Towards Generic 3D Single Object Tracking in the Wild
Yifan Jiao, Yunhao Li, Junhua Ding et al.
GWM: Towards Scalable Gaussian World Models for Robotic Manipulation
Guanxing Lu, Baoxiong Jia, Puhao Li et al.
Semantic Alignment and Reinforcement for Data-Free Quantization of Vision Transformers
Yunshan Zhong, Yuyao Zhou, Yuxin Zhang et al.
Boosting Multimodal Learning via Disentangled Gradient Learning
Shicai Wei, Chunbo Luo, Yang Luo
Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection
Yehao Lu, Minghe Weng, Zekang Xiao et al.
WAVE: Warp-Based View Guidance for Consistent Novel View Synthesis Using a Single Image
Jiwoo Park, Tae Choi, Youngjun Jun et al.
DanceEditor: Towards Iterative Editable Music-driven Dance Generation with Open-Vocabulary Descriptions
Hengyuan Zhang, Zhe Li, Xingqun Qi et al.
TAG-WM: Tamper-Aware Generative Image Watermarking via Diffusion Inversion Sensitivity
Yuzhuo Chen, Zehua Ma, Han Fang et al.
HORT: Monocular Hand-held Objects Reconstruction with Transformers
Zerui Chen, Rolandos Alexandros Potamias, Shizhe Chen et al.
SurfaceSplat: Connecting Surface Reconstruction and Gaussian Splatting
Zihui Gao, Jia-Wang Bian, Guosheng Lin et al.
CaliMatch: Adaptive Calibration for Improving Safe Semi-supervised Learning
Jinsoo Bae, Seoung Bum Kim, Hyungrok Do
Lightweight and Fast Real-time Image Enhancement via Decomposition of the Spatial-aware Lookup Tables
Wontae Kim, Keuntek Lee, Nam Ik Cho
Neural Multi-View Self-Calibrated Photometric Stereo without Photometric Stereo Cues
Xu Cao, Takafumi Taketomi
Reminiscence Attack on Residuals: Exploiting Approximate Machine Unlearning for Privacy
Yaxin Xiao, Qingqing Ye, Li Hu et al.
RayPose: Ray Bundling Diffusion for Template Views in Unseen 6D Object Pose Estimation
Junwen Huang, Shishir Reddy Vutukur, Peter Yu et al.
Tensor-aggregated LoRA in Federated Fine-tuning
Zhixuan Li, Binqian Xu, Xiangbo Shu et al.
EMatch: A Unified Framework for Event-based Optical Flow and Stereo Matching
Pengjie Zhang, Lin Zhu, Xiao Wang et al.
QR-LoRA: Efficient and Disentangled Fine-tuning via QR Decomposition for Customized Generation
Jiahui Yang, Yongjia Ma, Donglin Di et al.
Backdooring Self-Supervised Contrastive Learning by Noisy Alignment
Tuo Chen, Jie Gui, Minjing Dong et al.
CounterPC: Counterfactual Feature Realignment for Unsupervised Domain Adaptation on Point Clouds
Feng Yang, Yichao Cao, Xiu Su et al.
Robust Dataset Condensation using Supervised Contrastive Learning
Nicole Kim, Hwanjun Song
Liberated-GS: 3D Gaussian Splatting Independent from SfM Point Clouds
Weihong Pan, Xiaoyu Zhang, Hongjia Zhai et al.
Unlocking the Potential of Diffusion Priors in Blind Face Restoration
Yunqi Miao, Zhiyu Qu, Mingqi Gao et al.
Self-Supervised Sparse Sensor Fusion for Long Range Perception
Edoardo Palladin, Samuel Brucker, Filippo Ghilotti et al.
AccidentalGS: 3D Gaussian Splatting from Accidental Camera Motion
Mao Mao, Xujie Shen, Guyuan Chen et al.
Event-boosted Deformable 3D Gaussians for Dynamic Scene Reconstruction
Wenhao Xu, Wenming Weng, Yueyi Zhang et al.
Implicit Counterfactual Learning for Audio-Visual Segmentation
Mingfeng Zha, Tianyu Li, Guoqing Wang et al.
STaR: Seamless Spatial-Temporal Aware Motion Retargeting with Penetration and Consistency Constraints
Xiaohang Yang, Qing Wang, Jiahao Yang et al.
MRGen: Segmentation Data Engine For Underrepresented MRI Modalities
Haoning Wu, Ziheng Zhao, Ya Zhang et al.
Competitive Distillation: A Simple Learning Strategy for Improving Visual Classification
Daqian Shi, Xiaolei Diao, Xu Chen et al.
Rethink Sparse Signals for Pose-guided Text-to-image Generation
Wenjie Xuan, Jing Zhang, Juhua Liu et al.
MoFRR: Mixture of Diffusion Models for Face Retouching Restoration
Jiaxin Liu, Qichao Ying, Zhenxing Qian et al.
Adversarial Reconstruction Feedback for Robust Fine-grained Generalization
Shijie Wang, Jian Shi, Haojie Li
Unified Adversarial Augmentation for Improving Palmprint Recognition
Jianlong Jin, Chenglong Zhao, Ruixin Zhang et al.
Adding Additional Control to One-Step Diffusion with Joint Distribution Matching
Yihong Luo, Tianyang Hu, Yifan Song et al.
Uncover Treasures in DCT: Advancing JPEG Quality Enhancement by Exploiting Latent Correlations
jing Yang, Qunliang Xing, Mai Xu et al.
Single-Scanline Relative Pose Estimation for Rolling Shutter Cameras
Petr Hruby, Marc Pollefeys
Unified Multi-Agent Trajectory Modeling with Masked Trajectory Diffusion
songru Yang, Zhenwei Shi, Zhengxia Zou
Enhancing Transferability of Targeted Adversarial Examples via Inverse Target Gradient Competition and Spatial Distance Stretching
Zhankai Li, Weiping Wang, jie li et al.
LDPose: Towards Inclusive Human Pose Estimation for Limb-Deficient Individuals in the Wild
Jiaying Ying, Heming Du, Kaihao Zhang et al.
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
Ahmed Nassar, Matteo Omenetti, Maksym Lysak et al.
Images as Noisy Labels: Unleashing the Potential of the Diffusion Model for Open-Vocabulary Semantic Segmentation
Fan Li, Xuanbin Wang, Xuan Wang et al.
ContextFace: Generating Facial Expressions from Emotional Contexts
minjung kim, Minsang Kim, Seung Jun Baek
SMP-Attack: Boosting the Transferability of Feature Importance-based Adversarial Attack with Semantics-aware Multi-granularity Patchout
Wen Yang, Guodong Liu, Di Ming
Spatial-Temporal Forgery Trace based Forgery Image Identification
Yilin Wang, Zunlei Feng, Jiachi Wang et al.
Towards Annotation-Free Evaluation: KPAScore for Human Keypoint Detection
Xiaoxiao Wang, Chunxiao Li, Peng Sun et al.
Ultra High-Resolution Image Inpainting with Patch-Based Content Consistency Adapter
JianHui Zhang, Shen Cheng, Qirui Sun et al.
Agreement aware and dissimilarity oriented GLOM
Ru Zeng, Yan Song, Yang ZHANG et al.
The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation
Aoxiong Yin, Kai Shen, Yichong Leng et al.
Bridging Class Imbalance and Partial Labeling via Spectral-Balanced Energy Propagation for Skeleton-based Action Recognition
Yandan Wang, Chenqi Guo, Yinglong Ma et al.
MeasureXpert: Automatic Anthropometric Measurement Extraction from Two Unregistered, Partial, Posed, and Dressed Body Scans
Ran Zhao, Xinxin Dai, Pengpeng Hu et al.
ForeSight: Multi-View Streaming Joint Object Detection and Trajectory Forecasting
Sandro Papais, Letian Wang, Brian Cheong et al.
PROL : Rehearsal Free Continual Learning in Streaming Data via Prompt Online Learning
Muhammad Anwar Ma'sum, Mahardhika Pratama, Savitha Ramasamy et al.
Dual Domain Control via Active Learning for Remote Sensing Domain Incremental Object Detection
Jiachen Sun, De Cheng, Xi Yang et al.
SUV: Suppressing Undesired Video Content via Semantic Modulation Based on Text Embeddings
Xiang Lv, Mingwen Shao, Lingzhuang Meng et al.
Enpowering Your Pansharpening Models with Generalizability: Unified Distribution is All You Need
Yongchuan Cui, Peng Liu, HUI ZHANG
DiMPLe - Disentangled Multi-Modal Prompt Learning: Enhancing Out-Of-Distribution Alignment with Invariant and Spurious Feature Separation
Umaima Rahman, Mohammad Yaqub, Dwarikanath Mahapatra
ResidualViT for Efficient Temporally Dense Video Encoding
Mattia Soldan, Fabian Caba Heilbron, Bernard Ghanem et al.
LLM Thought Divergence and Convergence for Dialogue-Based Image Generation Control
Hui Li
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics
Ruining Li, Chuanxia Zheng, Christian Rupprecht et al.
Beyond Low-Rank Tuning: Model Prior-Guided Rank Allocation for Effective Transfer in Low-Data and Large-Gap Regimes.
Chuyan Zhang, Kefan Wang, Yun Gu
OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography
Li Caoshuo, Zengmao Ding, Xiaobin Hu et al.
COSMO: Combination of Selective Memorization for Low-cost Vision-and-Language Navigation
Siqi Zhang, Yanyuan Qiao, Qunbo Wang et al.
CoStoDet-DDPM: Collaborative Training of Stochastic and Deterministic Models Improves Surgical Workflow Anticipation and Recognition
Kaixiang Yang, Xin Li, Qiang Li et al.
Exploring Weather-aware Aggregation and Adaptation for Semantic Segmentation under Adverse Conditions
Yuwen Pan, Rui Sun, Wangkai Li et al.
Randomized Autoregressive Visual Generation
Qihang Yu, Ju He, Xueqing Deng et al.
Unsupervised RGB-D Point Cloud Registration for Scenes with Low Overlap and Photometric Inconsistency
yejun Shou, Haocheng Wang, Lingfeng Shen et al.
TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision
Ayush Gupta, Anirban Roy, Rama Chellappa et al.
DynFaceRestore: Balancing Fidelity and Quality in Diffusion-Guided Blind Face Restoration with Dynamic Blur-Level Mapping and Guidance
Huu Phu Do, Yu-Wei Chen, Yi-Cheng Liao et al.
Gradient-Reweighted Adversarial Camouflage for Physical Object Detection Evasion
Jiawei Liang, Siyuan Liang, Tianrui Lou et al.
Training-free Geometric Image Editing on Diffusion Models
Hanshen Zhu, Zhen Zhu, Kaile Zhang et al.
Monocular Facial Appearance Capture in the Wild
Yingyan Xu, Kate Gadola, Prashanth Chandran et al.
Growing a Twig to Accelerate Large Vision-Language Models
Zhenwei Shao, Mingyang Wang, Zhou Yu et al.
SignRep: Enhancing Self-Supervised Sign Representations
Ryan Wong, Necati Cihan Camgoz, Richard Bowden
MixA: A Mixed Attention approach with Stable Lightweight Linear Attention to enhance Efficiency of Vision Transformers at the Edge
Sabbir Ahmed, Jingtao Li, Weiming Zhuang et al.
Transparent Vision: A Theory of Hierarchical Invariant Representations
Shuren Qi, Yushu Zhang, CHAO WANG et al.
TemCoCo: Temporally Consistent Multi-modal Video Fusion with Visual-Semantic Collaboration
Gong Meiqi, Hao Zhang, Xunpeng Yi et al.
RetinexMCNet: A Memory Controller Dominated Network for Low-Light Video Enhancement Based on Retinex
Meiao Wang, Xuejing Kang, Yaxi Lu et al.
Sliced Wasserstein Bridge for Open-Vocabulary Video Instance Segmentation
Zheyun Qin, Deng Yu, Chuanchen Luo et al.
Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis
Zhuokun Chen, Jugang Fan, Zhuowei Yu et al.
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation
Junyuan Zhang, Qintong Zhang, Bin Wang et al.
Efficient Event Camera Data Pretraining with Adaptive Prompt Fusion
Quanmin Liang, Qiang Li, Shuai Liu et al.
Lightweight Gradient-Aware Upscaling of 3D Gaussian Splatting Images
Simon Niedermayr, Christoph Neuhauser, Rüdiger Westermann
RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation
Kaidong Zhang, Rongtao Xu, Ren Pengzhen et al.
3D Gaussian Splatting Driven Multi-View Robust Physical Adversarial Camouflage Generation
Tianrui Lou, Xiaojun Jia, Siyuan Liang et al.
Head2Body: Body Pose Generation from Multi-sensory Head-mounted Inputs
Minh Tran, Hongda Mao, Qingshuang Chen et al.
LLM-Assisted Semantic Guidance for Sparsely Annotated Remote Sensing Object Detection
Wei Liao, Chunyan Xu, Chenxu Wang et al.
Looking in the Mirror: A Faithful Counterfactual Explanation Method for Interpreting Deep Image Classification Models
Townim Chowdhury, Vu Phan, Kewen Liao et al.
FLSeg: Enhancing Privacy and Robustness in Federated Learning under Heterogeneous Data via Model Segmentation
Zichun Su, Zhi Lu, Yutong Wu et al.
Self-Calibrating Gaussian Splatting for Large Field-of-View Reconstruction
Youming Deng, Wenqi Xian, Guandao Yang et al.
DADM: Dual Alignment of Domain and Modality for Face Anti-spoofing
Yang JingYi, Xun Lin, Zitong YU et al.
Gradient Decomposition and Alignment for Incremental Object Detection
Wenlong Luo, Shizhou Zhang, De Cheng et al.
PacGDC: Label-Efficient Generalizable Depth Completion with Projection Ambiguity and Consistency
Haotian Wang, Aoran Xiao, Xiaoqin Zhang et al.
MSQ: Memory-Efficient Bit Sparsification Quantization
Seokho Han, Seoyeon Yoon, Jinhee Kim et al.
SuMa: A Subspace Mapping Approach for Robust and Effective Concept Erasure in Text-to-Image Diffusion Models
Kien Nguyen, Anh Tran, Cuong Pham
Recovering Parametric Scenes from Very Few Time-of-Flight Pixels
Carter Sifferman, Yiquan Li, Yiming Li et al.
MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video Understanding
Tongtong Cheng, Rongzhen Li, Yixin Xiong et al.
When and Where do Data Poisons Attack Textual Inversion?
Jeremy Styborski, Mingzhi Lyu, Jiayou Lu et al.
SRefiner: Soft-Braid Attention for Multi-Agent Trajectory Refinement
Liwen Xiao, Zhiyu Pan, Zhicheng Wang et al.
Rethinking Few Shot CLIP Benchmarks: A Critical Analysis in the Inductive Setting
Alexey Kravets, Da Chen, Vinay Namboodiri
Engage for All: Making Ordinary Image Descriptions Appealing Again!
Yuyan Chen, Yifan Jiang, Li Zhou et al.
Seam360GS: Seamless 360° Gaussian Splatting from Real-World Omnidirectional Images
Changha Shin, Woong Oh Cho, Seon Joo Kim
HiGarment: Cross-modal Harmony Based Diffusion Model for Flat Sketch to Realistic Garment Image
Junyi Guo, Jingxuan Zhang, Fangyu Wu et al.
AU-Blendshape for Fine-grained Stylized 3D Facial Expression Manipulation
Hao Li, Ju Dai, Feng Zhou et al.
BokehDiff: Neural Lens Blur with One-Step Diffusion
Chengxuan Zhu, Qingnan Fan, Qi Zhang et al.
Geometry Distributions
Biao Zhang, Jing Ren, Peter Wonka
Trial-Oriented Visual Rearrangement
Yuyi Liu, Xinhang Song, Tianliang Qi et al.
Debiased Teacher for Day-to-Night Domain Adaptive Object Detection
Yiming Cui, Liang Li, Haibing YIN et al.
Towards Effective Foundation Model Adaptation for Extreme Cross-Domain Few-Shot Learning
Fei Zhou, Peng Wang, Lei Zhang et al.
Social Debiasing for Fair Multi-modal LLMs
Harry Cheng, Yangyang Guo, Qingpei Guo et al.
Hierarchy-Aware Pseudo Word Learning with Text Adaptation for Zero-Shot Composed Image Retrieval
Zhe Li, Lei Zhang, Zheren Fu et al.
UPP: Unified Point-Level Prompting for Robust Point Cloud Analysis
Zixiang Ai, Zhenyu Cui, Yuxin Peng et al.
AV-Flow: Transforming Text to Audio-Visual Human-like Interactions
Aggelina Chatziagapi, Louis-Philippe Morency, Hongyu Gong et al.
Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy
Yiting Yang, Hao Luo, Yuan Sun et al.
Probabilistic Inertial Poser (ProbIP): Uncertainty-aware Human Motion Modeling from Sparse Inertial Sensors
Min Kim, Younho Jeon, Sungho Jo
SFUOD: Source-Free Unknown Object Detection
Keon-Hee Park, Seun-An Choe, Gyeong-Moon Park
Compression-Aware One-Step Diffusion Model for JPEG Artifact Removal
Jinpei Guo, Zheng Chen, Wenbo Li et al.
ConstStyle: Robust Domain Generalization with Unified Style Transformation
Nam Duong Tran, Nam Nguyen Phuong, Hieu Pham et al.
ConsistentCity: Semantic Flow-guided Occupancy DiT for Temporally Consistent Driving Scene Synthesis
Benjin Zhu, Xiaogang Wang, Hongsheng Li
CLOT: Closed Loop Optimal Transport for Unsupervised Action Segmentation
Elena Bueno-Benito, Mariella Dimiccoli
Dual-Temporal Exemplar Representation Network for Video Semantic Segmentation
Xiaolong Xu, Lei Zhang, Jiayi Li et al.
Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis
Bowen Zhang, Sicheng Xu, Chuxin Wang et al.
Golden Noise for Diffusion Models: A Learning Framework
zikai zhou, Shitong Shao, Lichen Bai et al.
Vision-Language Interactive Relation Mining for Open-Vocabulary Scene Graph Generation
Yukuan Min, Muli Yang, Jinhao Zhang et al.
OrderChain: Towards General Instruct-Tuning for Stimulating the Ordinal Understanding Ability of MLLM
Jinhong Wang, Shuo Tong, Jintai CHEN et al.
Unified Open-World Segmentation with Multi-Modal Prompts
Yang Liu, Yufei Yin, Chenchen Jing et al.
Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction
Haonan Wang, Qixiang ZHANG, Lehan Wang et al.
Outdoor Monocular SLAM with Global Scale-Consistent 3D Gaussian Pointmaps
Chong Cheng, Sicheng Yu, Zijian Wang et al.
LayerAnimate: Layer-level Control for Animation
Yuxue Yang, Lue Fan, Zuzeng Lin et al.
LHM: Large Animatable Human Reconstruction Model for Single Image to 3D in Seconds
Lingteng Qiu, Xiaodong Gu, Peihao Li et al.