Most Cited CVPR "active vision" Papers
5,589 papers found • Page 13 of 28
Conference
A3: Few-shot Prompt Learning of Unlearnable Examples with Cross-Modal Adversarial Feature Alignment
Xuan Wang, Xitong Gao, Dongping Liao et al.
HybridMQA: Exploring Geometry-Texture Interactions for Colored Mesh Quality Assessment
Armin Shafiee Sarvestani, Sheyang Tang, Zhou Wang
MaRI: Material Retrieval Integration across Domains
Jianhui Wang, Zhifei Yang, Yangfan He et al.
JDEC: JPEG Decoding via Enhanced Continuous Cosine Coefficients
Woo Kyoung Han, Sunghoon Im, Jaedeok Kim et al.
SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion
Xiyue Guo, Jiarui Hu, Junjie Hu et al.
Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency
Yutong Wang, Jiajie Teng, Jiajiong Cao et al.
UniPTS: A Unified Framework for Proficient Post-Training Sparsity
JingJing Xie, Yuxin Zhang, Mingbao Lin et al.
PoseTraj: Pose-Aware Trajectory Control in Video Diffusion
longbin ji, Lei Zhong, Pengfei Wei et al.
EigenGS Representation: From Eigenspace to Gaussian Image Space
LO-WEI TAI, Ching-En Ching En, Li et al.
IM-Zero: Instance-level Motion Controllable Video Generation in a Zero-shot Manner
Yuyang Huang, Yabo Chen, Li Ding et al.
Watermarking One for All: A Robust Watermarking Scheme Against Partial Image Theft
Gaozhi Liu, Silu Cao, Zhenxing Qian et al.
Is `Right' Right? Enhancing Object Orientation Understanding in Multimodal Large Language Models through Egocentric Instruction Tuning
JiHyeok Jung, EunTae Kim, SeoYeon Kim et al.
FaceCom: Towards High-fidelity 3D Facial Shape Completion via Optimization and Inpainting Guidance
Yinglong Li, Hongyu Wu, Wang et al.
Decoder Gradient Shield: Provable and High-Fidelity Prevention of Gradient-Based Box-Free Watermark Removal
Haonan An, Guang Hua, Zhengru Fang et al.
Learning from Synchronization: Self-Supervised Uncalibrated Multi-View Person Association in Challenging Scenes
Keqi Chen, vinkle srivastav, Didier MUTTER et al.
Decoupling Training-Free Guided Diffusion by ADMM
Youyuan Zhang, Zehua Liu, Zenan Li et al.
CMMLoc: Advancing Text-to-PointCloud Localization with Cauchy-Mixture-Model Based Framework
Yanlong Xu, Haoxuan Qu, Jun Liu et al.
Fine-Grained Image-Text Correspondence with Cost Aggregation for Open-Vocabulary Part Segmentation
Jiho Choi, Seonho Lee, Minhyun Lee et al.
Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval
Mankeerat Sidhu, Hetarth Chopra, Ansel Blume et al.
Classifier-to-Bias: Toward Unsupervised Automatic Bias Detection for Visual Classifiers
Quentin Guimard, Moreno D'Incà, Massimiliano Mancini et al.
Multi-focal Conditioned Latent Diffusion for Person Image Synthesis
Jiaqi Liu, Jichao Zhang, Paolo Rota et al.
Diffusion-based Realistic Listening Head Generation via Hybrid Motion Modeling
Yinuo Wang, Yanbo Fan, Xuan Wang et al.
Zero-Shot Blind-spot Image Denoising via Implicit Neural Sampling
Yuhui Quan, Tianxiang Zheng, Zhiyuan Ma et al.
TailedCore: Few-Shot Sampling for Unsupervised Long-Tail Noisy Anomaly Detection
Yoon Gyo Jung, Jaewoo Park, Jaeho Yoon et al.
Previously on ... From Recaps to Story Summarization
Aditya Kumar Singh, Dhruv Srivastava, Makarand Tapaswi
DOF-GS: Adjustable Depth-of-Field 3D Gaussian Splatting for Post-Capture Refocusing, Defocus Rendering and Blur Removal
Yujie Wang, Praneeth Chakravarthula, Baoquan Chen
3D-AVS: LiDAR-based 3D Auto-Vocabulary Segmentation
Weijie Wei, Osman Ülger, Fatemeh Karimi Nejadasl et al.
Scale Efficient Training for Large Datasets
Qing Zhou, Junyu Gao, Qi Wang
Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning
Maosen Zhao, Pengtao Chen, Chong Yu et al.
Degrees of Freedom Matter: Inferring Dynamics from Point Trajectories
Yan Zhang, Sergey Prokudin, Marko Mihajlovic et al.
Empowering Large Language Models with 3D Situation Awareness
Zhihao Yuan, Yibo Peng, Jinke Ren et al.
Learnable Infinite Taylor Gaussian for Dynamic View Rendering
Bingbing Hu, Yanyan Li, rui xie et al.
ArcPro: Architectural Programs for Structured 3D Abstraction of Sparse Points
Qirui Huang, Runze Zhang, Kangjun Liu et al.
Reanimating Images using Neural Representations of Dynamic Stimuli
Jacob Yeung, Andrew Luo, Gabriel Sarch et al.
GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs
Yi Fang, Bowen Jin, Jiacheng Shen et al.
Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmonization
Junying Wang, Jingyuan Liu, Xin Sun et al.
PGC: Physics-Based Gaussian Cloth from a Single Pose
Michelle Guo, Matt Jen-Yuan Chiang, Igor Santesteban et al.
Making Old Film Great Again: Degradation-aware State Space Model for Old Film Restoration
Yudong Mao, Hao Luo, Zhiwei Zhong et al.
Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos
Sagnik Majumder, Tushar Nagarajan, Ziad Al-Halah et al.
FilmComposer: LLM-Driven Music Production for Silent Film Clips
Zhifeng Xie, Qile He, Youjia Zhu et al.
Distilled Datamodel with Reverse Gradient Matching
Jingwen Ye, Ruonan Yu, Songhua Liu et al.
MuTri: Multi-view Tri-alignment for OCT to OCTA 3D Image Translation
zhuangzhuang chen, hualiang wang, Chubin Ou et al.
TutteNet: Injective 3D Deformations by Composition of 2D Mesh Deformations
Bo Sun, Thibault Groueix, Chen Song et al.
Pixel-aligned RGB-NIR Stereo Imaging and Dataset for Robot Vision
Jinneyong Kim, Seung-Hwan Baek
One-shot 3D Object Canonicalization based on Geometric and Semantic Consistency
Li Jin, Yujie Wang, Wenzheng Chen et al.
Differentiable Display Photometric Stereo
Seokjun Choi, Seungwoo Yoon, Giljoo Nam et al.
Enhanced Visual-Semantic Interaction with Tailored Prompts for Pedestrian Attribute Recognition
Junyi Wu, Yan Huang, Min Gao et al.
I2VGuard: Safeguarding Images against Misuse in Diffusion-based Image-to-Video Models
Dongnan Gui, Xun Guo, Wengang Zhou et al.
SceneCrafter: Controllable Multi-View Driving Scene Editing
Zehao Zhu, Yuliang Zou, Chiyu “Max” Jiang et al.
DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting
Liao Shen, Tianqi Liu, Huiqiang Sun et al.
ImViD: Immersive Volumetric Videos for Enhanced VR Engagement
Zhengxian Yang, Shi Pan, Shengqi Wang et al.
UrbanCAD: Towards Highly Controllable and Photorealistic 3D Vehicles for Urban Scene Simulation
Yichong Lu, Yichi Cai, Shangzhan Zhang et al.
Supervising Sound Localization by In-the-wild Egomotion
Anna Min, Ziyang Chen, Hang Zhao et al.
Towards Consistent Multi-Task Learning: Unlocking the Potential of Task-Specific Parameters
Xiaohan Qin, Xiaoxing Wang, Junchi Yan
Recognition-Synergistic Scene Text Editing
Zhengyao Fang, Pengyuan Lyu, Jingjing Wu et al.
RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression
Uri Gadot, Shie Mannor, Assaf Shocher et al.
MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation
Aviral Chharia, Wenbo Gou, Haoye Dong
Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation
Rohith Peddi, Saurabh ., Ayush Abhay Shrivastava et al.
HumanMM: Global Human Motion Recovery from Multi-shot Videos
Yuhong Zhang, Guanlin Wu, Ling-Hao Chen et al.
AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers
Jiazhi Guan, Kaisiyuan Wang, Zhiliang Xu et al.
NADER: Neural Architecture Design via Multi-Agent Collaboration
Zekang Yang, Wang ZENG, Sheng Jin et al.
A Flag Decomposition for Hierarchical Datasets
Nathan Mankovich, Ignacio Santamaria, Gustau Camps-Valls et al.
KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation
Ruida Zhang, Chenyangguang Zhang, Yan Di et al.
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration
Mingyang Song, Xiaoye Qu, Jiawei Zhou et al.
A Lightweight UDF Learning Framework for 3D Reconstruction Based on Local Shape Functions
Jiangbei Hu, Yanggeng Li, Fei Hou et al.
Pick-or-Mix: Dynamic Channel Sampling for ConvNets
Ashish Kumar, Daneul Kim, Jaesik Park et al.
Face Forgery Video Detection via Temporal Forgery Cue Unraveling
Zonghui Guo, YingJie Liu, Jie Zhang et al.
BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs
Zhantao Yang, Ruili Feng, Keyu Yan et al.
TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition
yilong wang, Zilin Gao, Qilong Wang et al.
Uncertainty Weighted Gradients for Model Calibration
Jinxu Lin, Linwei Tao, Minjing Dong et al.
Identity-Clothing Similarity Modeling for Unsupervised Clothing Change Person Re-Identification
Zhiqi Pang, Junjie Wang, Lingling Zhao et al.
Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects
Shalini Maiti, Lourdes Agapito, Filippos Kokkinos
Sensitivity-Aware Efficient Fine-Tuning via Compact Dynamic-Rank Adaptation
Tianran Chen, Jiarui Chen, Baoquan Zhang et al.
Less Attention is More: Prompt Transformer for Generalized Category Discovery
Wei Zhang, Baopeng Zhang, Zhu Teng et al.
DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos
Zijia Lu, ASM Iftekhar, Gaurav Mittal et al.
FeedEdit: Text-Based Image Editing with Dynamic Feedback Regulation
Fengyi Fu, Lei Zhang, Mengqi Huang et al.
Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models
Davide Berasi, Matteo Farina, Massimiliano Mancini et al.
Preserve or Modify? Context-Aware Evaluation for Balancing Preservation and Modification in Text-Guided Image Editing
Yoonjeon Kim, Soohyun Ryu, Yeonsung Jung et al.
Leak and Learn: An Attacker's Cookbook to Train Using Leaked Data from Federated Learning
Joshua C. Zhao, Ahaan Dabholkar, Atul Sharma et al.
BrepGiff: Lightweight Generation of Complex B-rep with 3D GAT Diffusion
Hao Guo, Xiaoshui Huang, Hao jiacheng et al.
Resilient Sensor Fusion Under Adverse Sensor Failures via Multi-Modal Expert Fusion
Konyul Park, Yecheol Kim, Daehun Kim et al.
PS-Diffusion: Photorealistic Subject-Driven Image Editing with Disentangled Control and Attention
Weicheng Wang, Guoli Jia, Zhongqi Zhang et al.
Hierarchical Features Matter: A Deep Exploration of Progressive Parameterization Method for Dataset Distillation
Xinhao Zhong, Hao Fang, Bin Chen et al.
DriveScape: High-Resolution Driving Video Generation by Multi-View Feature Fusion
Wei Wu, Xi Guo, Weixuan TANG et al.
SonicVisionLM: Playing Sound with Vision Language Models
Zhifeng Xie, Shengye Yu, Qile He et al.
Leveraging SD Map to Augment HD Map-based Trajectory Prediction
Zhiwei Dong, Ran Ding, Wei Li et al.
SUM Parts: Benchmarking Part-Level Semantic Segmentation of Urban Meshes
Weixiao Gao, Liangliang Nan, Hugo Ledoux
VODiff: Controlling Object Visibility Order in Text-to-Image Generation
Dong Liang, Jinyuan Jia, Yuhao Liu et al.
Self-Supervised Dual Contouring
Ramana Sundararaman, Roman Klokov, Maks Ovsjanikov
Optimizing for the Shortest Path in Denoising Diffusion Model
Ping Chen, Xingpeng Zhang, Zhaoxiang Liu et al.
ICP: Immediate Compensation Pruning for Mid-to-high Sparsity
Xin Luo, Fu Xueming, Zihang Jiang et al.
Wavelet and Prototype Augmented Query-based Transformer for Pixel-level Surface Defect Detection
Feng Yan, Xiaoheng Jiang, Yang Lu et al.
Believing is Seeing: Unobserved Object Detection using Generative Models
Subhransu S. Bhattacharjee, Dylan Campbell, Rahul Shome
Your Scale Factors are My Weapon: Targeted Bit-Flip Attacks on Vision Transformers via Scale Factor Manipulation
Jialai Wang, Yuxiao Wu, Weiye Xu et al.
In-distribution Public Data Synthesis with Diffusion Models for Differentially Private Image Classification
Jinseong Park, Yujin Choi, Jaewook Lee
GPAvatar: High-fidelity Head Avatars by Learning Efficient Gaussian Projections
Weiqi Feng, Dong Han, Zekang Zhou et al.
Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation
Hao Li, Ju Dai, Xin Zhao et al.
Charm: The Missing Piece in ViT Fine-Tuning for Image Aesthetic Assessment
Fatemeh Behrad, Tinne Tuytelaars, Johan Wagemans
ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices
Hao Yu, Tangyu Jiang, Shuning Jia et al.
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Yuanchen Wu, Lu Zhang, Hang Yao et al.
UNICL-SAM: Uncertainty-Driven In-Context Segmentation with Part Prototype Discovery
Dianmo Sheng, Dongdong Chen, Zhentao Tan et al.
PI-HMR: Towards Robust In-bed Temporal Human Shape Reconstruction with Contact Pressure Sensing
Ziyu Wu, Yufan Xiong, Mengting Niu et al.
Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning
Jiuyang Dong, Junjun Jiang, Kui Jiang et al.
TimeTracker: Event-based Continuous Point Tracking for Video Frame Interpolation with Non-linear Motion
Haoyue Liu, Jinghan Xu, Yi Chang et al.
Image Referenced Sketch Colorization Based on Animation Creation Workflow
Dingkun Yan, Xinrui Wang, Zhuoru Li et al.
HoGS: Unified Near and Far Object Reconstruction via Homogeneous Gaussian Splatting
Xinpeng Liu, Zeyi Huang, Fumio Okura et al.
Parameterized Blur Kernel Prior Learning for Local Motion Deblurring
Zhenxuan Fang, Fangfang Wu, Tao Huang et al.
EDCFlow: Exploring Temporally Dense Difference Maps for Event-based Optical Flow Estimation
Daikun Liu, Lei Cheng, Teng Wang et al.
OODD: Test-time Out-of-Distribution Detection with Dynamic Dictionary
Yifeng Yang, Lin Zhu, Zewen Sun et al.
Efficient Long Video Tokenization via Coordinate-based Patch Reconstruction
Huiwon Jang, Sihyun Yu, Jinwoo Shin et al.
Gyro-based Neural Single Image Deblurring
Heemin Yang, Jaesung Rim, Seungyong Lee et al.
UniPhy: Learning a Unified Constitutive Model for Inverse Physics Simulation
Himangi Mittal, Peiye Zhuang, Hsin-Ying Lee et al.
FedCALM: Conflict-aware Layer-wise Mitigation for Selective Aggregation in Deeper Personalized Federated Learning
Hao Zheng, Zhigang Hu, Boyu Wang et al.
FIction: 4D Future Interaction Prediction from Video
Kumar Ashutosh, Georgios Pavlakos, Kristen Grauman
What Makes a Good Dataset for Knowledge Distillation?
Logan Frank, Jim Davis
Six-CD: Benchmarking Concept Removals for Text-to-image Diffusion Models
Jie Ren, Kangrui Chen, Yingqian Cui et al.
Instantaneous Perception of Moving Objects in 3D
Di Liu, Bingbing Zhuang, Dimitris N. Metaxas et al.
Bridge the Gap: From Weak to Full Supervision for Temporal Action Localization with PseudoFormer
Ziyi Liu, Yangcen Liu
Balanced Rate-Distortion Optimization in Learned Image Compression
Yichi Zhang, Zhihao Duan, Yuning Huang et al.
Tightening Robustness Verification of MaxPool-based Neural Networks via Minimizing the Over-Approximation Zone
Yuan Xiao, Yuchen Chen, Shiqing Ma et al.
Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation
Yuan Gan, Jiaxu Miao, Yunze Wang et al.
Detecting Adversarial Data Using Perturbation Forgery
Qian Wang, Chen Li, Yuchen Luo et al.
GO-N3RDet: Geometry Optimized NeRF-enhanced 3D Object Detector
Zechuan Li, Hongshan Yu, Yihao Ding et al.
Annotation Ambiguity Aware Semi-Supervised Medical Image Segmentation
Suruchi Kumari, Pravendra Singh
Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering
Wenlong Fang, Qiaofeng Wu, Jing Chen et al.
A Hubness Perspective on Representation Learning for Graph-Based Multi-View Clustering
Zheming Xu, He Liu, Congyan Lang et al.
BWFormer: Building Wireframe Reconstruction from Airborne LiDAR Point Cloud with Transformer
Yuzhou Liu, Lingjie Zhu, Hanqiao Ye et al.
DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution
Yuzhong Zhao, Feng Liu, Yue Liu et al.
A Tale of Two Classes: Adapting Supervised Contrastive Learning to Binary Imbalanced Datasets
David Mildenberger, Paul Hager, Daniel Rueckert et al.
SLVR: Super-Light Visual Reconstruction via Blueprint Controllable Convolutions and Exploring Feature Diversity Representation
Ning Ni, Libao Zhang
Vision-Language Embodiment for Monocular Depth Estimation
Jinchang Zhang, Guoyu Lu
All Rivers Run to the Sea: Private Learning with Asymmetric Flows
Yue Niu, Ramy E. Ali, Saurav Prakash et al.
WeakMCN: Multi-task Collaborative Network for Weakly Supervised Referring Expression Comprehension and Segmentation
Silin Cheng, Yang Liu, Xinwei He et al.
Are Spatial-Temporal Graph Convolution Networks for Human Action Recognition Over-Parameterized?
Jianyang Xie, Yitian Zhao, Yanda Meng et al.
Differentiable Inverse Rendering with Interpretable Basis BRDFs
Hoon-Gyu Chung, Seokjun Choi, Seung-Hwan Baek
Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion
Songsong Yu, Yuxin Chen, Zhongang Qi et al.
Masking meets Supervision: A Strong Learning Alliance
Byeongho Heo, Taekyung Kim, Sangdoo Yun et al.
See Further When Clear: Curriculum Consistency Model
Yunpeng Liu, Boxiao Liu, Yi Zhang et al.
Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding
Han Xiao, yina xie, Guanxin tan et al.
ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models
Yahan Tu, Rui Hu, Jitao Sang
Olympus: A Universal Task Router for Computer Vision Tasks
Yuanze Lin, Yunsheng Li, Dongdong Chen et al.
Articulated Kinematics Distillation from Video Diffusion Models
Xuan Li, Qianli Ma, Tsung-Yi Lin et al.
D2SP: Dynamic Dual-Stage Purification Framework for Dual Noise Mitigation in Vision-based Affective Recognition.
Haoran Wang, Xinji Mai, Zeng Tao et al.
Zero-shot RGB-D Point Cloud Registration with Pre-trained Large Vision Model
Haobo Jiang, Jin Xie, Jian Yang et al.
Selective Interpretable and Motion Consistent Privacy Attribute Obfuscation for Action Recognition
Filip Ilic, He Zhao, Thomas Pock et al.
Distilling Spatially-Heterogeneous Distortion Perception for Blind Image Quality Assessment
Xudong Li, Wenjie Nie, Yan Zhang et al.
OW-OVD: Unified Open World and Open Vocabulary Object Detection
Xing Xi, Yangyang Huang, Ronghua Luo et al.
Adaptive Dropout: Unleashing Dropout across Layers for Generalizable Image Super-Resolution
Hang Xu, Jie Huang, Wei Yu et al.
AeSPa : Attention-guided Self-supervised Parallel Imaging for MRI Reconstruction
Jinho Joo, Hyeseong Kim, Hyeyeon Won et al.
TopNet: Transformer-Efficient Occupancy Prediction Network for Octree-Structured Point Cloud Geometry Compression
Xinjie Wang, Yifan Zhang, Ting Liu et al.
Improve Representation for Imbalanced Regression through Geometric Constraints
Zijian Dong, Yilei Wu, Chongyao Chen et al.
ControlFace: Harnessing Facial Parametric Control for Face Rigging
Wooseok Jang, Youngjun Hong, Geonho Cha et al.
Compositional Caching for Training-free Open-vocabulary Attribute Detection
Marco Garosi, Alessandro Conti, Gaowen Liu et al.
Shading Meets Motion: Self-supervised Indoor 3D Reconstruction Via Simultaneous Shape-from-Shading and Structure-from-Motion
Guoyu Lu
DocSAM: Unified Document Image Segmentation via Query Decomposition and Heterogeneous Mixed Learning
Xiao-Hui Li, Fei Yin, Cheng-Lin Liu
NSD-Imagery: A Benchmark Dataset for Extending fMRI Vision Decoding Methods to Mental Imagery
Reese Kneeland, Paul Scotti, Ghislain St-Yves et al.
CustomKD: Customizing Large Vision Foundation for Edge Model Improvement via Knowledge Distillation
Jungsoo Lee, Debasmit Das, Munawar Hayat et al.
Towards Understanding How Knowledge Evolves in Large Vision-Language Models
Sudong Wang, Yunjian Zhang, Yao Zhu et al.
Cheb-GR: Rethinking K-nearest Neighbor Search in Re-ranking for Person Re-identification
Jinxi Yang, He Li, Bo Du et al.
Efficient Video Super-Resolution for Real-time Rendering with Decoupled G-buffer Guidance
Mingjun Zheng, Long Sun, Jiangxin Dong et al.
Positive2Negative: Breaking the Information-Lossy Barrier in Self-Supervised Single Image Denoising
Tong Li, Lizhi Wang, Zhiyuan Xu et al.
Rethinking Decoder Design: Improving Biomarker Segmentation Using Depth-to-Space Restoration and Residual Linear Attention
Saad Wazir, Daeyoung Kim
CoSpace: Benchmarking Continuous Space Perception Ability for Vision-Language Models
Yiqi Zhu, Ziyue Wang, Can Zhang et al.
RivuletMLP: An MLP-based Architecture for Efficient Compressed Video Quality Enhancement
Gang He, Weiran Wang, Guancheng Quan et al.
Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration
Junyuan Deng, Xinyi Wu, Yongxing Yang et al.
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
Fida Mohammad Thoker, Letian Jiang, Chen Zhao et al.
Spherical Manifold Guided Diffusion Model for Panoramic Image Generation
Xiancheng Sun, Mai Xu, Shengxi Li et al.
Anatomical Consistency and Adaptive Prior-informed Transformation for Multi-contrast MR Image Synthesis via Diffusion Model
Yejee Shin, Yeeun Lee, Hanbyol Jang et al.
SMTPD: A New Benchmark for Temporal Prediction of Social Media Popularity
Yijie Xu, Bolun Zheng, Wei Zhu et al.
Decouple Distortion from Perception: Region Adaptive Diffusion for Extreme-low Bitrate Perception Image Compression
Jinchang Xu, Shaokang Wang, Jintao Chen et al.
Towards Efficient Foundation Model for Zero-shot Amodal Segmentation
Zhaochen Liu, Limeng Qiao, Xiangxiang Chu et al.
FedSPA: Generalizable Federated Graph Learning under Homophily Heterogeneity
Zihan Tan, Guancheng Wan, Wenke Huang et al.
WALT3D: Generating Realistic Training Data from Time-Lapse Imagery for Reconstructing Dynamic Objects Under Occlusion
Khiem Vuong, N. Dinesh Reddy, Robert Tamburo et al.
Nearly Zero-Cost Protection Against Mimicry by Personalized Diffusion Models
Namhyuk Ahn, KiYoon Yoo, Wonhyuk Ahn et al.
Learning Phase Distortion with Selective State Space Models for Video Turbulence Mitigation
Xingguang Zhang, Nicholas M Chimitt, Xijun Wang et al.
FiRe: Fixed-points of Restoration Priors for Solving Inverse Problems
Matthieu Terris, Ulugbek Kamilov, Thomas Moreau
Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture
Xuanchen Li, Jianyu Wang, Yuhao Cheng et al.
Infighting in the Dark: Multi-Label Backdoor Attack in Federated Learning
Ye Li, Yanchao Zhao, chengcheng zhu et al.
DiffCAM: Data-Driven Saliency Maps by Capturing Feature Differences
Xingjian Li, Qiming Zhao, Neelesh Bisht et al.
SynTab-LLaVA: Enhancing Multimodal Table Understanding with Decoupled Synthesis
Bangbang Zhou, Zuan Gao, Zixiao Wang et al.
Beyond Human Perception: Understanding Multi-Object World from Monocular View
Keyu Guo, Yongle Huang, Shijie Sun et al.
Hybrid Concept Bottleneck Models
Yang Liu, Tianwei Zhang, Shi Gu
Optimal Transport-Guided Source-Free Adaptation for Face Anti-Spoofing
Zhuowei Li, Tianchen Zhao, Xiang Xu et al.
CorrBEV: Multi-View 3D Object Detection by Correlation Learning with Multi-modal Prototypes
ziteng xue, Mingzhe Guo, Heng Fan et al.
BOE-ViT: Boosting Orientation Estimation with Equivariance in Self-Supervised 3D Subtomogram Alignment
Runmin Jiang, Jackson Daggett, Shriya Pingulkar et al.
PoseBH: Prototypical Multi-Dataset Training Beyond Human Pose Estimation
Uyoung Jeong, Jonathan Freer, Seungryul Baek et al.
MDP: Multidimensional Vision Model Pruning with Latency Constraint
Xinglong Sun, Barath Lakshmanan, Maying Shen et al.
Hyperbolic Uncertainty-Aware Few-Shot Incremental Point Cloud Segmentation
Tanuj Sur, Samrat Mukherjee, Kaizer Rahaman et al.
Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear Transformation
Seokil Ham, Hee-Seon Kim, Sangmin Woo et al.
Compass Control: Multi Object Orientation Control for Text-to-Image Generation
Rishubh Parihar, Vaibhav Agrawal, Sachidanand VS et al.
Perceptual Video Compression with Neural Wrapping
Muhammad Umar Karim Khan, Aaron Chadha, Mohammad Ashraful Anam et al.
DH-Set: Improving Vision-Language Alignment with Diverse and Hybrid Set-Embeddings Learning
Kun Zhang, Jingyu Li, Zhe Li et al.
Exploring Semantic Feature Discrimination for Perceptual Image Super-Resolution and Opinion-Unaware No-Reference Image Quality Assessment
Guanglu Dong, Xiangyu Liao, Mingyang Li et al.
Three Cars Approaching within 100m! Enhancing Distant Geometry by Tri-Axis Voxel Scanning for Camera-based Semantic Scene Completion
Jongseong Bae, Junwoo Ha, Ha Young Kim
LOCORE: Image Re-ranking with Long-Context Sequence Modeling
Zilin Xiao, Pavel Suma, Ayush Sachdeva et al.
Sample- and Parameter-Efficient Auto-Regressive Image Models
Elad Amrani, Leonid Karlinsky, Alex M. Bronstein
Deep Fair Multi-View Clustering with Attention KAN
HaiMing Xu, Qianqian Wang, Boyue Wang et al.
Distinguish Then Exploit: Source-free Open Set Domain Adaptation via Weight Barcode Estimation and Sparse Label Assignment
Weiming Liu, Jun Dan, Fan Wang et al.
Rethinking Token Reduction with Parameter-Efficient Fine-Tuning in ViT for Pixel-Level Tasks
Cheng Lei, Ao Li, Hu Yao et al.
Reasoning Mamba: Hypergraph-Guided Region Relation Calculating for Weakly Supervised Affordance Grounding
Yuxuan Wang, Aming Wu, Muli Yang et al.