Most Cited CVPR "unknown object discovery" Papers
5,589 papers found • Page 13 of 28
Conference
ATA: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting
Yizhe Tang, Zhimin Sun, Yuzhen Du et al.
PMA: Towards Parameter-Efficient Point Cloud Understanding via Point Mamba Adapter
Yaohua Zha, Yanzi Wang, Hang Guo et al.
4D-Fly: Fast 4D Reconstruction from a Single Monocular Video
Diankun Wu, Fangfu Liu, Yi-Hsin Hung et al.
Scaling Down Text Encoders of Text-to-Image Diffusion Models
Lifu Wang, Daqing Liu, Xinchen Liu et al.
Simpler Diffusion: 1.5 FID on ImageNet512 with Pixel-space Diffusion
Emiel Hoogeboom, Thomas Mensink, Jonathan Heek et al.
PolarMatte: Fully Computational Ground-Truth-Quality Alpha Matte Extraction for Images and Video using Polarized Screen Matting
Kenji Enomoto, TJ Rhodes, Brian Price et al.
FFR: Frequency Feature Rectification for Weakly Supervised Semantic Segmentation
Ziqian Yang, Xinqiao Zhao, Xiaolei Wang et al.
Action Detail Matters: Refining Video Recognition with Local Action Queries
Mengmeng Wang, Zeyi Huang, Xiangjie Kong et al.
From Laboratory to Real World: A New Benchmark Towards Privacy-Preserved Visible-Infrared Person Re-Identification
Yan Jiang, Hao Yu, Xu Cheng et al.
LibraGrad: Balancing Gradient Flow for Universally Better Vision Transformer Attributions
Faridoun Mehri, Mahdieh Baghshah, Mohammad Taher Pilehvar
Referring Expression Counting
Siyang Dai, Jun Liu, Ngai-Man Cheung
DFM: Differentiable Feature Matching for Anomaly Detection
Wu Sheng, Yimi Wang, Xudong Liu et al.
JDEC: JPEG Decoding via Enhanced Continuous Cosine Coefficients
Woo Kyoung Han, Sunghoon Im, Jaedeok Kim et al.
EvDiG: Event-guided Direct and Global Components Separation
xinyu zhou, Peiqi Duan, Boyu Li et al.
LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point Clouds
Zihui Zhang, Weisheng Dai, Hongtao Wen et al.
Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation
Gianni Franchi, Nacim Belkhir, Dat NGUYEN et al.
Conditional Balance: Improving Multi-Conditioning Trade-Offs in Image Generation
Nadav Z. Cohen, Oron Nir, Ariel Shamir
On the Generalization of Handwritten Text Recognition Models
Carlos Garrido-Munoz, Jorge Calvo-Zaragoza
StickMotion: Generating 3D Human Motions by Drawing a Stickman
Tao Wang, Zhihua Wu, Qiaozhi He et al.
A Regularization-Guided Equivariant Approach for Image Restoration
Yulu Bai, Jiahong Fu, Qi Xie et al.
SFDM: Robust Decomposition of Geometry and Reflectance for Realistic Face Rendering from Sparse-view Images
Daisheng Jin, Jiangbei Hu, Baixin Xu et al.
Finsler Multi-Dimensional Scaling: Manifold Learning for Asymmetric Dimensionality Reduction and Embedding
Thomas Dagès, Simon Weber, Ya-Wei Eileen Lin et al.
UniPTS: A Unified Framework for Proficient Post-Training Sparsity
JingJing Xie, Yuxin Zhang, Mingbao Lin et al.
Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion
Zhenglin Zhou, Fan Ma, Hehe Fan et al.
FADE: Frequency-Aware Diffusion Model Factorization for Video Editing
Yixuan Zhu, Haolin Wang, Shilin Ma et al.
A3: Few-shot Prompt Learning of Unlearnable Examples with Cross-Modal Adversarial Feature Alignment
Xuan Wang, Xitong Gao, Dongping Liao et al.
HybridMQA: Exploring Geometry-Texture Interactions for Colored Mesh Quality Assessment
Armin Shafiee Sarvestani, Sheyang Tang, Zhou Wang
MaRI: Material Retrieval Integration across Domains
Jianhui Wang, Zhifei Yang, Yangfan He et al.
SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion
Xiyue Guo, Jiarui Hu, Junjie Hu et al.
Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency
Yutong Wang, Jiajie Teng, Jiajiong Cao et al.
PoseTraj: Pose-Aware Trajectory Control in Video Diffusion
longbin ji, Lei Zhong, Pengfei Wei et al.
EigenGS Representation: From Eigenspace to Gaussian Image Space
LO-WEI TAI, Ching-En Ching En, Li et al.
IM-Zero: Instance-level Motion Controllable Video Generation in a Zero-shot Manner
Yuyang Huang, Yabo Chen, Li Ding et al.
FaceCom: Towards High-fidelity 3D Facial Shape Completion via Optimization and Inpainting Guidance
Yinglong Li, Hongyu Wu, Wang et al.
Watermarking One for All: A Robust Watermarking Scheme Against Partial Image Theft
Gaozhi Liu, Silu Cao, Zhenxing Qian et al.
Is `Right' Right? Enhancing Object Orientation Understanding in Multimodal Large Language Models through Egocentric Instruction Tuning
JiHyeok Jung, EunTae Kim, SeoYeon Kim et al.
Decoder Gradient Shield: Provable and High-Fidelity Prevention of Gradient-Based Box-Free Watermark Removal
Haonan An, Guang Hua, Zhengru Fang et al.
Learning from Synchronization: Self-Supervised Uncalibrated Multi-View Person Association in Challenging Scenes
Keqi Chen, vinkle srivastav, Didier MUTTER et al.
Decoupling Training-Free Guided Diffusion by ADMM
Youyuan Zhang, Zehua Liu, Zenan Li et al.
CMMLoc: Advancing Text-to-PointCloud Localization with Cauchy-Mixture-Model Based Framework
Yanlong Xu, Haoxuan Qu, Jun Liu et al.
Fine-Grained Image-Text Correspondence with Cost Aggregation for Open-Vocabulary Part Segmentation
Jiho Choi, Seonho Lee, Minhyun Lee et al.
Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval
Mankeerat Sidhu, Hetarth Chopra, Ansel Blume et al.
Classifier-to-Bias: Toward Unsupervised Automatic Bias Detection for Visual Classifiers
Quentin Guimard, Moreno D'Incà, Massimiliano Mancini et al.
Multi-focal Conditioned Latent Diffusion for Person Image Synthesis
Jiaqi Liu, Jichao Zhang, Paolo Rota et al.
Diffusion-based Realistic Listening Head Generation via Hybrid Motion Modeling
Yinuo Wang, Yanbo Fan, Xuan Wang et al.
Zero-Shot Blind-spot Image Denoising via Implicit Neural Sampling
Yuhui Quan, Tianxiang Zheng, Zhiyuan Ma et al.
TailedCore: Few-Shot Sampling for Unsupervised Long-Tail Noisy Anomaly Detection
Yoon Gyo Jung, Jaewoo Park, Jaeho Yoon et al.
Previously on ... From Recaps to Story Summarization
Aditya Kumar Singh, Dhruv Srivastava, Makarand Tapaswi
DOF-GS: Adjustable Depth-of-Field 3D Gaussian Splatting for Post-Capture Refocusing, Defocus Rendering and Blur Removal
Yujie Wang, Praneeth Chakravarthula, Baoquan Chen
3D-AVS: LiDAR-based 3D Auto-Vocabulary Segmentation
Weijie Wei, Osman Ülger, Fatemeh Karimi Nejadasl et al.
Degrees of Freedom Matter: Inferring Dynamics from Point Trajectories
Yan Zhang, Sergey Prokudin, Marko Mihajlovic et al.
Scale Efficient Training for Large Datasets
Qing Zhou, Junyu Gao, Qi Wang
Self-Supervised Dual Contouring
Ramana Sundararaman, Roman Klokov, Maks Ovsjanikov
Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning
Maosen Zhao, Pengtao Chen, Chong Yu et al.
Empowering Large Language Models with 3D Situation Awareness
Zhihao Yuan, Yibo Peng, Jinke Ren et al.
Learnable Infinite Taylor Gaussian for Dynamic View Rendering
Bingbing Hu, Yanyan Li, rui xie et al.
ArcPro: Architectural Programs for Structured 3D Abstraction of Sparse Points
Qirui Huang, Runze Zhang, Kangjun Liu et al.
Reanimating Images using Neural Representations of Dynamic Stimuli
Jacob Yeung, Andrew Luo, Gabriel Sarch et al.
GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs
Yi Fang, Bowen Jin, Jiacheng Shen et al.
Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmonization
Junying Wang, Jingyuan Liu, Xin Sun et al.
PGC: Physics-Based Gaussian Cloth from a Single Pose
Michelle Guo, Matt Jen-Yuan Chiang, Igor Santesteban et al.
Distilled Datamodel with Reverse Gradient Matching
Jingwen Ye, Ruonan Yu, Songhua Liu et al.
Making Old Film Great Again: Degradation-aware State Space Model for Old Film Restoration
Yudong Mao, Hao Luo, Zhiwei Zhong et al.
Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos
Sagnik Majumder, Tushar Nagarajan, Ziad Al-Halah et al.
TutteNet: Injective 3D Deformations by Composition of 2D Mesh Deformations
Bo Sun, Thibault Groueix, Chen Song et al.
FilmComposer: LLM-Driven Music Production for Silent Film Clips
Zhifeng Xie, Qile He, Youjia Zhu et al.
MuTri: Multi-view Tri-alignment for OCT to OCTA 3D Image Translation
zhuangzhuang chen, hualiang wang, Chubin Ou et al.
Pixel-aligned RGB-NIR Stereo Imaging and Dataset for Robot Vision
Jinneyong Kim, Seung-Hwan Baek
Differentiable Display Photometric Stereo
Seokjun Choi, Seungwoo Yoon, Giljoo Nam et al.
One-shot 3D Object Canonicalization based on Geometric and Semantic Consistency
Li Jin, Yujie Wang, Wenzheng Chen et al.
Enhanced Visual-Semantic Interaction with Tailored Prompts for Pedestrian Attribute Recognition
Junyi Wu, Yan Huang, Min Gao et al.
I2VGuard: Safeguarding Images against Misuse in Diffusion-based Image-to-Video Models
Dongnan Gui, Xun Guo, Wengang Zhou et al.
SceneCrafter: Controllable Multi-View Driving Scene Editing
Zehao Zhu, Yuliang Zou, Chiyu “Max” Jiang et al.
DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting
Liao Shen, Tianqi Liu, Huiqiang Sun et al.
ImViD: Immersive Volumetric Videos for Enhanced VR Engagement
Zhengxian Yang, Shi Pan, Shengqi Wang et al.
UrbanCAD: Towards Highly Controllable and Photorealistic 3D Vehicles for Urban Scene Simulation
Yichong Lu, Yichi Cai, Shangzhan Zhang et al.
Supervising Sound Localization by In-the-wild Egomotion
Anna Min, Ziyang Chen, Hang Zhao et al.
Towards Consistent Multi-Task Learning: Unlocking the Potential of Task-Specific Parameters
Xiaohan Qin, Xiaoxing Wang, Junchi Yan
Recognition-Synergistic Scene Text Editing
Zhengyao Fang, Pengyuan Lyu, Jingjing Wu et al.
RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression
Uri Gadot, Shie Mannor, Assaf Shocher et al.
MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation
Aviral Chharia, Wenbo Gou, Haoye Dong
KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation
Ruida Zhang, Chenyangguang Zhang, Yan Di et al.
Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation
Rohith Peddi, Saurabh ., Ayush Abhay Shrivastava et al.
HumanMM: Global Human Motion Recovery from Multi-shot Videos
Yuhong Zhang, Guanlin Wu, Ling-Hao Chen et al.
AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers
Jiazhi Guan, Kaisiyuan Wang, Zhiliang Xu et al.
NADER: Neural Architecture Design via Multi-Agent Collaboration
Zekang Yang, Wang ZENG, Sheng Jin et al.
A Flag Decomposition for Hierarchical Datasets
Nathan Mankovich, Ignacio Santamaria, Gustau Camps-Valls et al.
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration
Mingyang Song, Xiaoye Qu, Jiawei Zhou et al.
Pick-or-Mix: Dynamic Channel Sampling for ConvNets
Ashish Kumar, Daneul Kim, Jaesik Park et al.
A Lightweight UDF Learning Framework for 3D Reconstruction Based on Local Shape Functions
Jiangbei Hu, Yanggeng Li, Fei Hou et al.
Face Forgery Video Detection via Temporal Forgery Cue Unraveling
Zonghui Guo, YingJie Liu, Jie Zhang et al.
BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs
Zhantao Yang, Ruili Feng, Keyu Yan et al.
TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition
yilong wang, Zilin Gao, Qilong Wang et al.
Uncertainty Weighted Gradients for Model Calibration
Jinxu Lin, Linwei Tao, Minjing Dong et al.
Identity-Clothing Similarity Modeling for Unsupervised Clothing Change Person Re-Identification
Zhiqi Pang, Junjie Wang, Lingling Zhao et al.
Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects
Shalini Maiti, Lourdes Agapito, Filippos Kokkinos
Sensitivity-Aware Efficient Fine-Tuning via Compact Dynamic-Rank Adaptation
Tianran Chen, Jiarui Chen, Baoquan Zhang et al.
Less Attention is More: Prompt Transformer for Generalized Category Discovery
Wei Zhang, Baopeng Zhang, Zhu Teng et al.
DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos
Zijia Lu, ASM Iftekhar, Gaurav Mittal et al.
SonicVisionLM: Playing Sound with Vision Language Models
Zhifeng Xie, Shengye Yu, Qile He et al.
FeedEdit: Text-Based Image Editing with Dynamic Feedback Regulation
Fengyi Fu, Lei Zhang, Mengqi Huang et al.
Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models
Davide Berasi, Matteo Farina, Massimiliano Mancini et al.
Preserve or Modify? Context-Aware Evaluation for Balancing Preservation and Modification in Text-Guided Image Editing
Yoonjeon Kim, Soohyun Ryu, Yeonsung Jung et al.
BrepGiff: Lightweight Generation of Complex B-rep with 3D GAT Diffusion
Hao Guo, Xiaoshui Huang, Hao jiacheng et al.
Resilient Sensor Fusion Under Adverse Sensor Failures via Multi-Modal Expert Fusion
Konyul Park, Yecheol Kim, Daehun Kim et al.
PS-Diffusion: Photorealistic Subject-Driven Image Editing with Disentangled Control and Attention
Weicheng Wang, Guoli Jia, Zhongqi Zhang et al.
Hierarchical Features Matter: A Deep Exploration of Progressive Parameterization Method for Dataset Distillation
Xinhao Zhong, Hao Fang, Bin Chen et al.
DriveScape: High-Resolution Driving Video Generation by Multi-View Feature Fusion
Wei Wu, Xi Guo, Weixuan TANG et al.
Leveraging SD Map to Augment HD Map-based Trajectory Prediction
Zhiwei Dong, Ran Ding, Wei Li et al.
SUM Parts: Benchmarking Part-Level Semantic Segmentation of Urban Meshes
Weixiao Gao, Liangliang Nan, Hugo Ledoux
VODiff: Controlling Object Visibility Order in Text-to-Image Generation
Dong Liang, Jinyuan Jia, Yuhao Liu et al.
Optimizing for the Shortest Path in Denoising Diffusion Model
Ping Chen, Xingpeng Zhang, Zhaoxiang Liu et al.
ICP: Immediate Compensation Pruning for Mid-to-high Sparsity
Xin Luo, Fu Xueming, Zihang Jiang et al.
Wavelet and Prototype Augmented Query-based Transformer for Pixel-level Surface Defect Detection
Feng Yan, Xiaoheng Jiang, Yang Lu et al.
Believing is Seeing: Unobserved Object Detection using Generative Models
Subhransu S. Bhattacharjee, Dylan Campbell, Rahul Shome
Your Scale Factors are My Weapon: Targeted Bit-Flip Attacks on Vision Transformers via Scale Factor Manipulation
Jialai Wang, Yuxiao Wu, Weiye Xu et al.
In-distribution Public Data Synthesis with Diffusion Models for Differentially Private Image Classification
Jinseong Park, Yujin Choi, Jaewook Lee
Instantaneous Perception of Moving Objects in 3D
Di Liu, Bingbing Zhuang, Dimitris N. Metaxas et al.
GPAvatar: High-fidelity Head Avatars by Learning Efficient Gaussian Projections
Weiqi Feng, Dong Han, Zekang Zhou et al.
Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation
Hao Li, Ju Dai, Xin Zhao et al.
Charm: The Missing Piece in ViT Fine-Tuning for Image Aesthetic Assessment
Fatemeh Behrad, Tinne Tuytelaars, Johan Wagemans
ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices
Hao Yu, Tangyu Jiang, Shuning Jia et al.
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Yuanchen Wu, Lu Zhang, Hang Yao et al.
UNICL-SAM: Uncertainty-Driven In-Context Segmentation with Part Prototype Discovery
Dianmo Sheng, Dongdong Chen, Zhentao Tan et al.
PI-HMR: Towards Robust In-bed Temporal Human Shape Reconstruction with Contact Pressure Sensing
Ziyu Wu, Yufan Xiong, Mengting Niu et al.
Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning
Jiuyang Dong, Junjun Jiang, Kui Jiang et al.
TimeTracker: Event-based Continuous Point Tracking for Video Frame Interpolation with Non-linear Motion
Haoyue Liu, Jinghan Xu, Yi Chang et al.
Image Referenced Sketch Colorization Based on Animation Creation Workflow
Dingkun Yan, Xinrui Wang, Zhuoru Li et al.
HoGS: Unified Near and Far Object Reconstruction via Homogeneous Gaussian Splatting
Xinpeng Liu, Zeyi Huang, Fumio Okura et al.
Parameterized Blur Kernel Prior Learning for Local Motion Deblurring
Zhenxuan Fang, Fangfang Wu, Tao Huang et al.
EDCFlow: Exploring Temporally Dense Difference Maps for Event-based Optical Flow Estimation
Daikun Liu, Lei Cheng, Teng Wang et al.
OODD: Test-time Out-of-Distribution Detection with Dynamic Dictionary
Yifeng Yang, Lin Zhu, Zewen Sun et al.
Gyro-based Neural Single Image Deblurring
Heemin Yang, Jaesung Rim, Seungyong Lee et al.
UniPhy: Learning a Unified Constitutive Model for Inverse Physics Simulation
Himangi Mittal, Peiye Zhuang, Hsin-Ying Lee et al.
FedCALM: Conflict-aware Layer-wise Mitigation for Selective Aggregation in Deeper Personalized Federated Learning
Hao Zheng, Zhigang Hu, Boyu Wang et al.
All Rivers Run to the Sea: Private Learning with Asymmetric Flows
Yue Niu, Ramy E. Ali, Saurav Prakash et al.
Tightening Robustness Verification of MaxPool-based Neural Networks via Minimizing the Over-Approximation Zone
Yuan Xiao, Yuchen Chen, Shiqing Ma et al.
FIction: 4D Future Interaction Prediction from Video
Kumar Ashutosh, Georgios Pavlakos, Kristen Grauman
What Makes a Good Dataset for Knowledge Distillation?
Logan Frank, Jim Davis
Six-CD: Benchmarking Concept Removals for Text-to-image Diffusion Models
Jie Ren, Kangrui Chen, Yingqian Cui et al.
Bridge the Gap: From Weak to Full Supervision for Temporal Action Localization with PseudoFormer
Ziyi Liu, Yangcen Liu
Balanced Rate-Distortion Optimization in Learned Image Compression
Yichi Zhang, Zhihao Duan, Yuning Huang et al.
Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation
Yuan Gan, Jiaxu Miao, Yunze Wang et al.
Selective Interpretable and Motion Consistent Privacy Attribute Obfuscation for Action Recognition
Filip Ilic, He Zhao, Thomas Pock et al.
Detecting Adversarial Data Using Perturbation Forgery
Qian Wang, Chen Li, Yuchen Luo et al.
Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering
Wenlong Fang, Qiaofeng Wu, Jing Chen et al.
GO-N3RDet: Geometry Optimized NeRF-enhanced 3D Object Detector
Zechuan Li, Hongshan Yu, Yihao Ding et al.
Annotation Ambiguity Aware Semi-Supervised Medical Image Segmentation
Suruchi Kumari, Pravendra Singh
A Tale of Two Classes: Adapting Supervised Contrastive Learning to Binary Imbalanced Datasets
David Mildenberger, Paul Hager, Daniel Rueckert et al.
A Hubness Perspective on Representation Learning for Graph-Based Multi-View Clustering
Zheming Xu, He Liu, Congyan Lang et al.
BWFormer: Building Wireframe Reconstruction from Airborne LiDAR Point Cloud with Transformer
Yuzhou Liu, Lingjie Zhu, Hanqiao Ye et al.
DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution
Yuzhong Zhao, Feng Liu, Yue Liu et al.
Vision-Language Embodiment for Monocular Depth Estimation
Jinchang Zhang, Guoyu Lu
SLVR: Super-Light Visual Reconstruction via Blueprint Controllable Convolutions and Exploring Feature Diversity Representation
Ning Ni, Libao Zhang
WeakMCN: Multi-task Collaborative Network for Weakly Supervised Referring Expression Comprehension and Segmentation
Silin Cheng, Yang Liu, Xinwei He et al.
Are Spatial-Temporal Graph Convolution Networks for Human Action Recognition Over-Parameterized?
Jianyang Xie, Yitian Zhao, Yanda Meng et al.
Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion
Songsong Yu, Yuxin Chen, Zhongang Qi et al.
Masking meets Supervision: A Strong Learning Alliance
Byeongho Heo, Taekyung Kim, Sangdoo Yun et al.
Differentiable Inverse Rendering with Interpretable Basis BRDFs
Hoon-Gyu Chung, Seokjun Choi, Seung-Hwan Baek
Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding
Han Xiao, yina xie, Guanxin tan et al.
ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models
Yahan Tu, Rui Hu, Jitao Sang
See Further When Clear: Curriculum Consistency Model
Yunpeng Liu, Boxiao Liu, Yi Zhang et al.
Olympus: A Universal Task Router for Computer Vision Tasks
Yuanze Lin, Yunsheng Li, Dongdong Chen et al.
Articulated Kinematics Distillation from Video Diffusion Models
Xuan Li, Qianli Ma, Tsung-Yi Lin et al.
WALT3D: Generating Realistic Training Data from Time-Lapse Imagery for Reconstructing Dynamic Objects Under Occlusion
Khiem Vuong, N. Dinesh Reddy, Robert Tamburo et al.
D2SP: Dynamic Dual-Stage Purification Framework for Dual Noise Mitigation in Vision-based Affective Recognition.
Haoran Wang, Xinji Mai, Zeng Tao et al.
Zero-shot RGB-D Point Cloud Registration with Pre-trained Large Vision Model
Haobo Jiang, Jin Xie, Jian Yang et al.
Distilling Spatially-Heterogeneous Distortion Perception for Blind Image Quality Assessment
Xudong Li, Wenjie Nie, Yan Zhang et al.
OW-OVD: Unified Open World and Open Vocabulary Object Detection
Xing Xi, Yangyang Huang, Ronghua Luo et al.
AeSPa : Attention-guided Self-supervised Parallel Imaging for MRI Reconstruction
Jinho Joo, Hyeseong Kim, Hyeyeon Won et al.
Adaptive Dropout: Unleashing Dropout across Layers for Generalizable Image Super-Resolution
Hang Xu, Jie Huang, Wei Yu et al.
CAPE: CAM as a Probabilistic Ensemble for Enhanced DNN Interpretation
Townim Chowdhury, Kewen Liao, Vu Minh Hieu Phan et al.
TopNet: Transformer-Efficient Occupancy Prediction Network for Octree-Structured Point Cloud Geometry Compression
Xinjie Wang, Yifan Zhang, Ting Liu et al.
Improve Representation for Imbalanced Regression through Geometric Constraints
Zijian Dong, Yilei Wu, Chongyao Chen et al.
Compositional Caching for Training-free Open-vocabulary Attribute Detection
Marco Garosi, Alessandro Conti, Gaowen Liu et al.
ControlFace: Harnessing Facial Parametric Control for Face Rigging
Wooseok Jang, Youngjun Hong, Geonho Cha et al.
Shading Meets Motion: Self-supervised Indoor 3D Reconstruction Via Simultaneous Shape-from-Shading and Structure-from-Motion
Guoyu Lu
DocSAM: Unified Document Image Segmentation via Query Decomposition and Heterogeneous Mixed Learning
Xiao-Hui Li, Fei Yin, Cheng-Lin Liu
NSD-Imagery: A Benchmark Dataset for Extending fMRI Vision Decoding Methods to Mental Imagery
Reese Kneeland, Paul Scotti, Ghislain St-Yves et al.
CustomKD: Customizing Large Vision Foundation for Edge Model Improvement via Knowledge Distillation
Jungsoo Lee, Debasmit Das, Munawar Hayat et al.
Towards Understanding How Knowledge Evolves in Large Vision-Language Models
Sudong Wang, Yunjian Zhang, Yao Zhu et al.
Cheb-GR: Rethinking K-nearest Neighbor Search in Re-ranking for Person Re-identification
Jinxi Yang, He Li, Bo Du et al.
Efficient Video Super-Resolution for Real-time Rendering with Decoupled G-buffer Guidance
Mingjun Zheng, Long Sun, Jiangxin Dong et al.
Positive2Negative: Breaking the Information-Lossy Barrier in Self-Supervised Single Image Denoising
Tong Li, Lizhi Wang, Zhiyuan Xu et al.
Rethinking Decoder Design: Improving Biomarker Segmentation Using Depth-to-Space Restoration and Residual Linear Attention
Saad Wazir, Daeyoung Kim
CoSpace: Benchmarking Continuous Space Perception Ability for Vision-Language Models
Yiqi Zhu, Ziyue Wang, Can Zhang et al.
RivuletMLP: An MLP-based Architecture for Efficient Compressed Video Quality Enhancement
Gang He, Weiran Wang, Guancheng Quan et al.
Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration
Junyuan Deng, Xinyi Wu, Yongxing Yang et al.
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
Fida Mohammad Thoker, Letian Jiang, Chen Zhao et al.
Spherical Manifold Guided Diffusion Model for Panoramic Image Generation
Xiancheng Sun, Mai Xu, Shengxi Li et al.
Anatomical Consistency and Adaptive Prior-informed Transformation for Multi-contrast MR Image Synthesis via Diffusion Model
Yejee Shin, Yeeun Lee, Hanbyol Jang et al.
SMTPD: A New Benchmark for Temporal Prediction of Social Media Popularity
Yijie Xu, Bolun Zheng, Wei Zhu et al.
Decouple Distortion from Perception: Region Adaptive Diffusion for Extreme-low Bitrate Perception Image Compression
Jinchang Xu, Shaokang Wang, Jintao Chen et al.
Towards Efficient Foundation Model for Zero-shot Amodal Segmentation
Zhaochen Liu, Limeng Qiao, Xiangxiang Chu et al.
FedSPA: Generalizable Federated Graph Learning under Homophily Heterogeneity
Zihan Tan, Guancheng Wan, Wenke Huang et al.
Nearly Zero-Cost Protection Against Mimicry by Personalized Diffusion Models
Namhyuk Ahn, KiYoon Yoo, Wonhyuk Ahn et al.
Learning Phase Distortion with Selective State Space Models for Video Turbulence Mitigation
Xingguang Zhang, Nicholas M Chimitt, Xijun Wang et al.
FiRe: Fixed-points of Restoration Priors for Solving Inverse Problems
Matthieu Terris, Ulugbek Kamilov, Thomas Moreau
Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture
Xuanchen Li, Jianyu Wang, Yuhao Cheng et al.
Infighting in the Dark: Multi-Label Backdoor Attack in Federated Learning
Ye Li, Yanchao Zhao, chengcheng zhu et al.