Most Cited CVPR "human action analysis" Papers
5,589 papers found • Page 17 of 28
Conference
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
ruotian peng, Haiying He, Yake Wei et al.
Harnessing Global-Local Collaborative Adversarial Perturbation for Anti-Customization
Long Xu, Jiakai Wang, Haojie Hao et al.
Plug-and-Play PPO: An Adaptive Point Prompt Optimizer Making SAM Greater
Xueyu Liu, Rui Wang, Yexin Lai et al.
Dr. Bokeh: DiffeRentiable Occlusion-aware Bokeh Rendering
Yichen Sheng, Zixun Yu, Lu Ling et al.
SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation
Yuxuan Zhang, Yiren Song, Jiaming Liu et al.
SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting
Hoon Kim, Minje Jang, Wonjun Yoon et al.
Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models?
Yanbo Wang, Jiyang Guan, Jian Liang et al.
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Shuyang Sun, Runjia Li, Philip H.S. Torr et al.
Learning from Observer Gaze: Zero-Shot Attention Prediction Oriented by Human-Object Interaction Recognition
Yuchen Zhou, Linkai Liu, Chao Gou
CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image
Jingshun Huang, Haitao Lin, Tianyu Wang et al.
Super-Resolution Reconstruction from Bayer-Pattern Spike Streams
Yanchen Dong, Ruiqin Xiong, Jian Zhang et al.
Image Neural Field Diffusion Models
Yinbo Chen, Oliver Wang, Richard Zhang et al.
Multi-Scale Neighborhood Occupancy Masked Autoencoder for Self-Supervised Learning in LiDAR Point Clouds
Mohamed Abdelsamad, Michael Ulrich, Claudius Glaeser et al.
Decoupling Fine Detail and Global Geometry for Compressed Depth Map Super-Resolution
Huan Zheng, Wencheng Han, Jianbing Shen
Denoising Point Clouds in Latent Space via Graph Convolution and Invertible Neural Network
Aihua Mao, Biao Yan, Zijing Ma et al.
Dual-View Visual Contextualization for Web Navigation
Jihyung Kil, Chan Hee Song, Boyuan Zheng et al.
SVFR: A Unified Framework for Generalized Video Face Restoration
Zhiyao Wang, Xu Chen, Chengming Xu et al.
Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation
Haojie Zhang, Yongyi Su, Xun Xu et al.
Language-guided Image Reflection Separation
Haofeng Zhong, Yuchen Hong, Shuchen Weng et al.
SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation
Thuan Nguyen, Anh Tran
Looking 3D: Anomaly Detection with 2D-3D Alignment
Ankan Kumar Bhunia, Changjian Li, Hakan Bilen
EventPS: Real-Time Photometric Stereo Using an Event Camera
Bohan Yu, Jieji Ren, Jin Han et al.
Diffusion Handles Enabling 3D Edits for Diffusion Models by Lifting Activations to 3D
Karran Pandey, Paul Guerrero, Matheus Gadelha et al.
Circuit Design and Efficient Simulation of Quantum Inner Product and Empirical Studies of Its Effect on Near-Term Hybrid Quantum-Classic Machine Learning
Hao Xiong, Yehui Tang, Xinyu Ye et al.
Uncovering What Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly
Hang Du, Sicheng Zhang, Binzhu Xie et al.
CrowdDiff: Multi-hypothesis Crowd Density Estimation using Diffusion Models
Yasiru Ranasinghe, Nithin Gopalakrishnan Nair, Wele Gedara Chaminda Bandara et al.
PSDPM: Prototype-based Secondary Discriminative Pixels Mining for Weakly Supervised Semantic Segmentation
Xinqiao Zhao, Ziqian Yang, Tianhong Dai et al.
Effective SAM Combination for Open-Vocabulary Semantic Segmentation
Minhyeok Lee, Suhwan Cho, Jungho Lee et al.
A Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid Transformer and Contrastive Learning
Yuelin Zhang, Pengyu Zheng, Wanquan Yan et al.
Towards 3D Vision with Low-Cost Single-Photon Cameras
Fangzhou Mu, Carter Sifferman, Sacha Jungerman et al.
Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting
Haipeng Liu, Yang Wang, Biao Qian et al.
Shadows Don't Lie and Lines Can't Bend! Generative Models don't know Projective Geometry...for now
Ayush Sarkar, Hanlin Mai, Amitabh Mahapatra et al.
Aligning Logits Generatively for Principled Black-Box Knowledge Distillation
Jing Ma, Xiang Xiang, Ke Wang et al.
A Unified Approach to Interpreting Self-supervised Pre-training Methods for 3D Point Clouds via Interactions
Qiang Li, Jian Ruan, Fanghao Wu et al.
Semantically-Shifted Incremental Adapter-Tuning is A Continual ViTransformer
Yuwen Tan, Qinhao Zhou, Xiang Xiang et al.
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning
Yang Yue, Yulin Wang, Chenxin Tao et al.
Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models
David Stotko, Nils Wandel, Reinhard Klein
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
Xinshuai Song, weixing chen, Yang Liu et al.
Open Set Label Shift with Test Time Out-of-Distribution Reference
Changkun Ye, Russell Tsuchida, Lars Petersson et al.
Dynamic Motion Blending for Versatile Motion Editing
Nan Jiang, Hongjie Li, Ziye Yuan et al.
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
Tong Wu, Guandao Yang, Zhibing Li et al.
On Train-Test Class Overlap and Detection for Image Retrieval
Chull Hwan Song, Jooyoung Yoon, Taebaek Hwang et al.
Orthogonal Adaptation for Modular Customization of Diffusion Models
Ryan Po, Guandao Yang, Kfir Aberman et al.
Permutation Equivariance of Transformers and Its Applications
Hengyuan Xu, Liyao Xiang, Hangyu Ye et al.
AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction
Yuanbin Man, Ying Huang, Chengming Zhang et al.
PDFactor: Learning Tri-Perspective View Policy Diffusion Field for Multi-Task Robotic Manipulation
Jingyi Tian, Le Wang, Sanping Zhou et al.
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jieneng Chen, Qihang Yu, Xiaohui Shen et al.
Building Vision Models upon Heat Conduction
Zhaozhi Wang, Yue Liu, Yunjie Tian et al.
HomoFormer: Homogenized Transformer for Image Shadow Removal
Jie Xiao, Xueyang Fu, Yurui Zhu et al.
Incomplete Multi-View Multi-label Learning via Disentangled Representation and Label Semantic Embedding
Xu Yan, Jun Yin, Jie Wen
Misalignment-Robust Frequency Distribution Loss for Image Transformation
Zhangkai Ni, Juncheng Wu, Zian Wang et al.
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
Yiming Zhang, Zhening Xing, Yanhong Zeng et al.
DnLUT: Ultra-Efficient Color Image Denoising via Channel-Aware Lookup Tables
Sidi Yang, Binxiao Huang, Yulun Zhang et al.
DemoCaricature: Democratising Caricature Generation with a Rough Sketch
Dar-Yen Chen, Ayan Kumar Bhunia, Subhadeep Koley et al.
End-to-End Spatio-Temporal Action Localisation with Video Transformers
Alexey Gritsenko, Xuehan Xiong, Josip Djolonga et al.
TRINS: Towards Multimodal Language Models that Can Read
Ruiyi Zhang, Yanzhe Zhang, Jian Chen et al.
EFHQ: Multi-purpose ExtremePose-Face-HQ dataset
Trung Dao, Duc H Vu, Cuong Pham et al.
CocoER: Aligning Multi-Level Feature by Competition and Coordination for Emotion Recognition
Xuli Shen, Hua Cai, Weilin Shen et al.
Logarithmic Lenses: Exploring Log RGB Data for Image Classification
Bruce Maxwell, Sumegha Singhania, Avnish Patel et al.
Unlocking Pre-trained Image Backbones for Semantic Image Synthesis
Tariq Berrada, Jakob Verbeek, camille couprie et al.
TokenCompose: Text-to-Image Diffusion with Token-level Supervision
Zirui Wang, Zhizhou Sha, Zheng Ding et al.
Brain-Inspired Spiking Neural Networks for Energy-Efficient Object Detection
Ziqi Li, Tao Gao, Yisheng An et al.
Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning
Huiyi Wang, Haodong Lu, Lina Yao et al.
PointSR: Self-Regularized Point Supervision for Drone-View Object Detection
Weizhuo Li, Yue Xi, Wenjing Jia et al.
Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates
Ka Chun SHUM, Jaeyeon Kim, Binh-Son Hua et al.
Infer from What You Have Seen Before: Temporally-dependent Classifier for Semi-supervised Video Segmentation
Jiafan Zhuang, Zilei Wang, Yixin Zhang et al.
Seeing the World through Your Eyes
Hadi Alzayer, Kevin Zhang, Brandon Y. Feng et al.
BimArt: A Unified Approach for the Synthesis of 3D Bimanual Interaction with Articulated Objects
Wanyue Zhang, Rishabh Dabral, Vladislav Golyanik et al.
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
Sili Chen, Hengkai Guo, Shengnan Zhu et al.
Learning Vision from Models Rivals Learning Vision from Data
Yonglong Tian, Lijie Fan, Kaifeng Chen et al.
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Jianzong Wu, Chao Tang, Jingbo Wang et al.
WildlifeMapper: Aerial Image Analysis for Multi-Species Detection and Identification
Satish Kumar, Bowen Zhang, Chandrakanth Gudavalli et al.
RegionGPT: Towards Region Understanding Vision Language Model
Qiushan Guo, Shalini De Mello, Danny Yin et al.
Unlocking the Potential of Pre-trained Vision Transformers for Few-Shot Semantic Segmentation through Relationship Descriptors
Ziqin Zhou, Hai-Ming Xu, Yangyang Shu et al.
Holo-Relighting: Controllable Volumetric Portrait Relighting from a Single Image
Yiqun Mei, Yu Zeng, He Zhang et al.
Relightful Harmonization: Lighting-aware Portrait Background Replacement
Mengwei Ren, Wei Xiong, Jae Shin Yoon et al.
Taming the Tail in Class-Conditional GANs: Knowledge Sharing via Unconditional Training at Lower Resolutions
Saeed Khorram, Mingqi Jiang, Mohamad Shahbazi et al.
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Xiang Yue, Yuansheng Ni, Kai Zhang et al.
AniDoc: Animation Creation Made Easier
Yihao Meng, Hao Ouyang, Hanlin Wang et al.
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Yanhui Wang, Jianmin Bao, Wenming Weng et al.
MorpheuS: Neural Dynamic 360° Surface Reconstruction from Monocular RGB-D Video
Hengyi Wang, Jingwen Wang, Lourdes Agapito
Capturing Closely Interacted Two-Person Motions with Reaction Priors
Qi Fang, Yinghui Fan, Yanjun Li et al.
Good, Cheap, and Fast: Overfitted Image Compression with Wasserstein Distortion
Jona Ballé, Luca Versari, Emilien Dupont et al.
Camouflage Anything: Learning to Hide using Controlled Out-painting and Representation Engineering
Biplab Das, Viswanath Gopalakrishnan
LightOctree: Lightweight 3D Spatially-Coherent Indoor Lighting Estimation
Xuecan Wang, Shibang Xiao, Xiaohui Liang
Leveraging Temporal Cues for Semi-Supervised Multi-View 3D Object Detection
Jinhyung Park, Navyata Sanghvi, Hiroki Adachi et al.
CGMatch: A Different Perspective of Semi-supervised Learning
Bo Cheng, Jueqing Lu, Yuan Tian et al.
Bi-level Learning of Task-Specific Decoders for Joint Registration and One-Shot Medical Image Segmentation
Xin Fan, Xiaolin Wang, Jiaxin Gao et al.
FairCLIP: Harnessing Fairness in Vision-Language Learning
Yan Luo, MIN SHI, Muhammad Osama Khan et al.
Navigate Beyond Shortcuts: Debiased Learning Through the Lens of Neural Collapse
Yining Wang, Junjie Sun, Chenyue Wang et al.
DiVa-360: The Dynamic Visual Dataset for Immersive Neural Fields
Cheng-You Lu, Peisen Zhou, Angela Xing et al.
Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
Mark Hamilton, Andrew Zisserman, John Hershey et al.
Learning Visual Prompt for Gait Recognition
Kang Ma, Ying Fu, Chunshui Cao et al.
NoiseCtrl: A Sampling-Algorithm-Agnostic Conditional Generation Method for Diffusion Models
Longquan Dai, He Wang, Jinhui Tang
Compositional Targeted Multi-Label Universal Perturbations
Hassan Mahmood, Ehsan Elhamifar
ODA-GAN: Orthogonal Decoupling Alignment GAN Assisted by Weakly-supervised Learning for Virtual Immunohistochemistry Staining
Tong Wang, Mingkang Wang, Zhongze Wang et al.
ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models
Xinyu Tian, Shu Zou, Zhaoyuan Yang et al.
Proximal Algorithm Unrolling: Flexible and Efficient Reconstruction Networks for Single-Pixel Imaging
Ping Wang, Lishun Wang, Gang Qu et al.
PolarRec: Improving Radio Interferometric Data Reconstruction Using Polar Coordinates
Ruoqi Wang, Zhuoyang Chen, Jiayi Zhu et al.
LDP: Language-driven Dual-Pixel Image Defocus Deblurring Network
Hao Yang, Liyuan Pan, Yan Yang et al.
DiPrompT: Disentangled Prompt Tuning for Multiple Latent Domain Generalization in Federated Learning
Sikai Bai, Jie ZHANG, Song Guo et al.
StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN
Jongwoo Choi, Kwanggyoon Seo, Amirsaman Ashtari et al.
MS-MANO: Enabling Hand Pose Tracking with Biomechanical Constraints
Pengfei Xie, Wenqiang Xu, Tutian Tang et al.
CGI-DM: Digital Copyright Authentication for Diffusion Models via Contrasting Gradient Inversion
Xiaoyu Wu, Yang Hua, Chumeng Liang et al.
Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation
Ming Xu, Stephen Gould
ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization
Weiyao Wang, Pierre Gleize, Hao Tang et al.
Learning Large-Factor EM Image Super-Resolution with Generative Priors
Jiateng Shou, Zeyu Xiao, Shiyu Deng et al.
From Head to Tail: Efficient Black-box Model Inversion Attack via Long-tailed Learning
Ziang Li, Hongguang Zhang, Juan Wang et al.
PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion
Ying-Tian Liu, Yuan-Chen Guo, Guan Luo et al.
Learning for Transductive Threshold Calibration in Open-World Recognition
Qin ZHANG, DONGSHENG An, Tianjun Xiao et al.
SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing
Xueting Li, Ye Yuan, Shalini De Mello et al.
SerialGen: Personalized Image Generation by First Standardization Then Personalization
Cong Xie, Han Zou, Ruiqi Yu et al.
Spiking Transformer: Introducing Accurate Addition-Only Spiking Self-Attention for Transformer
Yufei Guo, Xiaode Liu, Yuanpei Chen et al.
SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking
Xiaojun Hou, Jiazheng Xing, Yijie Qian et al.
Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation
Zhiwei Yang, Kexue Fu, Minghong Duan et al.
Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence
Junyi Zhang, Charles Herrmann, Junhwa Hur et al.
Amodal Ground Truth and Completion in the Wild
Guanqi Zhan, Chuanxia Zheng, Weidi Xie et al.
MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding
Chun-Peng Chang, Shaoxiang Wang, Alain Pagani et al.
RSAR: Restricted State Angle Resolver and Rotated SAR Benchmark
Xin Zhang, Xue Yang, Yuxuan Li et al.
Seeing Unseen: Discover Novel Biomedical Concepts via Geometry-Constrained Probabilistic Modeling
Jianan Fan, Dongnan Liu, Hang Chang et al.
Real-Time Exposure Correction via Collaborative Transformations and Adaptive Sampling
Ziwen Li, Feng Zhang, Meng Cao et al.
Gaussian Splatting SLAM
Hidenobu Matsuki, Riku Murai, Paul Kelly et al.
PromptCoT: Align Prompt Distribution via Adapted Chain-of-Thought
Junyi Yao, Yijiang Liu, Zhen Dong et al.
NeLF-Pro: Neural Light Field Probes for Multi-Scale Novel View Synthesis
Zinuo You, Andreas Geiger, Anpei Chen
Few-shot Implicit Function Generation via Equivariance
Suizhi Huang, Xingyi Yang, Hongtao Lu et al.
MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration
Boyun Li, Haiyu Zhao, Wenxin Wang et al.
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction
Guillaume Jaume, Anurag Vaidya, Richard J. Chen et al.
Practical Measurements of Translucent Materials with Inter-Pixel Translucency Prior
Zhenyu Chen, Jie Guo, Shuichang Lai et al.
ViKIENet: Towards Efficient 3D Object Detection with Virtual Key Instance Enhanced Network
Zhuochen Yu, Bijie Qiu, Andy W. H. Khong
CoCoGaussian: Leveraging Circle of Confusion for Gaussian Splatting from Defocused Images
Jungho Lee, Suhwan Cho, Taeoh Kim et al.
RTracker: Recoverable Tracking via PN Tree Structured Memory
Yuqing Huang, Xin Li, Zikun Zhou et al.
View-Category Interactive Sharing Transformer for Incomplete Multi-View Multi-Label Learning
Shilong Ou, Zhe Xue, Yawen Li et al.
Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models
Yabin Zhang, Wenjie Zhu, Hui Tang et al.
Knowledge Bridger: Towards Training-Free Missing Modality Completion
Guanzhou Ke, Shengfeng He, Xiao-Li Wang et al.
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
David Junhao Zhang, Roni Paiss, Shiran Zada et al.
Perception Tokens Enhance Visual Reasoning in Multimodal Language Models
Mahtab Bigverdi, Zelun Luo, Cheng-Yu Hsieh et al.
A Versatile Framework for Continual Test-Time Domain Adaptation: Balancing Discriminability and Generalizability
Xu Yang, Xuan chen, Moqi Li et al.
Efficient Solution of Point-Line Absolute Pose
Petr Hruby, Timothy Duff, Marc Pollefeys
SPIN: Simultaneous Perception Interaction and Navigation
Shagun Uppal, Ananye Agarwal, Haoyu Xiong et al.
CAMixerSR: Only Details Need More "Attention"
Yan Wang, Yi Liu, Shijie Zhao et al.
Augmenting Perceptual Super-Resolution via Image Quality Predictors
Fengjia Zhang, Samrudhdhi Rangrej, Tristan T Aumentado-Armstrong et al.
Beyond Single-Modal Boundary: Cross-Modal Anomaly Detection through Visual Prototype and Harmonization
Kai Mao, Ping Wei, Yiyang Lian et al.
FISBe: A Real-World Benchmark Dataset for Instance Segmentation of Long-Range Thin Filamentous Structures
Lisa Mais, Peter Hirsch, Claire Managan et al.
POPDG: Popular 3D Dance Generation with PopDanceSet
Zhenye Luo, Min Ren, Xuecai Hu et al.
RankMatch: Exploring the Better Consistency Regularization for Semi-supervised Semantic Segmentation
Huayu Mai, Rui Sun, Tianzhu Zhang et al.
CoDe: An Explicit Content Decoupling Framework for Image Restoration
Enxuan Gu, Hongwei Ge, Yong Guo
D^4: Dataset Distillation via Disentangled Diffusion Model
Duo Su, Junjie Hou, Weizhi Gao et al.
Text Augmented Correlation Transformer For Few-shot Classification & Segmentation
Srinivasa Rao Nandam, Sara Atito, Zhenhua Feng et al.
MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation
Zhenyu Wu, Yuheng Zhou, Xiuwei Xu et al.
MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing
Shuo Wang, Wanting Li, Yongcai Wang et al.
TAGA: Self-supervised Learning for Template-free Animatable Gaussian Articulated Model
Zhichao Zhai, Guikun Chen, Wenguan Wang et al.
MarkovGen: Structured Prediction for Efficient Text-to-Image Generation
Sadeep Jayasumana, Daniel Glasner, Srikumar Ramalingam et al.
Rethinking Few-Shot Adaptation of Vision-Language Models in Two Stages
Matteo Farina, Massimiliano Mancini, Giovanni Iacca et al.
Rethinking the Representation in Federated Unsupervised Learning with Non-IID Data
Xinting Liao, Weiming Liu, Chaochao Chen et al.
Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents
Jun Chen, Dannong Xu, Junjie Fei et al.
All-Day Multi-Camera Multi-Target Tracking
Huijie Fan, Yu Qiao, Yihao Zhen et al.
Task-aware Cross-modal Feature Refinement Transformer with Large Language Models for Visual Grounding
Wenbo Chen, Zhen Xu, Ruotao Xu et al.
ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification
Jiangbo Shi, Chen Li, Tieliang Gong et al.
Segment Any Motion in Videos
Nan Huang, Wenzhao Zheng, Chenfeng Xu et al.
Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance
Phuc Nguyen, Tuan Duc Ngo, Evangelos Kalogerakis et al.
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
Kanchana Ranasinghe, Satya Narayan Shukla, Omid Poursaeed et al.
Visual Prompting for One-shot Controllable Video Editing without Inversion
Zhengbo Zhang, Yuxi Zhou, DUO PENG et al.
CaDeT: a Causal Disentanglement Approach for Robust Trajectory Prediction in Autonomous Driving
Mozhgan Pourkeshavarz, Junrui Zhang, Amir Rasouli
Continual Forgetting for Pre-trained Vision Models
Hongbo Zhao, Bolin Ni, Junsong Fan et al.
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
Md Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang et al.
SINR: Sparsity Driven Compressed Implicit Neural Representations
Dhananjaya Jayasundara, Sudarshan Rajagopalan, Yasiru Ranasinghe et al.
Boosting Neural Representations for Videos with a Conditional Decoder
XINJIE ZHANG, Ren Yang, Dailan He et al.
Unsupervised Feature Learning with Emergent Data-Driven Prototypicality
Yunhui Guo, Youren Zhang, Yubei Chen et al.
Text-Guided 3D Face Synthesis - From Generation to Editing
Yunjie Wu, Yapeng Meng, Zhipeng Hu et al.
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models
Huan Ling, Seung Wook Kim, Antonio Torralba et al.
IReNe: Instant Recoloring of Neural Radiance Fields
Alessio Mazzucchelli, Adrian Garcia-Garcia, Elena Garces et al.
Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual Knowledge
Yaqi Zhao, Yuanyang Yin, Lin Li et al.
Hazy Low-Quality Satellite Video Restoration Via Learning Optimal Joint Degradation Patterns and Continuous-Scale Super-Resolution Reconstruction
Ning Ni, Libao Zhang
ADD: Attribution-Driven Data Augmentation Framework for Boosting Image Super-Resolution
Zeyu Mi, Yu-Bin Yang
Constrained Layout Generation with Factor Graphs
Mohammed Haroon Dupty, Yanfei Dong, Sicong Leng et al.
SASep: Saliency-Aware Structured Separation of Geometry and Feature for Open Set Learning on Point Clouds
Jinfeng Xu, Xianzhi Li, Yuan Tang et al.
URHand: Universal Relightable Hands
Zhaoxi Chen, Gyeongsik Moon, Kaiwen Guo et al.
SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model
Yucheng Mao, Boyang Wang, Nilesh Kulkarni et al.
Neural Implicit Morphing of Face Images
Guilherme Schardong, Tiago Novello, Hallison Paz et al.
All-Optical Nonlinear Diffractive Deep Network for Ultrafast Image Denoising
Xiaoling Zhou, Zhemg Lee, Wei Ye et al.
Distilling CLIP with Dual Guidance for Learning Discriminative Human Body Shape Representation
Feng Liu, Minchul Kim, Zhiyuan Ren et al.
Snapshot Lidar: Fourier Embedding of Amplitude and Phase for Single-Image Depth Reconstruction
Sarah Friday, Yunzi Shi, Yaswanth Kumar Cherivirala et al.
CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification
Haoran Lai, Qingsong Yao, Zihang Jiang et al.
MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant
Chenlu Zhan, Gaoang Wang, Yu LIN et al.
GLID: Pre-training a Generalist Encoder-Decoder Vision Model
Jihao Liu, Jinliang Zheng, Yu Liu et al.
Your Image is My Video: Reshaping the Receptive Field via Image-To-Video Differentiable AutoAugmentation and Fusion
Sofia Casarin, Cynthia Ugwu, Sergio Escalera et al.
Kernel Adaptive Convolution for Scene Text Detection via Distance Map Prediction
Jinzhi Zheng, Heng Fan, Libo Zhang
DPSeg: Dual-Prompt Cost Volume Learning for Open-Vocabulary Semantic Segmentation
Ziyu Zhao, Xiaoguang Li, Lingjia Shi et al.
DejaVid: Encoder-Agnostic Learned Temporal Matching for Video Classification
Darryl Ho, Samuel Madden
NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation
Sicheng Li, Hao Li, Yiyi Liao et al.
Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing
Hyelin Nam, Gihyun Kwon, Geon Yeong Park et al.
From Prototypes to General Distributions: An Efficient Curriculum for Masked Image Modeling
Jinhong Lin, Cheng-En Wu, Huanran Li et al.
Hierarchical Knowledge Prompt Tuning for Multi-task Test-Time Adaptation
Qiang Zhang, Mengsheng Zhao, Jiawei Liu et al.
PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar
Tzofi Klinghoffer, Xiaoyu Xiang, Siddharth Somasundaram et al.
DiffLoc: Diffusion Model for Outdoor LiDAR Localization
Wen Li, Yuyang Yang, Shangshu Yu et al.
CoSDH: Communication-Efficient Collaborative Perception via Supply-Demand Awareness and Intermediate-Late Hybridization
Junhao Xu, Yanan Zhang, Zhi Cai et al.
A Focused Human Body Model for Accurate Anthropometric Measurements Extraction
Shuhang Chen, Xianliang Huang, Zhizhou Zhong et al.
Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation
Mingyu Lee, Jongwon Choi
Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models
Pengze Zhang, Hubery Yin, Chen Li et al.
Advancing Adversarial Robustness in GNeRFs: The IL2-NeRF Attack
Nicole Meng, Caleb Manicke, Ronak Sahu et al.