Most Cited 2024 "video recognition" Papers
12,324 papers found • Page 51 of 62
Conference
Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation
Jingyun Wang, Guoliang Kang
Leveraging Vision-Language Models for Improving Domain Generalization in Image Classification
Sravanti Addepalli, Ashish Asokan, Lakshay Sharma et al.
Absolute Pose from One or Two Scaled and Oriented Features
Jonathan Ventura, Zuzana Kukelova, Torsten Sattler et al.
Draw Step by Step: Reconstructing CAD Construction Sequences from Point Clouds via Multimodal Diffusion.
Weijian Ma, Shuaiqi Chen, Yunzhong Lou et al.
DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation
Zeeshan Hayder, Xuming He
Open-Vocabulary 3D Semantic Segmentation with Foundation Models
Li Jiang, Shaoshuai Shi, Bernt Schiele
Training Vision Transformers for Semi-Supervised Semantic Segmentation
Xinting Hu, Li Jiang, Bernt Schiele
APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation
Weizhao He, Yang Zhang, Wei Zhuo et al.
Design2Cloth: 3D Cloth Generation from 2D Masks
Jiali Zheng, Rolandos Alexandros Potamias, Stefanos Zafeiriou
S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes
Xingyi Li, Zhiguo Cao, Yizheng Wu et al.
SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation
Aysim Toker, Marvin Eisenberger, Daniel Cremers et al.
Dual-Consistency Model Inversion for Non-Exemplar Class Incremental Learning
Zihuan Qiu, Yi Xu, Fanman Meng et al.
DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes
Hao Yan, Zhihui Ke, Xiaobo Zhou et al.
Rolling Shutter Correction with Intermediate Distortion Flow Estimation
Mingdeng Cao, Sidi Yang, Yujiu Yang et al.
Towards Transferable Targeted 3D Adversarial Attack in the Physical World
Yao Huang, Yinpeng Dong, Shouwei Ruan et al.
Hybrid Functional Maps for Crease-Aware Non-Isometric Shape Matching
Lennart Bastian, Yizheng Xie, Nassir Navab et al.
Class Tokens Infusion for Weakly Supervised Semantic Segmentation
Sung-Hoon Yoon, Hoyong Kwon, Hyeonseong Kim et al.
SFOD: Spiking Fusion Object Detector
Yimeng Fan, Wei Zhang, Changsong Liu et al.
AnyDoor: Zero-shot Object-level Image Customization
Xi Chen, Lianghua Huang, Yu Liu et al.
GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs
Gege Gao, Weiyang Liu, Anpei Chen et al.
SeD: Semantic-Aware Discriminator for Image Super-Resolution
Bingchen Li, Xin Li, Hanxin Zhu et al.
InstanceDiffusion: Instance-level Control for Image Generation
XuDong Wang, Trevor Darrell, Sai Saketh Rambhatla et al.
Robust Emotion Recognition in Context Debiasing
Dingkang Yang, Kun Yang, Mingcheng Li et al.
Improving Training Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architecture
Huijie Zhang, Yifu Lu, Ismail Alkhouri et al.
Balancing Act: Distribution-Guided Debiasing in Diffusion Models
Rishubh Parihar, Abhijnya Bhat, Abhipsa Basu et al.
Sieve: Multimodal Dataset Pruning using Image Captioning Models
Anas Mahmoud, Mostafa Elhoushi, Amro Abbas et al.
Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation
Song Wang, Jiawei Yu, Wentong Li et al.
Towards Fairness-Aware Adversarial Learning
Yanghao Zhang, Tianle Zhang, Ronghui Mu et al.
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Andong Wang, Bo Wu, Sunli Chen et al.
MuRF: Multi-Baseline Radiance Fields
Haofei Xu, Anpei Chen, Yuedong Chen et al.
Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans
Romain Loiseau, Elliot Vincent, Mathieu Aubry et al.
Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds
Tianrui Lou, Xiaojun Jia, Jindong Gu et al.
PIGEON: Predicting Image Geolocations
Lukas Haas, Michal Skreta, Silas Alberti et al.
JoAPR: Cleaning the Lens of Prompt Learning for Vision-Language Models
YUNCHENG GUO, Xiaodong Gu
Retrieval-Augmented Egocentric Video Captioning
Jilan Xu, Yifei Huang, Junlin Hou et al.
GPLD3D: Latent Diffusion of 3D Shape Generative Models by Enforcing Geometric and Physical Priors
Yuan Dong, Qi Zuo, Xiaodong Gu et al.
Low-Rank Knowledge Decomposition for Medical Foundation Models
Yuhang Zhou, Haolin li, Siyuan Du et al.
Pixel-level Semantic Correspondence through Layout-aware Representation Learning and Multi-scale Matching Integration
Yixuan Sun, Zhangyue Yin, Haibo Wang et al.
View From Above: Orthogonal-View aware Cross-view Localization
Shan Wang, Chuong Nguyen, Jiawei Liu et al.
WorDepth: Variational Language Prior for Monocular Depth Estimation
Ziyao Zeng, Hyoungseob Park, Fengyu Yang et al.
Event-assisted Low-Light Video Object Segmentation
Li Hebei, Jin Wang, Jiahui Yuan et al.
3DToonify: Creating Your High-Fidelity 3D Stylized Avatar Easily from 2D Portrait Images
Yifang Men, Hanxi Liu, Yuan Yao et al.
Synthesize Diagnose and Optimize: Towards Fine-Grained Vision-Language Understanding
Wujian Peng, Sicheng Xie, Zuyao You et al.
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
Tai Wang, Xiaohan Mao, Chenming Zhu et al.
DIOD: Self-Distillation Meets Object Discovery
Sandra Kara, Hejer AMMAR, Julien Denize et al.
FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models
LIn Zhao, Tianchen Zhao, Zinan Lin et al.
COLMAP-Free 3D Gaussian Splatting
Yang Fu, Sifei Liu, Amey Kulkarni et al.
SNED: Superposition Network Architecture Search for Efficient Video Diffusion Model
Zhengang Li, Yan Kang, Yuchen Liu et al.
Personalized Residuals for Concept-Driven Text-to-Image Generation
Cusuh Ham, Matthew Fisher, James Hays et al.
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
Seokju Cho, Heeseong Shin, Sunghwan Hong et al.
Deep Generative Model based Rate-Distortion for Image Downscaling Assessment
yuanbang liang, Bhavesh Garg, Paul L. Rosin et al.
Forecasting of 3D Whole-body Human Poses with Grasping Objects
yan haitao, Qiongjie Cui, Jiexin Xie et al.
VA3: Virtually Assured Amplification Attack on Probabilistic Copyright Protection for Text-to-Image Generative Models
Xiang Li, Qianli Shen, Kenji Kawaguchi
PIE-NeRF: Physics-based Interactive Elastodynamics with NeRF
Yutao Feng, Yintong Shang, Xuan Li et al.
SNI-SLAM: Semantic Neural Implicit SLAM
Siting Zhu, Guangming Wang, Hermann Blum et al.
Edge-Aware 3D Instance Segmentation Network with Intelligent Semantic Prior
Wonseok Roh, Hwanhee Jung, Giljoo Nam et al.
TextureDreamer: Image-Guided Texture Synthesis Through Geometry-Aware Diffusion
Yu-Ying Yeh, Jia-Bin Huang, Changil Kim et al.
MAFA: Managing False Negatives for Vision-Language Pre-training
Jaeseok Byun, Dohoon Kim, Taesup Moon
Blur2Blur: Blur Conversion for Unsupervised Image Deblurring on Unknown Domains
Bang-Dang Pham, Phong Tran, Anh Tran et al.
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models
Ozgur Kara, Bariscan Kurtkaya, Hidir Yesiltepe et al.
ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for Autonomous Vehicles
Jiawei Zhang, Chejian Xu, Bo Li
MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
Jielin Qiu, Jiacheng Zhu, William Han et al.
Generalizable Novel-View Synthesis using a Stereo Camera
Haechan Lee, Wonjoon Jin, Seung-Hwan Baek et al.
Learning Structure-from-Motion with Graph Attention Networks
Lucas Brynte, José Pedro Iglesias, Carl Olsson et al.
Don’t Drop Your Samples! Coherence-Aware Training Benefits Conditional Diffusion
Nicolas Dufour, Victor Besnier, Vicky Kalogeiton et al.
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection
Peng Qi, Zehong Yan, Wynne Hsu et al.
Spatial-Aware Regression for Keypoint Localization
Dongkai Wang, Shiliang Zhang
Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes
Chi-Hsi Kung, 書緯 呂, Yi-Hsuan Tsai et al.
Diff-BGM: A Diffusion Model for Video Background Music Generation
Sizhe Li, Yiming Qin, Minghang Zheng et al.
ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation
Dar-Yen Chen, Hamish Tennent, Ching-Wen Hsu
EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars
Nikita Drobyshev, Antoni Bigata Casademunt, Konstantinos Vougioukas et al.
Shadow-Enlightened Image Outpainting
Hang Yu, Ruilin Li, Shaorong Xie et al.
Specularity Factorization for Low-Light Enhancement
Saurabh Saini, P. J. Narayanan
Latent Modulated Function for Computational Optimal Continuous Image Representation
Zongyao He, Zhi Jin
Domain-Rectifying Adapter for Cross-Domain Few-Shot Segmentation
Jiapeng Su, Qi Fan, Wenjie Pei et al.
Shallow-Deep Collaborative Learning for Unsupervised Visible-Infrared Person Re-Identification
Bin Yang, Jun Chen, Mang Ye
L2B: Learning to Bootstrap Robust Models for Combating Label Noise
Yuyin Zhou, Xianhang li, Fengze Liu et al.
OED: Towards One-stage End-to-End Dynamic Scene Graph Generation
Guan Wang, Zhimin Li, Qingchao Chen et al.
SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis
Ziqiao Peng, Wentao Hu, Yue Shi et al.
Attack To Defend: Exploiting Adversarial Attacks for Detecting Poisoned Models
Samar Fares, Karthik Nandakumar
ViewFusion: Towards Multi-View Consistency via Interpolated Denoising
Xianghui Yang, Gil Avraham, Yan Zuo et al.
D3still: Decoupled Differential Distillation for Asymmetric Image Retrieval
Yi Xie, Yihong Lin, Wenjie Cai et al.
LiDAR-Net: A Real-scanned 3D Point Cloud Dataset for Indoor Scenes
Yanwen Guo, Yuanqi Li, Dayong Ren et al.
Delving into the Trajectory Long-tail Distribution for Muti-object Tracking
Sijia Chen, En Yu, Jinyang Li et al.
Non-autoregressive Sequence-to-Sequence Vision-Language Models
Kunyu Shi, Qi Dong, Luis Goncalves et al.
No More Ambiguity in 360° Room Layout via Bi-Layout Estimation
Yu-Ju Tsai, Jin-Cheng Jhang, JINGJING ZHENG et al.
MTLoRA: Low-Rank Adaptation Approach for Efficient Multi-Task Learning
Ahmed Agiza, Marina Neseem, Sherief Reda
HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes
Yichen Yao, Zimo Jiang, YUJING SUN et al.
Streaming Dense Video Captioning
Xingyi Zhou, Anurag Arnab, Shyamal Buch et al.
3D LiDAR Mapping in Dynamic Environments using a 4D Implicit Neural Representation
Xingguang Zhong, Yue Pan, Cyrill Stachniss et al.
SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction
Yuanhui Huang, Wenzhao Zheng, Borui Zhang et al.
On the Scalability of Diffusion-based Text-to-Image Generation
Hao Li, Yang Zou, Ying Wang et al.
Bootstrapping Autonomous Driving Radars with Self-Supervised Learning
Yiduo Hao, Sohrab Madani, Junfeng Guan et al.
Analyzing and Improving the Training Dynamics of Diffusion Models
Tero Karras, Miika Aittala, Jaakko Lehtinen et al.
DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaptation by Combining 3D GANs and Diffusion Priors
Biwen Lei, Kai Yu, Mengyang Feng et al.
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han, Kaixiong Gong, Yiyuan Zhang et al.
LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition
Zhonglin Sun, Chen Feng, Ioannis Patras et al.
See Say and Segment: Teaching LMMs to Overcome False Premises
Tsung-Han Wu, Giscard Biamby, David Chan et al.
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models
Fei Deng, Qifei Wang, Wei Wei et al.
Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection
Jongha Kim, Jihwan Park, Jinyoung Park et al.
MMVP: A Multimodal MoCap Dataset with Vision and Pressure Sensors
He Zhang, Shenghao Ren, Haolei Yuan et al.
SD2Event:Self-supervised Learning of Dynamic Detectors and Contextual Descriptors for Event Cameras
Yuan Gao, Yuqing Zhu, Xinjun Li et al.
Tuning Stable Rank Shrinkage: Aiming at the Overlooked Structural Risk in Fine-tuning
Sicong Shen, Yang Zhou, Bingzheng Wei et al.
DiSR-NeRF: Diffusion-Guided View-Consistent Super-Resolution NeRF
Jie Long Lee, Chen Li, Gim Hee Lee
Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
Junyi Ma, Xieyuanli Chen, Jiawei Huang et al.
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
Li Hu
Relightable and Animatable Neural Avatar from Sparse-View Video
Zhen Xu, Sida Peng, Chen Geng et al.
Objects as Volumes: A Stochastic Geometry View of Opaque Solids
Bailey Miller, Hanyu Chen, Alice Lai et al.
Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation
Xiao Lin, Wenfei Yang, Yuan Gao et al.
PaReNeRF: Toward Fast Large-scale Dynamic NeRF with Patch-based Reference
Xiao Tang, Min Yang, Penghui Sun et al.
PostureHMR: Posture Transformation for 3D Human Mesh Recovery
Yu-Pei Song, Xiao WU, Zhaoquan Yuan et al.
Desigen: A Pipeline for Controllable Design Template Generation
Haohan Weng, Danqing Huang, YU QIAO et al.
WANDR: Intention-guided Human Motion Generation
Markos Diomataris, Nikos Athanasiou, Omid Taheri et al.
WWW: A Unified Framework for Explaining What Where and Why of Neural Networks by Interpretation of Neuron Concepts
Yong Hyun Ahn, Hyeon Bae Kim, Seong Tae Kim
Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval
Minkuk Kim, Hyeon Bae Kim, Jinyoung Moon et al.
Rich Human Feedback for Text-to-Image Generation
Youwei Liang, Junfeng He, Gang Li et al.
Dr. Bokeh: DiffeRentiable Occlusion-aware Bokeh Rendering
Yichen Sheng, Zixun Yu, Lu Ling et al.
SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation
Yuxuan Zhang, Yiren Song, Jiaming Liu et al.
SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting
Hoon Kim, Minje Jang, Wonjun Yoon et al.
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Shuyang Sun, Runjia Li, Philip H.S. Torr et al.
Learning from Observer Gaze: Zero-Shot Attention Prediction Oriented by Human-Object Interaction Recognition
Yuchen Zhou, Linkai Liu, Chao Gou
Super-Resolution Reconstruction from Bayer-Pattern Spike Streams
Yanchen Dong, Ruiqin Xiong, Jian Zhang et al.
Image Neural Field Diffusion Models
Yinbo Chen, Oliver Wang, Richard Zhang et al.
Denoising Point Clouds in Latent Space via Graph Convolution and Invertible Neural Network
Aihua Mao, Biao Yan, Zijing Ma et al.
Dual-View Visual Contextualization for Web Navigation
Jihyung Kil, Chan Hee Song, Boyuan Zheng et al.
Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation
Haojie Zhang, Yongyi Su, Xun Xu et al.
Language-guided Image Reflection Separation
Haofeng Zhong, Yuchen Hong, Shuchen Weng et al.
SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation
Thuan Nguyen, Anh Tran
Looking 3D: Anomaly Detection with 2D-3D Alignment
Ankan Kumar Bhunia, Changjian Li, Hakan Bilen
EventPS: Real-Time Photometric Stereo Using an Event Camera
Bohan Yu, Jieji Ren, Jin Han et al.
Diffusion Handles Enabling 3D Edits for Diffusion Models by Lifting Activations to 3D
Karran Pandey, Paul Guerrero, Matheus Gadelha et al.
Circuit Design and Efficient Simulation of Quantum Inner Product and Empirical Studies of Its Effect on Near-Term Hybrid Quantum-Classic Machine Learning
Hao Xiong, Yehui Tang, Xinyu Ye et al.
Uncovering What Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly
Hang Du, Sicheng Zhang, Binzhu Xie et al.
CrowdDiff: Multi-hypothesis Crowd Density Estimation using Diffusion Models
Yasiru Ranasinghe, Nithin Gopalakrishnan Nair, Wele Gedara Chaminda Bandara et al.
PSDPM: Prototype-based Secondary Discriminative Pixels Mining for Weakly Supervised Semantic Segmentation
Xinqiao Zhao, Ziqian Yang, Tianhong Dai et al.
Towards 3D Vision with Low-Cost Single-Photon Cameras
Fangzhou Mu, Carter Sifferman, Sacha Jungerman et al.
Shadows Don't Lie and Lines Can't Bend! Generative Models don't know Projective Geometry...for now
Ayush Sarkar, Hanlin Mai, Amitabh Mahapatra et al.
Aligning Logits Generatively for Principled Black-Box Knowledge Distillation
Jing Ma, Xiang Xiang, Ke Wang et al.
Semantically-Shifted Incremental Adapter-Tuning is A Continual ViTransformer
Yuwen Tan, Qinhao Zhou, Xiang Xiang et al.
Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models
David Stotko, Nils Wandel, Reinhard Klein
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
Tong Wu, Guandao Yang, Zhibing Li et al.
On Train-Test Class Overlap and Detection for Image Retrieval
Chull Hwan Song, Jooyoung Yoon, Taebaek Hwang et al.
Orthogonal Adaptation for Modular Customization of Diffusion Models
Ryan Po, Guandao Yang, Kfir Aberman et al.
Permutation Equivariance of Transformers and Its Applications
Hengyuan Xu, Liyao Xiang, Hangyu Ye et al.
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jieneng Chen, Qihang Yu, Xiaohui Shen et al.
HomoFormer: Homogenized Transformer for Image Shadow Removal
Jie Xiao, Xueyang Fu, Yurui Zhu et al.
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
Yiming Zhang, Zhening Xing, Yanhong Zeng et al.
DemoCaricature: Democratising Caricature Generation with a Rough Sketch
Dar-Yen Chen, Ayan Kumar Bhunia, Subhadeep Koley et al.
End-to-End Spatio-Temporal Action Localisation with Video Transformers
Alexey Gritsenko, Xuehan Xiong, Josip Djolonga et al.
TRINS: Towards Multimodal Language Models that Can Read
Ruiyi Zhang, Yanzhe Zhang, Jian Chen et al.
EFHQ: Multi-purpose ExtremePose-Face-HQ dataset
Trung Dao, Duc H Vu, Cuong Pham et al.
Logarithmic Lenses: Exploring Log RGB Data for Image Classification
Bruce Maxwell, Sumegha Singhania, Avnish Patel et al.
Unlocking Pre-trained Image Backbones for Semantic Image Synthesis
Tariq Berrada, Jakob Verbeek, camille couprie et al.
TokenCompose: Text-to-Image Diffusion with Token-level Supervision
Zirui Wang, Zhizhou Sha, Zheng Ding et al.
Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates
Ka Chun SHUM, Jaeyeon Kim, Binh-Son Hua et al.
Infer from What You Have Seen Before: Temporally-dependent Classifier for Semi-supervised Video Segmentation
Jiafan Zhuang, Zilei Wang, Yixin Zhang et al.
Seeing the World through Your Eyes
Hadi Alzayer, Kevin Zhang, Brandon Y. Feng et al.
Learning Vision from Models Rivals Learning Vision from Data
Yonglong Tian, Lijie Fan, Kaifeng Chen et al.
RegionGPT: Towards Region Understanding Vision Language Model
Qiushan Guo, Shalini De Mello, Danny Yin et al.
Unlocking the Potential of Pre-trained Vision Transformers for Few-Shot Semantic Segmentation through Relationship Descriptors
Ziqin Zhou, Hai-Ming Xu, Yangyang Shu et al.
Holo-Relighting: Controllable Volumetric Portrait Relighting from a Single Image
Yiqun Mei, Yu Zeng, He Zhang et al.
Relightful Harmonization: Lighting-aware Portrait Background Replacement
Mengwei Ren, Wei Xiong, Jae Shin Yoon et al.
Taming the Tail in Class-Conditional GANs: Knowledge Sharing via Unconditional Training at Lower Resolutions
Saeed Khorram, Mingqi Jiang, Mohamad Shahbazi et al.
Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences
Axel Barroso-Laguna, Sowmya Munukutla, Victor Adrian Prisacariu et al.
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Xiang Yue, Yuansheng Ni, Kai Zhang et al.
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Yanhui Wang, Jianmin Bao, Wenming Weng et al.
MorpheuS: Neural Dynamic 360° Surface Reconstruction from Monocular RGB-D Video
Hengyi Wang, Jingwen Wang, Lourdes Agapito
Capturing Closely Interacted Two-Person Motions with Reaction Priors
Qi Fang, Yinghui Fan, Yanjun Li et al.
LightOctree: Lightweight 3D Spatially-Coherent Indoor Lighting Estimation
Xuecan Wang, Shibang Xiao, Xiaohui Liang
Bi-level Learning of Task-Specific Decoders for Joint Registration and One-Shot Medical Image Segmentation
Xin Fan, Xiaolin Wang, Jiaxin Gao et al.
FairCLIP: Harnessing Fairness in Vision-Language Learning
Yan Luo, MIN SHI, Muhammad Osama Khan et al.
Navigate Beyond Shortcuts: Debiased Learning Through the Lens of Neural Collapse
Yining Wang, Junjie Sun, Chenyue Wang et al.
DiVa-360: The Dynamic Visual Dataset for Immersive Neural Fields
Cheng-You Lu, Peisen Zhou, Angela Xing et al.
Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
Mark Hamilton, Andrew Zisserman, John Hershey et al.
Learning Visual Prompt for Gait Recognition
Kang Ma, Ying Fu, Chunshui Cao et al.
ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models
Xinyu Tian, Shu Zou, Zhaoyuan Yang et al.
PolarRec: Improving Radio Interferometric Data Reconstruction Using Polar Coordinates
Ruoqi Wang, Zhuoyang Chen, Jiayi Zhu et al.
LDP: Language-driven Dual-Pixel Image Defocus Deblurring Network
Hao Yang, Liyuan Pan, Yan Yang et al.
StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN
Jongwoo Choi, Kwanggyoon Seo, Amirsaman Ashtari et al.
MS-MANO: Enabling Hand Pose Tracking with Biomechanical Constraints
Pengfei Xie, Wenqiang Xu, Tutian Tang et al.
CGI-DM: Digital Copyright Authentication for Diffusion Models via Contrasting Gradient Inversion
Xiaoyu Wu, Yang Hua, Chumeng Liang et al.
Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation
Ming Xu, Stephen Gould
ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization
Weiyao Wang, Pierre Gleize, Hao Tang et al.
Learning Large-Factor EM Image Super-Resolution with Generative Priors
Jiateng Shou, Zeyu Xiao, Shiyu Deng et al.
PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion
Ying-Tian Liu, Yuan-Chen Guo, Guan Luo et al.
Learning for Transductive Threshold Calibration in Open-World Recognition
Qin ZHANG, DONGSHENG An, Tianjun Xiao et al.
Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation
Zhiwei Yang, Kexue Fu, Minghong Duan et al.
Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence
Junyi Zhang, Charles Herrmann, Junhwa Hur et al.
Amodal Ground Truth and Completion in the Wild
Guanqi Zhan, Chuanxia Zheng, Weidi Xie et al.
MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding
Chun-Peng Chang, Shaoxiang Wang, Alain Pagani et al.
Seeing Unseen: Discover Novel Biomedical Concepts via Geometry-Constrained Probabilistic Modeling
Jianan Fan, Dongnan Liu, Hang Chang et al.
Real-Time Exposure Correction via Collaborative Transformations and Adaptive Sampling
Ziwen Li, Feng Zhang, Meng Cao et al.
Gaussian Splatting SLAM
Hidenobu Matsuki, Riku Murai, Paul Kelly et al.
PromptCoT: Align Prompt Distribution via Adapted Chain-of-Thought
Junyi Yao, Yijiang Liu, Zhen Dong et al.
NeLF-Pro: Neural Light Field Probes for Multi-Scale Novel View Synthesis
Zinuo You, Andreas Geiger, Anpei Chen
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction
Guillaume Jaume, Anurag Vaidya, Richard J. Chen et al.
Practical Measurements of Translucent Materials with Inter-Pixel Translucency Prior
Zhenyu Chen, Jie Guo, Shuichang Lai et al.
RTracker: Recoverable Tracking via PN Tree Structured Memory
Yuqing Huang, Xin Li, Zikun Zhou et al.
View-Category Interactive Sharing Transformer for Incomplete Multi-View Multi-Label Learning
Shilong Ou, Zhe Xue, Yawen Li et al.
Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models
Yabin Zhang, Wenjie Zhu, Hui Tang et al.