2024 Highlight Papers
324 papers found • Page 1 of 7
3D Face Reconstruction with the Geometric Guidance of Facial Part Segmentation
Zidu Wang, Xiangyu Zhu, Tianshuo Zhang et al.
3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos
Jiakai Sun, Han Jiao, Guangyuan Li et al.
3D Human Pose Perception from Egocentric Stereo Videos
Hiroyasu Akada, Jian Wang, Vladislav Golyanik et al.
3DInAction: Understanding Human Actions in 3D Point Clouds
Yizhak Ben-Shabat, Oren Shrout, Stephen Gould
4D-DRESS: A 4D Dataset of Real-World Human Clothing With Semantic Annotations
Wenbo Wang, Hsuan-I Ho, Chen Guo et al.
Abductive Ego-View Accident Video Understanding for Safe Driving Perception
Jianwu Fang, Lei-lei Li, Junfei Zhou et al.
Absolute Pose from One or Two Scaled and Oriented Features
Jonathan Ventura, Zuzana Kukelova, Torsten Sattler et al.
Accept the Modality Gap: An Exploration in the Hyperbolic Space
Sameera Ramasinghe, Violetta Shevchenko, Gil Avraham et al.
Active Domain Adaptation with False Negative Prediction for Object Detection
Yuzuru Nakamura, Yasunori Ishii, Takayoshi Yamashita
Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images
Chaoqin Huang, Aofan Jiang, Jinghao Feng et al.
AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis
Tao Tang, Guangrun Wang, Yixing Lao et al.
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models
Huan Ling, Seung Wook Kim, Antonio Torralba et al.
Amodal Completion via Progressive Mixed Context Diffusion
Katherine Xu, Lingzhi Zhang, Jianbo Shi
A Pedestrian is Worth One Prompt: Towards Language Guidance Person Re-Identification
Zexian Yang, Dayan Wu, Chenming Wu et al.
Are Conventional SNNs Really Efficient? A Perspective from Network Quantization
Guobin Shen, Dongcheng Zhao, Tenglong Li et al.
Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion
Fan Zhang, Shaodi You, Yu Li et al.
Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting
Taeho Kang, Youngki Lee
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
Jeongsoo Choi, Se Jin Park, Minsu Kim et al.
A Vision Check-up for Language Models
Pratyusha Sharma, Tamar Rott Shaham, Manel Baradad et al.
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning
Siyuan Liang, Mingli Zhu, Aishan Liu et al.
Bayes' Rays: Uncertainty Quantification for Neural Radiance Fields
Leili Goli, Cody Reading, Silvia Sellán et al.
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
Yunhao Ge, Yihe Tang, Jiashu Xu et al.
Boosting Neural Representations for Videos with a Conditional Decoder
XINJIE ZHANG, Ren Yang, Dailan He et al.
Brain Decodes Deep Nets
Huzheng Yang, James Gee, Jianbo Shi
Breathing Life Into Sketches Using Text-to-Video Priors
Rinon Gal, Yael Vinker, Yuval Alaluf et al.
C2KD: Bridging the Modality Gap for Cross-Modal Knowledge Distillation
Fushuo Huo, Wenchao Xu, Jingcai Guo et al.
CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided Attention
Mohammad Sadil Khan, Elona Dupont, Sk Aziz Ali et al.
CADTalk: An Algorithm and Benchmark for Semantic Commenting of CAD Programs
Haocheng Yuan, Jing Xu, Hao Pan et al.
Can I Trust Your Answer? Visually Grounded Video Question Answering
Junbin Xiao, Angela Yao, Yicong Li et al.
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
Seokju Cho, Heeseong Shin, Sunghwan Hong et al.
Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models
Shitian Zhao, Zhuowan Li, YadongLu et al.
CFAT: Unleashing Triangular Windows for Image Super-resolution
Abhisek Ray, Gaurav Kumar, Maheshkumar Kolekar
CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing
Ajian Liu, Shuai Xue, Gan Jianwen et al.
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Peng Jin, Ryuichi Takanobu, Cai Zhang et al.
CLiC: Concept Learning in Context
Mehdi Safaee, Aryan Mikaeili, Or Patashnik et al.
CLIP-Driven Open-Vocabulary 3D Scene Graph Generation via Cross-Modality Contrastive Learning
Lianggangxu Chen, Xuejiao Wang, Jiale Lu et al.
Clockwork Diffusion: Efficient Generation With Model-Step Distillation
Amirhossein Habibian, Amir Ghodrati, Noor Fathima et al.
Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis
Yanzuo Lu, Manlin Zhang, Jinhua Ma et al.
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
Hao Ouyang, Qiuyu Wang, Yuxi Xiao et al.
CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation
Zineng Tang, Ziyi Yang, MAHMOUD KHADEMI et al.
CogAgent: A Visual Language Model for GUI Agents
Wenyi Hong, Weihan Wang, Qingsong Lv et al.
Coherence As Texture – Passive Textureless 3D Reconstruction by Self-interference
Wei-Yu Chen, Aswin C. Sankaranarayanan, Anat Levin et al.
COLMAP-Free 3D Gaussian Splatting
Yang Fu, Sifei Liu, Amey Kulkarni et al.
Compact 3D Gaussian Representation for Radiance Field
Joo Chan Lee, Daniel Rho, Xiangyu Sun et al.
Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning
Yiwen Ye, Yutong Xie, Jianpeng Zhang et al.
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
Qi Yang, Xing Nie, Tong Li et al.
CoralSCOP: Segment any COral Image on this Planet
Zheng Ziqiang, Liang Haixin, Binh-Son Hua et al.
Correcting Diffusion Generation through Resampling
Yujian Liu, Yang Zhang, Tommi Jaakkola et al.
Correspondence-Free Non-Rigid Point Set Registration Using Unsupervised Clustering Analysis
Mingyang Zhao, Jiang Jingen, Lei Ma et al.
CosmicMan: A Text-to-Image Foundation Model for Humans
Shikai Li, Jianglin Fu, Kaiyuan Liu et al.