Most Cited 2024 Poster Papers
12,324 papers found • Page 55 of 62
Conference
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
Tao Wu, Runyu He, Gangshan Wu et al.
Classes Are Not Equal: An Empirical Study on Image Recognition Fairness
Jiequan Cui, Beier Zhu, Xin Wen et al.
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object
Chenshuang Zhang, Fei Pan, Junmo Kim et al.
BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition
Yuxuan Zhou, Xudong Yan, Zhi-Qi Cheng et al.
Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors
Yu Zhang, Songpengcheng Xia, Lei Chu et al.
Person-in-WiFi 3D: End-to-End Multi-Person 3D Pose Estimation with Wi-Fi
Kangwei Yan, Fei Wang, Bo Qian et al.
ERMVP: Communication-Efficient and Collaboration-Robust Multi-Vehicle Perception in Challenging Environments
Jingyu Zhang, Kun Yang, Yilei Wang et al.
GRAM: Global Reasoning for Multi-Page VQA
Itshak Blau, Sharon Fogel, Roi Ronen et al.
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data
Qifan Yu, Juncheng Li, Longhui Wei et al.
Tri-Perspective View Decomposition for Geometry-Aware Depth Completion
Zhiqiang Yan, Yuankai Lin, Kun Wang et al.
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
Haokun Lin, Haoli Bai, Zhili Liu et al.
DiffusionRegPose: Enhancing Multi-Person Pose Estimation using a Diffusion-Based End-to-End Regression Approach
Dayi Tan, Hansheng Chen, Wei Tian et al.
Tumor Micro-environment Interactions Guided Graph Learning for Survival Analysis of Human Cancers from Whole-slide Pathological Images
WEI SHAO, YangYang Shi, Daoqiang Zhang et al.
Perception-Oriented Video Frame Interpolation via Asymmetric Blending
Guangyang Wu, Xin Tao, Changlin Li et al.
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
Qi Yang, Xing Nie, Tong Li et al.
Exact Fusion via Feature Distribution Matching for Few-shot Image Generation
Yingbo Zhou, Yutong Ye, Pengyu Zhang et al.
Fooling Polarization-Based Vision using Locally Controllable Polarizing Projection
Zhuoxiao Li, Zhihang Zhong, Shohei Nobuhara et al.
Affine Equivariant Networks Based on Differential Invariants
Yikang Li, Yeqing Qiu, Yuxuan Chen et al.
Diffusion-based Blind Text Image Super-Resolution
Yuzhe Zhang, jiawei zhang, Hao Li et al.
Improving Generalized Zero-Shot Learning by Exploring the Diverse Semantics from External Class Names
Yapeng Li, Yong Luo, Zengmao Wang et al.
Continual Learning for Motion Prediction Model via Meta-Representation Learning and Optimal Memory Buffer Retention Strategy
Dae Jun Kang, Dongsuk Kum, Sanmin Kim
FlowDiffuser: Advancing Optical Flow Estimation with Diffusion Models
Ao Luo, XIN LI, Fan Yang et al.
3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting
Zhiyin Qian, Shaofei Wang, Marko Mihajlovic et al.
Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation
Haofeng Liu, Chenshu Xu, Yifei Yang et al.
AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring
Xintian Mao, Xiwen Gao, Yan Wang
Puff-Net: Efficient Style Transfer with Pure Content and Style Feature Fusion Network
Sizhe Zheng, Pan Gao, Peng Zhou et al.
SynSP: Synergy of Smoothness and Precision in Pose Sequences Refinement
Tao Wang, Lei Jin, Zheng Wang et al.
Building Vision-Language Models on Solid Foundations with Masked Distillation
Sepehr Sameni, Kushal Kafle, Hao Tan et al.
MS-DETR: Efficient DETR Training with Mixed Supervision
Chuyang Zhao, Yifan Sun, Wenhao Wang et al.
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Pengchong Qiao, Lei Shang, Chang Liu et al.
Towards High-fidelity Artistic Image Vectorization via Texture-Encapsulated Shape Parameterization
Ye Chen, Bingbing Ni, Jinfan Liu et al.
OmniSDF: Scene Reconstruction using Omnidirectional Signed Distance Functions and Adaptive Binoctrees
Hakyeong Kim, Andreas Meuleman, Hyeonjoong Jang et al.
Deformable One-shot Face Stylization via DINO Semantic Guidance
Yang Zhou, Zichong Chen, Hui Huang
Density-Guided Semi-Supervised 3D Semantic Segmentation with Dual-Space Hardness Sampling
Jianan Li, Qiulei Dong
Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework
Vu Minh Hieu Phan, Yutong Xie, Yuankai Qi et al.
LP++: A Surprisingly Strong Linear Probe for Few-Shot CLIP
Yunshi HUANG, Fereshteh Shakeri, Jose Dolz et al.
1-Lipschitz Layers Compared: Memory Speed and Certifiable Robustness
Bernd Prach, Fabio Brau, Giorgio Buttazzo et al.
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
Hyeonho Jeong, Geon Yeong Park, Jong Chul Ye
PoNQ: a Neural QEM-based Mesh Representation
Nissim Maruani, Maks Ovsjanikov, Pierre Alliez et al.
M3-UDA: A New Benchmark for Unsupervised Domain Adaptive Fetal Cardiac Structure Detection
Bin Pu, Liwen Wang, Jiewen Yang et al.
Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning
Da-Wei Zhou, Hai-Long Sun, Han-Jia Ye et al.
Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual Loss
Jaeha Kim, Junghun Oh, Kyoung Mu Lee
Point-VOS: Pointing Up Video Object Segmentation
Sabarinath Mahadevan, Idil Esen Zulfikar, Paul Voigtlaender et al.
3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow
Felix Taubner, Prashant Raina, Mathieu Tuli et al.
HIT: Estimating Internal Human Implicit Tissues from the Body Surface
Marilyn Keller, Vaibhav ARORA, Abdelmouttaleb Dakri et al.
Authentic Hand Avatar from a Phone Scan via Universal Hand Model
Gyeongsik Moon, Weipeng Xu, Rohan Joshi et al.
Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection
Jin Yang, Ping Wei, Huan Li et al.
Multiway Point Cloud Mosaicking with Diffusion and Global Optimization
Shengze Jin, Iro Armeni, Marc Pollefeys et al.
NeRSP: Neural 3D Reconstruction for Reflective Objects with Sparse Polarized Images
Yufei Han, Heng Guo, Koki Fukai et al.
HDRFlow: Real-Time HDR Video Reconstruction with Large Motions
Gangwei Xu, Yujin Wang, Jinwei Gu et al.
Beyond Average: Individualized Visual Scanpath Prediction
Xianyu Chen, Ming Jiang, Qi Zhao
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension
Lei Zhu, Fangyun Wei, Yanye Lu
LEDITS++: Limitless Image Editing using Text-to-Image Models
Manuel Brack, Felix Friedrich, Katharina Kornmeier et al.
CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective
Shunsuke Yasuki, Masato Taki
Regularized Parameter Uncertainty for Improving Generalization in Reinforcement Learning
Pehuen Moure, Longbiao Cheng, Joachim Ott et al.
Robust Noisy Correspondence Learning with Equivariant Similarity Consistency
Yuchen Yang, Erkun Yang, Likai Wang et al.
Situational Awareness Matters in 3D Vision Language Reasoning
Yunze Man, Liang-Yan Gui, Yu-Xiong Wang
Decentralized Directed Collaboration for Personalized Federated Learning
Yingqi Liu, Yifan Shi, Qinglun Li et al.
Task-Driven Wavelets using Constrained Empirical Risk Minimization
Eric Marcus, Ray Sheombarsing, Jan-Jakob Sonke et al.
SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction
Zechuan Zhang, Zongxin Yang, Yi Yang
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning
Siddharth Srivastava, Gaurav Sharma
Probing Synergistic High-Order Interaction in Infrared and Visible Image Fusion
Naishan Zheng, Man Zhou, Jie Huang et al.
Scaling Up Dynamic Human-Scene Interaction Modeling
Nan Jiang, Zhiyuan Zhang, Hongjie Li et al.
Utility-Fairness Trade-Offs and How to Find Them
Sepehr Dehdashtian, Bashir Sadeghi, Vishnu Naresh Boddeti
Data-Free Quantization via Pseudo-label Filtering
Chunxiao Fan, Ziqi Wang, Dan Guo et al.
Fitting Flats to Flats
Gabriel Dogadov, Ugo Finnendahl, Marc Alexa
HOIST-Former: Hand-held Objects Identification Segmentation and Tracking in the Wild
Supreeth Narasimhaswamy, Huy Anh Nguyen, Lihan Huang et al.
Animating General Image with Large Visual Motion Model
Dengsheng Chen, Xiaoming Wei, Xiaolin Wei
MetaCloak: Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning
Yixin Liu, Chenrui Fan, Yutong Dai et al.
EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams
Christen Millerdurai, Hiroyasu Akada, Jian Wang et al.
ModaVerse: Efficiently Transforming Modalities with LLMs
Xinyu Wang, Bohan Zhuang, Qi Wu
Improving Generalization via Meta-Learning on Hard Samples
Nishant Jain, Arun Suggala, Pradeep Shenoy
WaveFace: Authentic Face Restoration with Efficient Frequency Recovery
Yunqi Miao, Jiankang Deng, Jungong Han
Hierarchical Histogram Threshold Segmentation – Auto-terminating High-detail Oversegmentation
Thomas Chang, Simon Seibt, Bartosz von Rymon Lipinski
CogAgent: A Visual Language Model for GUI Agents
Wenyi Hong, Weihan Wang, Qingsong Lv et al.
Learning Adaptive Spatial Coherent Correlations for Speech-Preserving Facial Expression Manipulation
Tianshui Chen, Jianman Lin, Zhijing Yang et al.
UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and Unfavorable Sets
Youngju Na, Woo Jae Kim, Kyu Han et al.
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Lihe Yang, Bingyi Kang, Zilong Huang et al.
EventDance: Unsupervised Source-free Cross-modal Adaptation for Event-based Object Recognition
Xu Zheng, Addison, Lin Wang
Real-World Efficient Blind Motion Deblurring via Blur Pixel Discretization
Insoo Kim, Jae Seok Choi, Geonseok Seo et al.
Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers
Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim
BANF: Band-Limited Neural Fields for Levels of Detail Reconstruction
Ahan Shabanov, Shrisudhan Govindarajan, Cody Reading et al.
HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting
Hongyu Zhou, Jiahao Shao, Lu Xu et al.
Human Motion Prediction Under Unexpected Perturbation
Jiangbei Yue, Baiyi Li, Julien Pettré et al.
LLMs are Good Action Recognizers
Haoxuan Qu, Yujun Cai, Jun Liu
SynFog: A Photo-realistic Synthetic Fog Dataset based on End-to-end Imaging Simulation for Advancing Real-World Defogging in Autonomous Driving
Yiming Xie, Henglu Wei, Zhenyi Liu et al.
NeRFiller: Completing Scenes via Generative 3D Inpainting
Ethan Weber, Aleksander Holynski, Varun Jampani et al.
PeVL: Pose-Enhanced Vision-Language Model for Fine-Grained Human Action Recognition
Haosong Zhang, Mei Leong, Liyuan Li et al.
MPOD123: One Image to 3D Content Generation Using Mask-enhanced Progressive Outline-to-Detail Optimization
Jimin Xu, Tianbao Wang, Tao Jin et al.
Look-Up Table Compression for Efficient Image Restoration
Yinglong Li, Jiacheng Li, Zhiwei Xiong
Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
Wenhao Li, Mengyuan Liu, Hong Liu et al.
PAPR in Motion: Seamless Point-level 3D Scene Interpolation
Shichong Peng, Yanshu Zhang, Ke Li
Towards Modern Image Manipulation Localization: A Large-Scale Dataset and Novel Methods
Chenfan Qu, Yiwu Zhong, Chongyu Liu et al.
Dense Vision Transformer Compression with Few Samples
Hanxiao Zhang, Yifan Zhou, Guo-Hua Wang
IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing
Shaofei Wang, Bozidar Antic, Andreas Geiger et al.
Exploring Pose-Aware Human-Object Interaction via Hybrid Learning
EASTMAN Z Y WU, Yali Li, Yuan Wang et al.
All in One Framework for Multimodal Re-identification in the Wild
He Li, Mang Ye, Ming Zhang et al.
Bilateral Adaptation for Human-Object Interaction Detection with Occlusion-Robustness
Guangzhi Wang, Yangyang Guo, Ziwei Xu et al.
TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model
Hantao Yao, Rui Zhang, Changsheng Xu
RMT: Retentive Networks Meet Vision Transformers
Qihang Fan, Huaibo Huang, Mingrui Chen et al.
FedSOL: Stabilized Orthogonal Learning with Proximal Restrictions in Federated Learning
Gihun Lee, Minchan Jeong, SangMook Kim et al.
Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs
Lin Song, Yukang Chen, Shuai Yang et al.
LAENeRF: Local Appearance Editing for Neural Radiance Fields
Lukas Radl, Michael Steiner, Andreas Kurz et al.
Don't Look into the Dark: Latent Codes for Pluralistic Image Inpainting
Haiwei Chen, Yajie Zhao
Improved Visual Grounding through Self-Consistent Explanations
Ruozhen He, Paola Cascante-Bonilla, Ziyan Yang et al.
On the Faithfulness of Vision Transformer Explanations
Junyi Wu, Weitai Kang, Hao Tang et al.
CHAIN: Enhancing Generalization in Data-Efficient GANs via lipsCHitz continuity constrAIned Normalization
Yao Ni, Piotr Koniusz
OneFormer3D: One Transformer for Unified Point Cloud Segmentation
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin et al.
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion
Minghua Liu, Ruoxi Shi, Linghao Chen et al.
C2KD: Bridging the Modality Gap for Cross-Modal Knowledge Distillation
Fushuo Huo, Wenchao Xu, Jingcai Guo et al.
StrokeFaceNeRF: Stroke-based Facial Appearance Editing in Neural Radiance Field
Xiao-juan Li, Dingxi Zhang, Shu-Yu Chen et al.
Neural Modes: Self-supervised Learning of Nonlinear Modal Subspaces
Jiahong Wang, Yinwei DU, Stelian Coros et al.
WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion
Soyong Shin, Juyong Kim, Eni Halilaj et al.
CLOAF: CoLlisiOn-Aware Human Flow
Andrey Davydov, Martin Engilberge, Mathieu Salzmann et al.
FedUV: Uniformity and Variance for Heterogeneous Federated Learning
Ha Min Son, Moon-Hyun Kim, Tai-Myoung Chung et al.
Learning Occupancy for Monocular 3D Object Detection
Liang Peng, Junkai Xu, Haoran Cheng et al.
Bring Event into RGB and LiDAR: Hierarchical Visual-Motion Fusion for Scene Flow
Hanyu Zhou, Yi Chang, Zhiwei Shi
Language-driven Grasp Detection
An Dinh Vuong, Minh Nhat VU, Baoru Huang et al.
Each Test Image Deserves A Specific Prompt: Continual Test-Time Adaptation for 2D Medical Image Segmentation
Ziyang Chen, Yongsheng Pan, Yiwen Ye et al.
Abductive Ego-View Accident Video Understanding for Safe Driving Perception
Jianwu Fang, Lei-lei Li, Junfei Zhou et al.
Prompting Vision Foundation Models for Pathology Image Analysis
CHONG YIN, Siqi Liu, Kaiyang Zhou et al.
Unmixing Before Fusion: A Generalized Paradigm for Multi-Source-based Hyperspectral Image Synthesis
Yang Yu, Erting Pan, Xinya Wang et al.
Navigating Beyond Dropout: An Intriguing Solution towards Generalizable Image Super Resolution
Hongjun Wang, Jiyuan Chen, Yinqiang Zheng et al.
Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch
Xidong Wu, Shangqian Gao, Zeyu Zhang et al.
SCINeRF: Neural Radiance Fields from a Snapshot Compressive Image
Yunhao Li, Xiaodong Wang, Ping Wang et al.
Learning to Control Camera Exposure via Reinforcement Learning
Kyunghyun Lee, Ukcheol Shin, Byeong-Uk Lee
Regressor-Segmenter Mutual Prompt Learning for Crowd Counting
Mingyue Guo, Li Yuan, Zhaoyi Yan et al.
Vector Graphics Generation via Mutually Impulsed Dual-domain Diffusion
Zhongyin Zhao, Ye Chen, Zhangli Hu et al.
Spectral Meets Spatial: Harmonising 3D Shape Matching and Interpolation
Dongliang Cao, Marvin Eisenberger, Nafie El Amrani et al.
FocSAM: Delving Deeply into Focused Objects in Segmenting Anything
You Huang, Zongyu Lan, Liujuan Cao et al.
Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior
Chen Cheng, Xiaofeng Yang, Fan Yang et al.
Learning to Transform Dynamically for Better Adversarial Transferability
Rongyi Zhu, Zeliang Zhang, Susan Liang et al.
Spin-UP: Spin Light for Natural Light Uncalibrated Photometric Stereo
Zongrui Li, Zhan Lu, Haojie Yan et al.
SEAS: ShapE-Aligned Supervision for Person Re-Identification
Haidong Zhu, Pranav Budhwant, Zhaoheng Zheng et al.
Learning to Select Views for Efficient Multi-View Understanding
Yunzhong Hou, Stephen Gould, Liang Zheng
LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes
Shanlin Sun, Bingbing Zhuang, Ziyu Jiang et al.
Task2Box: Box Embeddings for Modeling Asymmetric Task Relationships
Rangel Daroya, Aaron Sun, Subhransu Maji
UniGS: Unified Representation for Image Generation and Segmentation
Lu Qi, Lehan Yang, Weidong Guo et al.
ExMap: Leveraging Explainability Heatmaps for Unsupervised Group Robustness to Spurious Correlations
Rwiddhi Chakraborty, Adrian de Sena Sletten, Michael C. Kampffmeyer
DUDF: Differentiable Unsigned Distance Fields with Hyperbolic Scaling
Miguel Fainstein, Viviana Siless, Emmanuel Iarussi
UV-IDM: Identity-Conditioned Latent Diffusion Model for Face UV-Texture Generation
Hong Li, Yutang Feng, Song Xue et al.
PBWR: Parametric-Building-Wireframe Reconstruction from Aerial LiDAR Point Clouds
Shangfeng Huang, Ruisheng Wang, Bo Guo et al.
GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation Demonstration and Imitation
Zifan Wang, Junyu Chen, Ziqing Chen et al.
Content-Style Decoupling for Unsupervised Makeup Transfer without Generating Pseudo Ground Truth
Zhaoyang Sun, Shengwu Xiong, Yaxiong Chen et al.
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Yanwu Xu, Yang Zhao, Zhisheng Xiao et al.
SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation
Keqi Chen, vinkle srivastav, Nicolas Padoy
Context-Aware Integration of Language and Visual References for Natural Language Tracking
Yanyan Shao, Shuting He, Qi Ye et al.
Open-Vocabulary Segmentation with Semantic-Assisted Calibration
Yong Liu, Sule Bai, Guanbin Li et al.
Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts
Cansu Korkmaz, Ahmet Murat Tekalp, Zafer Dogan
RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception
Ruiyang Hao, Siqi Fan, Yingru Dai et al.
Task-Customized Mixture of Adapters for General Image Fusion
Pengfei Zhu, Yang Sun, Bing Cao et al.
PointBeV: A Sparse Approach for BeV Predictions
Loick Chambon, Éloi Zablocki, Mickaël Chen et al.
Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution
Zhikai Chen, Fuchen Long, Zhaofan Qiu et al.
Ensemble Diversity Facilitates Adversarial Transferability
Bowen Tang, Zheng Wang, Yi Bin et al.
CFAT: Unleashing Triangular Windows for Image Super-resolution
Abhisek Ray, Gaurav Kumar, Maheshkumar Kolekar
Convolutional Prompting meets Language Models for Continual Learning
Anurag Roy, Riddhiman Moulick, Vinay Verma et al.
Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning
Zihua Zhao, Mengxi Chen, Tianjie Dai et al.
Contextual Augmented Global Contrast for Multimodal Intent Recognition
Kaili Sun, Zhiwen Xie, Mang Ye et al.
Intraoperative 2D/3D Image Registration via Differentiable X-ray Rendering
Vivek Gopalakrishnan, Neel Dey, Polina Golland
Relaxed Contrastive Learning for Federated Learning
Seonguk Seo, Jinkyu Kim, Geeho Kim et al.
EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
Jiaxuan Li, Duc Minh Vo, Akihiro Sugimoto et al.
LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation
Kibum Kim, Kanghoon Yoon, Jaehyeong Jeon et al.
Geometrically-driven Aggregation for Zero-shot 3D Point Cloud Understanding
Guofeng Mei, Luigi Riz, Yiming Wang et al.
Beyond Textual Constraints: Learning Novel Diffusion Conditions with Fewer Examples
Yuyang Yu, Bangzhen Liu, Chenxi Zheng et al.
Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations
Daan de Geus, Gijs Dubbelman
FreeKD: Knowledge Distillation via Semantic Frequency Prompt
Yuan Zhang, Tao Huang, Jiaming Liu et al.
Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning
Wei Zhang, Chaoqun Wan, Tongliang Liu et al.
SNIDA: Unlocking Few-Shot Object Detection with Non-linear Semantic Decoupling Augmentation
Yanjie Wang, Xu Zou, Luxin Yan et al.
Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching
Peng Xu, Zhiyu Xiang, Chengyu Qiao et al.
Driving-Video Dehazing with Non-Aligned Regularization for Safety Assistance
Junkai Fan, Jiangwei Weng, Kun Wang et al.
Exploring Region-Word Alignment in Built-in Detector for Open-Vocabulary Object Detection
Heng Zhang, Qiuyu Zhao, Linyu Zheng et al.
L0-Sampler: An L0 Model Guided Volume Sampling for NeRF
Liangchen Li, Juyong Zhang
Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features
Niladri Shekhar Dutt, Sanjeev Muralikrishnan, Niloy J. Mitra
Unsupervised Occupancy Learning from Sparse Point Cloud
Amine Ouasfi, Adnane Boukhayma
GLOW: Global Layout Aware Attacks on Object Detection
Jun Bao, Buyu Liu, Kui Ren et al.
Context-based and Diversity-driven Specificity in Compositional Zero-Shot Learning
Yun Li, Zhe Liu, Hang Chen et al.
Neural Underwater Scene Representation
Yunkai Tang, Chengxuan Zhu, Renjie Wan et al.
Scaled Decoupled Distillation
Shicai Wei, Chunbo Luo, Yang Luo
VISTA-LLAMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens
Fan Ma, Xiaojie Jin, Heng Wang et al.
Hierarchical Intra-modal Correlation Learning for Label-free 3D Semantic Segmentation
Xin Kang, Lei Chu, Jiahao Li et al.
PARA-Drive: Parallelized Architecture for Real-time Autonomous Driving
Xinshuo Weng, Boris Ivanovic, Yan Wang et al.
Towards Generalizable Tumor Synthesis
Qi Chen, Xiaoxi Chen, Haorui Song et al.
Adaptive Hyper-graph Aggregation for Modality-Agnostic Federated Learning
Fan Qi, Shuai Li
Bi-SSC: Geometric-Semantic Bidirectional Fusion for Camera-based 3D Semantic Scene Completion
Yujie Xue, Ruihui Li, F anWu et al.
Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment
Angchi Xu, Wei-Shi Zheng
Depth-Aware Concealed Crop Detection in Dense Agricultural Scenes
Liqiong Wang, Jinyu Yang, Yanfu Zhang et al.
FC-GNN: Recovering Reliable and Accurate Correspondences from Interferences
Haobo Xu, Jun Zhou, Hua Yang et al.
MoMask: Generative Masked Modeling of 3D Human Motions
chuan guo, Yuxuan Mu, Muhammad Gohar Javed et al.
CapsFusion: Rethinking Image-Text Data at Scale
Qiying Yu, Quan Sun, Xiaosong Zhang et al.
A General and Efficient Training for Transformer via Token Expansion
Wenxuan Huang, Yunhang Shen, Jiao Xie et al.
BigGait: Learning Gait Representation You Want by Large Vision Models
Dingqiang Ye, Chao Fan, Jingzhe Ma et al.
Event-based Visible and Infrared Fusion via Multi-task Collaboration
Mengyue Geng, Lin Zhu, Lizhi Wang et al.
Breathing Life Into Sketches Using Text-to-Video Priors
Rinon Gal, Yael Vinker, Yuval Alaluf et al.
Gaussian Shell Maps for Efficient 3D Human Generation
Rameen Abdal, Wang Yifan, Zifan Shi et al.
Byzantine-robust Decentralized Federated Learning via Dual-domain Clustering and Trust Bootstrapping
Peng Sun, Xinyang Liu, Zhibo Wang et al.
MotionEditor: Editing Video Motion via Content-Aware Diffusion
Shuyuan Tu, Qi Dai, Zhi-Qi Cheng et al.
State Space Models for Event Cameras
Nikola Zubic, Mathias Gehrig, Davide Scaramuzza
DiffInDScene: Diffusion-based High-Quality 3D Indoor Scene Generation
Xiaoliang Ju, Zhaoyang Huang, Yijin Li et al.
Towards Calibrated Multi-label Deep Neural Networks
Jiacheng Cheng, Nuno Vasconcelos
TIM: A Time Interval Machine for Audio-Visual Action Recognition
Jacob Chalk, Jaesung Huh, Evangelos Kazakos et al.