Most Cited CVPR "benchmark failure modes" Papers
5,589 papers found • Page 25 of 28
Conference
SleepVST: Sleep Staging from Near-Infrared Video Signals using Pre-Trained Transformers
Jonathan F. Carter, Joao Jorge, Oliver Gibson et al.
Empowering Resampling Operation for Ultra-High-Definition Image Enhancement with Model-Aware Guidance
Yu, Jie Huang, Li et al.
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
Rongjie Li, Songyang Zhang, Dahua Lin et al.
OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion
Xinyu Zhan, Lixin Yang, Yifei Zhao et al.
Visual Point Cloud Forecasting enables Scalable Autonomous Driving
Zetong Yang, Li Chen, Yanan Sun et al.
HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation
Xin Huang, Ruizhi Shao, Qi Zhang et al.
HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video
Zicong Fan, Maria Parelli, Maria Kadoglou et al.
AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving
Mingfu Liang, Jong-Chyi Su, Samuel Schulter et al.
Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing
Dongyoung Kim, Jinwoo Kim, Junsang Yu et al.
OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies
Lingdong Kong, Youquan Liu, Lai Xing Ng et al.
Z*: Zero-shot Style Transfer via Attention Reweighting
Yingying Deng, Xiangyu He, Fan Tang et al.
G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis
Yufei Ye, Abhinav Gupta, Kris Kitani et al.
ConCon-Chi: Concept-Context Chimera Benchmark for Personalized Vision-Language Tasks
Andrea Rosasco, Stefano Berti, Giulia Pasquale et al.
Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation
Mukul Khanna, Yongsen Mao, Hanxiao Jiang et al.
KeyPoint Relative Position Encoding for Face Recognition
Minchul Kim, Feng Liu, Yiyang Su et al.
QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition
Xiang Li, Jinglu Wang, Xiaohao Xu et al.
From a Bird's Eye View to See: Joint Camera and Subject Registration without the Camera Calibration
Zekun Qian, Ruize Han, Wei Feng et al.
Joint2Human: High-Quality 3D Human Generation via Compact Spherical Embedding of 3D Joints
Muxin Zhang, Qiao Feng, Zhuo Su et al.
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis
Feng Liang, Bichen Wu, Jialiang Wang et al.
Accept the Modality Gap: An Exploration in the Hyperbolic Space
Sameera Ramasinghe, Violetta Shevchenko, Gil Avraham et al.
Random Entangled Tokens for Adversarially Robust Vision Transformer
Huihui Gong, Minjing Dong, Siqi Ma et al.
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
Tongjia Chen, Hongshan Yu, Zhengeng Yang et al.
Continuous Pose for Monocular Cameras in Neural Implicit Representation
Qi Ma, Danda Paudel, Ajad Chhatkuli et al.
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning
Siyuan Liang, Mingli Zhu, Aishan Liu et al.
A Pedestrian is Worth One Prompt: Towards Language Guidance Person Re-Identification
Zexian Yang, Dayan Wu, Chenming Wu et al.
From SAM to CAMs: Exploring Segment Anything Model for Weakly Supervised Semantic Segmentation
Hyeokjun Kweon, Kuk-Jin Yoon
Vlogger: Make Your Dream A Vlog
Shaobin Zhuang, Kunchang Li, Xinyuan Chen et al.
EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion
Zehuan Huang, Hao Wen, Junting Dong et al.
IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images
Yushuang Wu, Luyue Shi, Junhao Cai et al.
SVGDreamer: Text Guided SVG Generation with Diffusion Model
XiMing Xing, Chuang Wang, Haitao Zhou et al.
Dual Prototype Attention for Unsupervised Video Object Segmentation
Suhwan Cho, Minhyeok Lee, Seunghoon Lee et al.
Contrastive Mean-Shift Learning for Generalized Category Discovery
Sua Choi, Dahyun Kang, Minsu Cho
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving
Yuqing Wen, Yucheng Zhao, Yingfei Liu et al.
Flexible Biometrics Recognition: Bridging the Multimodality Gap through Attention Alignment and Prompt Tuning
Leslie Ching Ow Tiong, Dick Sigmund, Chen-Hui Chan et al.
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers
Narges Norouzi, Svetlana Orlova, Daan de Geus et al.
UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion
Junsheng Zhou, Weiqi Zhang, Baorui Ma et al.
UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity
Jialong Zuo, Hanyu Zhou, Ying Nie et al.
Test-Time Zero-Shot Temporal Action Localization
Benedetta Liberatori, Alessandro Conti, Paolo Rota et al.
Towards Text-guided 3D Scene Composition
Qihang Zhang, Chaoyang Wang, Aliaksandr Siarohin et al.
Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation
Xiaohan Lei, Min Wang, Wengang Zhou et al.
Learning SO(3)-Invariant Semantic Correspondence via Local Shape Transform
Chunghyun Park, Seungwook Kim, Jaesik Park et al.
Taming Mode Collapse in Score Distillation for Text-to-3D Generation
Peihao Wang, Dejia Xu, Zhiwen Fan et al.
SnAG: Scalable and Accurate Video Grounding
Fangzhou Mu, Sicheng Mo, Yin Li
GLaMM: Pixel Grounding Large Multimodal Model
Hanoona Rasheed, Muhammad Maaz, Sahal Shaji Mullappilly et al.
ManiFPT: Defining and Analyzing Fingerprints of Generative Models
Hae Jin Song, Mahyar Khayatkhoei, Wael AbdAlmageed
CORE-MPI: Consistency Object Removal with Embedding MultiPlane Image
Donggeun Yoon, Donghyeon Cho
ScoreHypo: Probabilistic Human Mesh Estimation with Hypothesis Scoring
Yuan Xu, Xiaoxuan Ma, Jiajun Su et al.
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
Nataniel Ruiz, Yuanzhen Li, Varun Jampani et al.
UniMix: Towards Domain Adaptive and Generalizable LiDAR Semantic Segmentation in Adverse Weather
Haimei Zhao, Jing Zhang, Zhuo Chen et al.
BilevelPruning: Unified Dynamic and Static Channel Pruning for Convolutional Neural Networks
Shangqian Gao, Yanfu Zhang, Feihu Huang et al.
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Jiayi Guo, Xingqian Xu, Yifan Pu et al.
Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents
Yuxi Wei, Zi Wang, Yifan Lu et al.
Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions
Weizhen He, Yiheng Deng, SHIXIANG TANG et al.
SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers
Ioannis Kakogeorgiou, Spyros Gidaris, Konstantinos Karantzalos et al.
MRC-Net: 6-DoF Pose Estimation with MultiScale Residual Correlation
Yuelong Li, Yafei Mao, Raja Bala et al.
Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
Youngjoon Jang, Jihoon Kim, Junseok Ahn et al.
Learning to Segment Referred Objects from Narrated Egocentric Videos
Yuhan Shen, Huiyu Wang, Xitong Yang et al.
CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow
Chenbin Pan, Burhan Yaman, Senem Velipasalar et al.
ADFactory: An Effective Framework for Generalizing Optical Flow with NeRF
Han Ling, Quansen Sun, Yinghui Sun et al.
FastMAC: Stochastic Spectral Sampling of Correspondence Graph
Yifei Zhang, Hao Zhao, Hongyang Li et al.
PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution
Honghao Chen, Xiangxiang Chu, Renyongjian et al.
Improving Semantic Correspondence with Viewpoint-Guided Spherical Maps
Octave Mariotti, Oisin Mac Aodha, Hakan Bilen
Learning Degradation-Independent Representations for Camera ISP Pipelines
Yanhui Guo, Fangzhou Luo, Xiaolin Wu
Low-Resource Vision Challenges for Foundation Models
Yunhua Zhang, Hazel Doughty, Cees G. M. Snoek
MaxQ: Multi-Axis Query for N:M Sparsity Network
Jingyang Xiang, Siqi Li, Junhao Chen et al.
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Haoning Wu, Zicheng Zhang, Erli Zhang et al.
Efficient Scene Recovery Using Luminous Flux Prior
ZhongYu Li, Lei Zhang
Towards CLIP-driven Language-free 3D Visual Grounding via 2D-3D Relational Enhancement and Consistency
Yuqi Zhang, Han Luo, Yinjie Lei
LaRE^2: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection
Yunpeng Luo, Junlong Du, Ke Yan et al.
Stratified Avatar Generation from Sparse Observations
Han Feng, Wenchao Ma, Quankai Gao et al.
Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis
Simon Niedermayr, Josef Stumpfegger, rüdiger westermann
Motion Blur Decomposition with Cross-shutter Guidance
Xiang Ji, Haiyang Jiang, Yinqiang Zheng
NAPGuard: Towards Detecting Naturalistic Adversarial Patches
Siyang Wu, Jiakai Wang, Jiejie Zhao et al.
Active Prompt Learning in Vision Language Models
Jihwan Bang, Sumyeong Ahn, Jae-Gil Lee
Generating Handwritten Mathematical Expressions From Symbol Graphs: An End-to-End Pipeline
Yu chen, Fei Gao, YanguangZhang et al.
On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation
Agneet Chatterjee, Tejas Gokhale, Chitta Baral et al.
SingularTrajectory: Universal Trajectory Predictor Using Diffusion Model
Inhwan Bae, Young-Jae Park, Hae-Gon Jeon
Solving the Catastrophic Forgetting Problem in Generalized Category Discovery
Xinzi Cao, Xiawu Zheng, Guanhong Wang et al.
MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images
Junwen Huang, Hao Yu, Kuan-Ting Yu et al.
The Manga Whisperer: Automatically Generating Transcriptions for Comics
Ragav Sachdeva, Andrew Zisserman
3DInAction: Understanding Human Actions in 3D Point Clouds
Yizhak Ben-Shabat, Oren Shrout, Stephen Gould
StyLitGAN: Image-Based Relighting via Latent Control
Anand Bhattad, James Soole, David Forsyth
Unsupervised Universal Image Segmentation
XuDong Wang, Dantong Niu, Xinyang Han et al.
Not All Classes Stand on Same Embeddings: Calibrating a Semantic Distance with Metric Tensor
Jae Hyeon Park, Gyoomin Lee, Seunggi Park et al.
Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling
Zhe Li, Zerong Zheng, Lizhen Wang et al.
Retrieval-Augmented Open-Vocabulary Object Detection
Jooyeon Kim, Eulrang Cho, Sehyung Kim et al.
LangSplat: 3D Language Gaussian Splatting
Minghan Qin, Wanhua Li, Jiawei ZHOU et al.
SeaBird: Segmentation in Bird’s View with Dice Loss Improves Monocular 3D Detection of Large Objects
Abhinav Kumar, Yuliang Guo, Xinyu Huang et al.
Learning with Structural Labels for Learning with Noisy Labels
Noo-ri Kim, Jin-Seop Lee, Jee-Hyong Lee
Incremental Nuclei Segmentation from Histopathological Images via Future-class Awareness and Compatibility-inspired Distillation
Huyong Wang, Huisi Wu, Jing Qin
Model Inversion Robustness: Can Transfer Learning Help?
Sy-Tuyen Ho, Koh Jun Hao, Keshigeyan Chandrasegaran et al.
MaskCLR: Attention-Guided Contrastive Learning for Robust Action Representation Learning
Mohamed Abdelfattah, Mariam Hassan, Alex Alahi
D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection
Dinh Phat Do, Taehoon Kim, JAEMIN NA et al.
Intrinsic Image Diffusion for Indoor Single-view Material Estimation
Peter Kocsis, Vincent Sitzmann, Matthias Nießner
NetTrack: Tracking Highly Dynamic Objects with a Net
Guangze Zheng, Shijie Lin, Haobo Zuo et al.
FADES: Fair Disentanglement with Sensitive Relevance
Taeuk Jang, Xiaoqian Wang
Deep Single Image Camera Calibration by Heatmap Regression to Recover Fisheye Images Under Manhattan World Assumption
Nobuhiko Wakai, Satoshi Sato, Yasunori Ishii et al.
HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces
Haithem Turki, Vasu Agrawal, Samuel Rota Bulò et al.
IIRP-Net: Iterative Inference Residual Pyramid Network for Enhanced Image Registration
Tai Ma, zhangsuwei, Jiafeng Li et al.
SEED-Bench: Benchmarking Multimodal Large Language Models
Bohao Li, Yuying Ge, Yixiao Ge et al.
Style Aligned Image Generation via Shared Attention
Amir Hertz, Andrey Voynov, Shlomi Fruchter et al.
NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows
Zhenggang Tang, Jason Ren, Xiaoming Zhao et al.
LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content
Qihao Zhao, Yalun Dai, Hao Li et al.
How to Train Neural Field Representations: A Comprehensive Study and Benchmark
Samuele Papa, Riccardo Valperga, David Knigge et al.
Reg-PTQ: Regression-specialized Post-training Quantization for Fully Quantized Object Detector
Yifu Ding, Weilun Feng, Chuyan Chen et al.
FREE: Faster and Better Data-Free Meta-Learning
Yongxian Wei, Zixuan Hu, Zhenyi Wang et al.
You Only Need Less Attention at Each Stage in Vision Transformers
Shuoxi Zhang, Hanpeng Liu, Stephen Lin et al.
Hierarchical Patch Diffusion Models for High-Resolution Video Generation
Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin et al.
CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment
Sajid Javed, Arif Mahmood, IYYAKUTTI IYAPPAN GANAPATHI et al.
Towards Memorization-Free Diffusion Models
Chen Chen, Daochang Liu, Chang Xu
FedHCA2: Towards Hetero-Client Federated Multi-Task Learning
Yuxiang Lu, Suizhi Huang, Yuwen Yang et al.
Improving Unsupervised Hierarchical Representation with Reinforcement Learning
Ruyi An, Yewen Li, Xu He et al.
Global Latent Neural Rendering
Thomas Tanay, Matteo Maggioni
Data Poisoning based Backdoor Attacks to Contrastive Learning
Jinghuai Zhang, Hongbin Liu, Jinyuan Jia et al.
ERMVP: Communication-Efficient and Collaboration-Robust Multi-Vehicle Perception in Challenging Environments
Jingyu Zhang, Kun Yang, Yilei Wang et al.
GRAM: Global Reasoning for Multi-Page VQA
Itshak Blau, Sharon Fogel, Roi Ronen et al.
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data
Qifan Yu, Juncheng Li, Longhui Wei et al.
Perception-Oriented Video Frame Interpolation via Asymmetric Blending
Guangyang Wu, Xin Tao, Changlin Li et al.
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
Qi Yang, Xing Nie, Tong Li et al.
Improving Generalized Zero-Shot Learning by Exploring the Diverse Semantics from External Class Names
Yapeng Li, Yong Luo, Zengmao Wang et al.
Continual Learning for Motion Prediction Model via Meta-Representation Learning and Optimal Memory Buffer Retention Strategy
Dae Jun Kang, Dongsuk Kum, Sanmin Kim
3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting
Zhiyin Qian, Shaofei Wang, Marko Mihajlovic et al.
Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation
Haofeng Liu, Chenshu Xu, Yifei Yang et al.
Building Vision-Language Models on Solid Foundations with Masked Distillation
Sepehr Sameni, Kushal Kafle, Hao Tan et al.
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Pengchong Qiao, Lei Shang, Chang Liu et al.
OmniSDF: Scene Reconstruction using Omnidirectional Signed Distance Functions and Adaptive Binoctrees
Hakyeong Kim, Andreas Meuleman, Hyeonjoong Jang et al.
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
Hyeonho Jeong, Geon Yeong Park, Jong Chul Ye
M3-UDA: A New Benchmark for Unsupervised Domain Adaptive Fetal Cardiac Structure Detection
Bin Pu, Liwen Wang, Jiewen Yang et al.
Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning
Da-Wei Zhou, Hai-Long Sun, Han-Jia Ye et al.
Multiway Point Cloud Mosaicking with Diffusion and Global Optimization
Shengze Jin, Iro Armeni, Marc Pollefeys et al.
NeRSP: Neural 3D Reconstruction for Reflective Objects with Sparse Polarized Images
Yufei Han, Heng Guo, Koki Fukai et al.
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension
Lei Zhu, Fangyun Wei, Yanye Lu
CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective
Shunsuke Yasuki, Masato Taki
Regularized Parameter Uncertainty for Improving Generalization in Reinforcement Learning
Pehuen Moure, Longbiao Cheng, Joachim Ott et al.
Robust Noisy Correspondence Learning with Equivariant Similarity Consistency
Yuchen Yang, Erkun Yang, Likai Wang et al.
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning
Siddharth Srivastava, Gaurav Sharma
Utility-Fairness Trade-Offs and How to Find Them
Sepehr Dehdashtian, Bashir Sadeghi, Vishnu Naresh Boddeti
Fitting Flats to Flats
Gabriel Dogadov, Ugo Finnendahl, Marc Alexa
HOIST-Former: Hand-held Objects Identification Segmentation and Tracking in the Wild
Supreeth Narasimhaswamy, Huy Anh Nguyen, Lihan Huang et al.
WaveFace: Authentic Face Restoration with Efficient Frequency Recovery
Yunqi Miao, Jiankang Deng, Jungong Han
Learning Adaptive Spatial Coherent Correlations for Speech-Preserving Facial Expression Manipulation
Tianshui Chen, Jianman Lin, Zhijing Yang et al.
UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and Unfavorable Sets
Youngju Na, Woo Jae Kim, Kyu Han et al.
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Lihe Yang, Bingyi Kang, Zilong Huang et al.
EventDance: Unsupervised Source-free Cross-modal Adaptation for Event-based Object Recognition
Xu Zheng, Addison, Lin Wang
Real-World Efficient Blind Motion Deblurring via Blur Pixel Discretization
Insoo Kim, Jae Seok Choi, Geonseok Seo et al.
HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting
Hongyu Zhou, Jiahao Shao, Lu Xu et al.
Human Motion Prediction Under Unexpected Perturbation
Jiangbei Yue, Baiyi Li, Julien Pettré et al.
SynFog: A Photo-realistic Synthetic Fog Dataset based on End-to-end Imaging Simulation for Advancing Real-World Defogging in Autonomous Driving
Yiming Xie, Henglu Wei, Zhenyi Liu et al.
IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing
Shaofei Wang, Bozidar Antic, Andreas Geiger et al.
All in One Framework for Multimodal Re-identification in the Wild
He Li, Mang Ye, Ming Zhang et al.
FedSOL: Stabilized Orthogonal Learning with Proximal Restrictions in Federated Learning
Gihun Lee, Minchan Jeong, SangMook Kim et al.
Don't Look into the Dark: Latent Codes for Pluralistic Image Inpainting
Haiwei Chen, Yajie Zhao
WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion
Soyong Shin, Juyong Kim, Eni Halilaj et al.
FedUV: Uniformity and Variance for Heterogeneous Federated Learning
Ha Min Son, Moon-Hyun Kim, Tai-Myoung Chung et al.
Language-driven Grasp Detection
An Dinh Vuong, Minh Nhat VU, Baoru Huang et al.
Each Test Image Deserves A Specific Prompt: Continual Test-Time Adaptation for 2D Medical Image Segmentation
Ziyang Chen, Yongsheng Pan, Yiwen Ye et al.
Prompting Vision Foundation Models for Pathology Image Analysis
CHONG YIN, Siqi Liu, Kaiyang Zhou et al.
Unmixing Before Fusion: A Generalized Paradigm for Multi-Source-based Hyperspectral Image Synthesis
Yang Yu, Erting Pan, Xinya Wang et al.
Navigating Beyond Dropout: An Intriguing Solution towards Generalizable Image Super Resolution
Hongjun Wang, Jiyuan Chen, Yinqiang Zheng et al.
Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior
Chen Cheng, Xiaofeng Yang, Fan Yang et al.
Task2Box: Box Embeddings for Modeling Asymmetric Task Relationships
Rangel Daroya, Aaron Sun, Subhransu Maji
ExMap: Leveraging Explainability Heatmaps for Unsupervised Group Robustness to Spurious Correlations
Rwiddhi Chakraborty, Adrian de Sena Sletten, Michael C. Kampffmeyer
DUDF: Differentiable Unsigned Distance Fields with Hyperbolic Scaling
Miguel Fainstein, Viviana Siless, Emmanuel Iarussi
PBWR: Parametric-Building-Wireframe Reconstruction from Aerial LiDAR Point Clouds
Shangfeng Huang, Ruisheng Wang, Bo Guo et al.
GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation Demonstration and Imitation
Zifan Wang, Junyu Chen, Ziqing Chen et al.
Content-Style Decoupling for Unsupervised Makeup Transfer without Generating Pseudo Ground Truth
Zhaoyang Sun, Shengwu Xiong, Yaxiong Chen et al.
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Yanwu Xu, Yang Zhao, Zhisheng Xiao et al.
SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation
Keqi Chen, vinkle srivastav, Nicolas Padoy
Context-Aware Integration of Language and Visual References for Natural Language Tracking
Yanyan Shao, Shuting He, Qi Ye et al.
PointBeV: A Sparse Approach for BeV Predictions
Loick Chambon, Éloi Zablocki, Mickaël Chen et al.
Contextual Augmented Global Contrast for Multimodal Intent Recognition
Kaili Sun, Zhiwen Xie, Mang Ye et al.
Intraoperative 2D/3D Image Registration via Differentiable X-ray Rendering
Vivek Gopalakrishnan, Neel Dey, Polina Golland
LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation
Kibum Kim, Kanghoon Yoon, Jaehyeong Jeon et al.
FreeKD: Knowledge Distillation via Semantic Frequency Prompt
Yuan Zhang, Tao Huang, Jiaming Liu et al.
Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning
Wei Zhang, Chaoqun Wan, Tongliang Liu et al.
SNIDA: Unlocking Few-Shot Object Detection with Non-linear Semantic Decoupling Augmentation
Yanjie Wang, Xu Zou, Luxin Yan et al.
Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features
Niladri Shekhar Dutt, Sanjeev Muralikrishnan, Niloy J. Mitra
Unsupervised Occupancy Learning from Sparse Point Cloud
Amine Ouasfi, Adnane Boukhayma
Hierarchical Intra-modal Correlation Learning for Label-free 3D Semantic Segmentation
Xin Kang, Lei Chu, Jiahao Li et al.
Adaptive Hyper-graph Aggregation for Modality-Agnostic Federated Learning
Fan Qi, Shuai Li
Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment
Angchi Xu, Wei-Shi Zheng
MoMask: Generative Masked Modeling of 3D Human Motions
chuan guo, Yuxuan Mu, Muhammad Gohar Javed et al.
Event-based Visible and Infrared Fusion via Multi-task Collaboration
Mengyue Geng, Lin Zhu, Lizhi Wang et al.
MotionEditor: Editing Video Motion via Content-Aware Diffusion
Shuyuan Tu, Qi Dai, Zhi-Qi Cheng et al.
State Space Models for Event Cameras
Nikola Zubic, Mathias Gehrig, Davide Scaramuzza
Test-Time Linear Out-of-Distribution Detection
Ke Fan, Tong Liu, Xingyu Qiu et al.
Exploiting Style Latent Flows for Generalizing Deepfake Video Detection
Jongwook Choi, Taehoon Kim, Yonghyun Jeong et al.
Unsupervised Semantic Segmentation Through Depth-Guided Feature Correlation and Sampling
Leon Sick, Dominik Engel, Pedro Hermosilla et al.
Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge
Haoxiang Ma, Modi Shi, Boyang GAO et al.
Generalized Event Cameras
Varun Sundar, Matthew Dutson, Andrei Ardelean et al.
Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval
Rohan Sarkar, Avinash Kak
NIVeL: Neural Implicit Vector Layers for Text-to-Vector Generation
Vikas Thamizharasan, Difan Liu, Matthew Fisher et al.
Hyperbolic Anomaly Detection
Huimin Li, Zhentao Chen, Yunhao Xu et al.
DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance
Zixuan Wang, Jia Jia, Shikun Sun et al.
HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative
CONG MA, Qiao Lei, Chengkai Zhu et al.
What Sketch Explainability Really Means for Downstream Tasks?
Hmrishav Bandyopadhyay, Pinaki Nath Chowdhury, Ayan Kumar Bhunia et al.
CommonCanvas: Open Diffusion Models Trained on Creative-Commons Images
Aaron Gokaslan, A. Feder Cooper, Jasmine Collins et al.
Memory-based Adapters for Online 3D Scene Perception
Xiuwei Xu, Chong Xia, Ziwei Wang et al.
Cross-spectral Gated-RGB Stereo Depth Estimation
Samuel Brucker, Stefanie Walz, Mario Bijelic et al.
Readout Guidance: Learning Control from Diffusion Features
Grace Luo, Trevor Darrell, Oliver Wang et al.