Most Cited CVPR "feature representation spaces" Papers
5,589 papers found • Page 18 of 28
Conference
MRFS: Mutually Reinforcing Image Fusion and Segmentation
HAO ZHANG, Xuhui Zuo, Jie Jiang et al.
Multi-agent Long-term 3D Human Pose Forecasting via Interaction-aware Trajectory Conditioning
Jaewoo Jeong, Daehee Park, Kuk-Jin Yoon
3D-LFM: Lifting Foundation Model
Mosam Dabhi, László A. Jeni, Simon Lucey
Defense Against Adversarial Attacks on No-Reference Image Quality Models with Gradient Norm Regularization
Yujia Liu, Chenxi Yang, Dingquan Li et al.
Active Domain Adaptation with False Negative Prediction for Object Detection
Yuzuru Nakamura, Yasunori Ishii, Takayoshi Yamashita
Semantic-Aware Multi-Label Adversarial Attacks
Hassan Mahmood, Ehsan Elhamifar
Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring
Chengxu Liu, Xuan Wang, Xiangyu Xu et al.
CoralSCOP: Segment any COral Image on this Planet
Zheng Ziqiang, Liang Haixin, Binh-Son Hua et al.
Open Vocabulary Semantic Scene Sketch Understanding
Ahmed Bourouis, Judith Fan, Yulia Gryaditskaya
Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection
Chuangchuang Tan, Huan Liu, Yao Zhao et al.
BoQ: A Place is Worth a Bag of Learnable Queries
Amar Ali-bey, Brahim Chaib-draa, Philippe Giguère
UFC-Net: Unrolling Fixed-point Continuous Network for Deep Compressive Sensing
Xiaoyang Wang, Hongping Gan
MaskPLAN: Masked Generative Layout Planning from Partial Input
Hang Zhang, Anton Savov, Benjamin Dillenburger
Solving Masked Jigsaw Puzzles with Diffusion Vision Transformers
Jinyang Liu, Wondmgezahu Teshome, Sandesh Ghimire et al.
A-Teacher: Asymmetric Network for 3D Semi-Supervised Object Detection
Hanshi Wang, Zhipeng Zhang, Jin Gao et al.
3D Feature Tracking via Event Camera
Siqi Li, Zhou Zhikuan, Zhou Xue et al.
RoHM: Robust Human Motion Reconstruction via Diffusion
Siwei Zhang, Bharat Lal Bhatnagar, Yuanlu Xu et al.
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
Tao Wu, Runyu He, Gangshan Wu et al.
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object
Chenshuang Zhang, Fei Pan, Junmo Kim et al.
BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition
Yuxuan Zhou, Xudong Yan, Zhi-Qi Cheng et al.
Person-in-WiFi 3D: End-to-End Multi-Person 3D Pose Estimation with Wi-Fi
Kangwei Yan, Fei Wang, Bo Qian et al.
DiffusionRegPose: Enhancing Multi-Person Pose Estimation using a Diffusion-Based End-to-End Regression Approach
Dayi Tan, Hansheng Chen, Wei Tian et al.
Exact Fusion via Feature Distribution Matching for Few-shot Image Generation
Yingbo Zhou, Yutong Ye, Pengyu Zhang et al.
Affine Equivariant Networks Based on Differential Invariants
Yikang Li, Yeqing Qiu, Yuxuan Chen et al.
FlowDiffuser: Advancing Optical Flow Estimation with Diffusion Models
Ao Luo, XIN LI, Fan Yang et al.
Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention
Xingyu Zhou, Leheng Zhang, Xiaorui Zhao et al.
Puff-Net: Efficient Style Transfer with Pure Content and Style Feature Fusion Network
Sizhe Zheng, Pan Gao, Peng Zhou et al.
SynSP: Synergy of Smoothness and Precision in Pose Sequences Refinement
Tao Wang, Lei Jin, Zheng Wang et al.
Towards High-fidelity Artistic Image Vectorization via Texture-Encapsulated Shape Parameterization
Ye Chen, Bingbing Ni, Jinfan Liu et al.
Deformable One-shot Face Stylization via DINO Semantic Guidance
Yang Zhou, Zichong Chen, Hui Huang
Density-Guided Semi-Supervised 3D Semantic Segmentation with Dual-Space Hardness Sampling
Jianan Li, Qiulei Dong
Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework
Vu Minh Hieu Phan, Yutong Xie, Yuankai Qi et al.
1-Lipschitz Layers Compared: Memory Speed and Certifiable Robustness
Bernd Prach, Fabio Brau, Giorgio Buttazzo et al.
Point-VOS: Pointing Up Video Object Segmentation
Sabarinath Mahadevan, Idil Esen Zulfikar, Paul Voigtlaender et al.
Authentic Hand Avatar from a Phone Scan via Universal Hand Model
Gyeongsik Moon, Weipeng Xu, Rohan Joshi et al.
Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection
Jin Yang, Ping Wei, Huan Li et al.
HDRFlow: Real-Time HDR Video Reconstruction with Large Motions
Gangwei Xu, Yujin Wang, Jinwei Gu et al.
Beyond Average: Individualized Visual Scanpath Prediction
Xianyu Chen, Ming Jiang, Qi Zhao
Decentralized Directed Collaboration for Personalized Federated Learning
Yingqi Liu, Yifan Shi, Qinglun Li et al.
Task-Driven Wavelets using Constrained Empirical Risk Minimization
Eric Marcus, Ray Sheombarsing, Jan-Jakob Sonke et al.
Probing Synergistic High-Order Interaction in Infrared and Visible Image Fusion
Naishan Zheng, Man Zhou, Jie Huang et al.
Scaling Up Dynamic Human-Scene Interaction Modeling
Nan Jiang, Zhiyuan Zhang, Hongjie Li et al.
Data-Free Quantization via Pseudo-label Filtering
Chunxiao Fan, Ziqi Wang, Dan Guo et al.
Towards Robust Learning to Optimize with Theoretical Guarantees
Qingyu Song, Wei Lin, Juncheng Wang et al.
Animating General Image with Large Visual Motion Model
Dengsheng Chen, Xiaoming Wei, Xiaolin Wei
EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams
Christen Millerdurai, Hiroyasu Akada, Jian Wang et al.
Improving Generalization via Meta-Learning on Hard Samples
Nishant Jain, Arun Suggala, Pradeep Shenoy
BANF: Band-Limited Neural Fields for Levels of Detail Reconstruction
Ahan Shabanov, Shrisudhan Govindarajan, Cody Reading et al.
LLMs are Good Action Recognizers
Haoxuan Qu, Yujun Cai, Jun Liu
NeRFiller: Completing Scenes via Generative 3D Inpainting
Ethan Weber, Aleksander Holynski, Varun Jampani et al.
MPOD123: One Image to 3D Content Generation Using Mask-enhanced Progressive Outline-to-Detail Optimization
Jimin Xu, Tianbao Wang, Tao Jin et al.
RepAn: Enhanced Annealing through Re-parameterization
Xiang Fei, Xiawu Zheng, Yan Wang et al.
PAPR in Motion: Seamless Point-level 3D Scene Interpolation
Shichong Peng, Yanshu Zhang, Ke Li
Towards Modern Image Manipulation Localization: A Large-Scale Dataset and Novel Methods
Chenfan Qu, Yiwu Zhong, Chongyu Liu et al.
Exploring Pose-Aware Human-Object Interaction via Hybrid Learning
EASTMAN Z Y WU, Yali Li, Yuan Wang et al.
TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model
Hantao Yao, Rui Zhang, Changsheng Xu
LAENeRF: Local Appearance Editing for Neural Radiance Fields
Lukas Radl, Michael Steiner, Andreas Kurz et al.
Improved Visual Grounding through Self-Consistent Explanations
Ruozhen He, Paola Cascante-Bonilla, Ziyan Yang et al.
AZ-NAS: Assembling Zero-Cost Proxies for Network Architecture Search
Junghyup Lee, Bumsub Ham
On the Faithfulness of Vision Transformer Explanations
Junyi Wu, Weitai Kang, Hao Tang et al.
StrokeFaceNeRF: Stroke-based Facial Appearance Editing in Neural Radiance Field
Xiao-juan Li, Dingxi Zhang, Shu-Yu Chen et al.
Neural Modes: Self-supervised Learning of Nonlinear Modal Subspaces
Jiahong Wang, Yinwei DU, Stelian Coros et al.
Learning Occupancy for Monocular 3D Object Detection
Liang Peng, Junkai Xu, Haoran Cheng et al.
Bring Event into RGB and LiDAR: Hierarchical Visual-Motion Fusion for Scene Flow
Hanyu Zhou, Yi Chang, Zhiwei Shi
Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch
Xidong Wu, Shangqian Gao, Zeyu Zhang et al.
Learning to Control Camera Exposure via Reinforcement Learning
Kyunghyun Lee, Ukcheol Shin, Byeong-Uk Lee
FocSAM: Delving Deeply into Focused Objects in Segmenting Anything
You Huang, Zongyu Lan, Liujuan Cao et al.
Spin-UP: Spin Light for Natural Light Uncalibrated Photometric Stereo
Zongrui Li, Zhan Lu, Haojie Yan et al.
SEAS: ShapE-Aligned Supervision for Person Re-Identification
Haidong Zhu, Pranav Budhwant, Zhaoheng Zheng et al.
LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes
Shanlin Sun, Bingbing Zhuang, Ziyu Jiang et al.
Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts
Cansu Korkmaz, Ahmet Murat Tekalp, Zafer Dogan
Task-Customized Mixture of Adapters for General Image Fusion
Pengfei Zhu, Yang Sun, Bing Cao et al.
Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution
Zhikai Chen, Fuchen Long, Zhaofan Qiu et al.
Ensemble Diversity Facilitates Adversarial Transferability
Bowen Tang, Zheng Wang, Yi Bin et al.
CFAT: Unleashing Triangular Windows for Image Super-resolution
Abhisek Ray, Gaurav Kumar, Maheshkumar Kolekar
Convolutional Prompting meets Language Models for Continual Learning
Anurag Roy, Riddhiman Moulick, Vinay Verma et al.
Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning
Zihua Zhao, Mengxi Chen, Tianjie Dai et al.
Relaxed Contrastive Learning for Federated Learning
Seonguk Seo, Jinkyu Kim, Geeho Kim et al.
FFF: Fixing Flawed Foundations in Contrastive Pre-Training Results in Very Strong Vision-Language Models
Adrian Bulat, Yassine Ouali, Georgios Tzimiropoulos
Beyond Textual Constraints: Learning Novel Diffusion Conditions with Fewer Examples
Yuyang Yu, Bangzhen Liu, Chenxi Zheng et al.
Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations
Daan de Geus, Gijs Dubbelman
Targeted Representation Alignment for Open-World Semi-Supervised Learning
Ruixuan Xiao, Lei Feng, Kai Tang et al.
Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching
Peng Xu, Zhiyu Xiang, Chengyu Qiao et al.
Driving-Video Dehazing with Non-Aligned Regularization for Safety Assistance
Junkai Fan, Jiangwei Weng, Kun Wang et al.
Exploring Region-Word Alignment in Built-in Detector for Open-Vocabulary Object Detection
Heng Zhang, Qiuyu Zhao, Linyu Zheng et al.
Generative 3D Part Assembly via Part-Whole-Hierarchy Message Passing
Bi'an Du, Xiang Gao, Wei Hu et al.
GLOW: Global Layout Aware Attacks on Object Detection
Jun Bao, Buyu Liu, Kui Ren et al.
Neural Underwater Scene Representation
Yunkai Tang, Chengxuan Zhu, Renjie Wan et al.
Scaled Decoupled Distillation
Shicai Wei, Chunbo Luo, Yang Luo
Real-World Mobile Image Denoising Dataset with Efficient Baselines
Roman Flepp, Andrey Ignatov, Radu Timofte et al.
PARA-Drive: Parallelized Architecture for Real-time Autonomous Driving
Xinshuo Weng, Boris Ivanovic, Yan Wang et al.
Towards Generalizable Tumor Synthesis
Qi Chen, Xiaoxi Chen, Haorui Song et al.
Bi-SSC: Geometric-Semantic Bidirectional Fusion for Camera-based 3D Semantic Scene Completion
Yujie Xue, Ruihui Li, F anWu et al.
Depth-Aware Concealed Crop Detection in Dense Agricultural Scenes
Liqiong Wang, Jinyu Yang, Yanfu Zhang et al.
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image Recognition
Xiaohan Ding, Yiyuan Zhang, Yixiao Ge et al.
BigGait: Learning Gait Representation You Want by Large Vision Models
Dingqiang Ye, Chao Fan, Jingzhe Ma et al.
Gaussian Shell Maps for Efficient 3D Human Generation
Rameen Abdal, Wang Yifan, Zifan Shi et al.
DiffInDScene: Diffusion-based High-Quality 3D Indoor Scene Generation
Xiaoliang Ju, Zhaoyang Huang, Yijin Li et al.
LeGO: Leveraging a Surface Deformation Network for Animatable Stylized Face Generation with One Example
Soyeon Yoon, Kwan Yun, Kwanggyoon Seo et al.
Leveraging Predicate and Triplet Learning for Scene Graph Generation
Jiankai Li, Yunhong Wang, Xiefan Guo et al.
HHMR: Holistic Hand Mesh Recovery by Enhancing the Multimodal Controllability of Graph Diffusion Models
Mengcheng Li, Hongwen Zhang, Yuxiang Zhang et al.
Improved Baselines with Visual Instruction Tuning
Haotian Liu, Chunyuan Li, Yuheng Li et al.
Enhancing Visual Continual Learning with Language-Guided Supervision
Bolin Ni, Hongbo Zhao, Chenghao Zhang et al.
Rotation-Agnostic Image Representation Learning for Digital Pathology
Saghir Alfasly, Abubakr Shafique, Peyman Nejat et al.
Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer
Yuang Ai, Xiaoqiang Zhou, Huaibo Huang et al.
Making Vision Transformers Truly Shift-Equivariant
Renan A. Rojas-Gomez, Teck-Yian Lim, Minh Do et al.
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
Yujie Wei, Shiwei Zhang, Zhiwu Qing et al.
RankED: Addressing Imbalance and Uncertainty in Edge Detection Using Ranking-based Losses
bedrettin cetinkaya, Sinan Kalkan, Emre Akbas
Fine-Grained Bipartite Concept Factorization for Clustering
Chong Peng, Pengfei Zhang, Yongyong Chen et al.
Multimodal Prompt Perceiver: Empower Adaptiveness Generalizability and Fidelity for All-in-One Image Restoration
Yuang Ai, Huaibo Huang, Xiaoqiang Zhou et al.
BEVSpread: Spread Voxel Pooling for Bird’s-Eye-View Representation in Vision-based Roadside 3D Object Detection
Wenjie Wang, Yehao Lu, Guangcong Zheng et al.
Backdoor Defense via Test-Time Detecting and Repairing
Jiyang Guan, Jian Liang, Ran He
Leveraging Frame Affinity for sRGB-to-RAW Video De-rendering
Chen Zhang, Wencheng Han, Yang Zhou et al.
Learning CNN on ViT: A Hybrid Model to Explicitly Class-specific Boundaries for Domain Adaptation
Ba Hung Ngo, Nhat-Tuong Do-Tran, Tuan-Ngoc Nguyen et al.
GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo
Jiang Wu, Rui Li, Haofei Xu et al.
GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding
Zi-Ting Chou, Sheng-Yu Huang, I-Jieh Liu et al.
CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model
Jianhao Zeng, Dan Song, Weizhi Nie et al.
Action Detection via an Image Diffusion Process
Lin Geng Foo, Tianjiao Li, Hossein Rahmani et al.
DyBluRF: Dynamic Neural Radiance Fields from Blurry Monocular Video
Huiqiang Sun, Xingyi Li, Liao Shen et al.
MimicDiffusion: Purifying Adversarial Perturbation via Mimicking Clean Diffusion Model
Kaiyu Song, Hanjiang Lai, Yan Pan et al.
Action Scene Graphs for Long-Form Understanding of Egocentric Videos
Ivan Rodin, Antonino Furnari, Kyle Min et al.
SpiderMatch: 3D Shape Matching with Global Optimality and Geometric Consistency
Paul Roetzer, Florian Bernard
Realigning Confidence with Temporal Saliency Information for Point-Level Weakly-Supervised Temporal Action Localization
Ziying Xia, Jian Cheng, Siyu Liu et al.
Segment and Caption Anything
Xiaoke Huang, Jianfeng Wang, Yansong Tang et al.
Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images
Chaoqin Huang, Aofan Jiang, Jinghao Feng et al.
NightCC: Nighttime Color Constancy via Adaptive Channel Masking
Shuwei Li, Robby T. Tan
HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting
Xian Liu, Xiaohang Zhan, Jiaxiang Tang et al.
Depth Prompting for Sensor-Agnostic Depth Estimation
Jin-Hwi Park, Chanhwi Jeong, Junoh Lee et al.
Modality-Collaborative Test-Time Adaptation for Action Recognition
Baochen Xiong, Xiaoshan Yang, Yaguang Song et al.
DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback
Yangyi Chen, Karan Sikka, Michael Cogswell et al.
Rethinking Inductive Biases for Surface Normal Estimation
Gwangbin Bae, Andrew J. Davison
Pose-Guided Self-Training with Two-Stage Clustering for Unsupervised Landmark Discovery
Siddharth Tourani, Ahmed Alwheibi, Arif Mahmood et al.
OVMR: Open-Vocabulary Recognition with Multi-Modal References
Zehong Ma, Shiliang Zhang, Longhui Wei et al.
AETTA: Label-Free Accuracy Estimation for Test-Time Adaptation
Taeckyung Lee, Sorn Chottananurak, Taesik Gong et al.
AdaShift: Learning Discriminative Self-Gated Neural Feature Activation With an Adaptive Shift Factor
Sudong Cai
Holistic Features are almost Sufficient for Text-to-Video Retrieval
Kaibin Tian, Ruixiang Zhao, Zijie Xin et al.
CLIP-Driven Open-Vocabulary 3D Scene Graph Generation via Cross-Modality Contrastive Learning
Lianggangxu Chen, Xuejiao Wang, Jiale Lu et al.
Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation
Luca Barsellotti, Roberto Amoroso, Marcella Cornia et al.
Gaussian Shading: Provable Performance-Lossless Image Watermarking for Diffusion Models
Zijin Yang, Kai Zeng, Kejiang Chen et al.
Multimodal Sense-Informed Forecasting of 3D Human Motions
Zhenyu Lou, Qiongjie Cui, Haofan Wang et al.
Correlation-aware Coarse-to-fine MLPs for Deformable Medical Image Registration
Mingyuan Meng, Dagan Feng, Lei Bi et al.
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing
Denis Bobkov, Vadim Titov, Aibek Alanov et al.
Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement
Zaid Khan, Vijay Kumar BG, Samuel Schulter et al.
Joint-Task Regularization for Partially Labeled Multi-Task Learning
Kento Nishi, Junsik Kim, Wanhua Li et al.
Local-consistent Transformation Learning for Rotation-invariant Point Cloud Analysis
Yiyang Chen, Lunhao Duan, Shanshan Zhao et al.
Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis
Zhan Li, Zhang Chen, Zhong Li et al.
The More You See in 2D the More You Perceive in 3D
Xinyang Han, Zelin Gao, Angjoo Kanazawa et al.
What When and Where? Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Brian Chen, Nina Shvetsova, Andrew Rouditchenko et al.
Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities
Mingcheng Li, Dingkang Yang, Xiao Zhao et al.
ES³: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
Yuanhang Zhang, Shuang Yang, Shiguang Shan et al.
Depth-aware Test-Time Training for Zero-shot Video Object Segmentation
Weihuang Liu, Xi Shen, Haolun Li et al.
MSU-4S - The Michigan State University Four Seasons Dataset
Daniel Kent, Mohammed Alyaqoub, Xiaohu Lu et al.
Rapid 3D Model Generation with Intuitive 3D Input
Tianrun Chen, Chaotao Ding, Shangzhan Zhang et al.
CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation
Zineng Tang, Ziyi Yang, MAHMOUD KHADEMI et al.
AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-Resolution
Cheeun Hong, Kyoung Mu Lee
Instance-level Expert Knowledge and Aggregate Discriminative Attention for Radiology Report Generation
Shenshen Bu, Taiji Li, Zhiming Dai et al.
HyperSDFusion: Bridging Hierarchical Structures in Language and Geometry for Enhanced 3D Text2Shape Generation
Zhiying Leng, Tolga Birdal, Xiaohui Liang et al.
Just Add ?! Pose Induced Video Transformers for Understanding Activities of Daily Living
Dominick Reilly, Srijan Das
Viewpoint-Aware Visual Grounding in 3D Scenes
Xiangxi Shi, Zhonghua Wu, Stefan Lee
Long-Tail Class Incremental Learning via Independent Sub-prototype Construction
Xi Wang, Xu Yang, Jie Yin et al.
An Aggregation-Free Federated Learning for Tackling Data Heterogeneity
Yuan Wang, Huazhu Fu, Renuga Kanagavelu et al.
Advancing Saliency Ranking with Human Fixations: Dataset Models and Benchmarks
Bowen Deng, Siyang Song, Andrew French et al.
DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior
Tianyu Huang, Yihan Zeng, Zhilu Zhang et al.
From Coarse to Fine-Grained Open-Set Recognition
Nico Lang, Vésteinn Snæbjarnarson, Elijah Cole et al.
Arbitrary Motion Style Transfer with Multi-condition Motion Latent Diffusion Model
Wenfeng Song, Xingliang Jin, Shuai Li et al.
Distraction is All You Need: Memory-Efficient Image Immunization against Diffusion-Based Image Editing
Ling Lo, Cheng Yeo, Hong-Han Shuai et al.
MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction
Xiaolu Liu, Song Wang, Wentong Li et al.
ViT-Lens: Towards Omni-modal Representations
Stan Weixian Lei, Yixiao Ge, Kun Yi et al.
Prompt-Driven Referring Image Segmentation with Instance Contrasting
Chao Shang, Zichen Song, Heqian Qiu et al.
CosmicMan: A Text-to-Image Foundation Model for Humans
Shikai Li, Jianglin Fu, Kaiyuan Liu et al.
Overload: Latency Attacks on Object Detection for Edge Devices
Erh-Chung Chen, Pin-Yu Chen, I-Hsin Chung et al.
Neural Exposure Fusion for High-Dynamic Range Object Detection
Emmanuel Onzon, Maximilian Bömer, Fahim Mannan et al.
Semantics Distortion and Style Matter: Towards Source-free UDA for Panoramic Segmentation
Xu Zheng, Pengyuan Zhou, ATHANASIOS et al.
PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness
Anh-Quan Cao, Angela Dai, Raoul de Charette
Evaluating Transferability in Retrieval Tasks: An Approach Using MMD and Kernel Methods
Mengyu Dai, Amir Hossein Raffiee, Aashish Jain et al.
Confronting Ambiguity in 6D Object Pose Estimation via Score-Based Diffusion on SE(3)
Tsu-Ching Hsiao, Hao-Wei Chen, Hsuan-Kung Yang et al.
Enhancing Post-training Quantization Calibration through Contrastive Learning
Yuzhang Shang, Gaowen Liu, Ramana Kompella et al.
DualAD: Disentangling the Dynamic and Static World for End-to-End Driving
Simon Doll, Niklas Hanselmann, Lukas Schneider et al.
ColorPCR: Color Point Cloud Registration with Multi-Stage Geometric-Color Fusion
Juncheng Mu, Lin Bie, Shaoyi Du et al.
Time- Memory- and Parameter-Efficient Visual Adaptation
Otniel-Bogdan Mercea, Alexey Gritsenko, Cordelia Schmid et al.
Differentiable Point-based Inverse Rendering
Hoon-Gyu Chung, Seokjun Choi, Seung-Hwan Baek
CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation
Jun Wang, Yuzhe Qin, Kaiming Kuang et al.
Video Harmonization with Triplet Spatio-Temporal Variation Patterns
Zonghui Guo, XinYu Han, Jie Zhang et al.
Can Biases in ImageNet Models Explain Generalization?
Paul Gavrikov, Janis Keuper
HumMUSS: Human Motion Understanding using State Space Models
Arnab Mondal, Stefano Alletto, Denis Tome
Revisiting Adversarial Training at Scale
Zeyu Wang, Xianhang li, Hongru Zhu et al.
Make Pixels Dance: High-Dynamic Video Generation
Yan Zeng, Guoqiang Wei, Jiani Zheng et al.
Generative Multi-modal Models are Good Class Incremental Learners
Xusheng Cao, Haori Lu, Linlan Huang et al.
LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction
Bo Zou, Chao Yang, Yu Qiao et al.
NC-SDF: Enhancing Indoor Scene Reconstruction Using Neural SDFs with View-Dependent Normal Compensation
Ziyi Chen, Xiaolong Wu, Yu Zhang
Holistic Autonomous Driving Understanding by Bird’s-Eye-View Injected Multi-Modal Large Models
Xinpeng Ding, Jianhua Han, Hang Xu et al.
Addressing Background Context Bias in Few-Shot Segmentation through Iterative Modulation
Lanyun Zhu, Tianrun Chen, Jianxiong Yin et al.
OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation
Xiongwei Wu, Sicheng Yu, Ee-Peng Lim et al.
Friendly Sharpness-Aware Minimization
Tao Li, Pan Zhou, Zhengbao He et al.
MoSAR: Monocular Semi-Supervised Model for Avatar Reconstruction using Differentiable Shading
Abdallah Dib, Luiz Gustavo Hafemann, Emeline Got et al.
Point2CAD: Reverse Engineering CAD Models from 3D Point Clouds
Yujia Liu, Anton Obukhov, Jan D. Wegner et al.
Misalignment-Robust Frequency Distribution Loss for Image Transformation
Zhangkai Ni, Juncheng Wu, Zian Wang et al.
WildlifeMapper: Aerial Image Analysis for Multi-Species Detection and Identification
Satish Kumar, Bowen Zhang, Chandrakanth Gudavalli et al.
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
Tianhao Qi, Shancheng Fang, Yanze Wu et al.
Learning Degradation-unaware Representation with Prior-based Latent Transformations for Blind Face Restoration
Lianxin Xie, csbingbing zheng, Wen Xue et al.