Most Cited 2024 "object categories" Papers
12,324 papers found • Page 50 of 62
Conference
Dynamic Knowledge Injection for AIXI Agents
Samuel Yang-Zhao, Kee Siong Ng, Marcus Hutter
Factored Online Planning in Many-Agent POMDPs
Maris Galesloot, Thiago Simão, Sebastian Junges et al.
Principal-Agent Reward Shaping in MDPs
Omer Ben-Porat, Yishay Mansour, Michal Moshkovitz et al.
Feature Distribution Matching by Optimal Transport for Effective and Robust Coreset Selection
Dialogues Are Not Just Text: Modeling Cognition for Dialogue Coherence Evaluation
A Unified Self-Distillation Framework for Multimodal Sentiment Analysis with Uncertain Missing Modalities
Guiding a Harsh-Environments Robust Detector via RAW Data Characteristic Mining
LimeAttack: Local Explainable Method for Textual Hard-Label Adversarial Attack
A Novel Skip Orthogonal List for Dynamic Optimal Transport Problem
Mixed-Effects Contextual Bandits
Weiwei Xiao, Yongyong Chen, Qiben Shan et al.
Beyond Attention: Breaking the Limits of Transformer Context Length with Recurrent Memory
Aydar Bulatov, Yuri Kuratov, Yermek Kapushev et al.
Resisting Backdoor Attacks in Federated Learning via Bidirectional Elections and Individual Perspective
Zhen Qin, Feiyi Chen, Chen Zhi et al.
Transportable Representations for Domain Generalization
Kasra Jalaldoust, Elias Bareinboim
Exponential Hardness of Optimization from the Locality in Quantum Neural Networks
Hao-Kai Zhang, Chengkai Zhu, Geng Liu et al.
MFOS: Model-Free & One-Shot Object Pose Estimation
JongMin Lee, Yohann Cabon, Romain Brégier et al.
Hierarchical Topology Isomorphism Expertise Embedded Graph Contrastive Learning
Jiangmeng Li, Yifan Jin, Hang Gao et al.
PDE+: Enhancing Generalization via PDE with Adaptive Distributional Diffusion
Yige Yuan, Bingbing Xu, Bo Lin et al.
Towards Real-World Test-Time Adaptation: Tri-net Self-Training with Balanced Normalization
Yongyi Su, Xun Xu, Kui Jia
Probabilistic Offline Policy Ranking with Approximate Bayesian Computation
Longchao Da, Porter Jenkins, Trevor Schwantes et al.
DRF: Improving Certified Robustness via Distributional Robustness Framework
Zekai Wang, Zhengyu Zhou, Weiwei Liu
Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning
Ruiqian Nai, Zixin Wen, Ji Li et al.
Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation
Zhouhong Gu, Xiaoxuan Zhu, Haoning Ye et al.
Out of Thin Air: Exploring Data-Free Adversarial Robustness Distillation
Yuzheng Wang, Zhaoyu Chen, Dingkang Yang et al.
Towards Detailed Text-to-Motion Synthesis via Basic-to-Advanced Hierarchical Diffusion Model
Zhenyu Xie, Yang Wu, Xuehao Gao et al.
Dirichlet-Based Prediction Calibration for Learning with Noisy Labels
Chen-Chen Zong, Ye-Wen Wang, Ming-Kun Xie et al.
HAGO-Net: Hierarchical Geometric Massage Passing for Molecular Representation Learning
Hongbin Pei, Taile Chen, Chen A et al.
Unsupervised Template-assisted Point Cloud Shape Correspondence Network
Jiacheng Deng, Jiahao Lu, Tianzhu Zhang
X-3D: Explicit 3D Structure Modeling for Point Cloud Recognition
Shuofeng Sun, Yongming Rao, Jiwen Lu et al.
Spectral and Polarization Vision: Spectro-polarimetric Real-world Dataset
Yujin Jeon, Eunsue Choi, Youngchan Kim et al.
Efficient Model Stealing Defense with Noise Transition Matrix
Dong-Dong Wu, Chilin Fu, Weichang Wu et al.
HOIAnimator: Generating Text-prompt Human-object Animations using Novel Perceptive Diffusion Models
Wenfeng Song, Xinyu Zhang, Shuai Li et al.
MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval
bowen zhang, Xiaojie Jin, Weibo Gong et al.
Diffusion Models Without Attention
Jing Nathan Yan, Jiatao Gu, Alexander Rush
HDQMF: Holographic Feature Decomposition Using Quantum Algorithms
Prathyush Poduval, Zhuowen Zou, Mohsen Imani
DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes
Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan et al.
H-ViT: A Hierarchical Vision Transformer for Deformable Image Registration
Morteza Ghahremani, Mohammad Khateri, Bailiang Jian et al.
Going Beyond Multi-Task Dense Prediction with Synergy Embedding Models
Huimin Huang, Yawen Huang, Lanfen Lin et al.
FLHetBench: Benchmarking Device and State Heterogeneity in Federated Learning
Junyuan Zhang, Shuang Zeng, Miao Zhang et al.
MR-VNet: Media Restoration using Volterra Networks
Siddharth Roheda, Amit Unde, Loay Rashid
OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition
Jianqiang Wan, Sibo Song, Wenwen Yu et al.
PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization
Xu Peng, Junwei Zhu, Boyuan Jiang et al.
MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation
Petru-Daniel Tudosiu, Yongxin Yang, Shifeng Zhang et al.
Dr.Hair: Reconstructing Scalp-Connected Hair Strands without Pre-Training via Differentiable Rendering of Line Segments
Yusuke Takimoto, Hikari Takehara, Hiroyuki Sato et al.
CroSel: Cross Selection of Confident Pseudo Labels for Partial-Label Learning
Shiyu Tian, Hongxin Wei, Yiqun Wang et al.
PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild
Kun Yuan, Hongbo Liu, Mading Li et al.
Improved Self-Training for Test-Time Adaptation
Jing Ma
Mudslide: A Universal Nuclear Instance Segmentation Method
Jun Wang
Collaborative Learning of Anomalies with Privacy (CLAP) for Unsupervised Video Anomaly Detection: A New Baseline
Anas Al-lahham, Muhammad Zaigham Zaheer, Nurbek Tastan et al.
Cache Me if You Can: Accelerating Diffusion Models through Block Caching
Felix Wimbauer, Bichen Wu, Edgar Schoenfeld et al.
Rewrite the Stars
Xu Ma, Xiyang Dai, Yue Bai et al.
Virtual Immunohistochemistry Staining for Histological Images Assisted by Weakly-supervised Learning
Jiahan Li, Jiuyang Dong, Shenjin Huang et al.
3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features
Chenfeng Xu, Huan Ling, Sanja Fidler et al.
Model Adaptation for Time Constrained Embodied Control
Jaehyun Song, Minjong Yoo, Honguk Woo
DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data
Chengxiang Fan, Muzhi Zhu, Hao Chen et al.
SPAD: Spatially Aware Multi-View Diffusers
Yash Kant, Aliaksandr Siarohin, Ziyi Wu et al.
SCE-MAE: Selective Correspondence Enhancement with Masked Autoencoder for Self-Supervised Landmark Estimation
Kejia Yin, Varshanth Rao, Ruowei Jiang et al.
DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation
Chenyang Wang, Zerong Zheng, Tao Yu et al.
SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction
Pin Tang, Zhongdao Wang, Guoqing Wang et al.
Beyond First-Order Tweedie: Solving Inverse Problems using Latent Diffusion
Litu Rout, Yujia Chen, Abhishek Kumar et al.
Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training
Arun Reddy, William Paul, Corban Rivera et al.
RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection
Zhiwei Lin, Zhe Liu, Zhongyu Xia et al.
FineParser: A Fine-grained Spatio-temporal Action Parser for Human-centric Action Quality Assessment
Jinglin Xu, Sibo Yin, Guohao Zhao et al.
SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes
Alexandros Delitzas, Ayça Takmaz, Federico Tombari et al.
MAPLM: A Real-World Large-Scale Vision-Language Benchmark for Map and Traffic Scene Understanding
Xu Cao, Tong Zhou, Yunsheng Ma et al.
Do Vision and Language Encoders Represent the World Similarly?
Mayug Maniparambil, Raiymbek Akshulakov, YASSER ABDELAZIZ DAHOU DJILALI et al.
Weakly Supervised Point Cloud Semantic Segmentation via Artificial Oracle
Hyeokjun Kweon, Jihun Kim, Kuk-Jin Yoon
Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training
Runze He, Shaofei Huang, Xuecheng Nie et al.
Construct to Associate: Cooperative Context Learning for Domain Adaptive Point Cloud Segmentation
Guangrui Li
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
Hao Li, Xue Yang, Zhaokai Wang et al.
Wavelet-based Fourier Information Interaction with Frequency Diffusion Adjustment for Underwater Image Restoration
Chen Zhao, Weiling Cai, Chenyu Dong et al.
Generating Content for HDR Deghosting from Frequency View
Tao Hu, Qingsen Yan, Yuankai Qi et al.
Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion
Yuanxun Lu, Jingyang Zhang, Shiwei Li et al.
Not All Prompts Are Secure: A Switchable Backdoor Attack Against Pre-trained Vision Transfomers
Sheng Yang, Jiawang Bai, Kuofeng Gao et al.
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and Planning
Sijin Chen, Xin Chen, Chi Zhang et al.
GenTron: Diffusion Transformers for Image and Video Generation
Shoufa Chen, Mengmeng Xu, Jiawei Ren et al.
Map-Relative Pose Regression for Visual Re-Localization
Shuai Chen, Tommaso Cavallari, Victor Adrian Prisacariu et al.
Gradient-based Parameter Selection for Efficient Fine-Tuning
Zhi Zhang, Qizhe Zhang, Zijun Gao et al.
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov et al.
Backpropagation-free Network for 3D Test-time Adaptation
YANSHUO WANG, Ali Cheraghian, Zeeshan Hayder et al.
TransNeXt: Robust Foveal Visual Perception for Vision Transformers
Dai Shi
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
Zigang Geng, Binxin Yang, Tiankai Hang et al.
HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation
Linglin Jing, Yiming Ding, Yunpeng Gao et al.
Promptable Behaviors: Personalizing Multi-Objective Rewards from Human Preferences
Minyoung Hwang, Luca Weihs, Chanwoo Park et al.
Fourier Priors-Guided Diffusion for Zero-Shot Joint Low-Light Enhancement and Deblurring
Xiaoqian Lv, Shengping Zhang, Chenyang Wang et al.
Towards General Robustness Verification of MaxPool-based Convolutional Neural Networks via Tightening Linear Approximation
Yuan Xiao, Shiqing Ma, Juan Zhai et al.
RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D
Lingteng Qiu, Guanying Chen, Xiaodong Gu et al.
Robust Synthetic-to-Real Transfer for Stereo Matching
Jiawei Zhang, Jiahe Li, Lei Huang et al.
Understanding and Improving Source-free Domain Adaptation from a Theoretical Perspective
Yu Mitsuzumi, Akisato Kimura, Hisashi Kashima
From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding
Yonglu Li, Xiaoqian Wu, Xinpeng Liu et al.
LowRankOcc: Tensor Decomposition and Low-Rank Recovery for Vision-based 3D Semantic Occupancy Prediction
Linqing Zhao, Xiuwei Xu, Ziwei Wang et al.
Overcoming Generic Knowledge Loss with Selective Parameter Update
Wenxuan Zhang, Paul Janson, Rahaf Aljundi et al.
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
Hao Ouyang, Qiuyu Wang, Yuxi Xiao et al.
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
Ruyang Liu, Chen Li, Yixiao Ge et al.
Video Frame Interpolation via Direct Synthesis with the Event-based Reference
Yuhan Liu, Yongjian Deng, Hao Chen et al.
Lane2Seq: Towards Unified Lane Detection via Sequence Generation
Kunyang Zhou
CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation
Bo-Yuan Sun, Yuqi Yang, Le Zhang et al.
Rethinking Boundary Discontinuity Problem for Oriented Object Detection
Hang Xu, Xinyuan Liu, Haonan Xu et al.
MCNet: Rethinking the Core Ingredients for Accurate and Efficient Homography Estimation
Haokai Zhu, Si-Yuan Cao, Jianxin Hu et al.
UniDepth: Universal Monocular Metric Depth Estimation
Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis et al.
Diffusion Model Alignment Using Direct Preference Optimization
Bram Wallace, Meihua Dang, Rafael Rafailov et al.
SD4Match: Learning to Prompt Stable Diffusion Model for Semantic Matching
Xinghui Li, Jingyi Lu, Kai Han et al.
Uncertainty-Guided Never-Ending Learning to Drive
Lei Lai, Eshed Ohn-Bar, Sanjay Arora et al.
Feedback-Guided Autonomous Driving
Jimuyang Zhang, Zanming Huang, Arijit Ray et al.
Small Steps and Level Sets: Fitting Neural Surface Models with Point Guidance
Chamin Hewa Koneputugodage, Yizhak Ben-Shabat, Dylan Campbell et al.
Adapt or Perish: Adaptive Sparse Transformer with Attentive Feature Refinement for Image Restoration
Shihao Zhou, Duosheng Chen, Jinshan Pan et al.
3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos
Jiakai Sun, Han Jiao, Guangyuan Li et al.
LTM: Lightweight Textured Mesh Extraction and Refinement of Large Unbounded Scenes for Efficient Storage and Real-time Rendering
Jaehoon Choi, Rajvi Shah, Qinbo Li et al.
Geometry Transfer for Stylizing Radiance Fields
Hyunyoung Jung, Seonghyeon Nam, Nikolaos Sarafianos et al.
3D Human Pose Perception from Egocentric Stereo Videos
Hiroyasu Akada, Jian Wang, Vladislav Golyanik et al.
QN-Mixer: A Quasi-Newton MLP-Mixer Model for Sparse-View CT Reconstruction
Ishak Ayad, Nicolas Larue, Mai K. Nguyen
Check Locate Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation
Biao Gong, Siteng Huang, Yutong Feng et al.
Prompt3D: Random Prompt Assisted Weakly-Supervised 3D Object Detection
Xiaohong Zhang, Huisheng Ye, Jingwen Li et al.
Boosting Self-Supervision for Single-View Scene Completion via Knowledge Distillation
Keonhee Han, Dominik Muhle, Felix Wimbauer et al.
Volumetric Environment Representation for Vision-Language Navigation
Liu, Wenguan Wang, Yi Yang
CrossKD: Cross-Head Knowledge Distillation for Object Detection
JiaBao Wang, yuming chen, Zhaohui Zheng et al.
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation
Jiaming Liu, Ran Xu, Senqiao Yang et al.
TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing
Sherry X. Chen, Yaron Vaxman, Elad Ben Baruch et al.
Leveraging Camera Triplets for Efficient and Accurate Structure-from-Motion
Lalit Manam, Venu Madhav Govindu
CG-HOI: Contact-Guided 3D Human-Object Interaction Generation
Christian Diller, Angela Dai
Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?
Hanxin Zhu, Tianyu He, Xin Li et al.
Resurrecting Old Classes with New Data for Exemplar-Free Continual Learning
Dipam Goswami, Albin Soutif, Yuyang Liu et al.
DIEM: Decomposition-Integration Enhancing Multimodal Insights
Xinyi Jiang, Guoming Wang, Junhao Guo et al.
Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters
Jiazuo Yu, Yunzhi Zhuge, Lu Zhang et al.
HOI-M^3: Capture Multiple Humans and Objects Interaction within Contextual Environment
Juze Zhang, Jingyan Zhang, Zining Song et al.
CORES: Convolutional Response-based Score for Out-of-distribution Detection
Keke Tang, Chao Hou, Weilong Peng et al.
Equivariant Multi-Modality Image Fusion
Zixiang Zhao, Haowen Bai, Jiangshe Zhang et al.
PDF: A Probability-Driven Framework for Open World 3D Point Cloud Semantic Segmentation
Jinfeng Xu, Siyuan Yang, Xianzhi Li et al.
NeISF: Neural Incident Stokes Field for Geometry and Material Estimation
Chenhao Li, Taishi Ono, Takeshi Uemori et al.
PromptKD: Unsupervised Prompt Distillation for Vision-Language Models
Zheng Li, Xiang Li, xinyi fu et al.
DeMatch: Deep Decomposition of Motion Field for Two-View Correspondence Learning
Shihua Zhang, Zizhuo Li, Yuan Gao et al.
Domain Gap Embeddings for Generative Dataset Augmentation
Yinong Oliver Wang, Younjoon Chung, Chen Henry Wu et al.
Domain-Agnostic Mutual Prompting for Unsupervised Domain Adaptation
Zhekai Du, Xinyao Li, Fengling Li et al.
TransLoc4D: Transformer-based 4D Radar Place Recognition
Guohao Peng, Heshan Li, Yangyang Zhao et al.
Higher-order Relational Reasoning for Pedestrian Trajectory Prediction
Sungjune Kim, Hyung-gun Chi, Hyerin Lim et al.
Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation
Jingyun Wang, Guoliang Kang
Leveraging Vision-Language Models for Improving Domain Generalization in Image Classification
Sravanti Addepalli, Ashish Asokan, Lakshay Sharma et al.
Absolute Pose from One or Two Scaled and Oriented Features
Jonathan Ventura, Zuzana Kukelova, Torsten Sattler et al.
Draw Step by Step: Reconstructing CAD Construction Sequences from Point Clouds via Multimodal Diffusion.
Weijian Ma, Shuaiqi Chen, Yunzhong Lou et al.
DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation
Zeeshan Hayder, Xuming He
Open-Vocabulary 3D Semantic Segmentation with Foundation Models
Li Jiang, Shaoshuai Shi, Bernt Schiele
Training Vision Transformers for Semi-Supervised Semantic Segmentation
Xinting Hu, Li Jiang, Bernt Schiele
APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation
Weizhao He, Yang Zhang, Wei Zhuo et al.
Design2Cloth: 3D Cloth Generation from 2D Masks
Jiali Zheng, Rolandos Alexandros Potamias, Stefanos Zafeiriou
S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes
Xingyi Li, Zhiguo Cao, Yizheng Wu et al.
SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation
Aysim Toker, Marvin Eisenberger, Daniel Cremers et al.
Dual-Consistency Model Inversion for Non-Exemplar Class Incremental Learning
Zihuan Qiu, Yi Xu, Fanman Meng et al.
DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes
Hao Yan, Zhihui Ke, Xiaobo Zhou et al.
Rolling Shutter Correction with Intermediate Distortion Flow Estimation
Mingdeng Cao, Sidi Yang, Yujiu Yang et al.
Towards Transferable Targeted 3D Adversarial Attack in the Physical World
Yao Huang, Yinpeng Dong, Shouwei Ruan et al.
Hybrid Functional Maps for Crease-Aware Non-Isometric Shape Matching
Lennart Bastian, Yizheng Xie, Nassir Navab et al.
Class Tokens Infusion for Weakly Supervised Semantic Segmentation
Sung-Hoon Yoon, Hoyong Kwon, Hyeonseong Kim et al.
SFOD: Spiking Fusion Object Detector
Yimeng Fan, Wei Zhang, Changsong Liu et al.
AnyDoor: Zero-shot Object-level Image Customization
Xi Chen, Lianghua Huang, Yu Liu et al.
SeD: Semantic-Aware Discriminator for Image Super-Resolution
Bingchen Li, Xin Li, Hanxin Zhu et al.
InstanceDiffusion: Instance-level Control for Image Generation
XuDong Wang, Trevor Darrell, Sai Saketh Rambhatla et al.
Robust Emotion Recognition in Context Debiasing
Dingkang Yang, Kun Yang, Mingcheng Li et al.
Improving Training Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architecture
Huijie Zhang, Yifu Lu, Ismail Alkhouri et al.
Balancing Act: Distribution-Guided Debiasing in Diffusion Models
Rishubh Parihar, Abhijnya Bhat, Abhipsa Basu et al.
Sieve: Multimodal Dataset Pruning using Image Captioning Models
Anas Mahmoud, Mostafa Elhoushi, Amro Abbas et al.
Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation
Song Wang, Jiawei Yu, Wentong Li et al.
Towards Fairness-Aware Adversarial Learning
Yanghao Zhang, Tianle Zhang, Ronghui Mu et al.
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Andong Wang, Bo Wu, Sunli Chen et al.
MuRF: Multi-Baseline Radiance Fields
Haofei Xu, Anpei Chen, Yuedong Chen et al.
Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans
Romain Loiseau, Elliot Vincent, Mathieu Aubry et al.
Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds
Tianrui Lou, Xiaojun Jia, Jindong Gu et al.
PIGEON: Predicting Image Geolocations
Lukas Haas, Michal Skreta, Silas Alberti et al.
JoAPR: Cleaning the Lens of Prompt Learning for Vision-Language Models
YUNCHENG GUO, Xiaodong Gu
Retrieval-Augmented Egocentric Video Captioning
Jilan Xu, Yifei Huang, Junlin Hou et al.
GPLD3D: Latent Diffusion of 3D Shape Generative Models by Enforcing Geometric and Physical Priors
Yuan Dong, Qi Zuo, Xiaodong Gu et al.
Low-Rank Knowledge Decomposition for Medical Foundation Models
Yuhang Zhou, Haolin li, Siyuan Du et al.
Pixel-level Semantic Correspondence through Layout-aware Representation Learning and Multi-scale Matching Integration
Yixuan Sun, Zhangyue Yin, Haibo Wang et al.
View From Above: Orthogonal-View aware Cross-view Localization
Shan Wang, Chuong Nguyen, Jiawei Liu et al.
WorDepth: Variational Language Prior for Monocular Depth Estimation
Ziyao Zeng, Hyoungseob Park, Fengyu Yang et al.
Event-assisted Low-Light Video Object Segmentation
Li Hebei, Jin Wang, Jiahui Yuan et al.
3DToonify: Creating Your High-Fidelity 3D Stylized Avatar Easily from 2D Portrait Images
Yifang Men, Hanxi Liu, Yuan Yao et al.
Synthesize Diagnose and Optimize: Towards Fine-Grained Vision-Language Understanding
Wujian Peng, Sicheng Xie, Zuyao You et al.
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
Tai Wang, Xiaohan Mao, Chenming Zhu et al.
DIOD: Self-Distillation Meets Object Discovery
Sandra Kara, Hejer AMMAR, Julien Denize et al.
FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models
LIn Zhao, Tianchen Zhao, Zinan Lin et al.
COLMAP-Free 3D Gaussian Splatting
Yang Fu, Sifei Liu, Amey Kulkarni et al.
SNED: Superposition Network Architecture Search for Efficient Video Diffusion Model
Zhengang Li, Yan Kang, Yuchen Liu et al.
Personalized Residuals for Concept-Driven Text-to-Image Generation
Cusuh Ham, Matthew Fisher, James Hays et al.
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
Seokju Cho, Heeseong Shin, Sunghwan Hong et al.
Deep Generative Model based Rate-Distortion for Image Downscaling Assessment
yuanbang liang, Bhavesh Garg, Paul L. Rosin et al.
Forecasting of 3D Whole-body Human Poses with Grasping Objects
yan haitao, Qiongjie Cui, Jiexin Xie et al.
VA3: Virtually Assured Amplification Attack on Probabilistic Copyright Protection for Text-to-Image Generative Models
Xiang Li, Qianli Shen, Kenji Kawaguchi
PIE-NeRF: Physics-based Interactive Elastodynamics with NeRF
Yutao Feng, Yintong Shang, Xuan Li et al.
SNI-SLAM: Semantic Neural Implicit SLAM
Siting Zhu, Guangming Wang, Hermann Blum et al.
Edge-Aware 3D Instance Segmentation Network with Intelligent Semantic Prior
Wonseok Roh, Hwanhee Jung, Giljoo Nam et al.
TextureDreamer: Image-Guided Texture Synthesis Through Geometry-Aware Diffusion
Yu-Ying Yeh, Jia-Bin Huang, Changil Kim et al.
MAFA: Managing False Negatives for Vision-Language Pre-training
Jaeseok Byun, Dohoon Kim, Taesup Moon
Blur2Blur: Blur Conversion for Unsupervised Image Deblurring on Unknown Domains
Bang-Dang Pham, Phong Tran, Anh Tran et al.
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models
Ozgur Kara, Bariscan Kurtkaya, Hidir Yesiltepe et al.
ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for Autonomous Vehicles
Jiawei Zhang, Chejian Xu, Bo Li
MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
Jielin Qiu, Jiacheng Zhu, William Han et al.
Generalizable Novel-View Synthesis using a Stereo Camera
Haechan Lee, Wonjoon Jin, Seung-Hwan Baek et al.
Learning Structure-from-Motion with Graph Attention Networks
Lucas Brynte, José Pedro Iglesias, Carl Olsson et al.
Don’t Drop Your Samples! Coherence-Aware Training Benefits Conditional Diffusion
Nicolas Dufour, Victor Besnier, Vicky Kalogeiton et al.
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection
Peng Qi, Zehong Yan, Wynne Hsu et al.
Spatial-Aware Regression for Keypoint Localization
Dongkai Wang, Shiliang Zhang