Most Cited 2024 "planning" Papers
12,324 papers found • Page 9 of 62
Conference
Improving Plasticity in Online Continual Learning via Collaborative Learning
Maorong Wang, Nicolas Michel, Ling Xiao et al.
ColorPeel: Color Prompt Learning with Diffusion Models via Color and Shape Disentanglement
Muhammad Atif Butt, Kai Wang, Javier Vazquez-Corral et al.
MLP Can Be A Good Transformer Learner
Sihao Lin, Pumeng Lyu, Dongrui Liu et al.
VideoMAC: Video Masked Autoencoders Meet ConvNets
Gensheng Pei, Tao Chen, Xiruo Jiang et al.
Grid Diffusion Models for Text-to-Video Generation
Taegyeong Lee, Soyeong Kwon, Taehwan Kim
Neuro-Inspired Information-Theoretic Hierarchical Perception for Multimodal Learning
Xiongye Xiao, Gengshuo Liu, Gaurav Gupta et al.
PerAda: Parameter-Efficient Federated Learning Personalization with Generalization Guarantees
Chulin Xie, De-An Huang, Wenda Chu et al.
Upper Bounding Barlow Twins: A Novel Filter for Multi-Relational Clustering
Xiaowei Qian, Bingheng Li, Zhao Kang
ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models
Yi-Lin Sung, Jaehong Yoon, Mohit Bansal
Federated Learning with Extremely Noisy Clients via Negative Distillation
Yang Lu, Lin Chen, Yonggang Zhang et al.
Pre-training Sequence, Structure, and Surface Features for Comprehensive Protein Representation Learning
Youhan Lee, Hasun Yu, Jaemyung Lee et al.
Improving Video Segmentation via Dynamic Anchor Queries
Yikang Zhou, Tao Zhang, Xiangtai Li et al.
Instrumental Variable Estimation for Causal Inference in Longitudinal Data with Time-Dependent Latent Confounders
Debo Cheng, Ziqi Xu, Jiuyong Li et al.
FlowTrack: Revisiting Optical Flow for Long-Range Dense Tracking
Seokju Cho, Gabriel Huang, Seungryong Kim et al.
Exploring Diverse Representations for Open Set Recognition
Yu Wang, Junxian Mu, Pengfei Zhu et al.
Embarrassingly Simple Dataset Distillation
Yunzhen Feng, Shanmukha Ramakrishna Vedantam, Julia Kempe
Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation
Marco Mistretta, Alberto Baldrati, Marco Bertini et al.
WordRobe: Text-Guided Generation of Textured 3D Garments
Astitva Srivastava, Pranav Manu, Amit Raj et al.
Conformal Autoregressive Generation: Beam Search with Coverage Guarantees
Nicolas Deutschmann, Marvin Alberts, María Rodríguez Martínez
DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement
Hao Wu, Huabin Liu, Yu Qiao et al.
Aligning Geometric Spatial Layout in Cross-View Geo-Localization via Feature Recombination
Qingwang Zhang, Yingying Zhu
Training-free Video Temporal Grounding using Large-scale Pre-trained Models
Minghang Zheng, Xinhao Cai, Qingchao Chen et al.
Improving Cross-Modal Alignment with Synthetic Pairs for Text-Only Image Captioning
Zhiyue Liu, Jinyuan Liu, Fanrong Ma
ConR: Contrastive Regularizer for Deep Imbalanced Regression
Mahsa Keramati, Lili Meng, R. Evans
STAR: Boosting Low-Resource Information Extraction by Structure-to-Text Data Generation with Large Language Models
Mingyu Derek Ma, Xiaoxuan Wang, Po-Nien Kung et al.
Learning to Predict Activity Progress by Self-Supervised Video Alignment
Gerard Donahue, Ehsan Elhamifar
MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception
Mohammad Mahbubur Rahman, Ryoma Yataka, Sorachi Kato et al.
AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis
Tao Tang, Guangrun Wang, Yixing Lao et al.
AMEGO: Active Memory from long EGOcentric videos
Gabriele Goletto, Tushar Nagarajan, Giuseppe Averta et al.
Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration
Chu Jie Qin, Ruiqi Wu, Zikun Liu et al.
One-Class Face Anti-spoofing via Spoof Cue Map-Guided Feature Learning
Pei-Kai Huang, Cheng-Hsuan Chiang, Tzu-Hsien Chen et al.
Domain Randomization via Entropy Maximization
Gabriele Tiboni, Pascal Klink, Jan Peters et al.
Navigation Instruction Generation with BEV Perception and Large Language Models
Sheng Fan, Rui Liu, Wenguan Wang et al.
Loose Inertial Poser: Motion Capture with IMU-attached Loose-Wear Jacket
Chengxu Zuo, Yiming Wang, Lishuang Zhan et al.
You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval
Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain et al.
NeRF-HuGS: Improved Neural Radiance Fields in Non-static Scenes Using Heuristics-Guided Segmentation
Jiahao Chen, Yipeng Qin, Lingjie Liu et al.
360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries
Huajian Huang, Changkun Liu, Yipeng Zhu et al.
ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis
Kensen Shi, Joey Hong, Yinlin Deng et al.
Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation
Yunheng Li, Zhong-Yu Li, Quan-Sheng Zeng et al.
FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients
Shangchao Su, Bin Li, Xiangyang Xue
Multi-Level Neural Scene Graphs for Dynamic Urban Environments
Tobias Fischer, Lorenzo Porzi, Samuel Rota Bulò et al.
MirrorGaussian: Reflecting 3D Gaussians for Reconstructing Mirror Reflections
Jiayue Liu, Tang Xiao, Freeman Cheng et al.
Leaving the Nest: Going beyond Local Loss Functions for Predict-Then-Optimize
Sanket Shah, Bryan Wilder, Andrew Perrault et al.
SelEx: Self-Expertise in Fine-Grained Generalized Category Discovery
Sarah Rastegar, Mohammadreza Salehi, Yuki M Asano et al.
Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use
Imad Eddine Toubal, Aditya Avinash, Neil Alldrin et al.
ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and Self-Prompting
Yankai Jiang, Zhongzhen Huang, Rongzhao Zhang et al.
On the Variance of Neural Network Training with respect to Test Sets and Distributions
Keller Jordan
Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering
Antoine Guedon, Vincent Lepetit
Underwater Organism Color Fine-Tuning via Decomposition and Guidance
Xiaofeng Cong, Jie Gui, Junming Hou
Diffusion for Natural Image Matting
Yihan Hu, Yiheng Lin, Wei Wang et al.
Exploring the Promise and Limits of Real-Time Recurrent Learning
Kazuki Irie, Anand Gopalakrishnan, Jürgen Schmidhuber
Towards Open Domain Text-Driven Synthesis of Multi-Person Motions
Shan Mengyi, Lu Dong, Yutao Han et al.
GOODAT: Towards Test-Time Graph Out-of-Distribution Detection
Luzhi Wang, Di Jin, He Zhang et al.
Sketch and Refine: Towards Fast and Accurate Lane Detection
Chao Chen, Jie Liu, Chang Zhou et al.
A Graph-Based Approach for Category-Agnostic Pose Estimation
Or Hirschorn, Shai Avidan
EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models
Ruoxi Chen, Haibo Jin, Yixin Liu et al.
PORF: POSE RESIDUAL FIELD FOR ACCURATE NEURAL SURFACE RECONSTRUCTION
Jia-Wang Bian, Wenjing Bian, Victor Prisacariu et al.
ASAM: Boosting Segment Anything Model with Adversarial Tuning
Bo Li, Haoke Xiao, Lv Tang
An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning
Jianqing Zhang, Yang Liu, Yang Hua et al.
Steerers: A Framework for Rotation Equivariant Keypoint Descriptors
Georg Bökman, Johan Edstedt, Michael Felsberg et al.
Self-Distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach
Ziyin Zhang, Ning Lu, Minghui Liao et al.
Generating and Reweighting Dense Contrastive Patterns for Unsupervised Anomaly Detection
Songmin Dai, Yifan Wu, Xiaoqiang Li et al.
Distilling Vision-Language Models on Millions of Videos
Yue Zhao, Long Zhao, Xingyi Zhou et al.
Long-Tailed Anomaly Detection with Learnable Class Names
Chih-Hui Ho, Kuan-Chuan Peng, Nuno Vasconcelos
WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models
Zijian He, Peixin Chen, Guangrun Wang et al.
TeTriRF: Temporal Tri-Plane Radiance Fields for Efficient Free-Viewpoint Video
Minye Wu, Zehao Wang, Georgios Kouros et al.
Structure-Guided Adversarial Training of Diffusion Models
Ling Yang, Haotian Qian, Zhilong Zhang et al.
A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified Benchmark
Jakub Paplham, Vojtech Franc
Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks
Siyu Zou, Jiji Tang, Yiyi Zhou et al.
HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras
Zhongyu Xia, ZhiWei Lin, Xinhao Wang et al.
Dexterous Grasp Transformer
Guo-Hao Xu, Yi-Lin Wei, Dian Zheng et al.
Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents
Arrasy Rahman, Jiaxun Cui, Peter Stone
SmartControl: Enhancing ControlNet for Handling Rough Visual Conditions
XIAOYU LIU, Yuxiang WEI, Ming LIU et al.
HR-Pro: Point-Supervised Temporal Action Localization via Hierarchical Reliability Propagation
Huaxin Zhang, Xiang Wang, Xiaohao Xu et al.
DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
Yunhan Yang, Yukun Huang, Xiaoyang Wu et al.
Diffusion Language-Shapelets for Semi-supervised Time-Series Classification
Zhen Liu, Wenbin Pei, Disen Lan et al.
NViST: In the Wild New View Synthesis from a Single Image with Transformers
Wonbong Jang, Lourdes Agapito
As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion Priors
Seungwoo Yoo, Kunho Kim, Vladimir G. Kim et al.
Unsupervised Keypoints from Pretrained Diffusion Models
Eric Hedlin, Gopal Sharma, Shweta Mahajan et al.
Weighted Envy-Freeness for Submodular Valuations
Luisa Montanari, Ulrike Schmidt-Kraepelin, Warut Suksompong et al.
Exemplar-free Continual Representation Learning via Learnable Drift Compensation
Alex Gomez-Villa, Dipam Goswami, Kai Wang et al.
Scalable 3D Registration via Truncated Entry-wise Absolute Residuals
Tianyu Huang, Liangzu Peng, Rene Vidal et al.
Any2Point: Empowering Any-modality Transformers for Efficient 3D Understanding
YIWEN TANG, Renrui Zhang, Jiaming Liu et al.
iHuman: Instant Animatable Digital Humans From Monocular Videos
Pramish Paudel, Anubhav Khanal, Danda Pani Paudel et al.
Customizing Language Model Responses with Contrastive In-Context Learning
Xiang Gao, Kamalika Das
Continuous Memory Representation for Anomaly Detection
Joo Chan Lee, Taejune Kim, Eunbyung Park et al.
FusionFormer: A Concise Unified Feature Fusion Transformer for 3D Pose Estimation
Yanlu Cai, Weizhong Zhang, Yuan Wu et al.
FedFixer: Mitigating Heterogeneous Label Noise in Federated Learning
Xinyuan Ji, Zhaowei Zhu, Wei Xi et al.
SPIRE: Semantic Prompt-Driven Image Restoration
Chenyang Qi, Zhengzhong Tu, Keren Ye et al.
RPSC: Robust Pseudo-Labeling for Semantic Clustering
Sihang Liu, Wenming Cao, Ruigang Fu et al.
TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling
Dong Huo, Zixin Guo, Xinxin Zuo et al.
Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models
Yang Zhang, Tze Tzun Teoh, Wei Hern Lim et al.
Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive
Yumeng Li, Margret Keuper, Dan Zhang et al.
Distinguished In Uniform: Self-Attention Vs. Virtual Nodes
Eran Rosenbluth, Jan Tönshoff, Martin Ritzert et al.
DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer
Wei-Ting Chen, Gurunandan Krishnan, Qiang Gao et al.
The Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement
Gabriele Trivigno, Carlo Masone, Barbara Caputo et al.
Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach
Wei Dong, Xing Zhang, Bihui Chen et al.
Robust Image Denoising through Adversarial Frequency Mixup
Donghun Ryou, Inju Ha, Hyewon Yoo et al.
LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels
Tuo Feng, Wenguan Wang, Fan Ma et al.
Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation
Ji-Jia Wu, Andy Chia-Hao Chang, Chieh-Yu Chuang et al.
InterFusion: Text-Driven Generation of 3D Human-Object Interaction
Sisi Dai, Wenhao Li, Haowen Sun et al.
ZeroI2V: Zero-Cost Adaptation of Pre-Trained Transformers from Image to Video
Xinhao Li, Yuhan Zhu, Limin Wang
Repeated Fair Allocation of Indivisible Items
Ayumi Igarashi, Martin Lackner, Oliviero Nardi et al.
HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields
Haozhe Qi, Chen Zhao, Mathieu Salzmann et al.
Is Retain Set All You Need in Machine Unlearning? Restoring Performance of Unlearned Models with Out-Of-Distribution Images
Jacopo Bonato, Marco Cotogni, Luigi Sabetta
Boosting of Thoughts: Trial-and-Error Problem Solving with Large Language Models
Sijia Chen, Baochun Li, Di Niu
Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices
Huancheng Chen, Haris Vikalo
WebVLN: Vision-and-Language Navigation on Websites
Qi Chen, Dileepa Pitawela, Chongyang Zhao et al.
Constrained Bayesian Optimization under Partial Observations: Balanced Improvements and Provable Convergence
Shengbo Wang, Ke Li
ODIN: A Single Model for 2D and 3D Segmentation
Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios et al.
Exploring Efficient Asymmetric Blind-Spots for Self-Supervised Denoising in Real-World Scenarios
Shiyan Chen, Jiyuan Zhang, Zhaofei Yu et al.
Adversarial Adaptive Sampling: Unify PINN and Optimal Transport for the Approximation of PDEs
Kejun Tang, Jiayu Zhai, Xiaoliang Wan et al.
GlitchBench: Can Large Multimodal Models Detect Video Game Glitches?
Mohammad Reza Taesiri, Tianjun Feng, Cor-Paul Bezemer et al.
OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects
Akshay Krishnan, Abhijit Kundu, Kevis Maninis et al.
Towards Real-world Event-guided Low-light Video Enhancement and Deblurring
Taewoo Kim, Jaeseok Jeong, Hoonhee Cho et al.
LAN: Learning to Adapt Noise for Image Denoising
Changjin Kim, Tae Hyun Kim, Sungyong Baik
CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction
Zhangchen Ye, Tao Jiang, Chenfeng Xu et al.
SparseCraft: Few-Shot Neural Reconstruction through Stereopsis Guided Geometric Linearization
Mae Younes, Amine Ouasfi, Adnane Boukhayma
Deep Incomplete Multi-View Learning Network with Insufficient Label Information
Zhangqi Jiang, Tingjin Luo, Xinyan Liang
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
Changan Chen, Puyuan Peng, Ami Baid et al.
Unified Embedding Alignment for Open-Vocabulary Video Instance Segmentation
Hao Fang, Peng Wu, Yawei Li et al.
You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation
Mehdi Noroozi, Isma Hadji, Brais Martinez et al.
Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models
Tianzhe Chu, Shengbang Tong, Tianjiao Ding et al.
Centering the Value of Every Modality: Towards Efficient and Resilient Modality-agnostic Semantic Segmentation
Xu Zheng, Yuanhuiyi Lyu, jiazhou zhou et al.
Bounds on Representation-Induced Confounding Bias for Treatment Effect Estimation
Valentyn Melnychuk, Dennis Frauen, Stefan Feuerriegel
Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer
Hyeongjin Nam, Daniel Jung, Gyeongsik Moon et al.
Towards Neuro-Symbolic Video Understanding
Minkyu Choi, Harsh Goel, Mohammad Omama et al.
Contourlet Residual for Prompt Learning Enhanced Infrared Image Super-Resolution
Xingyuan Li, Jinyuan Liu, ZHIXIN CHEN et al.
Robust-Wide: Robust Watermarking against Instruction-driven Image Editing
Runyi Hu, Jie Zhang, Ting Xu et al.
ConGeo: Robust Cross-view Geo-localization across Ground View Variations
Li Mi, Chang Xu, Javiera Castillo Navarro et al.
Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection
Jiacheng Zhang, Jiaming Li, Xiangru Lin et al.
Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation
Xinyao Li, Yuke Li, Zhekai Du et al.
LatentEditor: Text Driven Local Editing of 3D Scenes
Umar Khalid, Hasan Iqbal, Muhammad Tayyab et al.
ParamISP: Learned Forward and Inverse ISPs using Camera Parameters
Woohyeok Kim, Geonu Kim, Junyong Lee et al.
RecDiffusion: Rectangling for Image Stitching with Diffusion Models
Tianhao Zhou, Li Haipeng, Ziyi Wang et al.
Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification
Chao Yi, Lu Ren, De-Chuan Zhan et al.
LayoutFlow: Flow Matching for Layout Generation
Julian Jorge Andrade Guerreiro, Naoto Inoue, Kento Masui et al.
SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
Weitai Kang, Gaowen Liu, Shah Mubarak et al.
OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation
Zhenyu Wang, Ya-Li Li, TAICHI LIU et al.
LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion
Pancheng Zhao, Peng Xu, Pengda Qin et al.
Federated Q-Learning: Linear Regret Speedup with Low Communication Cost
Zhong Zheng, Fengyu Gao, Lingzhou Xue et al.
Improving Transferable Targeted Adversarial Attacks with Model Self-Enhancement
Han Wu, Guanyan Ou, Weibin Wu et al.
GarmentCodeData: A Dataset of 3D Made-to-Measure Garments With Sewing Patterns
Maria Korosteleva, Timur Levent Kesdogan, Fabian Kemper et al.
Deep Orthogonal Hypersphere Compression for Anomaly Detection
Yunhe Zhang, Yan Sun, Jinyu Cai et al.
CA-Jaccard: Camera-aware Jaccard Distance for Person Re-identification
Yiyu Chen, Zheyi Fan, Zhaoru Chen et al.
Neuroformer: Multimodal and Multitask Generative Pretraining for Brain Data
Antonis Antoniades, Yiyi Yu, Joe Canzano et al.
LiDAR-based Person Re-identification
Wenxuan Guo, Zhiyu Pan, Yingping Liang et al.
Theoretically Achieving Continuous Representation of Oriented Bounding Boxes
Zikai Xiao, Guo-Ye Yang, Xue Yang et al.
To Grok or not to Grok: Disentangling Generalization and Memorization on Corrupted Algorithmic Datasets
Darshil Doshi, Aritra Das, Tianyu He et al.
VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning
Kang Chen, Xiangqian Wu
NICP: Neural ICP for 3D Human Registration at Scale
Riccardo Marin, Enric Corona, Gerard Pons-Moll
Discriminability-Driven Channel Selection for Out-of-Distribution Detection
Yue Yuan, Rundong He, Yicong Dong et al.
When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset
Yi Zhang, Wang Zeng, Sheng Jin et al.
Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual Learning
Jinglin Liang, Jin Zhong, Hanlin Gu et al.
ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval
Fang Kaipeng, Jingkuan Song, Lianli Gao et al.
Efficient Subgraph GNNs by Learning Effective Selection Policies
Beatrice Bevilacqua, Moshe Eliasof, Eli Meirom et al.
StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion
Ming Tao, BINGKUN BAO, Hao Tang et al.
Unprocessing Seven Years of Algorithmic Fairness
André F. Cruz, Moritz Hardt
MetaCoCo: A New Few-Shot Classification Benchmark with Spurious Correlation
Min Zhang, Haoxuan Li, Fei Wu et al.
Text-Enhanced Data-free Approach for Federated Class-Incremental Learning
Minh-Tuan Tran, Trung Le, Xuan-May Le et al.
Cloud-Device Collaborative Learning for Multimodal Large Language Models
Guanqun Wang, Jiaming Liu, Chenxuan Li et al.
Relightable and Animatable Neural Avatars from Videos
Wenbin Lin, Chengwei Zheng, Jun-hai Yong et al.
AssistGUI: Task-Oriented PC Graphical User Interface Automation
Difei Gao, Lei Ji, Zechen Bai et al.
Class Incremental Learning via Likelihood Ratio Based Task Prediction
Haowei Lin, Yijia Shao, Weinan Qian et al.
MESA: Matching Everything by Segmenting Anything
Yesheng Zhang, Xu Zhao
Locality Sensitive Sparse Encoding for Learning World Models Online
Zichen Liu, Chao Du, Wee Sun Lee et al.
Rotation-Agnostic Image Representation Learning for Digital Pathology
Saghir Alfasly, Abubakr Shafique, Peyman Nejat et al.
PromptCCD: Learning Gaussian Mixture Prompt Pool for Continual Category Discovery
Fernando Julio Cendra, Bingchen Zhao, Kai Han
FreePoint: Unsupervised Point Cloud Instance Segmentation
Zhikai Zhang, Jian Ding, Li Jiang et al.
Geometric-Facilitated Denoising Diffusion Model for 3D Molecule Generation
6428 Can Xu, Haosen Wang, Weigang Wang et al.
MoST: Motion Style Transformer Between Diverse Action Contents
Boeun Kim, Jungho Kim, Hyung Jin Chang et al.
PILoRA: Prototype Guided Incremental LoRA for Federated Class-Incremental Learning
Haiyang Guo, Fei Zhu, Wenzhuo Liu et al.
Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models
Ruibin Li, Ruihuang Li, Song Guo et al.
Text Image Inpainting via Global Structure-Guided Diffusion Models
Shipeng Zhu, Pengfei Fang, Chenjie Zhu et al.
InsMapper: Exploring Inner-instance Information for Vectorized HD Mapping
Zhenhua Xu, Kwan-Yee K. Wong, Hengshuang ZHAO
DVSAI: Diverse View-Shared Anchors Based Incomplete Multi-View Clustering
Shengju Yu, Siwei Wang, Pei Zhang et al.
Adaptive Bounding Box Uncertainties via Two-Step Conformal Prediction
Alexander Timans, Christoph-Nikolas Straehle, Kaspar Sakmann et al.
FedMef: Towards Memory-efficient Federated Dynamic Pruning
Hong Huang, Weiming Zhuang, Chen Chen et al.
OVOR: OnePrompt with Virtual Outlier Regularization for Rehearsal-Free Class-Incremental Learning
Wei-Cheng Huang, Chun-Fu Chen, Hsiang Hsu
STDiff: Spatio-Temporal Diffusion for Continuous Stochastic Video Prediction
Xi Ye, Guillaume-Alexandre Bilodeau
Fair-VPT: Fair Visual Prompt Tuning for Image Classification
Sungho Park, Hyeran Byun
Improving Virtual Try-On with Garment-focused Diffusion Models
Siqi Wan, Yehao Li, Jingwen Chen et al.
A Simple and Effective Point-based Network for Event Camera 6-DOFs Pose Relocalization
Hongwei Ren, Jiadong Zhu, Yue Zhou et al.
T4P: Test-Time Training of Trajectory Prediction via Masked Autoencoder and Actor-specific Token Memory
Daehee Park, Jaeseok Jeong, Sung-Hoon Yoon et al.
MM-Point: Multi-View Information-Enhanced Multi-Modal Self-Supervised 3D Point Cloud Understanding
HaiTao Yu, Mofei Song
Benchmarking Algorithms for Federated Domain Generalization
Ruqi Bai, Saurabh Bagchi, David Inouye
Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring
Huicong Zhang, Haozhe Xie, Hongxun Yao
FAR: Flexible Accurate and Robust 6DoF Relative Camera Pose Estimation
Chris Rockwell, Nilesh Kulkarni, Linyi Jin et al.
Code-Style In-Context Learning for Knowledge-Based Question Answering
Zhijie Nie, Richong Zhang, Zhongyuan Wang et al.
Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping
Zijie Pan, Jiachen Lu, Xiatian Zhu et al.
Open-Set Domain Adaptation for Semantic Segmentation
Seun-An Choe, Ah-Hyung Shin, Keon Hee Park et al.
Traffic Flow Optimisation for Lifelong Multi-Agent Path Finding
Zhe Chen, Daniel Harabor, Jiaoyang Li et al.
Diverse Person: Customize Your Own Dataset for Text-Based Person Search
Zifan Song, Guosheng Hu, Cairong Zhao
Towards Faithful XAI Evaluation via Generalization-Limited Backdoor Watermark
Mengxi Ya, Yiming Li, Tao Dai et al.
Temporally and Distributionally Robust Optimization for Cold-Start Recommendation
Xinyu Lin, Wenjie Wang, Jujia Zhao et al.
Dataset Enhancement with Instance-Level Augmentations
Orest Kupyn, Christian Rupprecht
HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation
Yongliang Lin, Yongzhi Su, Praveen Nathan et al.
GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction
Yuxuan Mu, Xinxin Zuo, Chuan Guo et al.
TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding
Zhihao Zhang, Shengcao Cao, Yu-Xiong Wang
A Dual-Way Enhanced Framework from Text Matching Point of View for Multimodal Entity Linking
Shezheng Song, Shan Zhao, ChengYu Wang et al.