Most Cited 2024 "parameterized environment configurations" Papers
12,324 papers found • Page 53 of 62
Conference
CLiC: Concept Learning in Context
Mehdi Safaee, Aryan Mikaeili, Or Patashnik et al.
Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology
Wenhao Tang, Fengtao ZHOU, Sheng Huang et al.
Real-time Acquisition and Reconstruction of Dynamic Volumes with Neural Structured Illumination
Yixin Zeng, Zoubin Bi, Yin Mingrui et al.
Perceptual Assessment and Optimization of HDR Image Rendering
Peibei Cao, Rafal Mantiuk, Kede Ma
Coherent Temporal Synthesis for Incremental Action Segmentation
Guodong Ding, Hans Golong, Angela Yao
Estimating Extreme 3D Image Rotations using Cascaded Attention
Shay Dekel, Yosi Keller, Martin Čadík
Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment Network
Yong Shu, Liquan Shen, Xiangyu Hu et al.
SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout Control
Jaskirat Singh, Jianming Zhang, Qing Liu et al.
GraCo: Granularity-Controllable Interactive Segmentation
Yian Zhao, Kehan Li, Zesen Cheng et al.
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
Shangchen Zhou, Peiqing Yang, Jianyi Wang et al.
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun, Ye Fang, Tong Wu et al.
Activity-Biometrics: Person Identification from Daily Activities
Shehreen Azad, Yogesh S. Rawat
Referring Image Editing: Object-level Image Editing via Referring Expressions
Chang Liu, Xiangtai Li, Henghui Ding
InNeRF360: Text-Guided 3D-Consistent Object Inpainting on 360-degree Neural Radiance Fields
Dongqing Wang, Tong Zhang, Alaa Abboud et al.
Unsupervised Blind Image Deblurring Based on Self-Enhancement
Lufei Chen, Xiangpeng Tian, Shuhua Xiong et al.
DGC-GNN: Leveraging Geometry and Color Cues for Visual Descriptor-Free 2D-3D Matching
Shuzhe Wang, Juho Kannala, Daniel Barath
FreeDrag: Feature Dragging for Reliable Point-based Image Editing
Pengyang Ling, Lin Chen, Pan Zhang et al.
GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians
Shenhan Qian, Tobias Kirschstein, Liam Schoneveld et al.
Explaining CLIP's Performance Disparities on Data from Blind/Low Vision Users
Daniela Massiceti, Camilla Longden, Agnieszka Słowik et al.
CURSOR: Scalable Mixed-Order Hypergraph Matching with CUR Decomposition
Qixuan Zheng, Ming Zhang, Hong Yan
FCS: Feature Calibration and Separation for Non-Exemplar Class Incremental Learning
Qiwei Li, Yuxin Peng, Jiahuan Zhou
GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs
Mustafa Munir, William Avery, Md Mostafijur Rahman et al.
Improving Graph Contrastive Learning via Adaptive Positive Sampling
Jiaming Zhuo, Feiyang Qin, Can Cui et al.
Bayesian Exploration of Pre-trained Models for Low-shot Image Classification
Yibo Miao, Yu lei, Feng Zhou et al.
RepKPU: Point Cloud Upsampling with Kernel Point Representation and Deformation
Yi Rong, Haoran Zhou, Kang Xia et al.
Context-Guided Spatio-Temporal Video Grounding
Xin Gu, Heng Fan, Yan Huang et al.
Logit Standardization in Knowledge Distillation
Shangquan Sun, Wenqi Ren, Jingzhi Li et al.
SPECAT: SPatial-spEctral Cumulative-Attention Transformer for High-Resolution Hyperspectral Image Reconstruction
Zhiyang Yao, Shuyang Liu, Xiaoyun Yuan et al.
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Jianyuan Wang, Nikita Karaev, Christian Rupprecht et al.
CPP-Net: Embracing Multi-Scale Feature Fusion into Deep Unfolding CP-PPA Network for Compressive Sensing
Zhen Guo, Hongping Gan
GenFlow: Generalizable Recurrent Flow for 6D Pose Refinement of Novel Objects
Sungphill Moon, Hyeontae Son, Dongcheol Hur et al.
DiffCast: A Unified Framework via Residual Diffusion for Precipitation Nowcasting
Demin Yu, Xutao Li, Yunming Ye et al.
Turb-Seg-Res: A Segment-then-Restore Pipeline for Dynamic Videos with Atmospheric Turbulence
Ripon Saha, Dehao Qin, Nianyi Li et al.
Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement
Ziyu Wang, Yue Xu, Cewu Lu et al.
SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Tongtian Yue, Jie Cheng, Longteng Guo et al.
ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation
Suraj Patni, Aradhye Agarwal, Chetan Arora
Bi-Causal: Group Activity Recognition via Bidirectional Causality
Youliang Zhang, Wenxuan Liu, danni xu et al.
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
Yutao Hu, Tianbin, Quanfeng Lu et al.
S²MVTC: a Simple yet Efficient Scalable Multi-View Tensor Clustering
Zhen Long, Qiyuan Wang, Yazhou Ren et al.
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Zhihao Yuan, Jinke Ren, Chun-Mei Feng et al.
Towards HDR and HFR Video from Rolling-Mixed-Bit Spikings
Yakun Chang, Yeliduosi Xiaokaiti, Yujia Liu et al.
Discriminative Probing and Tuning for Text-to-Image Generation
Leigang Qu, Wenjie Wang, Yongqi Li et al.
Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata
Dongsu Zhang, Francis Williams, Žan Gojčič et al.
Comparing the Decision-Making Mechanisms by Transformers and CNNs via Explanation Methods
Mingqi Jiang, Saeed Khorram, Li Fuxin
Attribute-Guided Pedestrian Retrieval: Bridging Person Re-ID with Internal Attribute Variability
Yan Huang, Zhang Zhang, Qiang Wu et al.
Weakly Misalignment-free Adaptive Feature Alignment for UAVs-based Multimodal Object Detection
Chen Chen, Jiahao Qi, Xingyue Liu et al.
Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization
Deng Li, Aming Wu, Yaowei Wang et al.
MVCPS-NeuS: Multi-view Constrained Photometric Stereo for Neural Surface Reconstruction
Hiroaki Santo, Fumio Okura, Yasuyuki Matsushita
OHTA: One-shot Hand Avatar via Data-driven Implicit Priors
Xiaozheng Zheng, Chao Wen, Zhuo Su et al.
MultiPhys: Multi-Person Physics-aware 3D Motion Estimation
Nicolás Ugrinovic, Boxiao Pan, Georgios Pavlakos et al.
LMDrive: Closed-Loop End-to-End Driving with Large Language Models
Hao Shao, Yuxuan Hu, Letian Wang et al.
CAGE: Controllable Articulation GEneration
Jiayi Liu, Hou In Ivan Tam, Ali Mahdavi Amiri et al.
GauHuman: Articulated Gaussian Splatting from Monocular Human Videos
Shoukang Hu, Tao Hu, Ziwei Liu
SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation
Jiehong Lin, lihua liu, Dekun Lu et al.
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation
Xiaoqi Li, Mingxu Zhang, Yiran Geng et al.
Mask4Align: Aligned Entity Prompting with Color Masks for Multi-Entity Localization Problems
Haoquan Zhang, Ronggang Huang, Yi Xie et al.
Global and Local Prompts Cooperation via Optimal Transport for Federated Learning
Hongxia Li, Wei Huang, Jingya Wang et al.
Dense Optical Tracking: Connecting the Dots
Guillaume Le Moing, Jean Ponce, Cordelia Schmid
Multi-agent Collaborative Perception via Motion-aware Robust Communication Network
Shixin Hong, Yu LIU, Zhi Li et al.
Rapid Motor Adaptation for Robotic Manipulator Arms
Yichao Liang, Kevin Ellis, João F. Henriques
LiSA: LiDAR Localization with Semantic Awareness
Bochun Yang, Zijun Li, Wen Li et al.
Adaptive Slot Attention: Object Discovery with Dynamic Slot Number
Ke Fan, Zechen Bai, Tianjun Xiao et al.
C3: High-Performance and Low-Complexity Neural Compression from a Single Image or Video
Hyunjik Kim, Matthias Bauer, Lucas Theis et al.
AttriHuman-3D: Editable 3D Human Avatar Generation with Attribute Decomposition and Indexing
Fan Yang, Tianyi Chen, XIAOSHENG HE et al.
iToF-flow-based High Frame Rate Depth Imaging
Yu Meng, Zhou Xue, Xu Chang et al.
Rethinking Human Motion Prediction with Symplectic Integral
Haipeng Chen, Kedi L yu, Zhenguang Liu et al.
DiVAS: Video and Audio Synchronization with Dynamic Frame Rates
Clara Maria Fernandez Labrador, Mertcan Akcay, Eitan Abecassis et al.
Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos
Chen Liu, Peike Li, Qingtao Yu et al.
Inter-X: Towards Versatile Human-Human Interaction Analysis
Liang Xu, Xintao Lv, Yichao Yan et al.
PartDistill: 3D Shape Part Segmentation by Vision-Language Model Distillation
Ardian Umam, Cheng-Kun Yang, Min-Hung Chen et al.
Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection
Jiawen Zhu, Choubo Ding, Yu Tian et al.
IBD-SLAM: Learning Image-Based Depth Fusion for Generalizable SLAM
Minghao Yin, Shangzhe Wu, Kai Han
Boosting Image Quality Assessment through Efficient Transformer Adaptation with Local Feature Enhancement
Kangmin Xu, Liang Liao, Jing Xiao et al.
Towards Realistic Scene Generation with LiDAR Diffusion Models
Haoxi Ran, Vitor Guizilini, Yue Wang
SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer
Rui Zhu, Yingwei Pan, Yehao Li et al.
KPConvX: Modernizing Kernel Point Convolution with Kernel Attention
Hugues Thomas, Yao-Hung Hubert Tsai, Timothy Barfoot et al.
Latency Correction for Event-guided Deblurring and Frame Interpolation
Yixin Yang, Jinxiu Liang, Bohan Yu et al.
NTO3D: Neural Target Object 3D Reconstruction with Segment Anything
Xiaobao Wei, Renrui Zhang, Jiarui Wu et al.
Adversarial Distillation Based on Slack Matching and Attribution Region Alignment
Shenglin Yin, Zhen Xiao, Mingxuan Song et al.
Multi-View Attentive Contextualization for Multi-View 3D Object Detection
Xianpeng Liu, Ce Zheng, Ming Qian et al.
AHIVE: Anatomy-aware Hierarchical Vision Encoding for Interactive Radiology Report Retrieval
Sixing Yan, William K. Cheung, Ivor Tsang et al.
SPU-PMD: Self-Supervised Point Cloud Upsampling via Progressive Mesh Deformation
Yanzhe Liu, Rong Chen, Yushi Li et al.
Self-Supervised Class-Agnostic Motion Prediction with Spatial and Temporal Consistency Regularizations
Kewei Wang, Yizheng Wu, Jun Cen et al.
MMA: Multi-Modal Adapter for Vision-Language Models
Lingxiao Yang, Ru-Yuan Zhang, Yanchen Wang et al.
Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment
Zheren Fu, Lei Zhang, Hou Xia et al.
A Category Agnostic Model for Visual Rearrangment
Yuyi Liu, Xinhang Song, Weijie Li et al.
Towards Progressive Multi-Frequency Representation for Image Warping
Jun Xiao, Zihang Lyu, Cong Zhang et al.
VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction
Jiaqi Lin, Zhihao Li, Xiao Tang et al.
OTE: Exploring Accurate Scene Text Recognition Using One Token
Jianjun Xu, Yuxin Wang, Hongtao Xie et al.
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
Yushi Hu, Otilia Stretcu, Chun-Ta Lu et al.
DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses
Chen Zhao, Tong Zhang, Zheng Dang et al.
TULIP: Multi-camera 3D Precision Assessment of Parkinson’s Disease
Kyungdo Kim, Sihan Lyu, Sneha Mantri et al.
MFP: Making Full Use of Probability Maps for Interactive Image Segmentation
Chaewon Lee, Seon-Ho Lee, Chang-Su Kim
When StyleGAN Meets Stable Diffusion: a W+ Adapter for Personalized Image Generation
Xiaoming Li, Xinyu Hou, Chen Change Loy
Low-power Continuous Remote Behavioral Localization with Event Cameras
Friedhelm Hamann, Suman Ghosh, Ignacio Juarez Martinez et al.
Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting
Taeho Kang, Youngki Lee
AM-RADIO: Agglomerative Vision Foundation Model Reduce All Domains Into One
Mike Ranzinger, Greg Heinrich, Jan Kautz et al.
BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model
song yiran, Qianyu Zhou, Xiangtai Li et al.
EarthLoc: Astronaut Photography Localization by Indexing Earth from Space
Gabriele Berton, Alex Stoken, Barbara Caputo et al.
PairDETR : Joint Detection and Association of Human Bodies and Faces
Ammar Ali, Georgii Gaikov, Denis Rybalchenko et al.
Querying as Prompt: Parameter-Efficient Learning for Multimodal Language Model
Tian Liang, Jing Huang, Ming Kong et al.
DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis
Jiapeng Tang, Yinyu Nie, Lev Markhasin et al.
YOLO-World: Real-Time Open-Vocabulary Object Detection
Tianheng Cheng, Lin Song, Yixiao Ge et al.
Ink Dot-Oriented Differentiable Optimization for Neural Image Halftoning
Hao Jiang, Bingfeng Zhou, Yadong Mu
ShapeMatcher: Self-Supervised Joint Shape Canonicalization Segmentation Retrieval and Deformation
Yan Di, Chenyangguang Zhang, Chaowei Wang et al.
Error Detection in Egocentric Procedural Task Videos
Shih-Po Lee, Zijia Lu, Zekun Zhang et al.
FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition
Ganggui Ding, Canyu Zhao, Wen Wang et al.
Generative Unlearning for Any Identity
Juwon Seo, Sung-Hoon Lee, Tae-Young Lee et al.
Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange
Yanhao Wu, Tong Zhang, Wei Ke et al.
Efficient Privacy-Preserving Visual Localization Using 3D Ray Clouds
Heejoon Moon, Chunghwan Lee, Je Hyeong Hong
Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration
Tony C. W. MOK, Zi Li, Yunhao Bai et al.
PoseIRM: Enhance 3D Human Pose Estimation on Unseen Camera Settings via Invariant Risk Minimization
Yanlu Cai, Weizhong Zhang, Yuan Wu et al.
On the Estimation of Image-matching Uncertainty in Visual Place Recognition
Mubariz Zaffar, Liangliang Nan, Julian F. P. Kooij
LoS: Local Structure-Guided Stereo Matching
Kunhong Li, Longguang Wang, Ye Zhang et al.
RadSimReal: Bridging the Gap Between Synthetic and Real Data in Radar Object Detection With Simulation
Oded Bialer, Yuval Haitman
VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
Syed Talal Wasim, Muzammal Naseer, Salman Khan et al.
Plug-and-Play Diffusion Distillation
Yi-Ting Hsiao, Siavash Khodadadeh, Kevin Duarte et al.
Differentiable Micro-Mesh Construction
Yishun Dou, Zhong Zheng, Qiaoqiao Jin et al.
ProxyCap: Real-time Monocular Full-body Capture in World Space via Human-Centric Proxy-to-Motion Learning
Yuxiang Zhang, Hongwen Zhang, Liangxiao Hu et al.
Unsupervised 3D Structure Inference from Category-Specific Image Collections
Weikang Wang, Dongliang Cao, Florian Bernard
Multi-Modal Proxy Learning Towards Personalized Visual Multiple Clustering
Jiawei Yao, Qi Qian, Juhua Hu
DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models
Yukang Cao, Yan-Pei Cao, Kai Han et al.
Genuine Knowledge from Practice: Diffusion Test-Time Adaptation for Video Adverse Weather Removal
Yijun Yang, Hongtao Wu, Angelica I. Aviles-Rivero et al.
Sharingan: A Transformer Architecture for Multi-Person Gaze Following
Samy Tafasca, Anshul Gupta, Jean-marc Odobez
OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation
Bohao Peng, Xiaoyang Wu, Li Jiang et al.
Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness
Sibo Wang, Jie Zhang, Zheng Yuan et al.
MART: Masked Affective RepresenTation Learning via Masked Temporal Distribution Distillation
Zhicheng Zhang, Pancheng Zhao, Eunil Park et al.
CosalPure: Learning Concept from Group Images for Robust Co-Saliency Detection
Jiayi Zhu, Qing Guo, Felix Juefei Xu et al.
CrossMAE: Cross-Modality Masked Autoencoders for Region-Aware Audio-Visual Pre-Training
Yuxin Guo, Siyang Sun, Shuailei Ma et al.
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
Yicheng Xiao, Zhuoyan Luo, Yong Liu et al.
Point Transformer V3: Simpler Faster Stronger
Xiaoyang Wu, Li Jiang, Peng-Shuai Wang et al.
Improving Distant 3D Object Detection Using 2D Box Supervision
Zetong Yang, Zhiding Yu, Christopher Choy et al.
ExtraNeRF: Visibility-Aware View Extrapolation of Neural Radiance Fields with Diffusion Models
Meng-Li Shih, Wei-Chiu Ma, Lorenzo Boyice et al.
Mining Supervision for Dynamic Regions in Self-Supervised Monocular Depth Estimation
Hoang Chuong Nguyen, Tianyu Wang, Jose M. Alvarez et al.
Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences
Seungwook Kim, Kejie Li, Xueqing Deng et al.
EVS-assisted Joint Deblurring Rolling-Shutter Correction and Video Frame Interpolation through Sensor Inverse Modeling
Rui Jiang, Fangwen Tu, Yixuan Long et al.
Open-World Semantic Segmentation Including Class Similarity
Matteo Sodano, Federico Magistri, Lucas Nunes et al.
READ: Retrieval-Enhanced Asymmetric Diffusion for Motion Planning
Takeru Oba, Matthew Walter, Norimichi Ukita
MeshPose: Unifying DensePose and 3D Body Mesh Reconstruction
Eric-Tuan Le, Antonios Kakolyris, Petros Koutras et al.
Bayesian Differentiable Physics for Cloth Digitalization
Deshan Gong, Ningtao Mao, He Wang
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action
Jiasen Lu, Christopher Clark, Sangho Lee et al.
Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning
Zichen Miao, Jiang Wang, Ze Wang et al.
Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving
JINLONG LI, Baolu Li, Zhengzhong Tu et al.
Your Student is Better Than Expected: Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models
Nikita Starodubcev, Dmitry Baranchuk, Artem Fedorov et al.
Unbiased Estimator for Distorted Conics in Camera Calibration
Chaehyeon Song, Jaeho Shin, Myung-Hwan Jeon et al.
Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding
Chaolei Tan, Jianhuang Lai, Wei-Shi Zheng et al.
Learning Continual Compatible Representation for Re-indexing Free Lifelong Person Re-identification
Zhenyu Cui, Jiahuan Zhou, Xun Wang et al.
Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos
Mehmet Saygin Seyfioglu, Wisdom Ikezogwo, Fatemeh Ghezloo et al.
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
Lingmin Ran, Xiaodong Cun, Jia-Wei Liu et al.
Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models
Pablo Marcos-Manchón, Roberto Alcover-Couso, Juan SanMiguel et al.
A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network
Ruichen Ma, Guanchao Qiao, Yian Liu et al.
3D Face Reconstruction with the Geometric Guidance of Facial Part Segmentation
Zidu Wang, Xiangyu Zhu, Tianshuo Zhang et al.
Spike-guided Motion Deblurring with Unknown Modal Spatiotemporal Alignment
Jiyuan Zhang, Shiyan Chen, Yajing Zheng et al.
Multi-Task Dense Prediction via Mixture of Low-Rank Experts
Yuqi Yang, Peng-Tao Jiang, Qibin Hou et al.
A Bayesian Approach to OOD Robustness in Image Classification
Prakhar Kaushik, Adam Kortylewski, Alan L. Yuille
Instance-aware Contrastive Learning for Occluded Human Mesh Reconstruction
Mi-Gyeong Gwon, Gi-Mun Um, Won-Sik Cheong et al.
DITTO: Dual and Integrated Latent Topologies for Implicit 3D Reconstruction
Jaehyeok Shim, Kyungdon Joo
UniMODE: Unified Monocular 3D Object Detection
Zhuoling Li, Xiaogang Xu, Ser-Nam Lim et al.
Investigating Compositional Challenges in Vision-Language Models for Visual Grounding
Yunan Zeng, Yan Huang, Jinjin Zhang et al.
SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation
Chen Sichen, Yingyi Zhang, Siming Huang et al.
Multi-Attribute Interactions Matter for 3D Visual Grounding
Can Xu, Yuehui Han, Rui Xu et al.
UnionFormer: Unified-Learning Transformer with Multi-View Representation for Image Manipulation Detection and Localization
Shuaibo Li, Wei Ma, Jianwei Guo et al.
Towards Better Vision-Inspired Vision-Language Models
Yun-Hao Cao, Kaixiang Ji, Ziyuan Huang et al.
Noisy-Correspondence Learning for Text-to-Image Person Re-identification
Yang Qin, Yingke Chen, Dezhong Peng et al.
DYSON: Dynamic Feature Space Self-Organization for Online Task-Free Class Incremental Learning
Yuhang He, YingJie Chen, Yuhan Jin et al.
Harnessing Large Language Models for Training-free Video Anomaly Detection
Luca Zanella, Willi Menapace, Massimiliano Mancini et al.
Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis
Yuchao Gu, Xintao Wang, Yixiao Ge et al.
Weakly Supervised Video Individual Counting
Xinyan Liu, Guorong Li, Yuankai Qi et al.
Learning Inclusion Matching for Animation Paint Bucket Colorization
Yuekun Dai, Shangchen Zhou, Blake Li et al.
Preserving Fairness Generalization in Deepfake Detection
Li Lin, Xinan He, Yan Ju et al.
RepViT: Revisiting Mobile CNN From ViT Perspective
Ao Wang, Hui Chen, Zijia Lin et al.
Prompt-Free Diffusion: Taking “Text” out of Text-to-Image Diffusion Models
Xingqian Xu, Jiayi Guo, Zhangyang Wang et al.
Insights from the Use of Previously Unseen Neural Architecture Search Datasets
Rob Geada, David Towers, Matthew Forshaw et al.
Referring Expression Counting
Siyang Dai, Jun Liu, Ngai-Man Cheung
R-Cyclic Diffuser: Reductive and Cyclic Latent Diffusion for 3D Clothed Human Digitalization
Kennard Chan, Fayao Liu, Guosheng Lin et al.
ControlRoom3D: Room Generation using Semantic Proxy Rooms
Jonas Schult, Sam Tsai, Lukas Höllein et al.
VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection
Zihua Liu, Hiroki Sakuma, Masatoshi Okutomi
Class Incremental Learning with Multi-Teacher Distillation
Haitao Wen, Lili Pan, Yu Dai et al.
Beyond Seen Primitive Concepts and Attribute-Object Compositional Learning
Nirat Saini, Khoi Pham, Abhinav Shrivastava
Scaling Laws of Synthetic Images for Model Training ... for Now
Lijie Fan, Kaifeng Chen, Dilip Krishnan et al.
Learning Group Activity Features Through Person Attribute Prediction
Chihiro Nakatani, Hiroaki Kawashima, Norimichi Ukita
FreeU: Free Lunch in Diffusion U-Net
Chenyang Si, Ziqi Huang, Yuming Jiang et al.
AnyScene: Customized Image Synthesis with Composited Foreground
Ruidong Chen, Lanjun Wang, Weizhi Nie et al.
TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression
Ho-Joong Kim, Jung-Ho Hong, Heejo Kong et al.
NICE: Neurogenesis Inspired Contextual Encoding for Replay-free Class Incremental Learning
Mustafa B Gurbuz, Jean Moorman, Constantine Dovrolis
FocusMAE: Gallbladder Cancer Detection from Ultrasound Videos with Focused Masked Autoencoders
Soumen Basu, Mayuna Gupta, Chetan Madan et al.
Noisy One-point Homographies are Surprisingly Good
Yaqing Ding, Jonathan Astermark, Magnus Oskarsson et al.
SUGAR: Pre-training 3D Visual Representations for Robotics
Shizhe Chen, Ricardo Garcia Pinel, Ivan Laptev et al.
Self-Calibrating Vicinal Risk Minimisation for Model Calibration
Jiawei Liu, Changkun Ye, Ruikai Cui et al.
Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning
Leonardo Iurada, Marco Ciccone, Tatiana Tommasi
Learning Continuous 3D Words for Text-to-Image Generation
Ta-Ying Cheng, Matheus Gadelha, Thibault Groueix et al.
Device-Wise Federated Network Pruning
Shangqian Gao, Junyi Li, Zeyu Zhang et al.
Rethinking FID: Towards a Better Evaluation Metric for Image Generation
Sadeep Jayasumana, Srikumar Ramalingam, Andreas Veit et al.
Progress-Aware Online Action Segmentation for Egocentric Procedural Task Videos
Yuhan Shen, Ehsan Elhamifar
MOHO: Learning Single-view Hand-held Object Reconstruction with Multi-view Occlusion-Aware Supervision
Chenyangguang Zhang, Guanlong Jiao, Yan Di et al.
Towards Generalizing to Unseen Domains with Few Labels
Chamuditha Jayanga Galappaththige, Sanoojan Baliah, Malitha Gunawardhana et al.
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
Jialin Wu, Xia Hu, Yaqing Wang et al.
Low-Latency Neural Stereo Streaming
Qiqi Hou, Farzad Farhadzadeh, Amir Said et al.
Your Transferability Barrier is Fragile: Free-Lunch for Transferring the Non-Transferable Learning
Ziming Hong, Li Shen, Tongliang Liu
ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe
Yifan Bai, Zeyang Zhao, Yihong Gong et al.