Most Cited CVPR "one-shot federated learning" Papers
5,589 papers found • Page 9 of 28
Conference
SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images
Kaiyu Li, Ruixun Liu, Xiangyong Cao et al.
Long-Tailed Anomaly Detection with Learnable Class Names
Chih-Hui Ho, Kuan-Chuan Peng, Nuno Vasconcelos
Multi-Level Neural Scene Graphs for Dynamic Urban Environments
Tobias Fischer, Lorenzo Porzi, Samuel Rota Bulò et al.
Task-driven Image Fusion with Learnable Fusion Loss
Haowen Bai, Jiangshe Zhang, Zixiang Zhao et al.
Video Depth without Video Models
Bingxin Ke, Dominik Narnhofer, Shengyu Huang et al.
Adaptive Rectangular Convolution for Remote Sensing Pansharpening
Xueyang Wang, Zhixin Zheng, Jiandong Shao et al.
PerAda: Parameter-Efficient Federated Learning Personalization with Generalization Guarantees
Chulin Xie, De-An Huang, Wenda Chu et al.
Textured Gaussians for Enhanced 3D Scene Appearance Modeling
Brian Chao, Hung-Yu Tseng, Lorenzo Porzi et al.
SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis
Hyojun Go, byeongjun park, Jiho Jang et al.
DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling
Linqi Zhou, Andy Shih, Chenlin Meng et al.
Clustering Propagation for Universal Medical Image Segmentation
Yuhang Ding, Liulei Li, Wenguan Wang et al.
QMambaBSR: Burst Image Super-Resolution with Query State Space Model
Xin Di, Long Peng, Peizhe Xia et al.
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
Yunze Man, De-An Huang, Guilin Liu et al.
360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries
Huajian Huang, Changkun Liu, Yipeng Zhu et al.
Spectral Meets Spatial: Harmonising 3D Shape Matching and Interpolation
Dongliang Cao, Marvin Eisenberger, Nafie El Amrani et al.
SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device
Yushu Wu, Zhixing Zhang, Yanyu Li et al.
MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders
jiajun cao, Yuan Zhang, Tao Huang et al.
Real-time 3D-aware Portrait Video Relighting
Ziqi Cai, Kaiwen Jiang, Shu-Yu Chen et al.
DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation
Tianyi Yan, Dongming Wu, Wencheng Han et al.
SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video
Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello et al.
Improving Plasticity in Online Continual Learning via Collaborative Learning
Maorong Wang, Nicolas Michel, Ling Xiao et al.
TeTriRF: Temporal Tri-Plane Radiance Fields for Efficient Free-Viewpoint Video
Minye Wu, Zehao Wang, Georgios Kouros et al.
Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
Zigeng Chen, Xinyin Ma, Gongfan Fang et al.
OccMamba: Semantic Occupancy Prediction with State Space Models
Heng Li, Yuenan Hou, Xiaohan Xing et al.
MLP Can Be A Good Transformer Learner
Sihao Lin, Pumeng Lyu, Dongrui Liu et al.
VideoMAC: Video Masked Autoencoders Meet ConvNets
Gensheng Pei, Tao Chen, Xiruo Jiang et al.
Grid Diffusion Models for Text-to-Video Generation
Taegyeong Lee, Soyeong Kwon, Taehwan Kim
Generative Image Layer Decomposition with Visual Effects
Jinrui Yang, Qing Liu, Yijun Li et al.
Composed Video Retrieval via Enriched Context and Discriminative Embeddings
Omkar Thawakar, Muzammal Naseer, Rao Anwer et al.
StoryGPT-V: Large Language Models as Consistent Story Visualizers
Xiaoqian Shen, Mohamed Elhoseiny
GEARS: Local Geometry-aware Hand-object Interaction Synthesis
Keyang Zhou, Bharat Lal Bhatnagar, Jan Lenssen et al.
Guided Slot Attention for Unsupervised Video Object Segmentation
Minhyeok Lee, Suhwan Cho, Dogyoon Lee et al.
Dual Prototype Attention for Unsupervised Video Object Segmentation
Suhwan Cho, Minhyeok Lee, Seunghoon Lee et al.
Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?
Hanxin Zhu, Tianyu He, Xin Li et al.
LSNet: See Large, Focus Small
Ao Wang, Hui Chen, Zijia Lin et al.
Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation
Kunpeng Qiu, Zhiqiang Gao, Zhiying Zhou et al.
Steerers: A Framework for Rotation Equivariant Keypoint Descriptors
Georg Bökman, Johan Edstedt, Michael Felsberg et al.
Self-Supervised Multi-Object Tracking with Path Consistency
Zijia Lu, Bing Shuai, Yanbei Chen et al.
LAN: Learning to Adapt Noise for Image Denoising
Changjin Kim, Tae Hyun Kim, Sungyong Baik
PrPSeg: Universal Proposition Learning for Panoramic Renal Pathology Segmentation
Ruining Deng, Quan Liu, Can Cui et al.
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D
Wei Cheng, Juncheng Mu, Xianfang Zeng et al.
InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
Jiun Tian Hoe, Xudong Jiang, Chee Seng Chan et al.
InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception
Haijie Li, Yanmin Wu, Jiarui Meng et al.
An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning
Jianqing Zhang, Yang Liu, Yang Hua et al.
Distilling Vision-Language Models on Millions of Videos
Yue Zhao, Long Zhao, Xingyi Zhou et al.
You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval
Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain et al.
Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated Learning
wenlong deng, Christos Thrampoulidis, Xiaoxiao Li
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
Xudong LU, Yinghao Chen, chencheng Chen et al.
Omni-Scene: Omni-Gaussian Representation for Ego-Centric Sparse-View Scene Reconstruction
Dongxu Wei, Zhiqi Li, Peidong Liu
Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation
Yuanbo Yang, Jiahao Shao, Xinyang Li et al.
ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and Self-Prompting
Yankai Jiang, Zhongzhen Huang, Rongzhao Zhang et al.
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models
Dhouib Mohamed, Davide Buscaldi, Vanier Sonia et al.
GRAM: Global Reasoning for Multi-Page VQA
Itshak Blau, Sharon Fogel, Roi Ronen et al.
Unmixing Diffusion for Self-Supervised Hyperspectral Image Denoising
Haijin Zeng, Jiezhang Cao, Yongyong Chen et al.
TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding
Zhihao Zhang, Shengcao Cao, Yu-Xiong Wang
VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning
Kang Chen, Xiangqian Wu
DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement
Hao Wu, Huabin Liu, Yu Qiao et al.
NeRF-HuGS: Improved Neural Radiance Fields in Non-static Scenes Using Heuristics-Guided Segmentation
Jiahao Chen, Yipeng Qin, Lingjie Liu et al.
Improved Self-Training for Test-Time Adaptation
Jing Ma
Viewpoint-Aware Visual Grounding in 3D Scenes
Xiangxi Shi, Zhonghua Wu, Stefan Lee
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
Sitong Gong, Yunzhi Zhuge, Lu Zhang et al.
OED: Towards One-stage End-to-End Dynamic Scene Graph Generation
Guan Wang, Zhimin Li, Qingchao Chen et al.
Decentralized Directed Collaboration for Personalized Federated Learning
Yingqi Liu, Yifan Shi, Qinglun Li et al.
Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards
Zijing Hu, Fengda Zhang, Long Chen et al.
AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis
Tao Tang, Guangrun Wang, Yixing Lao et al.
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders
Fiona Ryan, Ajay Bati, Sangmin Lee et al.
Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation
Xin Zhang, Robby T. Tan
Theoretically Achieving Continuous Representation of Oriented Bounding Boxes
Zikai Xiao, Guo-Ye Yang, Xue Yang et al.
Improving Transferable Targeted Adversarial Attacks with Model Self-Enhancement
Han Wu, Guanyan Ou, Weibin Wu et al.
Video Motion Transfer with Diffusion Transformers
Alexander Pondaven, Aliaksandr Siarohin, Sergey Tulyakov et al.
OSDFace: One-Step Diffusion Model for Face Restoration
Jingkai Wang, Jue Gong, Lin Zhang et al.
FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video
Jiawei Zhang, Zijian Wu, Zhiyang Liang et al.
Hierarchical Patch Diffusion Models for High-Resolution Video Generation
Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin et al.
Regressor-Segmenter Mutual Prompt Learning for Crowd Counting
Mingyue Guo, Li Yuan, Zhaoyi Yan et al.
ODIN: A Single Model for 2D and 3D Segmentation
Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios et al.
HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud
WENCAN CHENG, Hao Tang, Luc Van Gool et al.
Loose Inertial Poser: Motion Capture with IMU-attached Loose-Wear Jacket
Chengxu Zuo, Yiming Wang, Lishuang Zhan et al.
Event-assisted Low-Light Video Object Segmentation
Li Hebei, Jin Wang, Jiahui Yuan et al.
Learning to Predict Activity Progress by Self-Supervised Video Alignment
Gerard Donahue, Ehsan Elhamifar
Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior
Chen Guo, Junxuan Li, Yash Kant et al.
BEVSpread: Spread Voxel Pooling for Bird’s-Eye-View Representation in Vision-based Roadside 3D Object Detection
Wenjie Wang, Yehao Lu, Guangcong Zheng et al.
Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use
Imad Eddine Toubal, Aditya Avinash, Neil Alldrin et al.
SuperNormal: Neural Surface Reconstruction via Multi-View Normal Integration
Xu Cao, Takafumi Taketomi
Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer
Hyeongjin Nam, Daniel Jung, Gyeongsik Moon et al.
MoST: Multi-Modality Scene Tokenization for Motion Prediction
Norman Mu, Jingwei Ji, Zhenpei Yang et al.
SketchAgent: Language-Driven Sequential Sketch Generation
Yael Vinker, Tamar Rott Shaham, Kristine Zheng et al.
Instant Gaussian Stream: Fast and Generalizable Streaming of Dynamic Scene Reconstruction via Gaussian Splatting
Jinbo Yan, Rui Peng, Zhiyan Wang et al.
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation
Yifan Pu, Yiming Zhao, Zhicong Tang et al.
Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning
Fan Lu, Wei Wu, Kecheng Zheng et al.
DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness
Yiming Zhong, Qi Jiang, Jingyi Yu et al.
SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation
Keqi Chen, vinkle srivastav, Nicolas Padoy
Open-Set Domain Adaptation for Semantic Segmentation
Seun-An Choe, Ah-Hyung Shin, Keon Hee Park et al.
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
Kaihang Pan, Wang Lin, Zhongqi Yue et al.
Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption
Du CHEN, Tianhe Wu, Kede Ma et al.
Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Xingqun Qi, Jiahao Pan, Peng Li et al.
Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition
Hongda Liu, Yunfan Liu, Min Ren et al.
Unsupervised Semantic Segmentation Through Depth-Guided Feature Correlation and Sampling
Leon Sick, Dominik Engel, Pedro Hermosilla et al.
D^3: Scaling Up Deepfake Detection by Learning from Discrepancy
Yongqi Yang, Zhihao Qian, Ye Zhu et al.
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
Pengfei Zhou, Xiaopeng Peng, Jiajun Song et al.
Improved Video VAE for Latent Video Diffusion Model
Pingyu Wu, Kai Zhu, Yu Liu et al.
Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking
You Wu, Xucheng Wang, Xiangyang Yang et al.
Making Vision Transformers Truly Shift-Equivariant
Renan A. Rojas-Gomez, Teck-Yian Lim, Minh Do et al.
EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting
Dong In Lee, Hyeongcheol Park, Jiyoung Seo et al.
Any-Shift Prompting for Generalization over Distributions
Zehao Xiao, Jiayi Shen, Mohammad Mahdi Derakhshani et al.
TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression
Ho-Joong Kim, Jung-Ho Hong, Heejo Kong et al.
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
ziang yan, Zhilin Li, Yinan He et al.
MoFlow: One-Step Flow Matching for Human Trajectory Forecasting via Implicit Maximum Likelihood Estimation based Distillation
Yuxiang Fu, Qi Yan, Ke Li et al.
Multi-Sensor Object Anomaly Detection: Unifying Appearance, Geometry, and Internal Properties
wenqiao Li, BoZhong Zheng, Xiaohao Xu et al.
One-Class Face Anti-spoofing via Spoof Cue Map-Guided Feature Learning
Pei-Kai Huang, Cheng-Hsuan Chiang, Tzu-Hsien Chen et al.
ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval
Fang Kaipeng, Jingkuan Song, Lianli Gao et al.
Open-World Semantic Segmentation Including Class Similarity
Matteo Sodano, Federico Magistri, Lucas Nunes et al.
Dexterous Grasp Transformer
Guo-Hao Xu, Yi-Lin Wei, Dian Zheng et al.
FlowTrack: Revisiting Optical Flow for Long-Range Dense Tracking
Seokju Cho, Gabriel Huang, Seungryong Kim et al.
Structure-Guided Adversarial Training of Diffusion Models
Ling Yang, Haotian Qian, Zhilong Zhang et al.
DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation
Bo-Wen Yin, Jiao-Long Cao, Ming-Ming Cheng et al.
Dual Prior Unfolding for Snapshot Compressive Imaging
Jiancheng Zhang, Haijin Zeng, Jiezhang Cao et al.
A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified Benchmark
Jakub Paplham, Vojtech Franc
BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model
song yiran, Qianyu Zhou, Xiangtai Li et al.
EmoEdit: Evoking Emotions through Image Manipulation
Jingyuan Yang, Jiawei Feng, Weibin Luo et al.
Adapting Short-Term Transformers for Action Detection in Untrimmed Videos
Min Yang, gaohuan, Ping Guo et al.
Correspondence-Free Non-Rigid Point Set Registration Using Unsupervised Clustering Analysis
Mingyang Zhao, Jiang Jingen, Lei Ma et al.
Unsupervised Occupancy Learning from Sparse Point Cloud
Amine Ouasfi, Adnane Boukhayma
Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation
Ji-Jia Wu, Andy Chia-Hao Chang, Chieh-Yu Chuang et al.
RecDiffusion: Rectangling for Image Stitching with Diffusion Models
Tianhao Zhou, Li Haipeng, Ziyi Wang et al.
Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection
Jiacheng Zhang, Jiaming Li, Xiangru Lin et al.
ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images
Nicolas Bourriez, Ihab Bendidi, Cohen Ethan et al.
Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices
Huancheng Chen, Haris Vikalo
Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy
Joonhyun Jeong, Seyun Bae, Yeonsung Jung et al.
LaVin-DiT: Large Vision Diffusion Transformer
Zhaoqing Wang, Xiaobo Xia, Runnan Chen et al.
SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout Control
Jaskirat Singh, Jianming Zhang, Qing Liu et al.
RMem: Restricted Memory Banks Improve Video Object Segmentation
Junbao Zhou, Ziqi Pang, Yu-Xiong Wang
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction
Jingcheng Ni, Yuxin Guo, Yichen Liu et al.
LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels
Tuo Feng, Wenguan Wang, Fan Ma et al.
Robust Image Denoising through Adversarial Frequency Mixup
Donghun Ryou, Inju Ha, Hyewon Yoo et al.
CA-Jaccard: Camera-aware Jaccard Distance for Person Re-identification
Yiyu Chen, Zheyi Fan, Zhaoru Chen et al.
CLiC: Concept Learning in Context
Mehdi Safaee, Aryan Mikaeili, Or Patashnik et al.
Learning Inclusion Matching for Animation Paint Bucket Colorization
Yuekun Dai, Shangchen Zhou, Blake Li et al.
Breaking the Low-Rank Dilemma of Linear Attention
Qihang Fan, Huaibo Huang, Ran He
Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement
Ziyu Wang, Yue Xu, Cewu Lu et al.
HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos
Jinglei Zhang, Jiankang Deng, Chao Ma et al.
Taming Teacher Forcing for Masked Autoregressive Video Generation
Deyu Zhou, Quan Sun, Yuang Peng et al.
OmniGuard: Hybrid Manipulation Localization via Augmented Versatile Deep Image Watermarking
Xuanyu Zhang, Zecheng Tang, Zhipei Xu et al.
Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
Youngjoon Jang, Jihoon Kim, Junseok Ahn et al.
UniK3D: Universal Camera Monocular 3D Estimation
Luigi Piccinelli, Christos Sakaridis, Mattia Segu et al.
Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models
Ronghuan Wu, Wanchao Su, Jing Liao
Genuine Knowledge from Practice: Diffusion Test-Time Adaptation for Video Adverse Weather Removal
Yijun Yang, Hongtao Wu, Angelica I. Aviles-Rivero et al.
GlitchBench: Can Large Multimodal Models Detect Video Game Glitches?
Mohammad Reza Taesiri, Tianjun Feng, Cor-Paul Bezemer et al.
NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation
Sicheng Li, Hao Li, Yiyi Liao et al.
ShiftwiseConv: Small Convolutional Kernel with Large Kernel Effect
Dachong Li, li li, zhuangzhuang chen et al.
Generative Omnimatte: Learning to Decompose Video into Layers
Yao-Chih Lee, Erika Lu, Sarah Rumbley et al.
DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture
Qianlong Xiang, Miao Zhang, Yuzhang Shang et al.
TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting
Bojun Xiong, Jialun Liu, JiaKui Hu et al.
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
Wanhua Li, Renping Zhou, Jiawei Zhou et al.
Feat2GS: Probing Visual Foundation Models with Gaussian Splatting
Yue Chen, Xingyu Chen, Anpei Chen et al.
URHand: Universal Relightable Hands
Zhaoxi Chen, Gyeongsik Moon, Kaiwen Guo et al.
Learning Continuous 3D Words for Text-to-Image Generation
Ta-Ying Cheng, Matheus Gadelha, Thibault Groueix et al.
eTraM: Event-based Traffic Monitoring Dataset
Aayush Atul Verma, Bharatesh Chakravarthi, Arpitsinh Vaghela et al.
Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation
Nicolas Dufour, Vicky Kalogeiton, David Picard et al.
Scalable 3D Registration via Truncated Entry-wise Absolute Residuals
Tianyu Huang, Liangzu Peng, Rene Vidal et al.
SuperSVG: Superpixel-based Scalable Vector Graphics Synthesis
Teng Hu, Ran Yi, Baihong Qian et al.
DriveTrack: A Benchmark for Long-Range Point Tracking in Real-World Videos
Arjun Balasingam, Joseph Chandler, Chenning Li et al.
MaskGaussian: Adaptive 3D Gaussian Representation from Probabilistic Masks
Yifei Liu, Zhihang Zhong, Yifan Zhan et al.
A Stealthy Wrongdoer: Feature-Oriented Reconstruction Attack against Split Learning
Xiaoyang Xu, Mengda Yang, Wenzhe Yi et al.
NViST: In the Wild New View Synthesis from a Single Image with Transformers
Wonbong Jang, Lourdes Agapito
Text-Enhanced Data-free Approach for Federated Class-Incremental Learning
Minh-Tuan Tran, Trung Le, Xuan-May Le et al.
Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration
Tony C. W. MOK, Zi Li, Yunhao Bai et al.
As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion Priors
Seungwoo Yoo, Kunho Kim, Vladimir G. Kim et al.
Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models
Jingyao Xu, Yuetong Lu, Yandong Li et al.
MESA: Matching Everything by Segmenting Anything
Yesheng Zhang, Xu Zhao
Exploring Efficient Asymmetric Blind-Spots for Self-Supervised Denoising in Real-World Scenarios
Shiyan Chen, Jiyuan Zhang, Zhaofei Yu et al.
DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
Yunhan Yang, Yukun Huang, Xiaoyang Wu et al.
PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos
Yufei Zhang, Jeffrey Kephart, Zijun Cui et al.
Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation
Xinyao Li, Yuke Li, Zhekai Du et al.
Rotation-Agnostic Image Representation Learning for Digital Pathology
Saghir Alfasly, Abubakr Shafique, Peyman Nejat et al.
Unsupervised Keypoints from Pretrained Diffusion Models
Eric Hedlin, Gopal Sharma, Shweta Mahajan et al.
Bridging the Synthetic-to-Authentic Gap: Distortion-Guided Unsupervised Domain Adaptation for Blind Image Quality Assessment
Aobo Li, Jinjian Wu, Yongxu Liu et al.
Holoported Characters: Real-time Free-viewpoint Rendering of Humans from Sparse RGB Cameras
Ashwath Shetty, Marc Habermann, Guoxing Sun et al.
Constructing and Exploring Intermediate Domains in Mixed Domain Semi-supervised Medical Image Segmentation
Qinghe Ma, Jian Zhang, Lei Qi et al.
Factored-NeuS: Reconstructing Surfaces, Illumination, and Materials of Possibly Glossy Objects
Yue Fan, Ningjing Fan, Ivan Skorokhodov et al.
DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation
Zeeshan Hayder, Xuming He
Attribute-Guided Pedestrian Retrieval: Bridging Person Re-ID with Internal Attribute Variability
Yan Huang, Zhang Zhang, Qiang Wu et al.
CarPlanner: Consistent Auto-regressive Trajectory Planning for Large-Scale Reinforcement Learning in Autonomous Driving
Dongkun Zhang, Jiaming Liang, Ke Guo et al.
Segmenting Maxillofacial Structures in CBCT Volumes
Federico Bolelli, Kevin Marchesini, Niels van Nistelrooij et al.
MoST: Motion Style Transformer Between Diverse Action Contents
Boeun Kim, Jungho Kim, Hyung Jin Chang et al.
Discriminability-Driven Channel Selection for Out-of-Distribution Detection
Yue Yuan, Rundong He, Yicong Dong et al.
Overload: Latency Attacks on Object Detection for Edge Devices
Erh-Chung Chen, Pin-Yu Chen, I-Hsin Chung et al.
Instant3dit: Multiview Inpainting for Fast Editing of 3D Objects
Amir Barda, Matheus Gadelha, Vladimir G. Kim et al.
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training
Jierun Chen, Dongting Hu, Xijie Huang et al.
GenFusion: Closing the Loop between Reconstruction and Generation via Videos
Sibo Wu, Congrong Xu, Binbin Huang et al.
Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification
Chao Yi, Lu Ren, De-Chuan Zhan et al.
Segment Every Out-of-Distribution Object
Wenjie Zhao, Jia Li, Xin Dong et al.
DashGaussian: Optimizing 3D Gaussian Splatting in 200 Seconds
Youyu Chen, Junjun Jiang, Kui Jiang et al.
Driving-Video Dehazing with Non-Aligned Regularization for Safety Assistance
Junkai Fan, Jiangwei Weng, Kun Wang et al.
The Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement
Gabriele Trivigno, Carlo Masone, Barbara Caputo et al.
Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model
Zelin Peng, Zhengqin Xu, Zhilin Zeng et al.
From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding
Yonglu Li, Xiaoqian Wu, Xinpeng Liu et al.
LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting
Xiaoyan Xing, Konrad Groh, Sezer Karaoglu et al.
Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation
Zilyu Ye, Zhiyang Chen, Tiancheng Li et al.
Fair-VPT: Fair Visual Prompt Tuning for Image Classification
Sungho Park, Hyeran Byun
Cross-View Completion Models are Zero-shot Correspondence Estimators
Honggyu An, Jin Hyeon Kim, Seonghoon Park et al.