Most Cited ECCV "viewpoint invariance" Papers
2,387 papers found • Page 3 of 12
Conference
Object-Centric Diffusion for Efficient Video Editing
Kumara Kahatapitiya, Adil Karjauv, Davide Abati et al.
NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields
Muhammad Zubair Irshad, Sergey Zakharov, Vitor Guizilini et al.
ControlNet-XS: Rethinking the Control of Text-to-Image Diffusion Models as Feedback-Control Systems
Denis Zavadski, Johann-Friedrich Feiden, Carsten Rother
Robust Calibration of Large Vision-Language Adapters
Balamurali Murugesan, Julio Silva-Rodríguez, Ismail Ben Ayed et al.
Reliability in Semantic Segmentation: Can We Use Synthetic Data?
Thibaut Loiseau, Tuan Hung Vu, Mickael Chen et al.
EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding
Yuanming Li, Wei-Jin Huang, An-Lan Wang et al.
OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations
Yiming Zuo, Jia Deng
ViLA: Efficient Video-Language Alignment for Video Question Answering
Xijun Wang, Junbang Liang, Chun-Kai Wang et al.
RadEdit: stress-testing biomedical vision models via diffusion image editing
Fernando Pérez-García, Sam Bond-Taylor, Pedro Sanchez et al.
F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions
Jie Yang, Xuesong Niu, Nan Jiang et al.
DIM: Dyadic Interaction Modeling for Social Behavior Generation
Minh Tran, Di Chang, Maksim Siniukov et al.
An Incremental Unified Framework for Small Defect Inspection
Jiaqi Tang, Hao Lu, Xiaogang Xu et al.
WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding
Quan Kong, Yuki Kawana, Rajat Saini et al.
A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars
Ronglai Zuo, Fangyun Wei, Zenggui Chen et al.
One-Shot Diffusion Mimicker for Handwritten Text Generation
Gang Dai, Yifan Zhang, Quhui Ke et al.
Region-Adaptive Transform with Segmentation Prior for Image Compression
Yuxi Liu, Wenhan Yang, Huihui Bai et al.
VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions
Seokha Moon, Hyun Woo, Hongbeen Park et al.
ZeST: Zero-Shot Material Transfer from a Single Image
Ta-Ying Cheng, Prafull Sharma, Andrew Markham et al.
RealViformer: Investigating Attention for Real-World Video Super-Resolution
Yuehan Zhang, Angela Yao
SEED: A Simple and Effective 3D DETR in Point Clouds
Zhe Liu, Jinghua Hou, Xiaoqing Ye et al.
Learning to Adapt SAM for Segmenting Cross-domain Point Clouds
Xidong Peng, Runnan Chen, Feng Qiao et al.
Online Zero-Shot Classification with CLIP
Qi Qian, JUHUA HU
PreSight: Enhancing Autonomous Vehicle Perception with City-Scale NeRF Priors
Tianyuan Yuan, Mao Yucheng, Jiawei Yang et al.
Surface Reconstruction for 3D Gaussian Splatting via Local Structural Hints
Qianyi Wu, Jianmin Zheng, Jianfei Cai
Factorized Diffusion: Perceptual Illusions by Noise Decomposition
Daniel Geng, Inbum Park, Andrew Owens
SAGS: Structure-Aware 3D Gaussian Splatting
Evangelos Ververas, Rolandos Alexandros Potamias, Song Jifei et al.
Visible and Clear: Finding Tiny Objects in Difference Map
Bing Cao, Haiyu Yao, Pengfei Zhu et al.
Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation
Guan Gui, Bin-Bin Gao, Jun Liu et al.
PromptFusion: Decoupling Stability and Plasticity for Continual Learning
Haoran Chen, Zuxuan Wu, Xintong Han et al.
Isomorphic Pruning for Vision Models
Gongfan Fang, Xinyin Ma, Michael Bi Mi et al.
Improving Video Segmentation via Dynamic Anchor Queries
Yikang Zhou, Tao Zhang, Xiangtai Li et al.
EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models
Ruoxi Chen, Haibo Jin, Yixin Liu et al.
ColorPeel: Color Prompt Learning with Diffusion Models via Color and Shape Disentanglement
Muhammad Atif Butt, Kai Wang, Javier Vazquez-Corral et al.
Training-free Video Temporal Grounding using Large-scale Pre-trained Models
Minghang Zheng, Xinhao Cai, Qingchao Chen et al.
Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration
Chu Jie Qin, Ruiqi Wu, Zikun Liu et al.
AMEGO: Active Memory from long EGOcentric videos
Gabriele Goletto, Tushar Nagarajan, Giuseppe Averta et al.
WordRobe: Text-Guided Generation of Textured 3D Garments
Astitva Srivastava, Pranav Manu, Amit Raj et al.
Navigation Instruction Generation with BEV Perception and Large Language Models
Sheng Fan, Rui Liu, Wenguan Wang et al.
MirrorGaussian: Reflecting 3D Gaussians for Reconstructing Mirror Reflections
Jiayue Liu, Tang Xiao, Freeman Cheng et al.
MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception
Mohammad Mahbubur Rahman, Ryoma Yataka, Sorachi Kato et al.
Towards Open Domain Text-Driven Synthesis of Multi-Person Motions
Shan Mengyi, Lu Dong, Yutao Han et al.
Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering
Antoine Guedon, Vincent Lepetit
Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation
Marco Mistretta, Alberto Baldrati, Marco Bertini et al.
FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients
Shangchao Su, Bin Li, Xiangyang Xue
SelEx: Self-Expertise in Fine-Grained Generalized Category Discovery
Sarah Rastegar, Mohammadreza Salehi, Yuki M Asano et al.
A Graph-Based Approach for Category-Agnostic Pose Estimation
Or Hirschorn, Shai Avidan
Diffusion for Natural Image Matting
Yihan Hu, Yiheng Lin, Wei Wang et al.
WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models
Zijian He, Peixin Chen, Guangrun Wang et al.
Simplifying Source-Free Domain Adaptation for Object Detection: Effective Self-Training Strategies and Performance Insights
Yan Hao, Florent Forest, Olga Fink
Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual Learning
Jinglin Liang, Jin Zhong, Hanlin Gu et al.
TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling
Dong Huo, Zixin Guo, Xinxin Zuo et al.
SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
Weitai Kang, Gaowen Liu, Shah Mubarak et al.
SPIRE: Semantic Prompt-Driven Image Restoration
Chenyang Qi, Zhengzhong Tu, Keren Ye et al.
iHuman: Instant Animatable Digital Humans From Monocular Videos
Pramish Paudel, Anubhav Khanal, Danda Pani Paudel et al.
StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion
Ming Tao, BINGKUN BAO, Hao Tang et al.
InterFusion: Text-Driven Generation of 3D Human-Object Interaction
Sisi Dai, Wenhao Li, Haowen Sun et al.
Contourlet Residual for Prompt Learning Enhanced Infrared Image Super-Resolution
Xingyuan Li, Jinyuan Liu, ZHIXIN CHEN et al.
CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction
Zhangchen Ye, Tao Jiang, Chenfeng Xu et al.
OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects
Akshay Krishnan, Abhijit Kundu, Kevis Maninis et al.
Is Retain Set All You Need in Machine Unlearning? Restoring Performance of Unlearned Models with Out-Of-Distribution Images
Jacopo Bonato, Marco Cotogni, Luigi Sabetta
SparseCraft: Few-Shot Neural Reconstruction through Stereopsis Guided Geometric Linearization
Mae Younes, Amine Ouasfi, Adnane Boukhayma
Robust-Wide: Robust Watermarking against Instruction-driven Image Editing
Runyi Hu, Jie Zhang, Ting Xu et al.
Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models
Yang Zhang, Tze Tzun Teoh, Wei Hern Lim et al.
SmartControl: Enhancing ControlNet for Handling Rough Visual Conditions
XIAOYU LIU, Yuxiang WEI, Ming LIU et al.
NICP: Neural ICP for 3D Human Registration at Scale
Riccardo Marin, Enric Corona, Gerard Pons-Moll
Towards Real-world Event-guided Low-light Video Enhancement and Deblurring
Taewoo Kim, Jaeseok Jeong, Hoonhee Cho et al.
Exemplar-free Continual Representation Learning via Learnable Drift Compensation
Alex Gomez-Villa, Dipam Goswami, Kai Wang et al.
Towards Neuro-Symbolic Video Understanding
Minkyu Choi, Harsh Goel, Mohammad Omama et al.
GarmentCodeData: A Dataset of 3D Made-to-Measure Garments With Sewing Patterns
Maria Korosteleva, Timur Levent Kesdogan, Fabian Kemper et al.
You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation
Mehdi Noroozi, Isma Hadji, Brais Martinez et al.
LayoutFlow: Flow Matching for Layout Generation
Julian Jorge Andrade Guerreiro, Naoto Inoue, Kento Masui et al.
Any2Point: Empowering Any-modality Transformers for Efficient 3D Understanding
YIWEN TANG, Renrui Zhang, Jiaming Liu et al.
Continuous Memory Representation for Anomaly Detection
Joo Chan Lee, Taejune Kim, Eunbyung Park et al.
HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras
Zhongyu Xia, ZhiWei Lin, Xinhao Wang et al.
ZeroI2V: Zero-Cost Adaptation of Pre-Trained Transformers from Image to Video
Xinhao Li, Yuhan Zhu, Limin Wang
Unified Embedding Alignment for Open-Vocabulary Video Instance Segmentation
Hao Fang, Peng Wu, Yawei Li et al.
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
Changan Chen, Puyuan Peng, Ami Baid et al.
When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset
Yi Zhang, Wang Zeng, Sheng Jin et al.
LatentEditor: Text Driven Local Editing of 3D Scenes
Umar Khalid, Hasan Iqbal, Muhammad Tayyab et al.
Centering the Value of Every Modality: Towards Efficient and Resilient Modality-agnostic Semantic Segmentation
Xu Zheng, Yuanhuiyi Lyu, jiazhou zhou et al.
OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation
Zhenyu Wang, Ya-Li Li, TAICHI LIU et al.
ConGeo: Robust Cross-view Geo-localization across Ground View Variations
Li Mi, Chang Xu, Javiera Castillo Navarro et al.
Improving Virtual Try-On with Garment-focused Diffusion Models
Siqi Wan, Yehao Li, Jingwen Chen et al.
PromptCCD: Learning Gaussian Mixture Prompt Pool for Continual Category Discovery
Fernando Julio Cendra, Bingchen Zhao, Kai Han
InsMapper: Exploring Inner-instance Information for Vectorized HD Mapping
Zhenhua Xu, Kwan-Yee K. Wong, Hengshuang ZHAO
Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models
Ruibin Li, Ruihuang Li, Song Guo et al.
Adaptive Bounding Box Uncertainties via Two-Step Conformal Prediction
Alexander Timans, Christoph-Nikolas Straehle, Kaspar Sakmann et al.
Good Teachers Explain: Explanation-Enhanced Knowledge Distillation
Amin Parchami, Moritz Böhle, Sukrut Rao et al.
Diffusion Model for Robust Multi-Sensor Fusion in 3D Object Detection and BEV Segmentation
Duy Tho Le, Hengcan Shi, Jianfei Cai et al.
Dataset Enhancement with Instance-Level Augmentations
Orest Kupyn, Christian Rupprecht
CC-SAM: Enhancing SAM with Cross-feature Attention and Context for Ultrasound Image Segmentation
Shreyank Narayana Gowda, David A Clifton
Learning to Detect Multi-class Anomalies with Just One Normal Image Prompt
Bin-Bin Gao
Implicit Concept Removal of Diffusion Models
Zhili LIU, Kai Chen, Yifan Zhang et al.
Crowd-SAM:SAM as a smart annotator for object detection in crowded scenes
Zhi Cai, Yingjie Gao, Yaoyan Zheng et al.
GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction
Yuxuan Mu, Xinxin Zuo, Chuan Guo et al.
MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing
Haoyu Zhao, Tianyi Lu, Jiaxi Gu et al.
Beta-Tuned Timestep Diffusion Model
Tianyi Zheng, Peng-Tao Jiang, Ben Wan et al.
Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation
Zikai Huang, Xuemiao Xu, Cheng Xu et al.
Connecting Consistency Distillation to Score Distillation for Text-to-3D Generation
Zongrui Li, Minghui Hu, Qian Zheng et al.
InfMAE: A Foundation Model in The Infrared Modality
Fangcen liu, Chenqiang Gao, Yaming Zhang et al.
PILoRA: Prototype Guided Incremental LoRA for Federated Class-Incremental Learning
Haiyang Guo, Fei Zhu, Wenzhuo Liu et al.
Synchronization is All You Need: Exocentric-to-Egocentric Transfer for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs
Camillo Quattrocchi, Antonino Furnari, Daniele Di Mauro et al.
EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval
Thomas Hummel, Shyamgopal Karthik, Mariana-Iuliana Georgescu et al.
UNIC: Universal Classification Models via Multi-teacher Distillation
Yannis Kalantidis, Larlus Diane, Mert Bulent SARIYILDIZ et al.
CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt Tuning
Ziyang Gong, FuHao Li, Yupeng Deng et al.
PartSTAD: 2D-to-3D Part Segmentation Task Adaptation
Hyunjin Kim, Minhyuk Sung
Diffusion Model is a Good Pose Estimator from 3D RF-Vision
Junqiao Fan, Jianfei Yang, Yuecong Xu et al.
SFPNet: Sparse Focal Point Network for Semantic Segmentation on General LiDAR Point Clouds
Yanbo Wang, Wentao Zhao, Cao Chuan et al.
Towards Multimodal Open-Set Domain Generalization and Adaptation through Self-supervision
Hao Dong, Eleni Chatzi, Olga Fink
Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment
Brian Gordon, Yonatan Bitton, Yonatan Shafir et al.
Keypoint Promptable Re-Identification
Vladimir Somers, Alexandre ALahi, Christophe De Vleeschouwer
Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition
Mingfang Zhang, Yifei Huang, Ruicong Liu et al.
City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web
Kaiwen Song, Xiaoyi Zeng, Chenqu Ren et al.
PetFace: A Large-Scale Dataset and Benchmark for Animal Identification
Risa Shinoda, Kaede Shiohara
Latent Diffusion Prior Enhanced Deep Unfolding for Snapshot Spectral Compressive Imaging
Zongliang Wu, Ruiying Lu, Ying Fu et al.
RAW-Adapter: Adapting Pretrained Visual Model to Camera RAW Images
Ziteng Cui, Tatsuya Harada
PromptIQA: Boosting the Performance and Generalization for No-Reference Image Quality Assessment via Prompts
Zewen Chen, Haina Qin, Juan Wang et al.
Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal
Yeying Jin, Xin Li, Jiadong Wang et al.
CLOSER: Towards Better Representation Learning for Few-Shot Class-Incremental Learning
Junghun Oh, Sungyong Baik, Kyoung Mu Lee
Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation
Xuelu Feng, Dongdong Chen, Junsong Yuan et al.
PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer
Tongkun Guan, Chengyu Lin, Wei Shen et al.
Self-Supervised Video Desmoking for Laparoscopic Surgery
Renlong Wu, Zhilu Zhang, Shuohao Zhang et al.
Explain via Any Concept: Concept Bottleneck Model with Open Vocabulary Concepts
Andong Tan, Fengtao Zhou, Hao Chen
Knowledge Transfer with Simulated Inter-Image Erasing for Weakly Supervised Semantic Segmentation
Tao Chen, Xiruo Jiang, Gensheng Pei et al.
Motion-prior Contrast Maximization for Dense Continuous-Time Motion Estimation
Friedhelm Hamann, Ziyun Wang, Ioannis Asmanis et al.
Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance
Tien Toan Nguyen, Minh Nhat Nhat Vu, Baoru Huang et al.
One-stage Prompt-based Continual Learning
Youngeun Kim, YUHANG LI, Priyadarshini Panda
Unlocking the Potential of Federated Learning: The Symphony of Dataset Distillation via Deep Generative Latents
Yuqi Jia, Saeed Vahidian, Jingwei Sun et al.
Visual Alignment Pre-training for Sign Language Translation
Peiqi Jiao, Yuecong Min, Xilin CHEN
CoReS: Orchestrating the Dance of Reasoning and Segmentation
Xiaoyi Bao, Siyang Sun, Shuailei Ma et al.
LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors
Saksham Suri, Matthew Walmer, Kamal Gupta et al.
Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation
Bolin Lai, Fiona Ryan, Wenqi Jia et al.
M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models
Seunggeun Chi, Hyung-gun Chi, Hengbo Ma et al.
Deep Diffusion Image Prior for Efficient OOD Adaptation in 3D Inverse Problems
Hyungjin Chung, Jong Chul Ye
Emergent Visual-Semantic Hierarchies in Image-Text Representations
Morris Alper, Hadar Averbuch-Elor
UniM2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving
Jian Zou, Tianyu Huang, Guanglei Yang et al.
TransCAD: A Hierarchical Transformer for CAD Sequence Inference from Point Clouds
Dupont Elona, Kseniya Cherenkova, Dimitrios Mallis et al.
Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning
Yibing Wei, Abhinav Gupta, Pedro Morgado
Taming Latent Diffusion Model for Neural Radiance Field Inpainting
Chieh Lin, Changil Kim, Jia-Bin Huang et al.
GaussReg: Fast 3D Registration with Gaussian Splatting
Jiahao Chang, Yinglin Xu, Yihao Li et al.
Optimizing Diffusion Models for Joint Trajectory Prediction and Controllable Generation
Yixiao Wang, Chen Tang, Lingfeng Sun et al.
SuperGaussian: Repurposing Video Models for 3D Super Resolution
Yuan Shen, Duygu Ceylan, Paul Guerrero et al.
Controllable Navigation Instruction Generation with Chain of Thought Prompting
Xianghao Kong, Jinyu Chen, Wenguan Wang et al.
Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis
Qian Chen, Shihao Shu, Xiangzhi Bai
SegGen: Supercharging Segmentation Models with Text2Mask and Mask2Img Synthesis
Hanrong Ye, Jason Wen Yong Kuen, Qing Liu et al.
Tackling Structural Hallucination in Image Translation with Local Diffusion
Seunghoi Kim, Chen Jin, Tom Diethe et al.
PARE-Net: Position-Aware Rotation-Equivariant Networks for Robust Point Cloud Registration
Runzhao Yao, Shaoyi Du, Wenting Cui et al.
Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models
Yu-Chu Yu, Chi-Pin Huang, Jr-Jen Chen et al.
The Hard Positive Truth about Vision-Language Compositionality
Amita Kamath, Cheng-Yu Hsieh, Kai-Wei Chang et al.
TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Autonomous Driving
Cheng Zhao, su sun, Ruoyu Wang et al.
Osmosis: RGBD Diffusion Prior for Underwater Image Restoration
Opher Bar Nathan, Deborah Steinberger-Levy, Tali Treibitz et al.
Lazy Diffusion Transformer for Interactive Image Editing
Yotam Nitzan, Zongze Wu, Richard Zhang et al.
Image Demoireing in RAW and sRGB Domains
Shuning Xu, Binbin Song, Xiangyu Chen et al.
Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders
Alexandre Eymaël, Renaud Vandeghen, Anthony Cioppa et al.
Auto-GAS: Automated Proxy Discovery for Training-free Generative Architecture Search
Lujun Li, Haosen SUN, Shiwen Li et al.
An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding
Wei Chen, Long Chen, Yu Wu
Beyond MOT: Semantic Multi-Object Tracking
Yunhao Li, Qin Li, Hao Wang et al.
PatchRefiner: Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth Estimation
Zhenyu Li, Shariq Farooq Bhat, Peter Wonka
AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking
Yuheng Li, Tianyu Luan, Yizhou Wu et al.
TP2O: Creative Text Pair-to-Object Generation using Balance Swap-Sampling
Jun Li, Zedong Zhang, Jian Yang
Context Diffusion: In-Context Aware Image Generation
Ivona Najdenkoska, Animesh Sinha, Abhimanyu Dubey et al.
MSD: A Benchmark Dataset for Floor Plan Generation of Building Complexes
Casper van Engelenburg, Fatemeh Mostafavi, Emanuel Kuhn et al.
TimeLens-XL: Real-time Event-based Video Frame Interpolation with Large Motion
Shi Guo, Yutian Chen, Tianfan Xue et al.
VividDreamer: Invariant Score Distillation for Hyper-Realistic Text-to-3D Generation
Wenjie Zhuo, Fan Ma, Hehe Fan et al.
LookupViT: Compressing visual information to a limited number of tokens
Rajat Koner, Gagan Jain, Sujoy Paul et al.
Accelerating Image Generation with Sub-path Linear Approximation Model
Chen Xu, Tianhui Song, Weixin Feng et al.
A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting
Wouter Van Gansbeke, Bert De Brabandere
SAFNet: Selective Alignment Fusion Network for Efficient HDR Imaging
Lingtong Kong, Bo Li, Yike Xiong et al.
MeshSegmenter: Zero-Shot Mesh Segmentation via Texture Synthesis
ziming zhong, Yanyu Xu, Jing Li et al.
AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
Zanlin Ni, Yulin Wang, Renping Zhou et al.
LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation
Ruida Zhang, Ziqin Huang, Gu Wang et al.
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
Yingsen Zeng, Yujie Zhong, Chengjian Feng et al.
FRDiff : Feature Reuse for Universal Training-free Acceleration of Diffusion Models
Junhyuk So, Jungwon Lee, Eunhyeok Park
Instant 3D Human Avatar Generation using Image Diffusion Models
Nikos Kolotouros, Thiemo Alldieck, Enric Corona et al.
DG-PIC: Domain Generalized Point-In-Context Learning for Point Cloud Understanding
Jincen Jiang, Qianyu Zhou, Yuhang Li et al.
CardiacNet: Learning to Reconstruct Abnormalities for Cardiac Disease Assessment from Echocardiogram Videos
JIEWEN YANG, Yiqun Lin, Bin Pu et al.
Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Models
Hao Cheng, Erjia Xiao, Jindong Gu et al.
Faceptor: A Generalist Model for Face Perception
Lixiong Qin, Mei Wang, Xuannan Liu et al.
Dynamic Retraining-Updating Mean Teacher for Source-Free Object Detection
BA KHANH TRINH LE, Huy-Hung Nguyen, Long Hoang Pham et al.
Diffusion Bridges for 3D Point Cloud Denoising
Mathias Vogel, Keisuke Tateno, Marc Pollefeys et al.
Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos
Remy Sabathier, David Novotny, Niloy Mitra
Learning Camouflaged Object Detection from Noisy Pseudo Label
Jin Zhang, Ruiheng Zhang, Yanjiao Shi et al.
Tri^{2}-plane: Thinking Head Avatar via Feature Pyramid
Luchuan Song, Pinxin Liu, Lele Chen et al.
Large-Scale Multi-Hypotheses Cell Tracking Using Ultrametric Contours Maps
Jordao Bragantini, Merlin Lange, Loïc A Royer
VEON: Vocabulary-Enhanced Occupancy Prediction
Jilai Zheng, Pin Tang, Zhongdao Wang et al.
Enhancing Source-Free Domain Adaptive Object Detection with Low-confidence Pseudo Label Distillation
Ilhoon Yoon, Hyeongjun Kwon, Jin Kim et al.
PixOOD: Pixel-Level Out-of-Distribution Detection
Tomas Vojir, Jan Sochman, Jiri Matas
ColorMNet: A Memory-based Deep Spatial-Temporal Feature Propagation Network for Video Colorization
Yixin Yang, Jiangxin Dong, Jinhui Tang et al.
Reinforcement Learning Friendly Vision-Language Model for Minecraft
Haobin Jiang, Junpeng Yue, Hao Luo et al.
Open Panoramic Segmentation
Junwei Zheng, Ruiping Liu, Yufan Chen et al.
AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation
Yangchao Wu, Tian Yu Liu, Hyoungseob Park et al.
The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?
Qinyu Zhao, Ming Xu, Kartik Gupta et al.
A Simple Background Augmentation Method for Object Detection with Diffusion Model
YUHANG LI, Xin Dong, Chen Chen et al.
Long-term Temporal Context Gathering for Neural Video Compression
Linfeng Qi, Zhaoyang Jia, Jiahao Li et al.
TetraDiffusion: Tetrahedral Diffusion Models for 3D Shape Generation
Nikolai Kalischek, Torben Peters, Jan Dirk Wegner et al.
EvSign: Sign Language Recognition and Translation with Streaming Events
Pengyu Zhang, Hao Yin, Zeren Wang et al.
Norface: Improving Facial Expression Analysis by Identity Normalization
Hanwei Liu, Rudong An, Zhimeng Zhang et al.
Iterative Ensemble Training with Anti-Gradient Control for Mitigating Memorization in Diffusion Models
Xiao Liu, Xiaoliu Guan, Yu Wu et al.
Improving Text-guided Object Inpainting with Semantic Pre-inpainting
Yifu Chen, Jingwen Chen, Yingwei Pan et al.
Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
Shihao Zhao, Shaozhe Hao, Bojia Zi et al.