Most Cited ECCV "explanation quality" Papers
2,387 papers found • Page 3 of 12
Conference
Denoising Vision Transformers
Jiawei Yang, Katie Luo, Jiefeng Li et al.
An Economic Framework for 6-DoF Grasp Detection
Xiao-Ming Wu, Jia-Feng Cai, Jian-Jian Jiang et al.
Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion
Otto Seiskari, Jerry Ylilammi, Valtteri Kaatrasalo et al.
Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models
Yufei Zhan, Yousong Zhu, Zhiyang Chen et al.
MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views
Wangze Xu, Huachen Gao, Shihe Shen et al.
Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos
Mi Luo, Zihui Xue, Alex Dimakis et al.
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
Santiago Pascual, Chunghsin YEH, Ioannis Tsiamas et al.
Hierarchical Gaussian Mixture Normalizing Flow Modeling for Unified Anomaly Detection
Xincheng Yao, Ruoqi Li, Zefeng Qian et al.
View Selection for 3D Captioning via Diffusion Ranking
Tiange Luo, Justin Johnson, Honglak Lee
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
Xiangxiang Chu, Jianlin Su, Bo Zhang et al.
Agent3D-Zero: An Agent for Zero-shot 3D Understanding
Sha Zhang, Di Huang, Jiajun Deng et al.
N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields
Yash Bhalgat, Iro Laina, Joao F Henriques et al.
MapDistill: Boosting Efficient Camera-based HD Map Construction via Camera-LiDAR Fusion Model Distillation
Xiaoshuai Hao, Ruikai Li, Hui Zhang et al.
Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction
Xinhang Liu, Jiaben Chen, Shiu-Hong Kao et al.
Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs
Muhammad Jehanzeb Mirza, Leonid Karlinsky, Wei Lin et al.
UMBRAE: Unified Multimodal Brain Decoding
Weihao Xia, Raoul de Charette, Cengiz Oztireli et al.
Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
Fu-Yun Wang, Xiaoshi Wu, Zhaoyang Huang et al.
PosterLlama: Bridging Design Ability of Langauge Model to Content-Aware Layout Generation
Jaejung Seol, Seojun Kim, Jaejun Yoo
EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding
jiazhou zhou, Xu Zheng, Yuanhuiyi Lyu et al.
CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model
Aoran Xiao, Weihao Xuan, Heli Qi et al.
Soft Prompt Generation for Domain Generalization
Shuanghao Bai, Yuedi Zhang, Wanqi Zhou et al.
DragAPart: Learning a Part-Level Motion Prior for Articulated Objects
Ruining Li, Chuanxia Zheng, Christian Rupprecht et al.
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
Pilhyeon Lee, Hyeran Byun
WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing
Shuokang Huang, Kaihan Li, Di You et al.
DiffClass: Diffusion-Based Class Incremental Learning
Zichong Meng, Jie Zhang, Changdi Yang et al.
Learning Natural Consistency Representation for Face Forgery Video Detection
Daichi Zhang, Zihao Xiao, Shikun Li et al.
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Zhecan Wang, Garrett Bingham, Adams Wei Yu et al.
Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection
Hu Cao, Zehua Zhang, Yan Xia et al.
Video Editing via Factorized Diffusion Distillation
Uriel Singer, Amit Zohar, Yuval Kirstain et al.
I-MedSAM: Implicit Medical Image Segmentation with Segment Anything
Xiaobao Wei, Jiajun Cao, Yizhu Jin et al.
Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions
Fabio Tosi, Pierluigi Zama Ramirez, Matteo Poggi
Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding
Ozan Unal, Christos Sakaridis, Suman Saha et al.
Dataset Distillation by Automatic Training Trajectories
Dai Liu, Jindong Gu, Hu Cao et al.
Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training
Cheng Tan, Jingxuan Wei, Zhangyang Gao et al.
Nuvo: Neural UV Mapping for Unruly 3D Representations
Pratul Srinivasan, Stephan J Garbin, Dor Verbin et al.
MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders
Baijiong Lin, Weisen Jiang, Pengguang Chen et al.
Personalized Federated Domain-Incremental Learning based on Adaptive Knowledge Matching
Yichen Li, Wenchao Xu, Haozhao Wang et al.
PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving
Zhili Chen, Maosheng Ye, Shuangjie Xu et al.
WHAC: World-grounded Humans and Cameras
Wanqi Yin, Zhongang Cai, Chen Wei et al.
HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects
Xintao Lv, Liang Xu, Yichao Yan et al.
Human Hair Reconstruction with Strand-Aligned 3D Gaussians
Egor Zakharov, Vanessa Sklyarova, Michael J. Black et al.
Learning Unified Reference Representation for Unsupervised Multi-class Anomaly Detection
Liren He, Zhengkai Jiang, Jinlong Peng et al.
The Nerfect Match: Exploring NeRF Features for Visual Localization
Qunjie Zhou, Maxim Maximov, Or Litany et al.
T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models
Zhongqi Wang, Jie Zhang, Shiguang Shan et al.
VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors
Sungwon Hwang, Min-Jung Kim, Taewoong Kang et al.
Image Compression for Machine and Human Vision With Spatial-Frequency Adaptation
han li, Shaohui Li, Shuangrui Ding et al.
4D Contrastive Superflows are Dense 3D Representation Learners
Xiang Xu, Lingdong Kong, Hui Shuai et al.
Trackastra: Transformer-based cell tracking for live-cell microscopy
Benjamin Gallusser, Weigert Martin
Occupancy as Set of Points
Yiang Shi, Tianheng Cheng, Qian Zhang et al.
Cascade-Zero123: One Image to Highly Consistent 3D with Self-Prompted Nearby Views
Yabo Chen, Jiemin Fang, Yuyang Huang et al.
Zero-shot Object Counting with Good Exemplars
Huilin Zhu, Jingling Yuan, Zhengwei Yang et al.
Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities
Xu Zheng, Yuanhuiyi Lyu, LIN WANG
MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos
Yushuo Chen, Zerong Zheng, Zhe Li et al.
AccDiffusion: An Accurate Method for Higher-Resolution Image Generation
Zhihang Lin, Mingbao Lin, Meng Zhao et al.
Language-Driven Physics-Based Scene Synthesis and Editing via Feature Splatting
Ri-Zhao Qiu, Ge Yang, Weijia Zeng et al.
SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation
Heyuan Li, Ce Chen, Tianhao Shi et al.
DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification
Wenhui Zhu, Xiwen Chen, Peijie Qiu et al.
Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations
Tomáš Chobola, Yu Liu, Hanyi Zhang et al.
Attention Prompting on Image for Large Vision-Language Models
Runpeng Yu, Weihao Yu, Xinchao Wang
Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection
Ting Lei, Shaofeng Yin, Yuxin Peng et al.
GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features
Luc Sträter, Mohammadreza Salehi, Efstratios Gavves et al.
OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding
Ming Hu, Peng Xia, Lin Wang et al.
Masked Angle-Aware Autoencoder for Remote Sensing Images
Zhihao Li, Biao Hou, Siteng Ma et al.
DreamMotion: Space-Time Self-Similar Score Distillation for Zero-Shot Video Editing
Hyeonho Jeong, Jinho Chang, GEON YEONG PARK et al.
Do text-free diffusion models learn discriminative visual representations?
Soumik Mukhopadhyay, Matthew Gwilliam, Yosuke Yamaguchi et al.
ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation
Zhiyuan MA, Yuxiang WEI, Yabin Zhang et al.
MART: MultiscAle Relational Transformer Networks for Multi-agent Trajectory Prediction
Seongju Lee, Junseok Lee, Yeonguk Yu et al.
Tuning-Free Image Customization with Image and Text Guidance
Pengzhi Li, Qiang Nie, Ying Chen et al.
Safe-Sim: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries
WEI-JER Chang, Francesco Pittaluga, Masayoshi TOMIZUKA et al.
Few-shot Class Incremental Learning with Attention-Aware Self-Adaptive Prompt
Chenxi Liu, Zhenyi Wang, Tianyi Xiong et al.
FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification
Yu Tian, Congcong Wen, Min Shi et al.
StableDrag: Stable Dragging for Point-based Image Editing
Yutao Cui, Xiaotong Zhao, Guozhen Zhang et al.
SAM-COD: SAM-guided Unified Framework for Weakly-Supervised Camouflaged Object Detection
Huafeng Chen, Pengxu Wei, Guangqian Guo et al.
ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers
Jinke Li, Xiao He, Chonghua Zhou et al.
Multistain Pretraining for Slide Representation Learning in Pathology
Guillaume Jaume, Anurag J Vaidya, Andrew Zhang et al.
Progressive Pretext Task Learning for Human Trajectory Prediction
Xiaotong Lin, Tianming Liang, Jian-Huang Lai et al.
Benchmarking Object Detectors with COCO: A New Path Forward
Shweta Singh, Aayan Yadav, Jitesh Jain et al.
Dolfin: Diffusion Layout Transformers without Autoencoder
Yilin Wang, Zeyuan Chen, Liangjun Zhong et al.
Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation
Fangfu Liu, Hanyang Wang, Weiliang Chen et al.
SWinGS: Sliding Windows for Dynamic 3D Gaussian Splatting
Richard Shaw, Michal Nazarczuk, Song Jifei et al.
Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
Yifan Pu, Xia Zhuofan, Jiayi Guo et al.
Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching
Meng Chu, Zhedong Zheng, Wei Ji et al.
Seeing the Unseen: A Frequency Prompt Guided Transformer for Image Restoration
shihao zhou, Jinshan Pan, Jinglei Shi et al.
SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark
Zhengdi Yu, Shaoli Huang, yongkang cheng et al.
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Guez Aflalo et al.
Prioritized Semantic Learning for Zero-shot Instance Navigation
Xinyu Sun, Lizhao Liu, Hongyan Zhi et al.
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
Fucai Ke, Zhixi Cai, Simindokht Jahangard et al.
PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation
Yizhe Xiong, Hui Chen, Tianxiang Hao et al.
Enhancing Diffusion Models with Text-Encoder Reinforcement Learning
Chaofeng Chen, Annan Wang, Haoning Wu et al.
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning
Bolin Lai, Xiaoliang Dai, Lawrence Chen et al.
Efficient Inference of Vision Instruction-Following Models with Elastic Cache
ZUYAN LIU, Benlin Liu, Jiahui Wang et al.
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
Dahyun Kang, Minsu Cho
GeoCalib: Learning Single-image Calibration with Geometric Optimization
Alexander Veicht, Paul-Edouard Sarlin, Philipp Lindenberger et al.
MTKD: Multi-Teacher Knowledge Distillation for Image Super-Resolution
Yuxuan Jiang, Chen Feng, Fan Zhang et al.
WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering
Pingyi Chen, Chenglu Zhu, Sunyi Zheng et al.
Enhancing Vectorized Map Perception with Historical Rasterized Maps
Xiaoyu Zhang, Guangwei Liu, Zihao Liu et al.
A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks
Yixiang Qiu, Hao Fang, Hongyao Yu et al.
OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection
Hu Zhang, xu jianhua, Tao Tang et al.
LLMGA: Multimodal Large Language Model based Generation Assistant
Bin Xia, Shiyin Wang, Yingfan Tao et al.
Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving
Zhenghao Peng, Wenjie Luo, Yiren Lu et al.
SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow
Yuanzhi Zhu, Xingchao Liu, Qiang Liu
Toward Open Vocabulary Aerial Object Detection with CLIP-Activated Student-Teacher Learning
Yan Li, Weiwei Guo, Xue Yang et al.
TrojVLM: Backdoor Attack Against Vision Language Models
Weimin Lyu, Lu Pang, Tengfei Ma et al.
MUSES: The Multi-Sensor Semantic Perception Dataset for Driving under Uncertainty
Tim Broedermann, David Brüggemann, Christos Sakaridis et al.
Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes
Yaoting Wang, Peiwen Sun, Dongzhan Zhou et al.
SparseLIF: High-Performance Sparse LiDAR-Camera Fusion for 3D Object Detection
Hongcheng Zhang, Liu Liang, Pengxin Zeng et al.
Mask2Map: Vectorized HD Map Construction Using Bird's Eye View Segmentation Masks
Sehwan Choi, Jun Won Choi, JUNGHO KIM et al.
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
Talfan Evans, Shreya Pathak, Hamza Merzic et al.
Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection
Yuanpeng Tu, Boshen Zhang, Liang Liu et al.
Isomorphic Pruning for Vision Models
Gongfan Fang, Xinyin Ma, Michael Bi Mi et al.
Unleashing the Power of Prompt-driven Nucleus Instance Segmentation
Zhongyi Shui, Yunlong Zhang, Kai Yao et al.
Causality-inspired Discriminative Feature Learning in Triple Domains for Gait Recognition
Haijun Xiong, Bin Feng, Xinggang Wang et al.
EgoPoser: Robust Real-Time Egocentric Pose Estimation from Sparse and Intermittent Observations Everywhere
Jiaxi Jiang, Paul Streli, Manuel Meier et al.
Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models
Hyeonwoo Kim, Sookwan Han, Patrick Kwon et al.
AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer
Zhuguanyu Wu, Jiaxin Chen, Hanwen Zhong et al.
Cascade Prompt Learning for Visual-Language Model Adaptation
Ge Wu, Xin Zhang, Zheng Li et al.
Score Distillation Sampling with Learned Manifold Corrective
Thiemo Alldieck, Nikos Kolotouros, Cristian Sminchisescu
DyFADet: Dynamic Feature Aggregation for Temporal Action Detection
Le Yang, Ziwei Zheng, Yizeng Han et al.
LISO: Lidar-only Self-Supervised 3D Object Detection
Stefan Baur, Frank Moosmann, Andreas Geiger
VideoMamba: Spatio-Temporal Selective State Space Model
Jinyoung Park, Hee-Seon Kim, Kangwook Ko et al.
SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
Guohao Sun, Can Qin, JIAMINAN WANG et al.
Semantic Residual Prompts for Continual Learning
Martin Menabue, Emanuele Frascaroli, Matteo Boschini et al.
FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing
Gwanhyeong Koo, Sunjae Yoon, Ji Woo Hong et al.
Diffusion Soup: Model Merging for Text-to-Image Diffusion Models
Benjamin J Biggs, Arjun Seshadri, Yang Zou et al.
OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving
Guoqing Wang, Zhongdao Wang, Pin Tang et al.
TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete Data
Siyi Du, Shaoming Zheng, Yinsong Wang et al.
Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering
Ruofan Liang, Zan Gojcic, Merlin Nimier-David et al.
PreciseControl: Enhancing Text-To-Image Diffusion Models with Fine-Grained Attribute Control
Rishubh Parihar, Sachidanand VS, Sabariswaran Mani et al.
Facial Affective Behavior Analysis with Instruction Tuning
Yifan Li, Anh Dao, Wentao Bao et al.
Text-Conditioned Resampler For Long Form Video Understanding
Bruno Korbar, Yongqin Xian, Alessio Tonioni et al.
Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation
Nina Weng, Paraskevas Pegios, Eike Petersen et al.
G3R: Gradient Guided Generalizable Reconstruction
Yun Chen, Jingkang Wang, Ze Yang et al.
EAS-SNN: End-to-End Adaptive Sampling and Representation for Event-based Detection with Recurrent Spiking Neural Networks
Ziming Wang, Ziling Wang, Huaning Li et al.
RealViformer: Investigating Attention for Real-World Video Super-Resolution
Yuehan Zhang, Angela Yao
F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions
Jie Yang, Xuesong Niu, Nan Jiang et al.
Object-Centric Diffusion for Efficient Video Editing
Kumara Kahatapitiya, Adil Karjauv, Davide Abati et al.
StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models
Wen Li, Muyuan Fang, Cheng Zou et al.
Region-Adaptive Transform with Segmentation Prior for Image Compression
Yuxi Liu, Wenhan Yang, Huihui Bai et al.
FocusDiffuser: Perceiving Local Disparities for Camouflaged Object Detection
Jianwei Zhao, Xin Li, Fan Yang et al.
Defect Spectrum: A Granular Look of Large-scale Defect Datasets with Rich Semantics
Shuai Yang, ZhiFei Chen, Pengguang Chen et al.
Teaching Tailored to Talent: Adverse Weather Restoration via Prompt Pool and Depth-Anything Constraint
Sixiang Chen, Tian Ye, Kai Zhang et al.
HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models
Shen Zhang, Zhaowei CHEN, Zhenyu Zhao et al.
ControlNet-XS: Rethinking the Control of Text-to-Image Diffusion Models as Feedback-Control Systems
Denis Zavadski, Johann-Friedrich Feiden, Carsten Rother
EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding
Yuanming Li, Wei-Jin Huang, An-Lan Wang et al.
Improving Medical Multi-modal Contrastive Learning with Expert Annotations
Yogesh Kumar, Pekka Marttinen
Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time
Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta et al.
DataDream: Few-shot Guided Dataset Generation
Jae Myung Kim, Jessica Bader, Stephan Alaniz et al.
Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts
Byeongjun Park, Hyojun Go, Jin-Young Kim et al.
milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing
Fangqiang Ding, Zhen Luo, Peijun Zhao et al.
PALM: Predicting Actions through Language Models
Sanghwan Kim, Daoji Huang, Yongqin Xian et al.
Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation
Juncheng Ma, Peiwen Sun, Yaoting Wang et al.
Revisit Anything: Visual Place Recognition via Image Segment Retrieval
Kartik Garg, Sai Shubodh Puligilla, Shishir N Y Kolathaya et al.
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Haiyang Wang, Hao Tang, Li Jiang et al.
An Incremental Unified Framework for Small Defect Inspection
Jiaqi Tang, Hao Lu, Xiaogang Xu et al.
Online Zero-Shot Classification with CLIP
Qi Qian, JUHUA HU
Factorized Diffusion: Perceptual Illusions by Noise Decomposition
Daniel Geng, Inbum Park, Andrew Owens
PromptFusion: Decoupling Stability and Plasticity for Continual Learning
Haoran Chen, Zuxuan Wu, Xintong Han et al.
ViLA: Efficient Video-Language Alignment for Video Question Answering
Xijun Wang, Junbang Liang, Chun-Kai Wang et al.
PreSight: Enhancing Autonomous Vehicle Perception with City-Scale NeRF Priors
Tianyuan Yuan, Mao Yucheng, Jiawei Yang et al.
IRGen: Generative Modeling for Image Retrieval
Yidan Zhang, Ting Zhang, DONG CHEN et al.
DIFFender: Diffusion-Based Adversarial Defense against Patch Attacks
Caixin Kang, Yinpeng Dong, Zhengyi Wang et al.
Robust Calibration of Large Vision-Language Adapters
Balamurali Murugesan, Julio Silva-Rodríguez, Ismail Ben Ayed et al.
SLEDGE: Synthesizing Driving Environments with Generative Models and Rule-Based Traffic
Kashyap Chitta, Daniel Dauner, Andreas Geiger
Collaborative Control for Geometry-Conditioned PBR Image Generation
Shimon Vainer, Mark Boss, Mathias Parger et al.
Reliability in Semantic Segmentation: Can We Use Synthetic Data?
Thibaut Loiseau, Tuan Hung Vu, Mickael Chen et al.
Continuous Memory Representation for Anomaly Detection
Joo Chan Lee, Taejune Kim, Eunbyung Park et al.
Eliminating Feature Ambiguity for Few-Shot Segmentation
Qianxiong Xu, Guosheng Lin, Chen Change Loy et al.
OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations
Yiming Zuo, Jia Deng
RadEdit: stress-testing biomedical vision models via diffusion image editing
Fernando Pérez-García, Sam Bond-Taylor, Pedro Sanchez et al.
VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions
Seokha Moon, Hyun Woo, Hongbeen Park et al.
AddressCLIP: Empowering Vision-Language Models for City-wide Image Address Localization
Shixiong Xu, Chenghao Zhang, Lubin Fan et al.
SEED: A Simple and Effective 3D DETR in Point Clouds
Zhe Liu, Jinghua Hou, Xiaoqing Ye et al.
SelEx: Self-Expertise in Fine-Grained Generalized Category Discovery
Sarah Rastegar, Mohammadreza Salehi, Yuki M Asano et al.
Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion
Xiang Fan, Anand Bhattad, Ranjay Krishna
Robust-Wide: Robust Watermarking against Instruction-driven Image Editing
Runyi Hu, Jie Zhang, Ting Xu et al.
3D Hand Pose Estimation in Everyday Egocentric Images
Aditya Prakash, Ruisen Tu, Matthew Chang et al.
NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields
Muhammad Zubair Irshad, Sergey Zakharov, Vitor Guizilini et al.
DIM: Dyadic Interaction Modeling for Social Behavior Generation
Minh Tran, Di Chang, Maksim Siniukov et al.
MotionChain: Conversational Motion Controllers via Multimodal Prompts
Biao Jiang, Xin Chen, Chi Zhang et al.
Text2LiDAR: Text-guided LiDAR Point Clouds Generation via Equirectangular Transformer
Yang Wu, Kaihua Zhang, Jianjun Qian et al.
Visible and Clear: Finding Tiny Objects in Difference Map
Bing Cao, Haiyu Yao, Pengfei Zhu et al.
One-Shot Diffusion Mimicker for Handwritten Text Generation
Gang Dai, Yifan Zhang, Quhui Ke et al.
FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients
Shangchao Su, Bin Li, Xiangyang Xue
Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration
Chu Jie Qin, Ruiqi Wu, Zikun Liu et al.
Is Retain Set All You Need in Machine Unlearning? Restoring Performance of Unlearned Models with Out-Of-Distribution Images
Jacopo Bonato, Marco Cotogni, Luigi Sabetta
A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars
Ronglai Zuo, Fangyun Wei, Zenggui Chen et al.
Match-Stereo-Videos: Bidirectional Alignment for Consistent Dynamic Stereo Matching
Junpeng Jing, Ye Mao, Krystian Mikolajczyk
Textual Query-Driven Mask Transformer for Domain Generalized Segmentation
Byeonghyun Pak, Byeongju Woo, Sunghwan Kim et al.
AMEGO: Active Memory from long EGOcentric videos
Gabriele Goletto, Tushar Nagarajan, Giuseppe Averta et al.
ZeST: Zero-Shot Material Transfer from a Single Image
Ta-Ying Cheng, Prafull Sharma, Andrew Markham et al.
Navigation Instruction Generation with BEV Perception and Large Language Models
Sheng Fan, Rui Liu, Wenguan Wang et al.
WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding
Quan Kong, Yuki Kawana, Rajat Saini et al.
Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation
Guan Gui, Bin-Bin Gao, Jun Liu et al.
Protecting NeRFs' Copyright via Plug-And-Play Watermarking Base Model
Qi Song, Ziyuan Luo, Ka Chun Cheung et al.
Training-free Video Temporal Grounding using Large-scale Pre-trained Models
Minghang Zheng, Xinhao Cai, Qingchao Chen et al.
LayoutFlow: Flow Matching for Layout Generation
Julian Jorge Andrade Guerreiro, Naoto Inoue, Kento Masui et al.
EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models
Ruoxi Chen, Haibo Jin, Yixin Liu et al.
CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches
Sifan Wu, Amir Hosein Khasahmadi, Mor Katz et al.
SAGS: Structure-Aware 3D Gaussian Splatting
Evangelos Ververas, Rolandos Alexandros Potamias, Song Jifei et al.
HERGen: Elevating Radiology Report Generation with Longitudinal Data
Fuying Wang, Shenghui Du, Lequan Yu