Most Cited ECCV "watermark accessibility" Papers
2,387 papers found • Page 10 of 12
Conference
Towards Stable 3D Object Detection
Jiabao Wang, Qiang Meng, Guochao Liu et al.
HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban Scenes
Zhuopeng Li, Yilin Zhang, Chenming Wu et al.
KeypointDETR: An End-to-End 3D Keypoint Detector
Hairong Jin, Yuefan Shen, Jianwen Lou et al.
Generating Human Interaction Motions in Scenes with Text Control
Hongwei Yi, Justus Thies, Michael J. Black et al.
Optimizing Illuminant Estimation in Dual-Exposure HDR Imaging
Mahmoud Afifi, Zhenhua Hu, Liang Liang
Revisit Human-Scene Interaction via Space Occupancy
Xinpeng Liu, Haowen Hou, Yanchao Yang et al.
Progressive Proxy Anchor Propagation for Unsupervised Semantic Segmentation
Hyun Seok Seong, WonJun Moon, SuBeen Lee et al.
Multi-branch Collaborative Learning Network for 3D Visual Grounding
Zhipeng Qian, Yiwei Ma, Zhekai Lin et al.
FLAT: Flux-aware Imperceptible Adversarial Attacks on 3D Point Clouds
Keke Tang, Lujie Huang, Weilong Peng et al.
Instruction Tuning-free Visual Token Complement for Multimodal LLMs
Dongsheng Wang, Jiequan Cui, Miaoge Li et al.
Improving Point-based Crowd Counting and Localization Based on Auxiliary Point Guidance
I-HSIANG CHEN, Wei-Ting Chen, Yu-Wei Liu et al.
JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation
ChenHan Jiang, Yihan Zeng, Tianyang Hu et al.
Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts
Shuangkang Fang, Yufeng Wang, Yi-Hsuan Tsai et al.
Online Temporal Action Localization with Memory-Augmented Transformer
Youngkil Song, Dongkeun Kim, Minsu Cho et al.
Disentangled Generation and Aggregation for Robust Radiance Fields
Shihe Shen, Huachen Gao, Wangze Xu et al.
MANIKIN: Biomechanically Accurate Neural Inverse Kinematics for Human Motion Estimation
Jiaxi Jiang, Paul Streli, Xuejing Luo et al.
Click-Gaussian: Interactive Segmentation to Any 3D Gaussians
Seokhun Choi, Hyeonseop Song, Jaechul Kim et al.
Online Vectorized HD Map Construction using Geometry
Zhixin Zhang, Yiyuan Zhang, Xiaohan Ding et al.
HybridBooth: Hybrid Prompt Inversion for Efficient Subject-Driven Generation
Shanyan Guan, Yanhao Ge, Ying Tai et al.
Diffusion-Guided Weakly Supervised Semantic Segmentation
Sung-Hoon Yoon, Hoyong Kwon, Jaeseok Jeong et al.
Panel-Specific Degradation Representation for Raw Under-Display Camera Image Restoration
Youngjin Oh, Keuntek Lee, Jooyoung Lee et al.
SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer
Zijie Wu, Chaohui Yu, Yanqin Jiang et al.
Fully Authentic Visual Question Answering Dataset from Online Communities
Chongyan Chen, Mengchen Liu, Noel C Codella et al.
CoMusion: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion
Jiarui Sun, Girish Chowdhary
Real-data-driven 2000 FPS Color Video from Mosaicked Chromatic Spikes
Siqi Yang, Zhaojun Huang, Yakun Chang et al.
Revisit Self-supervision with Local Structure-from-Motion
Shengjie Zhu, Xiaoming Liu
On the Viability of Monocular Depth Pre-training for Semantic Segmentation
DONG LAO, Fengyu Yang, Daniel Wang et al.
Weakly-supervised Camera Localization by Ground-to-satellite Image Registration
Yujiao Shi, HONGDONG LI, Akhil Perincherry et al.
GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval
Han Zhou, Wei Dong, Xiaohong Liu et al.
Open-Vocabulary Camouflaged Object Segmentation
Youwei Pang, Xiaoqi Zhao, JiaMing Zuo et al.
ProMerge: Prompt and Merge for Unsupervised Instance Segmentation
Dylan Li, Gyungin Shin
GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection
Ziying Song, Lei Yang, Shaoqing Xu et al.
Revisiting Feature Disentanglement Strategy in Diffusion Training and Breaking Conditional Independence Assumption in Sampling
Wonwoong Cho, Hareesh Ravi, Midhun Harikumar et al.
ProtoComp: Diverse Point Cloud Completion with Controllable Prototype
Xumin Yu, Yanbo Wang, Jie Zhou et al.
IAM-VFI : Interpolate Any Motion for Video Frame Interpolation with motion complexity map
Kihwan Yoon, Yong Han Kim, Sungjei Kim et al.
Co-Student: Collaborating Strong and Weak Students for Sparsely Annotated Object Detection
Lianjun Wu, Jiangxiao Han, Zengqiang Zheng et al.
Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture
Xuanchen Li, Yuhao Cheng, Xingyu Ren et al.
EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis
Shuai Tan, Bin Ji, Mengxiao Bi et al.
Geospecific View Generation - Geometry-Context Aware High-resolution Ground View Inference from Satellite Views
Ningli Xu, Rongjun Qin
Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos
Mi Luo, Zihui Xue, Alex Dimakis et al.
TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation
Yufei Liu, Junwei Zhu, Junshu Tang et al.
Privacy-Preserving Adaptive Re-Identification without Image Transfer
Hamza Rami, Jhony H. Giraldo, Nicolas Winckler et al.
LivePhoto: Real Image Animation with Text-guided Motion Control
Xi Chen, Zhiheng Liu, Mengting Chen et al.
GroupDiff: Diffusion-based Group Portrait Editing
Yuming Jiang, Nanxuan Zhao, Qing Liu et al.
Motion Aware Event Representation-driven Image Deblurring
Zhijing Sun, Xueyang Fu, Longzhuo Huang et al.
DiffPMAE: Diffusion Masked Autoencoders for Point Cloud Reconstruction
YANLONG LI, Chamara Madarasingha, Kanchana Thilakarathna
OLAF: A Plug-and-Play Framework for Enhanced Multi-object Multi-part Scene Parsing
Pranav Gupta, Rishubh Singh, Pradeep Shenoy et al.
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
Wendi Zheng, Jiayan Teng, Zhuoyi Yang et al.
Uncertainty Calibration with Energy Based Instance-wise Scaling in the Wild Dataset
Mijoo Kim, Junseok Kwon
OAPT: Offset-Aware Partition Transformer for Double JPEG Artifacts Removal
Qiao Mo, Yukang Ding, Jinhua Hao et al.
Seeing the Unseen: A Frequency Prompt Guided Transformer for Image Restoration
shihao zhou, Jinshan Pan, Jinglei Shi et al.
Category Adaptation Meets Projected Distillation in Generalized Continual Category Discovery
Grzegorz Rypesc, Daniel Marczak, Sebastian Cygert et al.
Animate Your Motion: Turning Still Images into Dynamic Videos
Mingxiao Li, Bo Wan, Marie-Francine Moens et al.
Spatial-Temporal Multi-level Association for Video Object Segmentation
Deshui Miao, Xin Li, Zhenyu He et al.
High-Resolution and Few-shot View Synthesis from Asymmetric Dual-lens Inputs
Ruikang Xu, Mingde Yao, Yue Li et al.
MapDistill: Boosting Efficient Camera-based HD Map Construction via Camera-LiDAR Fusion Model Distillation
Xiaoshuai Hao, Ruikai Li, Hui Zhang et al.
Context-Aware Action Recognition: Introducing a Comprehensive Dataset for Behavior Contrast
Tatsuya Sasaki, Yoshiki Ito, Satoshi Kondo
ST-LDM: A Universal Framework for Text-Grounded Object Generation in Real Images
Xiangtian Xue, Jiasong Wu, Youyong Kong et al.
AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models
Xuelong Dai, Kaisheng Liang, Bin Xiao
Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition
Masashi Hatano, Ryo Hachiuma, Ryo Fujii et al.
Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized Visual Class Discovery
Haiyang Zheng, Pu Nan, Wenjing Li et al.
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
Jinbo Xing, Menghan Xia, Yong Zhang et al.
UniProcessor: A Text-induced Unified Low-level Image Processor
Huiyu Duan, Xiongkuo Min, Sijing Wu et al.
Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors
Tongkun Guan, Wei Shen, Xue Yang et al.
Textual Grounding for Open-vocabulary Visual Information Extraction in Layout-diversified Documents
MENGJUN CHENG, Chengquan Zhang, Chang Liu et al.
TAPTR: Tracking Any Point with Transformers as Detection
Hongyang Li, Hao Zhang, Shilong Liu et al.
Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach
Shizhou Zhang, Wenlong Luo, De Cheng et al.
Learning Chain of Counterfactual Thought for Bias-Robust Vision-Language Reasoning
Yifeng Zhang, Ming Jiang, Qi Zhao
Text2Place: Affordance-aware Text Guided Human Placement
Rishubh Parihar, Harsh Gupta, Sachidanand VS et al.
CPT-VR: Improving Surface Rendering via Closest Point Transform with View-Reflection Appearance
Zhipeng Hu, Yongqiang Zhang, Chen Liu et al.
Relightable 3D Gaussians: Realistic Point Cloud Relighting with BRDF Decomposition and Ray Tracing
Jian Gao, chun gu, Youtian Lin et al.
A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks
Yixiang Qiu, Hao Fang, Hongyao Yu et al.
BlazeBVD: Make Scale-Time Equalization Great Again for Blind Video Deflickering
Xinmin Qiu, Congying Han, Zicheng Zhang et al.
Let the Avatar Talk using Texts without Paired Training Data
Xiuzhe Wu, Yang-Tian Sun, Handi Chen et al.
SAFARI: Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation
Sayan Nag, Koustava Goswami, Srikrishna Karanam
MAD-DR: Map Compression for Visual Localization with Matchness Aware Descriptor Dimension Reduction
Qiang Wang
Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution
Xi Yang, Chenhang He, Jianqi Ma et al.
LLM as Copilot for Coarse-grained Vision-and-Language Navigation
Yanyuan Qiao, Qianyi Liu, Jiajun Liu et al.
Physically Plausible Color Correction for Neural Radiance Fields
Qi Zhang, Ying Feng, HONGDONG LI
Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
Fu-Yun Wang, Xiaoshi Wu, Zhaoyang Huang et al.
Attention Beats Linear for Fast Implicit Neural Representation Generation
Shuyi Zhang, Ke Liu, Jingjun Gu et al.
Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning
Yunbin Tu, Liang Li, Li Su et al.
Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks
Hunmin Yang, Jongoh Jeong, Kuk-Jin Yoon
PLOT: Text-based Person Search with Part Slot Attention for Corresponding Part Discovery
Jicheol Park, Dongwon Kim, Boseung Jeong et al.
Prompt-Based Test-Time Real Image Dehazing: A Novel Pipeline
Zixuan Chen, Zewei He, Ziqian Lu et al.
RCS-Prompt: Learning Prompt to Rearrange Class Space for Prompt-based Continual Learning
Longrong Yang, Hanbin Zhao, Yunlong Yu et al.
Dynamic Guidance Adversarial Distillation with Enhanced Teacher Knowledge
Hyejin Park, Dongbo Min
Solving Motion Planning Tasks with a Scalable Generative Model
Yihan Hu, Siqi Chai, Zhening Yang et al.
Versatile Incremental Learning: Towards Class and Domain-Agnostic Incremental Learning
Minyeong Park, Jae-Ho Lee, Gyeong-Moon Park
Parrot Captions Teach CLIP to Spot Text
Yiqi Lin, Conghui He, Alex Jinpeng Wang et al.
Gaussian Grouping: Segment and Edit Anything in 3D Scenes
Mingqiao Ye, Martin Danelljan, Fisher Yu et al.
3D Hand Sequence Recovery from Real Blurry Images and Event Stream
Joonkyu Park, Gyeongsik Moon, Weipeng Xu et al.
A Direct Approach to Viewing Graph Solvability
Federica Arrigoni, Andrea Fusiello, Tomas Pajdla
Three Things We Need to Know About Transferring Stable Diffusion to Visual Dense Prediciton Tasks
Manyuan Zhang, Guanglu Song, Xiaoyu Shi et al.
Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models
Yasi Zhang, Peiyu Yu, Ying Nian Wu
Unsqueeze [CLS] Bottleneck to Learn Rich Representations
Qing Su, Shihao Ji
DomainFusion: Generalizing To Unseen Domains with Latent Diffusion Models
Yuyang Huang, Yabo Chen, Yuchen Liu et al.
Tree-D Fusion: Simulation-Ready Tree Dataset from Single Images with Diffusion Priors
Jae Joong Lee, Bosheng Li, Sara Beery et al.
Segmentation-guided Layer-wise Image Vectorization with Gradient Fills
Hengyu Zhou, Hui Zhang, Bin Wang
Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects
Zicong Fan, Takehiko Ohkawa, Linlin Yang et al.
TrajPrompt: Aligning Color Trajectory with Vision-Language Representations
Li-Wu Tsao, Hao-Tang Tsui, Yu-Rou Tuan et al.
Strike a Balance in Continual Panoptic Segmentation
Jinpeng Chen, Runmin Cong, Yuxuan Luo et al.
Expressive Whole-Body 3D Gaussian Avatar
Gyeongsik Moon, Takaaki Shiratori, Shunsuke Saito
HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting
Zhenglin Zhou, Fan Ma, Hehe Fan et al.
Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion
Hang Xu, Chen Long, Wenxiao Zhang et al.
Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture
ShahRukh Athar, Shunsuke Saito, Stanislav Pidhorskyi et al.
StructLDM: Structured Latent Diffusion for 3D Human Generation
Tao Hu, Fangzhou Hong, Ziwei Liu
Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers
Chi-Pin Huang, Kai-Po Chang, Chung-Ting Tsai et al.
HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts
Wonjae Kim, Sanghyuk Chun, Taekyung Kim et al.
High-Fidelity Modeling of Generalizable Wrinkle Deformation
Jingfan Guo, Jae Shin Yoon, Shunsuke Saito et al.
COMPOSE: Comprehensive Portrait Shadow Editing
Andrew Hou, Zhixin Shu, Xuaner Zhang et al.
EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion
Guangyao Zhai, Evin Pınar Örnek, Dave Zhenyu Chen et al.
MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation
KUNPENG SONG, Yizhe Zhu, Bingchen Liu et al.
Learning Representations from Foundation Models for Domain Generalized Stereo Matching
Yongjian Zhang, Longguang Wang, Kunhong Li et al.
Global Structure-from-Motion Revisited
Linfei Pan, Daniel Barath, Marc Pollefeys et al.
NeRF-XL: NeRF at Any Scale with Multi-GPU
Ruilong Li, Sanja Fidler, Angjoo Kanazawa et al.
ReMatching: Low-Resolution Representations for Scalable Shape Correspondence
Filippo Maggioli, Daniele Baieri, Emanuele Rodola et al.
3D Hand Pose Estimation in Everyday Egocentric Images
Aditya Prakash, Ruisen Tu, Matthew Chang et al.
DEAL: Disentangle and Localize Concept-level Explanations for VLMs
Tang Li, Mengmeng Ma, Xi Peng
Controllable Human-Object Interaction Synthesis
Jiaman Li, Alexander Clegg, Roozbeh Mottaghi et al.
Nymeria: A Massive Collection of Egocentric Multi-modal Human Motion in the Wild
Lingni Ma, Yuting Ye, Rowan Postyeni et al.
MTKD: Multi-Teacher Knowledge Distillation for Image Super-Resolution
Yuxuan Jiang, Chen Feng, Fan Zhang et al.
Appearance-based Refinement for Object-Centric Motion Segmentation
Junyu Xie, Weidi Xie, Andrew ZISSERMAN
iMatching: Imperative Correspondence Learning
Chen Wang, Dasong Gao, Yun-Jou Lin et al.
AnyHome: Open-Vocabulary Large-Scale Indoor Scene Generation with First-Person View Exploration
Rao Fu, Zehao Wen, Zichen Liu et al.
Towards High-Quality 3D Motion Transfer with Realistic Apparel Animation
Rong Wang, Wei Mao, Changsheng Lu et al.
Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene
Ruiyang Zhang, Hu Zhang, Hang Yu et al.
SlotLifter: Slot-guided Feature Lifting for Learning Object-Centric Radiance Fields
Yu Liu, Baoxiong Jia, Yixin Chen et al.
Confidence Self-Calibration for Multi-Label Class-Incremental Learning
Kaile Du, Yifan Zhou, Fan Lyu et al.
Fast View Synthesis of Casual Videos with Soup-of-Planes
Yao-Chih Lee, Zhoutong Zhang, Kevin Blackburn-Matzen et al.
Six-Point Method for Multi-Camera Systems with Reduced Solution Space
Banglei Guan, Ji Zhao, Laurent Kneip
A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties
Junfei Xiao, Ziqi Zhou, Wenxuan Li et al.
Tuning-Free Image Customization with Image and Text Guidance
Pengzhi Li, Qiang Nie, Ying Chen et al.
MegaScenes: Scene-Level View Synthesis at Scale
Joseph Tung, Gene Chou, Ruojin Cai et al.
Mono-ViFI: A Unified Learning Framework for Self-supervised Single- and Multi-frame Monocular Depth Estimation
Jinfeng Liu, Lingtong Kong, Bo Li et al.
Watch Your Steps: Local Image and Scene Editing by Text Instructions
Ashkan Mirzaei, Tristan T Aumentado-Armstrong, Marcus A Brubaker et al.
ControlCap: Controllable Region-level Captioning
Yuzhong Zhao, Liu Yue, Zonghao Guo et al.
Neural graphics texture compression supporting random access
Farzad Farhadzadeh, Qiqi Hou, Hoang Le et al.
U-COPE: Taking a Further Step to Universal 9D Category-level Object Pose Estimation
li zhang, Weiqing Meng, Yan Zhong et al.
Idea2Img: Iterative Self-Refinement with GPT-4V for Automatic Image Design and Generation
Zhengyuan Yang, Jianfeng Wang, Linjie Li et al.
Preventing Catastrophic Forgetting through Memory Networks in Continuous Detection
Gaurav Bhatt, Leonid Sigal, James Ross
Trajectory-aligned Space-time Tokens for Few-shot Action Recognition
Pulkit Kumar, Namitha Padmanabhan, Luke Luo et al.
Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models
Longxiang Tang, Zhuotao Tian, Kai Li et al.
Robust Incremental Structure-from-Motion with Hybrid Features
Shaohui Liu, Yidan Gao, Tianyi Zhang et al.
COIN-Matting: Confounder Intervention for Image Matting
Zhaohe Liao, Jiangtong Li, Jun Lan et al.
E3V-K5: An Authentic Benchmark for Redefining Video-Based Energy Expenditure Estimation
Shengxuming Zhang, Lei Jin, Yifan Wang et al.
Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection
Xingyu Peng, Yan Bai, Chen Gao et al.
Score Distillation Sampling with Learned Manifold Corrective
Thiemo Alldieck, Nikos Kolotouros, Cristian Sminchisescu
EGIC: Enhanced Low-Bit-Rate Generative Image Compression Guided by Semantic Segmentation
Nikolai Körber, Eduard Kromer, Andreas Siebert et al.
Rethinking Weakly-supervised Video Temporal Grounding From a Game Perspective
Xiang Fang, Zeyu Xiong, Wanlong Fang et al.
FlexAttention for Efficient High-Resolution Vision-Language Models
Junyan Li, Delin Chen, Tianle Cai et al.
AdaDiffSR: Adaptive Region-aware Dynamic acceleration Diffusion Model for Real-World Image Super-Resolution
Yuanting Fan, Chengxu Liu, Nengzhong Yin et al.
Asymmetric Mask Scheme for Self-Supervised Real Image Denoising
Xiangyu Liao, Tianheng Zheng, Jiayu Zhong et al.
Pathformer3D: A 3D Scanpath Transformer for 360° Images
Rong Quan, yantao Lai, Mengyu Qiu et al.
AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection
Yunkang Cao, Jiangning Zhang, Luca Frittoli et al.
Visual Prompting via Partial Optimal Transport
MENGYU ZHENG, Zhiwei Hao, Yehui Tang et al.
LiteSAM is Actually what you Need for segment Everything
Jianhai Fu, Yuanjie Yu, Ningchuan Li et al.
Deep Patch Visual SLAM
Lahav Lipson, Zachary Teed, Jia Deng
Efficient Training of Spiking Neural Networks with Multi-Parallel Implicit Stream Architecture
Zhigao Cao, Meng Li, Xiashuang Wang et al.
SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction
Marko Mihajlovic, Sergey Prokudin, Siyu Tang et al.
Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels
Rui Huang, Songyou Peng, Ayca Takmaz et al.
Domesticating SAM for Breast Ultrasound Image Segmentation via Spatial-frequency Fusion and Uncertainty Correction
Wanting Zhang, Huisi Wu, Jing Qin
BRAVE: Broadening the visual encoding of vision-language models
Oguzhan Fatih Kar, Alessio Tonioni, Petra Poklukar et al.
Diff3DETR: Agent-based Diffusion Model for Semi-supervised 3D Object Detection
Jiacheng Deng, Jiahao Lu, Tianzhu Zhang
MarineInst: A Foundation Model for Marine Image Analysis with Instance Visual Description
Ziqiang Zheng, Yiwei Chen, Huimin Zeng et al.
Interaction-centric Spatio-Temporal Context Reasoning for Multi-Person Video HOI Recognition
Yisong Wang, Nan Xi, Jingjing Meng et al.
Multi-modal Relation Distillation for Unified 3D Representation Learning
Huiqun Wang, Yiping Bao, Panwang Pan et al.
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
Yanwei Li, Chengyao Wang, Jiaya Jia
Masked Angle-Aware Autoencoder for Remote Sensing Images
Zhihao Li, Biao Hou, Siteng Ma et al.
6DoF Head Pose Estimation through Explicit Bidirectional Interaction with Face Geometry
Sungho Chun, Ju Yong Chang
Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos
Akshay Paruchuri, Samuel Ehrenstein, Shuxian Wang et al.
ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion Models against Stochastic Perturbation
Yi Zhang, Yun Tang, Wenjie Ruan et al.
Agent3D-Zero: An Agent for Zero-shot 3D Understanding
Sha Zhang, Di Huang, Jiajun Deng et al.
S-JEPA: A Joint Embedding Predictive Architecture for Skeletal Action Recognition
Mohamed Abdelfattah, Alexandre ALahi
Integer-Valued Training and Spike-driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection
Xinhao Luo, Man Yao, Yuhong Chou et al.
Structured-NeRF: Hierarchical Scene Graph with Neural Representation
Zhide Zhong, Jiakai Cao, songen gu et al.
Improving Unsupervised Domain Adaptation: A Pseudo-Candidate Set Approach
Aveen Dayal, Rishabh Lalla, Linga Reddy Cenkeramaddi et al.
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Haiyang Wang, Hao Tang, Li Jiang et al.
PoseAugment: Generative Human Pose Data Augmentation with Physical Plausibility for IMU-based Motion Capture
Zhuojun Li, Chun Yu, Chen Liang et al.
APL: Anchor-based Prompt Learning for One-stage Weakly Supervised Referring Expression Comprehension
Yaxin Luo, Jiayi Ji, Xiaofu Chen et al.
SCP-Diff: Spatial-Categorical Joint Prior for Diffusion Based Semantic Image Synthesis
Huan-ang Gao, Mingju Gao, Jiaju Li et al.
DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency
Xiaojing Zhong, Xinyi Huang, Xiaofeng Yang et al.
MeshFeat: Multi-Resolution Features for Neural Fields on Meshes
Mihir Mahajan, Florian Hofherr, Daniel Cremers
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias
Sanghyun Jo, Soohyun Ryu, Sungyub Kim et al.
DragAPart: Learning a Part-Level Motion Prior for Articulated Objects
Ruining Li, Chuanxia Zheng, Christian Rupprecht et al.
BeNeRF:Neural Radiance Fields from a Single Blurry Image and Event Stream
Wenpu Li, Pian Wan, Peng Wang et al.
Enhancing Optimization Robustness in 1-bit Neural Networks through Stochastic Sign Descent
NianHui Guo, Hong Guo, Christoph Meinel et al.
Learning to Unlearn for Robust Machine Unlearning
Mark HUANG, Lin Geng Foo, Jun Liu
Diff-Tracker: Text-to-Image Diffusion Models are Unsupervised Trackers
Zhengbo Zhang, Li Xu, Duo Peng et al.
Echoes of the Past: Boosting Long-tail Recognition via Reflective Learning
Qihao Zhao, YALUN DAI, Shen Lin et al.
OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models
Kong Zhe, Yong Zhang, Tianyu Yang et al.
Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models
Saman Motamed, Danda Pani Paudel, Luc Van Gool
Taming CLIP for Fine-grained and Structured Visual Understanding of Museum Exhibits
Ada-Astrid Balauca, Danda Paudel, Kristina Toutanova et al.
Visual Text Generation in the Wild
Yuanzhi Zhu, Jiawei Liu, Feiyu Gao et al.
Cross-Domain Learning for Video Anomaly Detection with Limited Supervision
Yashika Jain, Ali Dabouei, Min Xu
DreamMotion: Space-Time Self-Similar Score Distillation for Zero-Shot Video Editing
Hyeonho Jeong, Jinho Chang, GEON YEONG PARK et al.
A Unified Image Compression Method for Human Perception and Multiple Vision Tasks
Sha Guo, Sui Lin, Chen-Lin Zhang et al.
Domain-adaptive Video Deblurring via Test-time Blurring
Jin-Ting He, Fu-Jen Tsai, Jia-Hao Wu et al.
3DEgo: 3D Editing on the Go!
Umar Khalid, Hasan Iqbal, Azib Farooq et al.
Unleashing the Power of Prompt-driven Nucleus Instance Segmentation
Zhongyi Shui, Yunlong Zhang, Kai Yao et al.