Most Cited ECCV "state-action occupancy" Papers
2,387 papers found • Page 11 of 12
Conference
Delving into Adversarial Robustness on Document Tampering Localization
Huiru Shao, Zhuang Qian, Kaizhu Huang et al.
SceneTeller: Language-to-3D Scene Generation
Basak Melis Ocal, Maxim Tatarchenko, Sezer Karaoglu et al.
MagMax: Leveraging Model Merging for Seamless Continual Learning
Daniel Marczak, Bartlomiej Twardowski, Tomasz Trzcinski et al.
Spline-based Transformers
Prashanth Chandran, Agon Serifi, Markus Gross et al.
Efficient NeRF Optimization - Not All Samples Remain Equally Hard
Juuso Korhonen, Goutham Rangu, Hamed Rezazadegan Tavakoli et al.
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
Xiangxiang Chu, Jianlin Su, Bo Zhang et al.
Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models
Taesup Kim, Donggeun Kim
Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation
Nina Weng, Paraskevas Pegios, Eike Petersen et al.
Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention
Zuyao Chen, Jinlin Wu, Zhen Lei et al.
FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation
Xinzhi MU, Li Chen, Bohan CHEN et al.
GroCo: Ground Constraint for Metric Self-Supervised Monocular Depth
Aurélien Cecille, Stefan Duffner, Franck DAVOINE et al.
MART: MultiscAle Relational Transformer Networks for Multi-agent Trajectory Prediction
Seongju Lee, Junseok Lee, Yeonguk Yu et al.
Distributed Active Client Selection With Noisy Clients Using Model Association Scores
Kwang In Kim
Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers
Ekaterina Grishina, Mikhail Gorbunov, Maxim Rakhuba
Deciphering the Role of Representation Disentanglement: Investigating Compositional Generalization in CLIP Models
Reza Abbasi, Mohammad Rohban, Mahdieh Soleymani Baghshah
A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars
Ronglai Zuo, Fangyun Wei, Zenggui Chen et al.
Unveiling Privacy Risks in Stochastic Neural Networks Training: Effective Image Reconstruction from Gradients
Yiming Chen, Xiangyu Yang, Nikos Deligiannis
Towards compact reversible image representations for neural style transfer
Xiyao Liu, Siyu Yang, Jian Zhang et al.
SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision
Ankit Vani, Bac Nguyen, Samuel Lavoie et al.
Learning Scalable Model Soup on a Single GPU: An Efficient Subspace Training Strategy
Tao Li, Weisen Jiang, Fanghui Liu et al.
Straightforward Layer-wise Pruning for More Efficient Visual Adaptation
Ruizi Han, Jinglei Tang
Dual-Decoupling Learning and Metric-Adaptive Thresholding for Semi-Supervised Multi-Label Learning
Jiahao Xiao, Ming-Kun Xie, Heng-Bo Fan et al.
SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation
Lingchen Meng, Shiyi Lan, Hengduo Li et al.
Handling The Non-Smooth Challenge in Tensor SVD: A Multi-Objective Tensor Recovery Framework
Jingjing Zheng, Wanglong Lu, Wenzhe Wang et al.
DoubleTake: Geometry Guided Depth Estimation
Mohamed Sayed, Filippo Aleotti, Jamie Watson et al.
Sur^2f: A Hybrid Representation for High-Quality and Efficient Surface Reconstruction from Multi-view Images
Zhangjin Huang, Zhihao Liang, Kui Jia
UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model
Xiangyu Fan, Jiaqi Li, Zhiqian Lin et al.
PartCraft: Crafting Creative Objects by Parts
Kam Woh Ng, Xiatian Zhu, Yi-Zhe Song et al.
Coarse-to-Fine Implicit Representation Learning for 3D Hand-Object Reconstruction from a Single RGB-D Image
Xingyu Liu, Pengfei Ren, Jingyu Wang et al.
AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation
Sun Yanan, Yanchen Liu, Yinhao Tang et al.
Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design
Li, zhihao shu, Jie Ji et al.
PRET: Planning with Directed Fidelity Trajectory for Vision and Language Navigation
Renjie Lu, Jing-Ke Meng, WEISHI ZHENG
Long-CLIP: Unlocking the Long-Text Capability of CLIP
Beichen Zhang, Pan Zhang, Xiaoyi Dong et al.
Learning with Counterfactual Explanations for Radiology Report Generation
Mingjie Li, Haokun Lin, Liang Qiu et al.
Pseudo-Embedding for Generalized Few-Shot Point Cloud Segmentation
Chih-Jung Tsai, Hwann-Tzong Chen, Tyng-Luh Liu
AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer
Zhuguanyu Wu, Jiaxin Chen, Hanwen Zhong et al.
Oulu Remote-photoplethysmography Physical Domain Attacks Database (ORPDAD)
Marko Savic, Guoying Zhao
Optimizing Illuminant Estimation in Dual-Exposure HDR Imaging
Mahmoud Afifi, Zhenhua Hu, Liang Liang
Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts
Shuangkang Fang, Yufeng Wang, Yi-Hsuan Tsai et al.
Embodied Understanding of Driving Scenarios
Yunsong Zhou, Linyan Huang, Qingwen Bu et al.
HybridBooth: Hybrid Prompt Inversion for Efficient Subject-Driven Generation
Shanyan Guan, Yanhao Ge, Ying Tai et al.
On the Viability of Monocular Depth Pre-training for Semantic Segmentation
DONG LAO, Fengyu Yang, Daniel Wang et al.
Weakly-supervised Camera Localization by Ground-to-satellite Image Registration
Yujiao Shi, HONGDONG LI, Akhil Perincherry et al.
ProtoComp: Diverse Point Cloud Completion with Controllable Prototype
Xumin Yu, Yanbo Wang, Jie Zhou et al.
Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture
Xuanchen Li, Yuhao Cheng, Xingyu Ren et al.
SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models
Ziyi Lin, Dongyang Liu, Renrui Zhang et al.
Diffusion Models as Data Mining Tools
Ioannis Siglidis, Aleksander Holynski, Alexei Efros et al.
SCPNet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning
Runmin Zhang, Jun Ma, Lun Luo et al.
Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection
Christos Koutlis, Symeon Papadopoulos
Seeing the Unseen: A Frequency Prompt Guided Transformer for Image Restoration
shihao zhou, Jinshan Pan, Jinglei Shi et al.
Animate Your Motion: Turning Still Images into Dynamic Videos
Mingxiao Li, Bo Wan, Marie-Francine Moens et al.
Spatial-Temporal Multi-level Association for Video Object Segmentation
Deshui Miao, Xin Li, Zhenyu He et al.
Adaptive Multi-head Contrastive Learning
Lei Wang, Piotr Koniusz, Tom Gedeon et al.
UniProcessor: A Text-induced Unified Low-level Image Processor
Huiyu Duan, Xiongkuo Min, Sijing Wu et al.
GRACE: Graph-Based Contextual Debiasing for Fair Visual Question Answering
Yifeng Zhang, Ming Jiang, Qi Zhao
Learning Chain of Counterfactual Thought for Bias-Robust Vision-Language Reasoning
Yifeng Zhang, Ming Jiang, Qi Zhao
Generalizing to Unseen Domains via Text-guided Augmentation
Daiqing Qi, Handong Zhao, Aidong Zhang et al.
On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines
Selim Kuzucu, Kemal Oksuz, Jonathan Sadeghi et al.
IVTP: Instruction-guided Visual Token Pruning for Large Vision-Language Models
Kai Huang, Hao Zou, Ye Xi et al.
Unsqueeze [CLS] Bottleneck to Learn Rich Representations
Qing Su, Shihao Ji
RCS-Prompt: Learning Prompt to Rearrange Class Space for Prompt-based Continual Learning
Longrong Yang, Hanbin Zhao, Yunlong Yu et al.
Dynamic Guidance Adversarial Distillation with Enhanced Teacher Knowledge
Hyejin Park, Dongbo Min
LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models
Yabin Zhang, Wenjie Zhu, Chenhang He et al.
Gaussian Grouping: Segment and Edit Anything in 3D Scenes
Mingqiao Ye, Martin Danelljan, Fisher Yu et al.
StructLDM: Structured Latent Diffusion for 3D Human Generation
Tao Hu, Fangzhou Hong, Ziwei Liu
High-Fidelity Modeling of Generalizable Wrinkle Deformation
Jingfan Guo, Jae Shin Yoon, Shunsuke Saito et al.
EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion
Guangyao Zhai, Evin Pınar Örnek, Dave Zhenyu Chen et al.
NeRF-XL: NeRF at Any Scale with Multi-GPU
Ruilong Li, Sanja Fidler, Angjoo Kanazawa et al.
Learning Neural Deformation Representation for 4D Dynamic Shape Generation
Gyojin Han, Jiwan Hur, Jaehyun Choi et al.
3D Hand Pose Estimation in Everyday Egocentric Images
Aditya Prakash, Ruisen Tu, Matthew Chang et al.
Towards Robust Event-based Networks for Nighttime via Unpaired Day-to-Night Event Translation
Yuhwan Jeong, Hoonhee Cho, Kuk-Jin Yoon
Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene
Ruiyang Zhang, Hu Zhang, Hang Yu et al.
Robust Nearest Neighbors for Source-Free Domain Adaptation under Class Distribution Shift
Antonio Tejero-de-Pablos, Riku Togashi, Mayu Otani et al.
Six-Point Method for Multi-Camera Systems with Reduced Solution Space
Banglei Guan, Ji Zhao, Laurent Kneip
Physics-informed Knowledge Transfer for Underwater Monocular Depth Estimation
Jinghe Yang, Mingming Gong, Ye Pu
Learning Equilibrium Transformation for Gamut Expansion and Color Restoration
JUN XIAO, Changjian Shui, Zhi-Song Liu et al.
COIN-Matting: Confounder Intervention for Image Matting
Zhaohe Liao, Jiangtong Li, Jun Lan et al.
FoundPose: Unseen Object Pose Estimation with Foundation Features
Evin Pınar Örnek, Yann Labbé, Bugra Tekin et al.
SimPB: A Single Model for 2D and 3D Object Detection from Multiple Cameras
Yingqi Tang, Zhaotie Meng, Guoliang Chen et al.
ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction
Shaozhe Hao, Kai Han, Zhengyao Lv et al.
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
Yanwei Li, Chengyao Wang, Jiaya Jia
DreamMotion: Space-Time Self-Similar Score Distillation for Zero-Shot Video Editing
Hyeonho Jeong, Jinho Chang, GEON YEONG PARK et al.
Agent3D-Zero: An Agent for Zero-shot 3D Understanding
Sha Zhang, Di Huang, Jiajun Deng et al.
Structured-NeRF: Hierarchical Scene Graph with Neural Representation
Zhide Zhong, Jiakai Cao, songen gu et al.
Robustness Preserving Fine-tuning using Neuron Importance
Guangrui Li, Rahul Duggal, Aaditya Singh et al.
DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency
Xiaojing Zhong, Xinyi Huang, Xiaofeng Yang et al.
MeshFeat: Multi-Resolution Features for Neural Fields on Meshes
Mihir Mahajan, Florian Hofherr, Daniel Cremers
DragAPart: Learning a Part-Level Motion Prior for Articulated Objects
Ruining Li, Chuanxia Zheng, Christian Rupprecht et al.
CLR-GAN: Improving GANs Stability and Quality via Consistent Latent Representation and Reconstruction
Shengke Sun, Ziqian Luan, Zhanshan Zhao et al.
Learning to Unlearn for Robust Machine Unlearning
Mark HUANG, Lin Geng Foo, Jun Liu
Taming CLIP for Fine-grained and Structured Visual Understanding of Museum Exhibits
Ada-Astrid Balauca, Danda Paudel, Kristina Toutanova et al.
Visual Text Generation in the Wild
Yuanzhi Zhu, Jiawei Liu, Feiyu Gao et al.
FreeInit: Bridging Initialization Gap in Video Diffusion Models
Tianxing Wu, Chenyang Si, Yuming Jiang et al.
Learning Quantized Adaptive Conditions for Diffusion Models
Yuchen Liang, Yuchuan Tian, Lei Yu et al.
Learn to Optimize Denoising Scores: A Unified and Improved Diffusion Prior for 3D Generation
Xiaofeng Yang, Yiwen Chen, Cheng Chen et al.
Rethinking Fast Adversarial Training: A Splitting Technique To Overcome Catastrophic Overfitting
Masoumeh Zareapoor, Pourya Shamsolmoali
Similarity of Neural Architectures using Adversarial Attack Transferability
Jaehui Hwang, Dongyoon Han, Byeongho Heo et al.
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
Dahyun Kang, Minsu Cho
CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches
Sifan Wu, Amir Hosein Khasahmadi, Mor Katz et al.
AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild
Junho Park, Kyeongbo Kong, Suk-Ju Kang
Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection
Hu Cao, Zehua Zhang, Yan Xia et al.
TimeLens-XL: Real-time Event-based Video Frame Interpolation with Large Motion
Shi Guo, Yutian Chen, Tianfan Xue et al.
Identity-Consistent Diffusion Network for Grading Knee Osteoarthritis Progression in Radiographic Imaging
Wenhua Wu, Kun Hu, Wenxi Yue et al.
Teach CLIP to Develop a Number Sense for Ordinal Regression
Yao DU, Qiang Zhai, Weihang Dai et al.
Compact 3D Scene Representation via Self-Organizing Gaussian Grids
Wieland Morgenstern, Florian Barthel, Anna Hilsmann et al.
Flash-Splat: 3D Reflection Removal with Flash Cues and Gaussian Splats
Mingyang Xie, Haoming Cai, Sachin Shah et al.
Instant Uncertainty Calibration of NeRFs Using a Meta-Calibrator
Niki Amini-Naieni, Tomas Jakab, Andrea Vedaldi et al.
SHIC: Shape-Image Correspondences with no Keypoint Supervision
Aleksandar Shtedritski, Christian Rupprecht, Andrea Vedaldi
Debiasing surgeon: fantastic weights and how to find them
Remi Nahon, Ivan Luiz De Moura Matos, Van-Tam Nguyen et al.
R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model
Changhoon Kim, Kyle Min, Yezhou Yang
Labeled Data Selection for Category Discovery
Bingchen Zhao, Nico Lang, Serge Belongie et al.
Mew: Multiplexed Immunofluorescence Image Analysis through an Efficient Multiplex Network
Sukwon Yun, Jie Peng, Alexandro E Trevino et al.
A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis
Xiang Liu, Zhaoxiang Liu, Huan Hu et al.
Understanding and Mitigating Human-Labelling Errors in Supervised Contrastive Learning
Zijun Long, Lipeng Zhuang, George W Killick et al.
MOD-UV: Learning Mobile Object Detectors from Unlabeled Videos
Yihong Sun, Bharath Hariharan
Leveraging scale- and orientation-covariant features for planar motion estimation
Marcus Valtonen Örnhag, Alberto Jaenal
WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation
Jiachen Lu, Ze Huang, Zeyu Yang et al.
EMIE-MAP: Large-Scale Road Surface Reconstruction Based on Explicit Mesh and Implicit Encoding
Wenhua Wu, Qi Wang, Guangming Wang et al.
HyperSpaceX: Radial and Angular Exploration of HyperSpherical Dimensions
Chiranjeev Chiranjeev, Muskan Dosi, Kartik Thakral et al.
Nonverbal Interaction Detection
Jianan Wei, Tianfei Zhou, Yi Yang et al.
PoseSOR: Human Pose Can Guide Our Attention
Huankang Guan, Rynson W.H. Lau
Generalized Coverage for More Robust Low-Budget Active Learning
Wonho Bae, Junhyug Noh, Danica J. Sutherland
Track Everything Everywhere Fast and Robustly
Yunzhou Song, Jiahui Lei, Ziyun Wang et al.
Common Sense Reasoning for Deep Fake Detection
Yue Zhang, Ben Colman, Xiao Guo et al.
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models
Samuele Poppi, Tobia Poppi, Federico Cocchi et al.
Idling Neurons, Appropriately Lenient Workload During Fine-tuning Leads to Better Generalization
Hongjing Niu, Hanting Li, Bin Li et al.
Improving image synthesis with diffusion-negative sampling
Alakh Desai, Nuno Vasconcelos
WRIM-Net: Wide-Ranging Information Mining Network for Visible-Infrared Person Re-Identification
Yonggan Wu, Ling-Chao Meng, Yuan Zichao et al.
CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing
Haibo Jin, Ruoxi Chen, Jinyin Chen et al.
DriveLM: Driving with Graph Visual Question Answering
Chonghao Sima, Katrin Renz, Kashyap Chitta et al.
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling
Siming Yan, Min Bai, Weifeng Chen et al.
Region-aware Distribution Contrast: A Novel Approach to Multi-Task Partially Supervised Learning
Meixuan Li, Tianyu Li, Guoqing Wang et al.
Deep Companion Learning: Enhancing Generalization Through Historical Consistency
Ruizhao Zhu, Venkatesh Saligrama
ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-agnostic Counting
Michael A Hobley, Victor Adrian Prisacariu
CrossScore: A Multi-View Approach to Image Evaluation and Scoring
Zirui Wang, Wenjing Bian, Victor Adrian Prisacariu
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
Yuanhong Chen, Chong Wang, Yuyuan Liu et al.
DiffClass: Diffusion-Based Class Incremental Learning
Zichong Meng, Jie Zhang, Changdi Yang et al.
Dual-Rain: Video Rain Removal using Assertive and Gentle Teachers
Tingting Chen, Beibei Lin, Yeying Jin et al.
DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing
Minghao Chen, Iro Laina, Andrea Vedaldi
Dynamic Neural Radiance Field From Defocused Monocular Video
Xianrui Luo, Huiqiang Sun, Juewen Peng et al.
4Diff: 3D-Aware Diffusion Model for Third-to-First Viewpoint Translation
Feng Cheng, Mi Luo, Huiyu Wang et al.
Chronologically Accurate Retrieval for Temporal Grounding of Motion-Language Models
Kent Fujiwara, Mikihiro Tanaka, Qing Yu
Realistic Human Motion Generation with Cross-Diffusion Models
Zeping Ren, Shaoli Huang, Xiu Li
Diffusion Models as Optimizers for Efficient Planning in Offline RL
Renming Huang, Yunqiang Pei, Guoqing Wang et al.
MERLiN: Single-Shot Material Estimation and Relighting for Photometric Stereo
Ashish Tiwari, Satoshi Ikehata, Shanmuganathan Raman
BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation
Hee Suk Yoon, Eunseop Yoon, Joshua Tian Jin Tee et al.
Rethinking Few-shot Class-incremental Learning: Learning from Yourself
Yu-Ming Tang, Yi-Xing Peng, Jing-Ke Meng et al.
Attention Decomposition for Cross-Domain Semantic Segmentation
Liqiang He, Sinisa Todorovic
RoGUENeRF: A Robust Geometry-Consistent Universal Enhancer for NeRF
Sibi Catley-Chandar, Richard Shaw, Greg Slabaugh et al.
FuseTeacher: Modality-fused Encoders are Strong Vision Supervisors
Chen-Wei Xie, Siyang Sun, Liming Zhao et al.
MVDD: Multi-View Depth Diffusion Models
Zhen Wang, Qiangeng Xu, Feitong Tan et al.
Wavelet Convolutions for Large Receptive Fields
Shahaf Finder, Roy Amoyal, Eran Treister et al.
Gradient-based Out-of-Distribution Detection
Taha Entesari, Sina Sharifi, Bardia Safaei et al.
Veil Privacy on Visual Data: Concealing Privacy for Humans, Unveiling for DNNs
Shuchao Pang, Ruhao Ma, Bing Li et al.
FYI: Flip Your Images for Dataset Distillation
Byunggwan Son, Youngmin Oh, Donghyeon Baek et al.
SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow
Yuanzhi Zhu, Xingchao Liu, Qiang Liu
Simple Unsupervised Knowledge Distillation With Space Similarity
Aditya Singh, Haohan Wang
Efficient Vision Transformers with Partial Attention
Xuan-Thuy Vo, Duy-Linh Nguyen, Adri Priadana et al.
Learning Natural Consistency Representation for Face Forgery Video Detection
Daichi Zhang, Zihao Xiao, Shikun Li et al.
Towards Stable 3D Object Detection
Jiabao Wang, Qiang Meng, Guochao Liu et al.
Revisit Human-Scene Interaction via Space Occupancy
Xinpeng Liu, Haowen Hou, Yanchao Yang et al.
View-Consistent 3D Editing with Gaussian Splatting
Yuxuan Wang, Xuanyu Yi, Zike Wu et al.
HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban Scenes
Zhuopeng Li, Yilin Zhang, Chenming Wu et al.
Generating Human Interaction Motions in Scenes with Text Control
Hongwei Yi, Justus Thies, Michael J. Black et al.
Multi-branch Collaborative Learning Network for 3D Visual Grounding
Zhipeng Qian, Yiwei Ma, Zhekai Lin et al.
KeypointDETR: An End-to-End 3D Keypoint Detector
Hairong Jin, Yuefan Shen, Jianwen Lou et al.
Instruction Tuning-free Visual Token Complement for Multimodal LLMs
Dongsheng Wang, Jiequan Cui, Miaoge Li et al.
Improving Point-based Crowd Counting and Localization Based on Auxiliary Point Guidance
I-HSIANG CHEN, Wei-Ting Chen, Yu-Wei Liu et al.
Online Temporal Action Localization with Memory-Augmented Transformer
Youngkil Song, Dongkeun Kim, Minsu Cho et al.
Bayesian Self-Training for Semi-Supervised 3D Segmentation
Ozan Unal, Christos Sakaridis, Luc Van Gool
SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer
Zijie Wu, Chaohui Yu, Yanqin Jiang et al.
Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
Xiaoshi Wu, Yiming Hao, Manyuan Zhang et al.
Fully Authentic Visual Question Answering Dataset from Online Communities
Chongyan Chen, Mengchen Liu, Noel C Codella et al.
Revisit Self-supervision with Local Structure-from-Motion
Shengjie Zhu, Xiaoming Liu
Open-Vocabulary Camouflaged Object Segmentation
Youwei Pang, Xiaoqi Zhao, JiaMing Zuo et al.
Revisiting Feature Disentanglement Strategy in Diffusion Training and Breaking Conditional Independence Assumption in Sampling
Wonwoong Cho, Hareesh Ravi, Midhun Harikumar et al.
TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation
Yufei Liu, Junwei Zhu, Junshu Tang et al.
GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection
Ziying Song, Lei Yang, Shaoqing Xu et al.
EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis
Shuai Tan, Bin Ji, Mengxiao Bi et al.
Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos
Mi Luo, Zihui Xue, Alex Dimakis et al.
SRPose: Two-view Relative Pose Estimation with Sparse Keypoints
Rui Yin, Yulun Zhang, Zherong Pan et al.
LivePhoto: Real Image Animation with Text-guided Motion Control
Xi Chen, Zhiheng Liu, Mengting Chen et al.
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
Wendi Zheng, Jiayan Teng, Zhuoyi Yang et al.
OAPT: Offset-Aware Partition Transformer for Double JPEG Artifacts Removal
Qiao Mo, Yukang Ding, Jinhua Hao et al.
Context-Aware Action Recognition: Introducing a Comprehensive Dataset for Behavior Contrast
Tatsuya Sasaki, Yoshiki Ito, Satoshi Kondo
Privacy-Preserving Adaptive Re-Identification without Image Transfer
Hamza Rami, Jhony H. Giraldo, Nicolas Winckler et al.
FRI-Net: Floorplan Reconstruction via Room-wise Implicit Representation
Honghao Xu, Juzhan Xu, Zeyu Huang et al.
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
Jinbo Xing, Menghan Xia, Yong Zhang et al.
Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors
Tongkun Guan, Wei Shen, Xue Yang et al.
Motion Aware Event Representation-driven Image Deblurring
Zhijing Sun, Xueyang Fu, Longzhuo Huang et al.
OLAF: A Plug-and-Play Framework for Enhanced Multi-object Multi-part Scene Parsing
Pranav Gupta, Rishubh Singh, Pradeep Shenoy et al.
Let the Avatar Talk using Texts without Paired Training Data
Xiuzhe Wu, Yang-Tian Sun, Handi Chen et al.
Uncertainty Calibration with Energy Based Instance-wise Scaling in the Wild Dataset
Mijoo Kim, Junseok Kwon
Attention Beats Linear for Fast Implicit Neural Representation Generation
Shuyi Zhang, Ke Liu, Jingjun Gu et al.
Prompt-Based Test-Time Real Image Dehazing: A Novel Pipeline
Zixuan Chen, Zewei He, Ziqian Lu et al.
Category Adaptation Meets Projected Distillation in Generalized Continual Category Discovery
Grzegorz Rypesc, Daniel Marczak, Sebastian Cygert et al.
3D Hand Sequence Recovery from Real Blurry Images and Event Stream
Joonkyu Park, Gyeongsik Moon, Weipeng Xu et al.
SG-NeRF: Neural Surface Reconstruction with Scene Graph Optimization
Yiyang Chen, Siyan Dong, Xulong Wang et al.
Segmentation-guided Layer-wise Image Vectorization with Gradient Fills
Hengyu Zhou, Hui Zhang, Bin Wang
ST-LDM: A Universal Framework for Text-Grounded Object Generation in Real Images
Xiangtian Xue, Jiasong Wu, Youyong Kong et al.