Most Cited ECCV "pseudo-supervision generation" Papers
2,387 papers found • Page 5 of 12
Conference
Event Camera Data Dense Pre-training
Yan Yang, Liyuan Pan, Liu liu
Just a Hint: Point-Supervised Camouflaged Object Detection
Huafeng Chen, Dian SHAO, Guangqian Guo et al.
PLOT: Text-based Person Search with Part Slot Attention for Corresponding Part Discovery
Jicheol Park, Dongwon Kim, Boseung Jeong et al.
EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head
Qianyun He, Xinya Ji, Yicheng Gong et al.
Visual Text Generation in the Wild
Yuanzhi Zhu, Jiawei Liu, Feiyu Gao et al.
Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance
Reyhane Askari Hemmat, Melissa Hall, Alicia Yi Sun et al.
HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts
Wonjae Kim, Sanghyuk Chun, Taekyung Kim et al.
Open Panoramic Segmentation
Junwei Zheng, Ruiping Liu, Yufan Chen et al.
FRDiff : Feature Reuse for Universal Training-free Acceleration of Diffusion Models
Junhyuk So, Jungwon Lee, Eunhyeok Park
Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos
Remy Sabathier, David Novotny, Niloy Mitra
Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Models
Hao Cheng, Erjia Xiao, Jindong Gu et al.
MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models
Nithin Gopalakrishnan Nair, Jeya Maria Jose Valanarasu, Vishal Patel
Learning Camouflaged Object Detection from Noisy Pseudo Label
Jin Zhang, Ruiheng Zhang, Yanjiao Shi et al.
SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation
Lingchen Meng, Shiyi Lan, Hengduo Li et al.
UniProcessor: A Text-induced Unified Low-level Image Processor
Huiyu Duan, Xiongkuo Min, Sijing Wu et al.
Reinforcement Learning Friendly Vision-Language Model for Minecraft
Haobin Jiang, Junpeng Yue, Hao Luo et al.
Temporal Event Stereo via Joint Learning with Stereoscopic Flow
Hoonhee Cho, Jae-young Kang, Kuk-Jin Yoon
Enhancing Source-Free Domain Adaptive Object Detection with Low-confidence Pseudo Label Distillation
Ilhoon Yoon, Hyeongjun Kwon, Jin Kim et al.
The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?
Qinyu Zhao, Ming Xu, Kartik Gupta et al.
Dynamic Retraining-Updating Mean Teacher for Source-Free Object Detection
BA KHANH TRINH LE, Huy-Hung Nguyen, Long Hoang Pham et al.
Instant 3D Human Avatar Generation using Image Diffusion Models
Nikos Kolotouros, Thiemo Alldieck, Enric Corona et al.
NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation
Ruikai Cui, Weizhe Liu, Weixuan Sun et al.
General and Task-Oriented Video Segmentation
Mu Chen, Liulei Li, Wenguan Wang et al.
Generalizable Facial Expression Recognition
Yuhang Zhang, Xiuqi Zheng, Chenyi Liang et al.
EntAugment: Entropy-Driven Adaptive Data Augmentation Framework for Image Classification
Suorong Yang, Furao Shen, Jian Zhao
CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection
Wuyang Li, Xinyu Liu, Jiayi Ma et al.
Controlling the World by Sleight of Hand
Sruthi Sudhakar, Ruoshi Liu, Basile Van Hoorick et al.
EvSign: Sign Language Recognition and Translation with Streaming Events
Pengyu Zhang, Hao Yin, Zeren Wang et al.
InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style Adviser
Xing Cui, Zekun Li, Peipei Li et al.
ScanTalk: 3D Talking Heads from Unregistered Scans
Federico Nocentini, Thomas Besnier, Claudio Ferrari et al.
MoEAD: A Parameter-efficient Model for Multi-class Anomaly Detection
Shiyuan Meng, Wenchao Meng, Qihang Zhou et al.
Track Everything Everywhere Fast and Robustly
Yunzhou Song, Jiahui Lei, Ziyun Wang et al.
TetraDiffusion: Tetrahedral Diffusion Models for 3D Shape Generation
Nikolai Kalischek, Torben Peters, Jan Dirk Wegner et al.
GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity
Shuo Cao, Yihao Liu, Wenlong Zhang et al.
Multi-Label Cluster Discrimination for Visual Representation Learning
Xiang An, Kaicheng Yang, Xiangzi Dai et al.
Mono-ViFI: A Unified Learning Framework for Self-supervised Single- and Multi-frame Monocular Depth Estimation
Jinfeng Liu, Lingtong Kong, Bo Li et al.
DGD: Dynamic 3D Gaussians Distillation
Isaac Labe, Noam Issachar, Itai Lang et al.
Kalman-Inspired Feature Propagation for Video Face Super-Resolution
Ruicheng Feng, Chongyi Li, Chen Change Loy
UniCode : Learning a Unified Codebook for Multimodal Large Language Models
Sipeng Zheng, Bohan Zhou, Yicheng Feng et al.
Foster Adaptivity and Balance in Learning with Noisy Labels
Mengmeng Sheng, Zeren Sun, Tao Chen et al.
Long-term Temporal Context Gathering for Neural Video Compression
Linfeng Qi, Zhaoyang Jia, Jiahao Li et al.
Non-Exemplar Domain Incremental Learning via Cross-Domain Concept Integration
Qiang Wang, Yuhang He, Songlin Dong et al.
Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression
Yuan Tian, Guo Lu, Guangtao Zhai
SIGMA: Sinkhorn-Guided Masked Video Modeling
Mohammadreza Salehi, Michael Dorkenwald, Fida Mohammad Thoker et al.
X-Pose: Detecting Any Keypoints
Jie Yang, AILING ZENG, Ruimao Zhang et al.
Adaptive Compressed Sensing with Diffusion-Based Posterior Sampling
Noam Elata, Tomer Michaeli, Michael Elad
FineMatch: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Hang Hua, Jing Shi, Kushal Kafle et al.
AttnZero: Efficient Attention Discovery for Vision Transformers
Lujun Li, Zimian Wei, Peijie Dong et al.
Referring Atomic Video Action Recognition
Kunyu Peng, Jia Fu, Kailun Yang et al.
Where am I? Scene Retrieval with Language
Jiaqi Chen, Daniel Barath, Iro Armeni et al.
MoVideo: Motion-Aware Video Generation with Diffusion Models
Jingyun Liang, Yuchen Fan, Kai Zhang et al.
Finding Visual Task Vectors
Alberto Hojel, Yutong Bai, Trevor Darrell et al.
Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment
Huangbiao Xu, Xiao Ke, Yuezhou Li et al.
Parrot Captions Teach CLIP to Spot Text
Yiqi Lin, Conghui He, Alex Jinpeng Wang et al.
Free-Editor: Zero-shot Text-driven 3D Scene Editing
Md Nazmul Karim, Hasan Iqbal, Umar Khalid et al.
Robust Multimodal Learning via Representation Decoupling
Shicai Wei, Yang Luo, Yuji Wang et al.
Tree-D Fusion: Simulation-Ready Tree Dataset from Single Images with Diffusion Priors
Jae Joong Lee, Bosheng Li, Sara Beery et al.
Reinforcement Learning Meets Visual Odometry
Nico Messikommer, Giovanni Cioffi, Mathias Gehrig et al.
Iterative Ensemble Training with Anti-Gradient Control for Mitigating Memorization in Diffusion Models
Xiao Liu, Xiaoliu Guan, Yu Wu et al.
Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments
Djamahl Etchegaray, Zi Helen Huang, Tatsuya Harada et al.
3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation
Zihao Xiao, Longlong Jing, Shangxuan Wu et al.
ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments
Taewoong Kim, Cheolhong Min, Byeonghwi Kim et al.
IMMA: Immunizing text-to-image Models against Malicious Adaptation
Amber Yijia Zheng, Raymond Yeh
LEIA: Latent View-invariant Embeddings for Implicit 3D Articulation
Archana Swaminathan, Anubhav Anubhav, Kamal Gupta et al.
ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction
Shaozhe Hao, Kai Han, Zhengyao Lv et al.
LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer
Ning Yu, Chia-Chih Chen, Zeyuan Chen et al.
ControlCap: Controllable Region-level Captioning
Yuzhong Zhao, Liu Yue, Zonghao Guo et al.
HUMOS: Human Motion Model Conditioned on Body Shape
Shashank Tripathi, Omid Taheri, Christoph Lassner et al.
Rate-Distortion-Cognition Controllable Versatile Neural Image Compression
Jinming Liu, Ruoyu Feng, Yunpeng Qi et al.
Norface: Improving Facial Expression Analysis by Identity Normalization
Hanwei Liu, Rudong An, Zhimeng Zhang et al.
RoDUS: Robust Decomposition of Static and Dynamic Elements in Urban Scenes
Thang-Anh-Quan Nguyen, Luis G Roldao Jimenez, Nathan Piasco et al.
PointNeRF++: A multi-scale, point-based Neural Radiance Field
Weiwei Sun, Eduard Trulls, Yang-Che Tseng et al.
Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL
Fangwei Zhong, Kui Wu, Hai Ci et al.
Neural Volumetric World Models for Autonomous Driving
Zanming Huang, Jimuyang Zhang, Eshed Ohn-Bar
Event-Adapted Video Super-Resolution
Zeyu Xiao, Dachun Kai, Yueyi Zhang et al.
Bridge Past and Future: Overcoming Information Asymmetry in Incremental Object Detection
QIJIE MO, Yipeng Gao, Shenghao Fu et al.
SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders
Sheng-Wei Li, Zi-Xiang Wei, Wei-Jie Jack Chen et al.
Grounding Language Models for Visual Entity Recognition
Zilin Xiao, Ming Gong, Paola Cascante-Bonilla et al.
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
Junyi Li, Junfeng Wu, Weizhi Zhao et al.
Representing Topological Self-Similarity Using Fractal Feature Maps for Accurate Segmentation of Tubular Structures
Jiaxing Huang, Yanfeng Zhou, Yaoru Luo et al.
ChEX: Interactive Localization and Region Description in Chest X-rays
Philip Müller, Georgios Kaissis, Daniel Rueckert
DreamMover: Leveraging the Prior of Diffusion Models for Image Interpolation with Large Motion
Liao Shen, Tianqi Liu, Huiqiang Sun et al.
ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders
Jefferson Hernandez, Ruben Villegas, Vicente Ordonez
Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models
Chen Ju, Haicheng Wang, Haozhe Cheng et al.
Learning Representations of Satellite Images From Metadata Supervision
Jules Bourcier, Gohar Dashyan, Karteek Alahari et al.
Rethinking Features-Fused-Pyramid-Neck for Object Detection
Hulin Li
How to Train the Teacher Model for Effective Knowledge Distillation
Shayan Mohajer Hamidi, Xizhen Deng, Renhao Tan et al.
MagicEraser: Erasing Any Objects via Semantics-Aware Control
FAN LI, Zixiao Zhang, Yi Huang et al.
Improving Feature Stability during Upsampling -- Spectral Artifacts and the Importance of Spatial Context
Shashank Agnihotri, Julia Grabinski, Margret Keuper
Projecting Points to Axes: Oriented Object Detection via Point-Axis Representation
Zeyang Zhao, Qilong Xue, Yifan Bai et al.
On the Utility of 3D Hand Poses for Action Recognition
Md Salman Shamil, Dibyadip Chatterjee, Fadime Sener et al.
Semi-supervised Segmentation of Histopathology Images with Noise-Aware Topological Consistency
Meilong Xu, Xiaoling Hu, Saumya Gupta et al.
Teddy: Efficient Large-Scale Dataset Distillation via Taylor-Approximated Matching
Ruonan Yu, Songhua Liu, Jingwen Ye et al.
3iGS: Factorised Tensorial Illumination for 3D Gaussian Splatting
Zhe Jun Tang, Tat-Jen Cham
Editable Image Elements for Controllable Synthesis
Jiteng Mu, Michael Gharbi, Richard Zhang et al.
Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation
Zhengyuan Xie, Haiquan Lu, Jia-wen Xiao et al.
Few-shot Defect Image Generation based on Consistency Modeling
Qingfeng Shi, Jing Wei, Fei Shen et al.
OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation
Kwanyoung Kim, Yujin Oh, Jong Chul Ye
Boosting 3D Single Object Tracking with 2D Matching Distillation and 3D Pre-training
qiangqiang wu, Yan Xia, Jia Wan et al.
BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation
Zekai Xu, Kang You, Qinghai Guo et al.
InstructGIE: Towards Generalizable Image Editing
Zichong Meng, Changdi Yang, Jun Liu et al.
Brain-ID: Learning Contrast-agnostic Anatomical Representations for Brain Imaging
Peirong Liu, Oula Puonti, Xiaoling Hu et al.
Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation
Peng Jin, Hao Li, Zesen Cheng et al.
Forest2Seq: Revitalizing Order Prior for Sequential Indoor Scene Synthesis
Qi Sun, Hang Zhou, Wengang Zhou et al.
Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image
Pengkun Jiao, Na Zhao, Jingjing Chen et al.
Fairness-aware Vision Transformer via Debiased Self-Attention
Yao Qiang, Chengyin Li, Prashant Khanduri et al.
Open-Vocabulary Camouflaged Object Segmentation
Youwei Pang, Xiaoqi Zhao, JiaMing Zuo et al.
Geospecific View Generation - Geometry-Context Aware High-resolution Ground View Inference from Satellite Views
Ningli Xu, Rongjun Qin
STAMP: Outlier-Aware Test-Time Adaptation with Stable Memory Replay
Yu Yongcan, Lijun Sheng, Ran He et al.
WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language
Zhenxiang Lin, Xidong Peng, peishan cong et al.
Towards Image Ambient Lighting Normalization
Florin-Alexandru Vasluianu, Tim Seizinger, Zongwei Wu et al.
Harnessing Text-to-Image Diffusion Models for Category-Agnostic Pose Estimation
Duo Peng, Zhengbo Zhang, Ping Hu et al.
ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer
Jiazhi Guan, Zhiliang Xu, Hang Zhou et al.
GENIXER: Empowering Multimodal Large Language Models as a Powerful Data Generator
Hengyuan Zhao, Pan Zhou, Mike Zheng Shou
BAFFLE: A Baseline of Backpropagation-Free Federated Learning
Haozhe Feng, Tianyu Pang, Chao Du et al.
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
Bowen Shi, Peisen Zhao, Zichen Wang et al.
TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks
Jinjie Mai, Wenxuan Zhu, Sara Rojas Martinez et al.
Can OOD Object Detectors Learn from Foundation Models?
Jiahui Liu, Xin Wen, Shizhen Zhao et al.
3DGazeNet: Generalizing Gaze Estimation with Weak Supervision from Synthetic Views
Evangelos Ververas, Polydefkis Gkagkos, Jiankang Deng et al.
Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse Weather
Junsung Park, Kyungmin Kim, Hyunjung Shim
MultiDelete for Multimodal Machine Unlearning
Jiali Cheng, Hadi Amiri
DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects
Dominik Bauer, Zhenjia Xu, Shuran Song
Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models
Taesup Kim, Donggeun Kim
Can Textual Semantics Mitigate Sounding Object Segmentation Preference?
Yaoting Wang, Peiwen Sun, Yuanchao Li et al.
Audio-driven Talking Face Generation with Stabilized Synchronization Loss
Dogucan Yaman, Fevziye Irem Eyiokur Yaman, Leonard Bärmann et al.
Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing
Vadim Titov, Madina Khalmatova, Alexandra Ivanova et al.
OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
Zijian Zhou, Zheng Zhu, Holger Caesar et al.
DiffuMatting: Synthesizing Arbitrary Objects with Matting-level Annotation
Xiaobin Hu, Xu Peng, Donghao Luo et al.
Every Pixel Has its Moments: Ultra-High-Resolution Unpaired Image-to-Image Translation via Dense Normalization
Ming-Yang Ho, Che-Ming Wu, Min-Sheng Wu et al.
SAMFusion: Sensor-Adaptive Multimodal Fusion for 3D Object Detection in Adverse Weather
Edoardo Palladin, Roland Dietze, Praveen Narayanan et al.
DualBEV: Unifying Dual View Transformation with Probabilistic Correspondences
Peidong Li, Wancheng Shen, Qihao Huang et al.
Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos
Akshay Paruchuri, Samuel Ehrenstein, Shuxian Wang et al.
ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention
Chenhang He, Ruihuang Li, Guowen Zhang et al.
CrossGLG: LLM Guides One-shot Skeleton-based 3D Action Recognition in a Cross-level Manner
Tingbing Yan, Wenzheng Zeng, Yang Xiao et al.
Self-Guided Generation of Minority Samples Using Diffusion Models
Soobin Um, Jong Chul Ye
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
Sara Sarto, Marcella Cornia, Lorenzo Baraldi et al.
Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation
Hyunwoo Yu, Yubin Cho, Beoungwoo Kang et al.
Training-Free Model Merging for Multi-target Domain Adaptation
Wenyi Li, Huan-ang Gao, Mingju Gao et al.
Adapting Fine-Grained Cross-View Localization to Areas without Fine Ground Truth
Zimin Xia, Yujiao Shi, HONGDONG LI et al.
SNeRV: Spectra-preserving Neural Representation for Video
Jina Kim, Jihoo Lee, Jewon Kang
Monocular Occupancy Prediction for Scalable Indoor Scenes
Hongxiao Yu, Yuqi Wang, Yuntao Chen et al.
DiffFAS: Face Anti-Spoofing via Generative Diffusion Models
Xinxu Ge, Xin Liu, Zitong Yu et al.
Which Model Generated This Image? A Model-Agnostic Approach for Origin Attribution
Fengyuan Liu, Haochen Luo, Yiming Li et al.
FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior
Zhekai Chen, Wen Wang, Zhen Yang et al.
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
Tanveer Hannan, Mohaiminul Islam, Thomas Seidl et al.
Global Counterfactual Directions
Bartlomiej Sobieski, Przemyslaw Biecek
Real Appearance Modeling for More General Deepfake Detection
Jiahe Tian, Yu Cai, Xi Wang et al.
PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance
Aoming Liu, Zhong Li, Zhang Chen et al.
Kernel Diffusion: An Alternate Approach to Blind Deconvolution
Yash Sanghvi, Yiheng Chi, Stanley Chan
C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
Rongchang Li, Zhenhua Feng, Tianyang Xu et al.
Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics
Woojin Cho, Jihyun Lee, Minjae Yi et al.
UpFusion: Novel View Diffusion from Unposed Sparse View Observations
Bharath Raj Nagoor Kani, Hsin-Ying Lee, Sergey Tulyakov et al.
Deciphering the Role of Representation Disentanglement: Investigating Compositional Generalization in CLIP Models
Reza Abbasi, Mohammad Rohban, Mahdieh Soleymani Baghshah
SINDER: Repairing the Singular Defects of DINOv2
Haoqi Wang, Tong Zhang, Mathieu Salzmann
CarFormer: Self-Driving with Learned Object-Centric Representations
Shadi Hamdan, Fatma Guney
DeTra: A Unified Model for Object Detection and Trajectory Forecasting
Sergio Casas, Ben T Agro, Jiageng Mao et al.
OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection
Jinghua Hou, Tong Wang, Xiaoqing Ye et al.
Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models
Jiaqi Xu, Mengyang Wu, Xiaowei Hu et al.
Mitigating Background Shift in Class-Incremental Semantic Segmentation
gilhan Park, WonJun Moon, SuBeen Lee et al.
Learning Video Context as Interleaved Multimodal Sequences
Qinghong Lin, Pengchuan Zhang, Difei Gao et al.
Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph
Zhengcen Li, Xinle Chang, Yueran Li et al.
UCIP: A Universal Framework for Compressed Image Super-Resolution using Dynamic Prompt
Xin Li, Bingchen Li, Yeying Jin et al.
∞-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions
Minh Quan Le, Alexandros Graikos, Srikar Yellapragada et al.
CamoTeacher: Dual-Rotation Consistency Learning for Semi-Supervised Camouflaged Object Detection
Xunfa Lai, Zhiyu Yang, Jie Hu et al.
Closed-Loop Unsupervised Representation Disentanglement with $\beta$-VAE Distillation and Diffusion Probabilistic Feedback
Xin Jin, Bohan Li, Baao Xie et al.
DPA-Net: Structured 3D Abstraction from Sparse Views via Differentiable Primitive Assembly
Fenggen Yu, Yiming Qian, Xu Zhang et al.
Unleashing the Potential of the Semantic Latent Space in Diffusion Models for Image Dehazing
Zizheng Yang, Hu Yu, Bing Li et al.
RICA^2: Rubric-Informed, Calibrated Assessment of Actions
Abrar Majeedi, Viswanatha Reddy Gajjala, Satya Sai Srinath Namburi GNVV et al.
Modeling and Driving Human Body Soundfields through Acoustic Primitives
Chao Huang, Dejan Markovic, Chenliang Xu et al.
COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation
Liu He, Daniel Aliaga
Multi-modal Crowd Counting via a Broker Modality
Haoliang Meng, Xiaopeng Hong, Chenhao Wang et al.
Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models
Saman Motamed, Danda Pani Paudel, Luc Van Gool
Bridging the Gap Between Human Motion and Action Semantics via Kinematics Phrases
Xinpeng Liu, Yong-Lu Li, AILING ZENG et al.
Eliminating Warping Shakes for Unsupervised Online Video Stitching
Lang Nie, Chunyu Lin, Kang Liao et al.
MICDrop: Masking Image and Depth Features via Complementary Dropout for Domain-Adaptive Semantic Segmentation
Linyan Yang, Lukas Hoyer, Mark Weber et al.
Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-Network Selection
Alireza Ganjdanesh, Yan Kang, Yuchen Liu et al.
Realistic Human Motion Generation with Cross-Diffusion Models
Zeping Ren, Shaoli Huang, Xiu Li
3x2: 3D Object Part Segmentation by 2D Semantic Correspondences
Anh Thai, Weiyao Wang, Hao Tang et al.
Learn to Preserve and Diversify: Parameter-Efficient Group with Orthogonal Regularization for Domain Generalization
Jiajun Hu, Jian Zhang, Lei Qi et al.
CMTA: Cross-Modal Temporal Alignment for Event-guided Video Deblurring
Taewoo Kim, Hoonhee Cho, Kuk-Jin Yoon
PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation
Shilin Yan, Xiaohao Xu, Renrui Zhang et al.
Benchmarking Spurious Bias in Few-Shot Image Classifiers
Guangtao Zheng, Wenqian Ye, Aidong Zhang
Towards Reliable Evaluation and Fast Training of Robust Semantic Segmentation Models
Francesco Croce, Naman D. Singh, Matthias Hein
The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization
Jiafeng Mao, Xueting Wang, Kiyoharu Aizawa
Multi-Sentence Grounding for Long-term Instructional Video
Zeqian Li, QIRUI CHEN, Tengda Han et al.
Temporally Consistent Stereo Matching
Jiaxi Zeng, Chengtang Yao, Yuwei Wu et al.
Contrastive ground-level image and remote sensing pre-training improves representation learning for natural world imagery
Andy V Huynh, Lauren Gillespie, Jael Lopez-Saucedo et al.
ReLoo: Reconstructing Humans Dressed in Loose Garments from Monocular Video in the Wild
Chen Guo, Tianjian Jiang, Manuel Kaufmann et al.
FutureDepth: Learning to Predict the Future Improves Video Depth Estimation
Rajeev Yasarla, Manish Kumar Singh, Hong Cai et al.
LabelDistill: Label-guided Cross-modal Knowledge Distillation for Camera-based 3D Object Detection
Sanmin Kim, Youngseok Kim, Sihwan Hwang et al.
DreamDiffusion: High-Quality EEG-to-Image Generation with Temporal Masked Signal Modeling and CLIP Alignment
Yunpeng Bai, Xintao Wang, Yanpei Cao et al.
Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation
Seonghoon Yu, Paul Hongsuck Seo, Jeany Son
Explorative Inbetweening of Time and Space
Haiwen Feng, Zheng Ding, Zhihao Xia et al.
Physical-Based Event Camera Simulator
Haiqian Han, Jiacheng Lyu, Jianing Li et al.
Weakly-supervised Camera Localization by Ground-to-satellite Image Registration
Yujiao Shi, HONGDONG LI, Akhil Perincherry et al.
Salience-Based Adaptive Masking: Revisiting Token Dynamics for Enhanced Pre-training
Hyesong Choi, Hyejin Park, Kwang Moo Yi et al.
Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
Xiaoyu Zhu, Hao Zhou, Pengfei Xing et al.
FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models
Wei WU, Qingnan Fan, Shuai Qin et al.
MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion
Lehong Wu, Lilang Lin, Jiahang Zhang et al.
TimeCraft: Navigate Weakly-Supervised Temporal Grounded Video Question Answering via Bi-directional Reasoning
Huabin Liu, Xiao Ma, Cheng Zhong et al.