Josef Sivic
48
Papers
1,144
Total Citations
Papers (48)
Is Object Localization for Free? - Weakly-Supervised Learning With Convolutional Neural Networks
CVPR 2015
922
citations
Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions
ECCV 2020
192
citations
Learning to design protein-protein interactions with enhanced generalization
ICLR 2024
25
citations
Learning to engineer protein flexibility
ICLR 2025arXiv
4
citations
Improving Personalized Search with Regularized Low-Rank Parameter Updates
CVPR 2025
1
citations
24/7 Place Recognition by View Synthesis
CVPR 2015
0
citations
On Pairwise Costs for Network Flow Multi-Object Tracking
CVPR 2015
0
citations
Unsupervised Learning From Narrated Instruction Videos
CVPR 2016
0
citations
NetVLAD: CNN Architecture for Weakly Supervised Place Recognition
CVPR 2016
0
citations
ActionVLAD: Learning Spatio-Temporal Aggregation for Action Classification
CVPR 2017arXiv
0
citations
Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization?
CVPR 2017
0
citations
Convolutional Neural Network Architecture for Geometric Matching
CVPR 2017arXiv
0
citations
End-to-End Weakly-Supervised Semantic Alignment
CVPR 2018arXiv
0
citations
InLoc: Indoor Visual Localization With Dense Matching and View Synthesis
CVPR 2018arXiv
0
citations
Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions
CVPR 2018arXiv
0
citations
Cross-Task Weakly Supervised Learning From Instructional Videos
CVPR 2019
0
citations
D2-Net: A Trainable CNN for Joint Description and Detection of Local Features
CVPR 2019
0
citations
Estimating 3D Motion and Forces of Person-Object Interactions From Monocular Video
CVPR 2019
0
citations
End-to-End Learning of Visual Representations From Uncurated Instructional Videos
CVPR 2020arXiv
0
citations
Single-View Robot Pose and Joint Angle Estimation via Render & Compare
CVPR 2021arXiv
0
citations
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval With Transformers
CVPR 2021arXiv
0
citations
Look for the Change: Learning Object States and State-Modifying Actions From Untrimmed Web Videos
CVPR 2022
0
citations
Focal Length and Object Pose Estimation via Render and Compare
CVPR 2022
0
citations
TubeDETR: Spatio-Temporal Video Grounding With Transformers
CVPR 2022arXiv
0
citations
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
CVPR 2023arXiv
0
citations
Meta-Personalizing Vision-Language Models To Find Named Instances in Video
CVPR 2023
0
citations
Language-Guided Music Recommendation for Video via Prompt Analogies
CVPR 2023
0
citations
Joint Discovery of Object States and Manipulation Actions
ICCV 2017arXiv
0
citations
Weakly-Supervised Learning of Visual Relations
ICCV 2017arXiv
0
citations
Learning From Video and Text via Large-Scale Discriminative Clustering
ICCV 2017arXiv
0
citations
Localizing Moments in Video With Natural Language
ICCV 2017arXiv
0
citations
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
CVPR 2025
0
citations
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
ICCV 2019
0
citations
Is This the Right Place? Geometric-Semantic Pose Verification for Indoor Visual Localization
ICCV 2019
0
citations
Just Ask: Learning To Answer Questions From Millions of Narrated Videos
ICCV 2021arXiv
0
citations
Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions
ICCV 2021arXiv
0
citations
CosyPose: Consistent multi-view multi-object 6D pose estimation
ECCV 2020
0
citations
Learning Actionness via Long-range Temporal Order Verification
ECCV 2020
0
citations
Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via Cross-Modal Distillation
ECCV 2022
0
citations
Detecting Unseen Visual Relations Using Analogies
ICCV 2019
0
citations
Discovering Divergent Representations between Text-to-Image Models
ICCV 2025
0
citations
Large-scale Pre-training for Grounded Video Caption Generation
ICCV 2025
0
citations
ResidualViT for Efficient Temporally Dense Video Encoding
ICCV 2025
0
citations
GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos
CVPR 2024
0
citations
Neighbourhood Consensus Networks
NeurIPS 2018
0
citations
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
NeurIPS 2022
0
citations
VidChapters-7M: Video Chapters at Scale
NeurIPS 2023
0
citations
POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images
NeurIPS 2023
0
citations