α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Josef Sivic
Josef Sivic
26
papers
228
total citations
papers (26)
Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions
ECCV 2020
192
citations
Learning to design protein-protein interactions with enhanced generalization
ICLR 2024
arXiv
25
citations
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
CVPR 2025
arXiv
6
citations
Learning to engineer protein flexibility
ICLR 2025
arXiv
4
citations
Improving Personalized Search with Regularized Low-Rank Parameter Updates
CVPR 2025
1
citations
Discovering Divergent Representations between Text-to-Image Models
ICCV 2025
arXiv
0
citations
Large-scale Pre-training for Grounded Video Caption Generation
ICCV 2025
arXiv
0
citations
ResidualViT for Efficient Temporally Dense Video Encoding
ICCV 2025
0
citations
GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos
CVPR 2024
0
citations
End-to-End Learning of Visual Representations From Uncurated Instructional Videos
CVPR 2020
arXiv
0
citations
Single-View Robot Pose and Joint Angle Estimation via Render & Compare
CVPR 2021
arXiv
0
citations
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval With Transformers
CVPR 2021
arXiv
0
citations
Look for the Change: Learning Object States and State-Modifying Actions From Untrimmed Web Videos
CVPR 2022
0
citations
Focal Length and Object Pose Estimation via Render and Compare
CVPR 2022
0
citations
TubeDETR: Spatio-Temporal Video Grounding With Transformers
CVPR 2022
arXiv
0
citations
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
CVPR 2023
arXiv
0
citations
Meta-Personalizing Vision-Language Models To Find Named Instances in Video
CVPR 2023
0
citations
Language-Guided Music Recommendation for Video via Prompt Analogies
CVPR 2023
0
citations
Just Ask: Learning To Answer Questions From Millions of Narrated Videos
ICCV 2021
arXiv
0
citations
Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions
ICCV 2021
arXiv
0
citations
CosyPose: Consistent multi-view multi-object 6D pose estimation
ECCV 2020
0
citations
Learning Actionness via Long-range Temporal Order Verification
ECCV 2020
0
citations
Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via Cross-Modal Distillation
ECCV 2022
0
citations
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
NeurIPS 2022
arXiv
0
citations
VidChapters-7M: Video Chapters at Scale
NeurIPS 2023
arXiv
0
citations
POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images
NeurIPS 2023
arXiv
0
citations