Hilde Kuehne
31
Papers
103
Total Citations
Papers (31)
Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
CVPR 2024
74
citations
LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity
ICCV 2025
24
citations
Teaching VLMs to Localize Specific Objects from In-context Examples
ICCV 2025
2
citations
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment
CVPR 2025
2
citations
Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks
CVPR 2025
1
citations
NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning
CVPR 2018arXiv
0
citations
Unsupervised Learning of Action Classes With Continuous Temporal Embedding
CVPR 2019
0
citations
Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules
CVPR 2021arXiv
0
citations
Everything at Once - Multi-Modal Fusion Transformer for Video Retrieval
CVPR 2022arXiv
0
citations
Unsupervised Domain Generalization by Learning a Bridge Across Domains
CVPR 2022arXiv
0
citations
Video Test-Time Adaptation for Action Recognition
CVPR 2023arXiv
0
citations
Detector-Free Weakly Supervised Grounding by Separation
ICCV 2021arXiv
0
citations
Generalized and Incremental Few-Shot Learning by Explicit Learning and Calibration Without Forgetting
ICCV 2021arXiv
0
citations
Multimodal Clustering Networks for Self-Supervised Learning From Unlabeled Videos
ICCV 2021arXiv
0
citations
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge
ICCV 2023arXiv
0
citations
Learning by Sorting: Self-supervised Learning with Group Ordering Constraints
ICCV 2023arXiv
0
citations
Preserving Modality Structure Improves Multi-Modal Learning
ICCV 2023arXiv
0
citations
In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
ICCV 2023
0
citations
CycDA: Unsupervised Cycle Domain Adaptation to Learn from Image to Video
ECCV 2022
0
citations
Weakly Supervised Grounding for VQA in Vision-Language Transformers
ECCV 2022
0
citations
Learning Situation Hyper-Graphs for Video Question Answering
CVPR 2023arXiv
0
citations
VideoGEM: Training-free Action Grounding in Videos
CVPR 2025arXiv
0
citations
What When and Where? Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
CVPR 2024
0
citations
Weakly Supervised Action Learning With RNN Based Fine-To-Coarse Modeling
CVPR 2017arXiv
0
citations
Action Sets: Weakly Supervised Action Segmentation Without Ordering Constraints
CVPR 2018arXiv
0
citations
More Is Less: Learning Efficient Video Representations by Big-Little Network and Depthwise Temporal Aggregation
NeurIPS 2019
0
citations
Learning with Algorithmic Supervision via Continuous Relaxations
NeurIPS 2021
0
citations
Deep Differentiable Logic Gate Networks
NeurIPS 2022
0
citations
How Transferable are Video Representations Based on Synthetic Data?
NeurIPS 2022
0
citations
Learning Human Action Recognition Representations Without Real Humans
NeurIPS 2023
0
citations
What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation
NeurIPS 2023
0
citations