Hilde Kuehne

31
Papers
103
Total Citations

Papers (31)

Grounding Everything: Emerging Localization Properties in Vision-Language Transformers

CVPR 2024
74
citations

LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity

ICCV 2025
24
citations

Teaching VLMs to Localize Specific Objects from In-context Examples

ICCV 2025
2
citations

CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment

CVPR 2025
2
citations

Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks

CVPR 2025
1
citations

NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning

CVPR 2018arXiv
0
citations

Unsupervised Learning of Action Classes With Continuous Temporal Embedding

CVPR 2019
0
citations

Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules

CVPR 2021arXiv
0
citations

Everything at Once - Multi-Modal Fusion Transformer for Video Retrieval

CVPR 2022arXiv
0
citations

Unsupervised Domain Generalization by Learning a Bridge Across Domains

CVPR 2022arXiv
0
citations

Video Test-Time Adaptation for Action Recognition

CVPR 2023arXiv
0
citations

Detector-Free Weakly Supervised Grounding by Separation

ICCV 2021arXiv
0
citations

Generalized and Incremental Few-Shot Learning by Explicit Learning and Calibration Without Forgetting

ICCV 2021arXiv
0
citations

Multimodal Clustering Networks for Self-Supervised Learning From Unlabeled Videos

ICCV 2021arXiv
0
citations

MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge

ICCV 2023arXiv
0
citations

Learning by Sorting: Self-supervised Learning with Group Ordering Constraints

ICCV 2023arXiv
0
citations

Preserving Modality Structure Improves Multi-Modal Learning

ICCV 2023arXiv
0
citations

In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval

ICCV 2023
0
citations

CycDA: Unsupervised Cycle Domain Adaptation to Learn from Image to Video

ECCV 2022
0
citations

Weakly Supervised Grounding for VQA in Vision-Language Transformers

ECCV 2022
0
citations

Learning Situation Hyper-Graphs for Video Question Answering

CVPR 2023arXiv
0
citations

VideoGEM: Training-free Action Grounding in Videos

CVPR 2025arXiv
0
citations

What When and Where? Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions

CVPR 2024
0
citations

Weakly Supervised Action Learning With RNN Based Fine-To-Coarse Modeling

CVPR 2017arXiv
0
citations

Action Sets: Weakly Supervised Action Segmentation Without Ordering Constraints

CVPR 2018arXiv
0
citations

More Is Less: Learning Efficient Video Representations by Big-Little Network and Depthwise Temporal Aggregation

NeurIPS 2019
0
citations

Learning with Algorithmic Supervision via Continuous Relaxations

NeurIPS 2021
0
citations

Deep Differentiable Logic Gate Networks

NeurIPS 2022
0
citations

How Transferable are Video Representations Based on Synthetic Data?

NeurIPS 2022
0
citations

Learning Human Action Recognition Representations Without Real Humans

NeurIPS 2023
0
citations

What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation

NeurIPS 2023
0
citations