ICML 2024 "spatial-temporal reasoning" Papers
2 papers found
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Zongxin Yang, Guikun Chen, Xiaodi Li et al.
ICML 2024oralarXiv:2401.08392
Multi-Factor Adaptive Vision Selection for Egocentric Video Question Answering
Haoyu Zhang, Meng Liu, Zixin Liu et al.
ICML 2024oral