by Tim Rocktaeschel Papers
4 papers found
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
Davide Paglieri, Bartłomiej Cupiał, Samuel Coward et al.
ICLR 2025poster
70
citations
Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
Laura Ruis, Maximilian Mozes, Juhan Bae et al.
ICLR 2025poster
H-GAP: Humanoid Control with a Generalist Planner
Zhengyao Jiang, Yingchen Xu, Nolan Wagener et al.
ICLR 2024spotlight
Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks
Samyak Jain, Robert Kirk, Ekdeep Singh Lubana et al.
ICLR 2024poster
89
citations