"exploration-exploitation tradeoff" Papers
5 papers found
Feel-Good Thompson Sampling for Contextual Bandits: a Markov Chain Monte Carlo Showdown
Emile Anand, Sarah Liaw
NeurIPS 2025posterarXiv:2507.15290
1
citations
PlanU: Large Language Model Reasoning through Planning under Uncertainty
Ziwei Deng, Mian Deng, Chenjing Liang et al.
NeurIPS 2025posterarXiv:2510.18442
Entropy-Reinforced Planning with Large Language Models for Drug Discovery
Xuefeng Liu, Chih-chan Tien, Peng Ding et al.
ICML 2024poster
Optimal Batched Linear Bandits
Xuanfei Ren, Tianyuan Jin, Pan Xu
ICML 2024poster
Stochastic Bandits with ReLU Neural Networks
Kan Xu, Hamsa Bastani, Surbhi Goel et al.
ICML 2024poster