"thompson sampling" Papers
17 papers found
Contextual Thompson Sampling via Generation of Missing Data
Kelly W Zhang, Tianhui Cai, Hongseok Namkoong et al.
Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling
Jasmine Bayrooti, Carl Ek, Amanda Prorok
Feel-Good Thompson Sampling for Contextual Bandits: a Markov Chain Monte Carlo Showdown
Emile Anand, Sarah Liaw
Generator-Mediated Bandits: Thompson Sampling for GenAI-Powered Adaptive Interventions
Marc Brooks, Gabriel Durham, Kihyuk Hong et al.
Geometry-Aware Approaches for Balancing Performance and Theoretical Guarantees in Linear Bandits
Yuwei Luo, Mohsen Bayati
No-Regret Thompson Sampling for Finite-Horizon Markov Decision Processes with Gaussian Processes
Jasmine Bayrooti, Sattar Vakili, Amanda Prorok et al.
Variance-Aware Feel-Good Thompson Sampling for Contextual Bandits
Xuheng Li, Quanquan Gu
$\mathtt{VITS}$ : Variational Inference Thompson Sampling for contextual bandits
Pierre Clavier, Tom Huix, Alain Oliviero Durmus
A Bayesian Approach to Online Planning
Nir Greshler, David Ben Eli, Carmel Rabinovitz et al.
Adaptive Anytime Multi-Agent Path Finding Using Bandit-Based Large Neighborhood Search
Thomy Phan, Taoan Huang, Bistra Dilkina et al.
Efficient Exploration for LLMs
Vikranth Dwaracherla, Seyed Mohammad Asghari, Botao Hao et al.
Feel-Good Thompson Sampling for Contextual Dueling Bandits
Xuheng Li, Heyang Zhao, Quanquan Gu
Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs
Tianyuan Jin, Hao-Lun Hsu, William Chang et al.
Posterior Sampling-Based Bayesian Optimization with Tighter Bayesian Regret Bounds
Shion Takeno, Yu Inatsu, Masayuki Karasuyama et al.
ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages
Andrew Jesson, Christopher Lu, Gunshi Gupta et al.
The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models
Jongyeong Lee, Chao-Kai Chiang, Masashi Sugiyama
Two-sided Competing Matching Recommendation Markets With Quota and Complementary Preferences Constraints
Yuantong Li, Guang Cheng, Xiaowu Dai