"contextual bandits" Papers
25 papers found
Contextual Thompson Sampling via Generation of Missing Data
Kelly W Zhang, Tianhui Cai, Hongseok Namkoong et al.
Cross-Domain Off-Policy Evaluation and Learning for Contextual Bandits
Yuta Natsubori, Masataka Ushiku, Yuta Saito
Feel-Good Thompson Sampling for Contextual Bandits: a Markov Chain Monte Carlo Showdown
Emile Anand, Sarah Liaw
Geometry Meets Incentives: Sample-Efficient Incentivized Exploration with Linear Contexts
Ben Schiffer, Mark Sellke
MultiScale Contextual Bandits for Long Term Objectives
Richa Rastogi, Yuta Saito, Thorsten Joachims
Second Order Bounds for Contextual Bandits with Function Approximation
Aldo Pacchiano
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao, Chenlu Ye, Quanquan Gu et al.
Statistical Parity with Exponential Weights
Stephen Pasteris, Chris Hicks, Vasilios Mavroudis
True Impact of Cascade Length in Contextual Cascading Bandits
Hyun-jun Choi, Joongkyu Lee, Min-hwan Oh
Variance-Aware Feel-Good Thompson Sampling for Contextual Bandits
Xuheng Li, Quanquan Gu
$\mathtt{VITS}$ : Variational Inference Thompson Sampling for contextual bandits
Pierre Clavier, Tom Huix, Alain Oliviero Durmus
A Contextual Combinatorial Bandit Approach to Negotiation
Yexin Li, Zhancun Mu, Siyuan Qi
Adaptively Learning to Select-Rank in Online Platforms
Jingyuan Wang, Perry Dong, Ying Jin et al.
Borda Regret Minimization for Generalized Linear Dueling Bandits
Yue Wu, Tao Jin, Qiwei Di et al.
Efficient Contextual Bandits with Uninformed Feedback Graphs
Mengxiao Zhang, Yuheng Zhang, Haipeng Luo et al.
Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits
Jiabin Lin, Shana Moothedath, Namrata Vaswani
Federated Contextual Cascading Bandits with Asynchronous Communication and Heterogeneous Users
Hantao Yang, Xutong Liu, Zhiyong Wang et al.
High-dimensional Linear Bandits with Knapsacks
Wanteng Ma, Dong Xia, Jiashuo Jiang
Incentivized Learning in Principal-Agent Bandit Games
Antoine Scheid, Daniil Tiapkin, Etienne Boursier et al.
In-Context Reinforcement Learning for Variable Action Spaces
Viacheslav Sinii, Alexander Nikulin, Vladislav Kurenkov et al.
Low-Rank Bandits via Tight Two-to-Infinity Singular Subspace Recovery
Yassir Jedra, William Réveillard, Stefan Stojanovic et al.
More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning
Kaiwen Wang, Owen Oertell, Alekh Agarwal et al.
Prospective Side Information for Latent MDPs
Jeongyeol Kwon, Yonathan Efroni, Shie Mannor et al.
Randomized Confidence Bounds for Stochastic Partial Monitoring
Maxime Heuillet, Ola Ahmad, Audrey Durand
The Non-linear $F$-Design and Applications to Interactive Learning
Alekh Agarwal, Jian Qian, Alexander Rakhlin et al.