ICLR "preference feedback" Papers
3 papers found
Finally Rank-Breaking Conquers MNL Bandits: Optimal and Efficient Algorithms for MNL Assortment
Aadirupa Saha, Pierre Gaillard
ICLR 2025poster
1
citations
Non-Stationary Dueling Bandits Under a Weighted Borda Criterion
Joe Suk, Arpit Agarwal
ICLR 2025posterarXiv:2403.12950
2
citations
Reward Learning from Multiple Feedback Types
Yannick Metz, Andras Geiszl, Raphaël Baur et al.
ICLR 2025posterarXiv:2502.21038
4
citations