Yunhao Tang

6

Papers

16

Total Citations

Papers (6)

Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards

Learning Uncertainty-Aware Temporally-Extended Actions

Nash Learning from Human Feedback

A Distributional Analogue to the Successor Representation

Generalized Preference Optimization: A Unified Approach to Offline Alignment

Human Alignment of Large Language Models through Online Preference Optimisation