Yunhao Tang
6
Papers
16
Total Citations
Papers (6)
Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards
NeurIPS 2025
13
citations
Learning Uncertainty-Aware Temporally-Extended Actions
AAAI 2024arXiv
3
citations
Nash Learning from Human Feedback
ICML 2024
0
citations
A Distributional Analogue to the Successor Representation
ICML 2024
0
citations
Generalized Preference Optimization: A Unified Approach to Offline Alignment
ICML 2024
0
citations
Human Alignment of Large Language Models through Online Preference Optimisation
ICML 2024
0
citations