by Qing-Shan Jia Papers
3 papers found
CLARIFY: Contrastive Preference Reinforcement Learning for Untangling Ambiguous Queries
Ni Mu, Hao Hu, Xiao Hu et al.
ICML 2025posterarXiv:2506.00388
STAIR: Addressing Stage Misalignment through Temporal-Aligned Preference Reinforcement Learning
Yao Luan, Ni Mu, Yiqin Yang et al.
NeurIPS 2025oral
Query-Policy Misalignment in Preference-Based Reinforcement Learning
Xiao Hu, Jianxiong Li, Xianyuan Zhan et al.
ICLR 2024spotlightarXiv:2305.17400