Tian Xu
4
Papers
47
Total Citations
Papers (4)
Preserving Diversity in Supervised Fine-Tuning of Large Language Models
ICLR 2025arXiv
33
citations
Reward-Consistent Dynamics Models are Strongly Generalizable for Offline Reinforcement Learning
ICLR 2024arXiv
14
citations
Limited Preference Aided Imitation Learning from Imperfect Demonstrations
ICML 2024
0
citations
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
ICML 2024arXiv
0
citations