Rishabh Joshi

2

Papers

51

Total Citations

Papers (2)

RRM: Robust Reward Model Training Mitigates Reward Hacking

Learning from negative feedback, or positive feedback or both