Alexander Bukharin

3

Papers

133

Total Citations

Papers (3)

HelpSteer2-Preference: Complementing Ratings with Preferences

HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages

Robust Multi-Agent Reinforcement Learning via Adversarial Regularization: Theoretical Foundation and Stable Algorithms