Dmitrii Krasheninnikov
4
papers
746
total citations
papers (4)
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
ICLR 2025arXiv
733
citations
Detecting High-Stakes Interactions with Activation Probes
NeurIPS 2025arXiv
13
citations
Implicit meta-learning may lead language models to trust more reliable sources
ICML 2024arXiv
0
citations
Defining and Characterizing Reward Gaming
NeurIPS 2022
0
citations