He He
8
Papers
98
Total Citations
Papers (8)
Language Models Learn to Mislead Humans via RLHF
ICLR 2025arXiv
73
citations
A Credit Assignment Compiler for Joint Prediction
NeurIPS 2016arXiv
20
citations
Predicting Empirical AI Research Outcomes with Language Models
NeurIPS 2025
5
citations
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
ICML 2024
0
citations
Opponent Modeling in Deep Reinforcement Learning
ICML 2016
0
citations
IRM—when it works and when it doesn't: A test case of natural language inference
NeurIPS 2021
0
citations
SeqPATE: Differentially Private Text Generation via Knowledge Distillation
NeurIPS 2022
0
citations
Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples
NeurIPS 2023
0
citations