Tom Bewley
4
Papers
9
Total Citations
Papers (4)
Interpreting Language Reward Models via Contrastive Explanations
ICLR 2025
5
citations
Representation Consistency for Accurate and Coherent LLM Answer Aggregation
NeurIPS 2025arXiv
2
citations
To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models
ICML 2025
2
citations
Counterfactual Metarules for Local and Global Recourse
ICML 2024
0
citations