Tom Bewley

4

Papers

9

Total Citations

Papers (4)

Interpreting Language Reward Models via Contrastive Explanations

Representation Consistency for Accurate and Coherent LLM Answer Aggregation

NeurIPS 2025arXiv

To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models

Counterfactual Metarules for Local and Global Recourse