Victor Veitch

4

Papers

1

Total Citations

Papers (4)

RATE: Causal Explainability of Reward Models with Imperfect Counterfactuals

On the Origins of Linear Representations in Large Language Models

The Linear Representation Hypothesis and the Geometry of Large Language Models

Transforming and Combining Rewards for Aligning Large Language Models