Zhengxuan Wu

5

Papers

100

Total Citations

Papers (5)

AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation

ZeroC: A Neuro-Symbolic Model for Zero-shot Concept Recognition and Acquisition at Inference Time

CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior

Interpretability at Scale: Identifying Causal Mechanisms in Alpaca