Zhengxuan Wu
5
Papers
100
Total Citations
Papers (5)
AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
ICML 2025
100
citations
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation
ICML 2024
0
citations
ZeroC: A Neuro-Symbolic Model for Zero-shot Concept Recognition and Acquisition at Inference Time
NeurIPS 2022
0
citations
CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior
NeurIPS 2022
0
citations
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
NeurIPS 2023
0
citations