In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation

0citations
PDF
0
Citations
#10
in ICML 2024
of 2635 papers
7
Authors
1
Data Points

Abstract

Large language models (LLMs) frequently hallucinate, e.g., making factual errors, yet our understanding of why they make these errors remains limited. In this study, we aim to understand the underlying mechanisms of LLM hallucinations from the perspective ofinner representations. We discover a pattern associated with hallucinations: correct generations tend to havesharpercontext activations in the hidden states of the in-context tokens, compared to that of the incorrect generations. Leveraging this signal, we propose an entropy-based metric to quantify thesharpnessamong the in-context hidden states and incorporate it into the decoding process, i.e, use the entropy value to adjust the next token prediction distribution to improve the factuality and overall quality of the generated text. Experiments on knowledge-seeking datasets (Natural Questions, HotpotQA, TriviaQA) and hallucination benchmark (TruthfulQA) demonstrate our consistent effectiveness, e.g., up to 8.6 absolute points on TruthfulQA. We believe this study can improve our understanding of hallucinations and serve as a practical solution for hallucination mitigation.

Citation History

Jan 28, 2026
0