Peter Henderson
10
Papers
420
Total Citations
Papers (10)
Safety Alignment Should be Made More Than Just a Few Tokens Deep
ICLR 2025
277
citations
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
ICLR 2025arXiv
141
citations
Position: In-House Evaluation Is Not Enough. Towards Robust Third-Party Evaluation and Flaw Disclosure for General-Purpose AI
ICML 2025
2
citations
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
ICML 2024
0
citations
Position: A Safe Harbor for AI Evaluation and Red Teaming
ICML 2024
0
citations
Position: On the Societal Impact of Open Foundation Models
ICML 2024
0
citations
Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset
NeurIPS 2022
0
citations
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
NeurIPS 2023
0
citations
Cheaply Estimating Inference Efficiency Metrics for Autoregressive Transformer Models
NeurIPS 2023
0
citations
Separating value functions across time-scales
ICML 2019
0
citations