2025 Poster "training data attribution" Papers
2 papers found
Better Training Data Attribution via Better Inverse Hessian-Vector Products
Andrew Wang, Elisa Nguyen, Runshi Yang et al.
NEURIPS 2025posterarXiv:2507.14740
3
citations
Explainable Reinforcement Learning from Human Feedback to Improve Alignment
Shicheng Liu, Siyuan Xu, Wenjie Qiu et al.
NEURIPS 2025posterarXiv:2512.13837