"policy updates" Papers
2 papers found
A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning
Yuzheng Hu, Fan Wu, Haotian Ye et al.
NeurIPS 2025oralarXiv:2505.19281
2
citations
Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling
Jiahao Wang, Weiye Xu, Aijun Yang et al.
NeurIPS 2025posterarXiv:2511.10648