"reward optimization" Papers
3 papers found
Alignment of Large Language Models with Constrained Learning
Botong Zhang, Shuo Li, Ignacio Hounie et al.
NeurIPS 2025posterarXiv:2505.19387
2
citations
Understanding Data Influence in Reinforcement Finetuning
Haoru Tan, Xiuzhe Wu, Sitong Wu et al.
NeurIPS 2025oral
GFlowNet Training by Policy Gradients
Puhua Niu, Shili Wu, Mingzhou Fan et al.
ICML 2024poster