Haipeng Luo
7
Papers
632
Total Citations
Papers (7)
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
ICLR 2025
629
citations
Contextual Linear Bandits with Delay as Payoff
ICML 2025
2
citations
Improved Bounds for Swap Multicalibration and Swap Omniprediction
NeurIPS 2025
1
citations
Improved Regret and Contextual Linear Extension for Pandora's Box and Prophet Inequality
NeurIPS 2025arXiv
0
citations
Efficient Contextual Bandits with Uninformed Feedback Graphs
ICML 2024
0
citations
ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints
ICML 2024
0
citations
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
ICML 2024
0
citations