Haipeng Luo

38
Papers
775
Total Citations

Papers (38)

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

ICLR 2025
629
citations

Efficient Second Order Online Learning by Sketching

NeurIPS 2016arXiv
100
citations

Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits

NeurIPS 2016arXiv
43
citations

Contextual Linear Bandits with Delay as Payoff

ICML 2025
2
citations

Improved Bounds for Swap Multicalibration and Swap Omniprediction

NeurIPS 2025
1
citations

Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

CVPR 2023arXiv
0
citations

Online Gradient Boosting

NeurIPS 2015
0
citations

Fast Convergence of Regularized Learning in Games

NeurIPS 2015
0
citations

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition With Pre-Trained Vision-Language Models

CVPR 2023arXiv
0
citations

Efficient Contextual Bandits with Uninformed Feedback Graphs

ICML 2024
0
citations

ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints

ICML 2024
0
citations

Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback

ICML 2024
0
citations

Improved Regret and Contextual Linear Extension for Pandora's Box and Prophet Inequality

NeurIPS 2025arXiv
0
citations

Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback

NeurIPS 2022
0
citations

Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback

NeurIPS 2022
0
citations

Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments

NeurIPS 2022
0
citations

Near-Optimal No-Regret Learning Dynamics for General Convex Games

NeurIPS 2022
0
citations

Practical Contextual Bandits with Feedback Graphs

NeurIPS 2023
0
citations

Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms

NeurIPS 2023
0
citations

Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback

NeurIPS 2023
0
citations

No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions

NeurIPS 2023
0
citations

Regret Matching+: (In)Stability and Fast Convergence in Games

NeurIPS 2023
0
citations

Optimal and Adaptive Algorithms for Online Boosting

ICML 2015
0
citations

Variance-Reduced and Projection-Free Stochastic Optimization

ICML 2016
0
citations

Practical Contextual Bandits with Regression Oracles

ICML 2018arXiv
0
citations

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously

ICML 2019
0
citations

Efficient Online Portfolio with Logarithmic Regret

NeurIPS 2018
0
citations

Hypothesis Set Stability and Generalization

NeurIPS 2019
0
citations

Equipping Experts/Bandits with Long-term Memory

NeurIPS 2019
0
citations

Model Selection for Contextual Bandits

NeurIPS 2019
0
citations

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs

NeurIPS 2020
0
citations

Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition

NeurIPS 2020
0
citations

Comparator-Adaptive Convex Bandits

NeurIPS 2020
0
citations

Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path

NeurIPS 2021
0
citations

Last-iterate Convergence in Extensive-Form Games

NeurIPS 2021
0
citations

The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition

NeurIPS 2021
0
citations

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

NeurIPS 2021
0
citations

Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games

NeurIPS 2022
0
citations