Haipeng Luo
38
Papers
775
Total Citations
Papers (38)
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
ICLR 2025
629
citations
Efficient Second Order Online Learning by Sketching
NeurIPS 2016arXiv
100
citations
Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits
NeurIPS 2016arXiv
43
citations
Contextual Linear Bandits with Delay as Payoff
ICML 2025
2
citations
Improved Bounds for Swap Multicalibration and Swap Omniprediction
NeurIPS 2025
1
citations
Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
CVPR 2023arXiv
0
citations
Online Gradient Boosting
NeurIPS 2015
0
citations
Fast Convergence of Regularized Learning in Games
NeurIPS 2015
0
citations
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition With Pre-Trained Vision-Language Models
CVPR 2023arXiv
0
citations
Efficient Contextual Bandits with Uninformed Feedback Graphs
ICML 2024
0
citations
ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints
ICML 2024
0
citations
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
ICML 2024
0
citations
Improved Regret and Contextual Linear Extension for Pandora's Box and Prophet Inequality
NeurIPS 2025arXiv
0
citations
Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback
NeurIPS 2022
0
citations
Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
NeurIPS 2022
0
citations
Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments
NeurIPS 2022
0
citations
Near-Optimal No-Regret Learning Dynamics for General Convex Games
NeurIPS 2022
0
citations
Practical Contextual Bandits with Feedback Graphs
NeurIPS 2023
0
citations
Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms
NeurIPS 2023
0
citations
Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback
NeurIPS 2023
0
citations
No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions
NeurIPS 2023
0
citations
Regret Matching+: (In)Stability and Fast Convergence in Games
NeurIPS 2023
0
citations
Optimal and Adaptive Algorithms for Online Boosting
ICML 2015
0
citations
Variance-Reduced and Projection-Free Stochastic Optimization
ICML 2016
0
citations
Practical Contextual Bandits with Regression Oracles
ICML 2018arXiv
0
citations
Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously
ICML 2019
0
citations
Efficient Online Portfolio with Logarithmic Regret
NeurIPS 2018
0
citations
Hypothesis Set Stability and Generalization
NeurIPS 2019
0
citations
Equipping Experts/Bandits with Long-term Memory
NeurIPS 2019
0
citations
Model Selection for Contextual Bandits
NeurIPS 2019
0
citations
Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs
NeurIPS 2020
0
citations
Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition
NeurIPS 2020
0
citations
Comparator-Adaptive Convex Bandits
NeurIPS 2020
0
citations
Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path
NeurIPS 2021
0
citations
Last-iterate Convergence in Extensive-Form Games
NeurIPS 2021
0
citations
The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition
NeurIPS 2021
0
citations
Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses
NeurIPS 2021
0
citations
Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games
NeurIPS 2022
0
citations