Han Zhong
14
Papers
8
Total Citations
Papers (14)
BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning
ICML 2025
8
citations
A3S: A General Active Clustering Method with Pairwise Constraints
ICML 2024
0
citations
Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment
ICML 2024
0
citations
Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret
ICML 2024
0
citations
Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond
ICML 2024
0
citations
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint
ICML 2024
0
citations
Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs
NeurIPS 2021
0
citations
Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power
NeurIPS 2022
0
citations
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration
NeurIPS 2023
0
citations
Posterior Sampling for Competitive RL: Function Approximation and Partial Observation
NeurIPS 2023
0
citations
A Reduction-based Framework for Sequential Decision Making with Delayed Feedback
NeurIPS 2023
0
citations
Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds
NeurIPS 2023
0
citations
Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage
NeurIPS 2023
0
citations
A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes
NeurIPS 2023
0
citations