Han Zhong

14
Papers
8
Total Citations

Papers (14)

BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning

ICML 2025
8
citations

A3S: A General Active Clustering Method with Pairwise Constraints

ICML 2024
0
citations

Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

ICML 2024
0
citations

Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret

ICML 2024
0
citations

Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond

ICML 2024
0
citations

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint

ICML 2024
0
citations

Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs

NeurIPS 2021
0
citations

Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power

NeurIPS 2022
0
citations

Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration

NeurIPS 2023
0
citations

Posterior Sampling for Competitive RL: Function Approximation and Partial Observation

NeurIPS 2023
0
citations

A Reduction-based Framework for Sequential Decision Making with Delayed Feedback

NeurIPS 2023
0
citations

Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds

NeurIPS 2023
0
citations

Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage

NeurIPS 2023
0
citations

A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes

NeurIPS 2023
0
citations