Han Zhong

14

Papers

8

Total Citations

Papers (14)

BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning

A3S: A General Active Clustering Method with Pairwise Constraints

Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret

Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint

Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs

Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power

Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration

Posterior Sampling for Competitive RL: Function Approximation and Partial Observation

A Reduction-based Framework for Sequential Decision Making with Delayed Feedback

Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds

Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage

A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes