Han Zhong

6

Papers

8

Total Citations

Papers (6)

BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning

A3S: A General Active Clustering Method with Pairwise Constraints

Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret

Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint