Shuang Qiu
6
Papers
47
Total Citations
Papers (6)
Online Preference Alignment for Language Models via Count-based Exploration
ICLR 2025
19
citations
Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models
NeurIPS 2025
15
citations
Forward KL Regularized Preference Optimization for Aligning Diffusion Policies
AAAI 2025
7
citations
ROPO: Robust Preference Optimization for Large Language Models
ICML 2025
6
citations
Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment
ICML 2024
0
citations
Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning
ICML 2024
0
citations