Zhiwei He
4
Papers
29
Total Citations
Papers (4)
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
NeurIPS 2025
15
citations
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
ICLR 2025arXiv
14
citations
UAWTrack: Universal 3D Single Object Tracking in Adverse Weather
AAAI 2025
0
citations
Improving Open-Ended Text Generation via Adaptive Decoding
ICML 2024
0
citations