Ganqu Cui

6

Papers

704

Total Citations

Papers (6)

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Advancing LLM Reasoning Generalists with Preference Trees

TTRL: Test-Time Reinforcement Learning

NeurIPS 2025arXiv

RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

Scaling Physical Reasoning with the PHYSICS Dataset

ULTRAFEEDBACK: Boosting Language Models with Scaled AI Feedback