2025 "group relative policy optimization" Papers

9 papers found

A2Seek: Towards Reasoning-Centric Benchmark for Aerial Anomaly Understanding

Mengjingcheng Mo, Xinyang Tong, Mingpi Tan et al.

NEURIPS 2025posterarXiv:2505.21962
2
citations

ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation

Lingfeng Wang, Hualing Lin, Senda Chen et al.

NEURIPS 2025posterarXiv:2505.16495
2
citations

CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models

Zhihang Lin, Mingbao Lin, Yuan Xie et al.

NEURIPS 2025posterarXiv:2503.22342
47
citations

DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

Jinyoung Park, Jeehye Na, Jinyoung Kim et al.

NEURIPS 2025posterarXiv:2506.07464
23
citations

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO

Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.

NEURIPS 2025posterarXiv:2505.17017
25
citations

Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning

Fanrui Zhang, Dian Li, Qiang Zhang et al.

NEURIPS 2025posterarXiv:2505.16836
4
citations

JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

Yunlong Lin, Zixu Lin, Kunjie Lin et al.

NEURIPS 2025posterarXiv:2506.17612
9
citations

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models

Hongbo Liu, Jingwen He, Yi Jin et al.

NEURIPS 2025posterarXiv:2506.21356
7
citations

VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank

Tianhe Wu, Jian Zou, Jie Liang et al.

NEURIPS 2025spotlightarXiv:2505.14460
30
citations