2025 "reinforcement learning" Papers

177 papers found • Page 3 of 4

OrbitZoo: Real Orbital Systems Challenges for Reinforcement Learning

Alexandre Oliveira, Katarina Dyreby, Francisco Caldas et al.

NEURIPS 2025posterarXiv:2504.04160

Parameter Efficient Fine-tuning via Explained Variance Adaptation

Fabian Paischer, Lukas Hauzenberger, Thomas Schmied et al.

NEURIPS 2025posterarXiv:2410.07170
4
citations

Pareto Prompt Optimization

Guang Zhao, Byung-Jun Yoon, Gilchan Park et al.

ICLR 2025poster
1
citations

Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems

Christian Walder, Deep Tejas Karkhanis

NEURIPS 2025spotlightarXiv:2505.15201
21
citations

Periodic Skill Discovery

Jonghae Park, Daesol Cho, Jusuk Lee et al.

NEURIPS 2025oralarXiv:2511.03187

Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing

Yilmazcan Ozyurt, Tunaberk Almaci, Stefan Feuerriegel et al.

NEURIPS 2025posterarXiv:2507.11060

Policy Gradient with Kernel Quadrature

Tetsuro Morimura, Satoshi Hayakawa

ICLR 2025posterarXiv:2310.14768
1
citations

Preference Distillation via Value based Reinforcement Learning

Minchan Kwon, Junwon Ko, Kangil kim et al.

NEURIPS 2025posterarXiv:2509.16965

Progress Reward Model for Reinforcement Learning via Large Language Models

Xiuhui Zhang, Ning Gao, Xingyu Jiang et al.

NEURIPS 2025poster

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Mingjie Liu, Shizhe Diao, Ximing Lu et al.

NEURIPS 2025posterarXiv:2505.24864
99
citations

Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control

Zijie Xu, Tong Bu, Zecheng Hao et al.

NEURIPS 2025posterarXiv:2505.24161
3
citations

RAST: Reasoning Activation in LLMs via Small-model Transfer

Siru Ouyang, Xinyu Zhu, Zilin Xiao et al.

NEURIPS 2025posterarXiv:2506.15710
1
citations

Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)

Zhenjie Yang, Xiaosong Jia, Qifeng Li et al.

NEURIPS 2025posterarXiv:2505.16394
18
citations

ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding

Yiyang Zhou, Yangfan He, Yaofeng Su et al.

NEURIPS 2025posterarXiv:2506.01300
28
citations

Real-World Reinforcement Learning of Active Perception Behaviors

Edward Hu, Jie Wang, Xingfang Yuan et al.

NEURIPS 2025posterarXiv:2512.01188

Reasoning as an Adaptive Defense for Safety

Taeyoun Kim, Fahim Tajwar, Aditi Raghunathan et al.

NEURIPS 2025posterarXiv:2507.00971
9
citations

Reducing the Probability of Undesirable Outputs in Language Models Using Probabilistic Inference

Stephen Zhao, Aidan Li, Rob Brekelmans et al.

NEURIPS 2025posterarXiv:2510.21184

Reinforced Active Learning for Large-Scale Virtual Screening with Learnable Policy Model

Yicong Chen, Jiahua Rao, Jiancong Xie et al.

NEURIPS 2025poster

Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding

Hanyin Wang, Zhenbang Wu, Gururaj Kolar et al.

NEURIPS 2025spotlightarXiv:2505.21908
3
citations

Reinforcement Learning from Imperfect Corrective Actions and Proxy Rewards

Zhaohui JIANG, Xuening Feng, Paul Weng et al.

ICLR 2025posterarXiv:2410.05782
3
citations

Reinforcement Learning-Guided Data Selection via Redundancy Assessment

Suorong Yang, Peijia Li, Furao Shen et al.

ICCV 2025posterarXiv:2506.21037
1
citations

Reinforcement learning with combinatorial actions for coupled restless bandits

Lily Xu, Bryan Wilder, Elias Khalil et al.

ICLR 2025posterarXiv:2503.01919
5
citations

Reinforcement Learning with Imperfect Transition Predictions: A Bellman-Jensen Approach

Chenbei Lu, Zaiwei Chen, Tongxin Li et al.

NEURIPS 2025spotlightarXiv:2510.18687
1
citations

Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models

Zemin Huang, Zhiyang Chen, Zijun Wang et al.

NEURIPS 2025posterarXiv:2505.10446
28
citations

REM: A Scalable Reinforced Multi-Expert Framework for Multiplex Influence Maximization

Huyen Nguyen, Hieu Dam, Nguyen Hoang Khoi Do et al.

AAAI 2025paperarXiv:2501.00779
1
citations

Rendering-Aware Reinforcement Learning for Vector Graphics Generation

Juan Rodriguez, Haotian Zhang, Abhay Puri et al.

NEURIPS 2025posterarXiv:2505.20793
6
citations

ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

Mingyang Chen, Linzhuang Sun, Tianpeng Li et al.

NEURIPS 2025posterarXiv:2503.19470
56
citations

Retro-R1: LLM-based Agentic Retrosynthesis

Wei Liu, Jiangtao Feng, Hongli Yu et al.

NEURIPS 2025poster

Reverse Engineering Human Preferences with Reinforcement Learning

Lisa Alazraki, Yi-Chern Tan, Jon Ander Campos et al.

NEURIPS 2025spotlightarXiv:2505.15795

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Jorge (Zhoujun) Cheng, Shibo Hao, Tianyang Liu et al.

NEURIPS 2025posterarXiv:2506.14965
35
citations

REvolve: Reward Evolution with Large Language Models using Human Feedback

RISHI HAZRA, Alkis Sygkounas, Andreas Persson et al.

ICLR 2025posterarXiv:2406.01309
8
citations

RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation

Tianyi Yan, Wencheng Han, xia zhou et al.

NEURIPS 2025posterarXiv:2509.16500

Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics

Dongyoung Kim, Huiwon Jang, Sumin Park et al.

NEURIPS 2025posterarXiv:2506.00070
9
citations

Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning

Haozhen Zhang, Tao Feng, Jiaxuan You

NEURIPS 2025posterarXiv:2506.09033
9
citations

Safety Representations for Safer Policy Learning

Kaustubh Mani, Vincent Mai, Charlie Gauthier et al.

ICLR 2025posterarXiv:2502.20341
1
citations

SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning

Jiaqi Huang, Zunnan Xu, Jun Zhou et al.

NEURIPS 2025posterarXiv:2505.22596
8
citations

Scaling RL to Long Videos

Yukang Chen, Wei Huang, Baifeng Shi et al.

NEURIPS 2025posterarXiv:2507.07966
38
citations

Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation

Zilyu Ye, Zhiyang Chen, Tiancheng Li et al.

CVPR 2025posterarXiv:2412.01243
17
citations

Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models

Yiran Guo, Lijie Xu, Jie Liu et al.

NEURIPS 2025posterarXiv:2505.23564
15
citations

Self-Challenging Language Model Agents

Yifei Zhou, Sergey Levine, Jason Weston et al.

NEURIPS 2025posterarXiv:2506.01716
20
citations

Selftok-Zero: Reinforcement Learning for Visual Generation via Discrete and Autoregressive Visual Tokens

Bohan Wang, Mingze Zhou, Zhongqi Yue et al.

NEURIPS 2025poster

Semantic Temporal Abstraction via Vision-Language Model Guidance for Efficient Reinforcement Learning

Tian-Shuo Liu, Xu-Hui Liu, Ruifeng Chen et al.

ICLR 2025oral

Sequential Attention-based Sampling for Histopathological Analysis

Tarun Gogisetty, Naman Malpani, Gugan Chandrashekhar Mallika Thoppe et al.

NEURIPS 2025posterarXiv:2507.05077

SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data

Wenkai Fang, Shunyu Liu, Yang Zhou et al.

NEURIPS 2025posterarXiv:2505.20347
19
citations

Shift Before You Learn: Enabling Low-Rank Representations in Reinforcement Learning

Bastien Dubail, Stefan Stojanovic, Alexandre Proutiere

NEURIPS 2025spotlightarXiv:2509.05193

Solver-Informed RL: Grounding Large Language Models for Authentic Optimization Modeling

Yitian Chen, Jingfan Xia, Siyu Shao et al.

NEURIPS 2025posterarXiv:2505.11792
11
citations

SORREL: Suboptimal-Demonstration-Guided Reinforcement Learning for Learning to Branch

Shengyu Feng, Yiming Yang

AAAI 2025paperarXiv:2412.15534
5
citations

SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning

Jiaqi Chen, Bang Zhang, Ruotian Ma et al.

NEURIPS 2025posterarXiv:2504.19162
21
citations

SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

Peixian Ma, Xialie Zhuang, Chengjin Xu et al.

NEURIPS 2025posterarXiv:2504.08600
46
citations

Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation

Eliot Xing, Vernon Luk, Jean Oh

ICLR 2025posterarXiv:2412.12089
11
citations