NeurIPS Poster "reinforcement learning" Papers

50 papers found

$\texttt{G1}$: Teaching LLMs to Reason on Graphs with Reinforcement Learning

Xiaojun Guo, Ang Li, Yifei Wang et al.

NeurIPS 2025poster
4
citations

A Differential and Pointwise Control Approach to Reinforcement Learning

Minh Nguyen, Chandrajit Bajaj

NeurIPS 2025posterarXiv:2404.15617
1
citations

A Generalized Bisimulation Metric of State Similarity between Markov Decision Processes: From Theoretical Propositions to Applications

Zhenyu Tao, Wei Xu, Xiaohu You

NeurIPS 2025posterarXiv:2509.18714
2
citations

Continual Knowledge Adaptation for Reinforcement Learning

Jinwu Hu, ZiHao Lian, Zhiquan Wen et al.

NeurIPS 2025posterarXiv:2510.19314
1
citations

Cypher-RI: Reinforcement Learning for Integrating Schema Selection into Cypher Generation

Hanchen Su, Xuyuan Li, Yan Zhou et al.

NeurIPS 2025poster

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO

Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.

NeurIPS 2025posterarXiv:2505.17017
25
citations

Dynamic Configuration for Cutting Plane Separators via Reinforcement Learning on Incremental Graph

Mingxuan Ye, Jie Wang, Fangzhou et al.

NeurIPS 2025poster

Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games

Runyu Lu, Peng Zhang, Ruochuan Shi et al.

NeurIPS 2025posterarXiv:2511.00811
2
citations

EvolvedGRPO: Unlocking Reasoning in LVLMs via Progressive Instruction Evolution

Zhebei Shen, Qifan Yu, Juncheng Li et al.

NeurIPS 2025poster

From Kolmogorov to Cauchy: Shallow XNet Surpasses KANs

Xin Li, Xiaotao Zheng, Zhihong Xia

NeurIPS 2025poster

Globally Optimal Policy Gradient Algorithms for Reinforcement Learning with PID Control Policies

Vipul Sharma, Wesley Suttle, S Sivaranjani

NeurIPS 2025poster

GoalLadder: Incremental Goal Discovery with Vision-Language Models

Alexey Zakharov, Shimon Whiteson

NeurIPS 2025posterarXiv:2506.16396
1
citations

GraphChain: Large Language Models for Large-scale Graph Analysis via Tool Chaining

Chunyu Wei, Wenji Hu, Xingjia Hao et al.

NeurIPS 2025posterarXiv:2511.00457

HCRMP: An LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving

Zhiwen Chen, Hanming Deng, Zhuoren Li et al.

NeurIPS 2025posterarXiv:2505.15793
3
citations

Improving Monte Carlo Tree Search for Symbolic Regression

Zhengyao Huang, Daniel Huang, Tiannan Xiao et al.

NeurIPS 2025posterarXiv:2509.15929

Iterative Foundation Model Fine-Tuning on Multiple Rewards

Pouya M. Ghari, simone sciabola, Ye Wang

NeurIPS 2025posterarXiv:2511.00220

Janus-Pro-R1: Advancing Collaborative Visual Comprehension and Generation via Reinforcement Learning

Kaihang Pan, Yang Wu, Wendong Bu et al.

NeurIPS 2025posterarXiv:2506.01480
6
citations

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning

Xi Chen, Mingkang Zhu, Shaoteng Liu et al.

NeurIPS 2025posterarXiv:2506.22434

Modelling the control of offline processing with reinforcement learning

Eleanor Spens, Neil Burgess, Tim Behrens

NeurIPS 2025poster

MRO: Enhancing Reasoning in Diffusion Language Models via Multi-Reward Optimization

Chenglong Wang, Yang Gan, Hang Zhou et al.

NeurIPS 2025posterarXiv:2510.21473

Multi-Agent Collaboration via Evolving Orchestration

Yufan Dang, Chen Qian, Xueheng Luo et al.

NeurIPS 2025posterarXiv:2505.19591
25
citations

NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimation

Longtian Qiu, Shan Ning, Jiaxuan Sun et al.

NeurIPS 2025posterarXiv:2510.21122

Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning

Weidong Liu, Jiyuan Tu, Xi Chen et al.

NeurIPS 2025posterarXiv:2310.02581
5
citations

On the Convergence of Projected Policy Gradient for Any Constant Step Sizes

Jiacai Liu, Wenye Li, Dachao Lin et al.

NeurIPS 2025posterarXiv:2311.01104
4
citations

Optimal Dynamic Regret by Transformers for Non-Stationary Reinforcement Learning

Baiyuan Chen, Shinji Ito, Masaaki Imaizumi

NeurIPS 2025posterarXiv:2508.16027

OrbitZoo: Real Orbital Systems Challenges for Reinforcement Learning

Alexandre Oliveira, Katarina Dyreby, Francisco Caldas et al.

NeurIPS 2025posterarXiv:2504.04160

Parameter Efficient Fine-tuning via Explained Variance Adaptation

Fabian Paischer, Lukas Hauzenberger, Thomas Schmied et al.

NeurIPS 2025posterarXiv:2410.07170
4
citations

Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing

Yilmazcan Ozyurt, Tunaberk Almaci, Stefan Feuerriegel et al.

NeurIPS 2025posterarXiv:2507.11060

Progress Reward Model for Reinforcement Learning via Large Language Models

Xiuhui Zhang, Ning Gao, Xingyu Jiang et al.

NeurIPS 2025poster

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Mingjie Liu, Shizhe Diao, Ximing Lu et al.

NeurIPS 2025posterarXiv:2505.24864
99
citations

Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control

Zijie Xu, Tong Bu, Zecheng Hao et al.

NeurIPS 2025posterarXiv:2505.24161
3
citations

Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)

Zhenjie Yang, Xiaosong Jia, Qifeng Li et al.

NeurIPS 2025posterarXiv:2505.16394
18
citations

ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding

Yiyang Zhou, Yangfan He, Yaofeng Su et al.

NeurIPS 2025posterarXiv:2506.01300
28
citations

Real-World Reinforcement Learning of Active Perception Behaviors

Edward Hu, Jie Wang, Xingfang Yuan et al.

NeurIPS 2025posterarXiv:2512.01188

Reasoning as an Adaptive Defense for Safety

Taeyoun Kim, Fahim Tajwar, Aditi Raghunathan et al.

NeurIPS 2025posterarXiv:2507.00971
9
citations

Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models

Zemin Huang, Zhiyang Chen, Zijun Wang et al.

NeurIPS 2025posterarXiv:2505.10446
28
citations

ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

Mingyang Chen, Linzhuang Sun, Tianpeng Li et al.

NeurIPS 2025posterarXiv:2503.19470
56
citations

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Jorge (Zhoujun) Cheng, Shibo Hao, Tianyang Liu et al.

NeurIPS 2025posterarXiv:2506.14965
35
citations

RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation

Tianyi Yan, Wencheng Han, xia zhou et al.

NeurIPS 2025posterarXiv:2509.16500

Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics

Dongyoung Kim, Huiwon Jang, Sumin Park et al.

NeurIPS 2025posterarXiv:2506.00070
9
citations

Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning

Haozhen Zhang, Tao Feng, Jiaxuan You

NeurIPS 2025posterarXiv:2506.09033
9
citations

SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data

Wenkai Fang, Shunyu Liu, Yang Zhou et al.

NeurIPS 2025posterarXiv:2505.20347
19
citations

Solver-Informed RL: Grounding Large Language Models for Authentic Optimization Modeling

Yitian Chen, Jingfan Xia, Siyu Shao et al.

NeurIPS 2025posterarXiv:2505.11792
11
citations

SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond

Junteng Liu, Yuanxiang Fan, Jiang Zhuo et al.

NeurIPS 2025posterarXiv:2505.19641
21
citations

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Dongzhi JIANG, Ziyu Guo, Renrui Zhang et al.

NeurIPS 2025posterarXiv:2505.00703
91
citations

The Promise of RL for Autoregressive Image Editing

Saba Ahmadi, Rabiul Awal, Ankur Sikarwar et al.

NeurIPS 2025posterarXiv:2508.01119
2
citations

The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning

Xinyu Zhu, Mengzhou Xia, Zhepei Wei et al.

NeurIPS 2025posterarXiv:2506.01347
74
citations

Uni-RL: Unifying Online and Offline RL via Implicit Value Regularization

Haoran Xu, Liyuan Mao, Hui Jin et al.

NeurIPS 2025poster

Unlocking Multimodal Mathematical Reasoning via Process Reward Model

Ruilin Luo, Zhuofan Zheng, Lei Wang et al.

NeurIPS 2025posterarXiv:2501.04686
29
citations

WebDancer: Towards Autonomous Information Seeking Agency

Jialong Wu, Baixuan Li, Runnan Fang et al.

NeurIPS 2025posterarXiv:2505.22648
81
citations