Poster "reinforcement learning" Papers

220 papers found • Page 2 of 5

Improving Monte Carlo Tree Search for Symbolic Regression

Zhengyao Huang, Daniel Huang, Tiannan Xiao et al.

NEURIPS 2025arXiv:2509.15929

Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

Yinlam Chow, Guy Tennenholtz, Izzeddin Gur et al.

ICLR 2025arXiv:2412.15287
49
citations

Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models

Cong Lu, Shengran Hu, Jeff Clune

ICLR 2025arXiv:2405.15143
27
citations

Iterative Foundation Model Fine-Tuning on Multiple Rewards

Pouya M. Ghari, simone sciabola, Ye Wang

NEURIPS 2025arXiv:2511.00220

Janus-Pro-R1: Advancing Collaborative Visual Comprehension and Generation via Reinforcement Learning

Kaihang Pan, Yang Wu, Wendong Bu et al.

NEURIPS 2025arXiv:2506.01480
7
citations

Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks

Michael Matthews, Michael Beukman, Chris Lu et al.

ICLR 2025arXiv:2410.23208
21
citations

Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning

Seanie Lee, Minsu Kim, Lynn Cherif et al.

ICLR 2025arXiv:2405.18540
47
citations

Learning mirror maps in policy mirror descent

Carlo Alfano, Sebastian Towers, Silvia Sapora et al.

ICLR 2025arXiv:2402.05187
2
citations

Learning to Clean: Reinforcement Learning for Noisy Label Correction

Marzi Heidari, Hanping Zhang, Yuhong Guo

NEURIPS 2025arXiv:2511.19808

Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration

Max Wilcoxson, Qiyang Li, Kevin Frans et al.

ICML 2025arXiv:2410.18076
7
citations

LICORICE: Label-Efficient Concept-Based Interpretable Reinforcement Learning

Zhuorui Ye, Stephanie Milani, Geoff Gordon et al.

ICLR 2025arXiv:2407.15786
5
citations

MallowsPO: Fine-Tune Your LLM with Preference Dispersions

Haoxian Chen, Hanyang Zhao, Henry Lam et al.

ICLR 2025arXiv:2405.14953
15
citations

MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility

Wayne Wu, Honglin He, Jack He et al.

ICLR 2025arXiv:2407.08725
11
citations

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning

Xi Chen, Mingkang Zhu, Shaoteng Liu et al.

NEURIPS 2025arXiv:2506.22434

Modelling the control of offline processing with reinforcement learning

Eleanor Spens, Neil Burgess, Tim Behrens

NEURIPS 2025

MRO: Enhancing Reasoning in Diffusion Language Models via Multi-Reward Optimization

Chenglong Wang, Yang Gan, Hang Zhou et al.

NEURIPS 2025arXiv:2510.21473
1
citations

Multi-Agent Collaboration via Evolving Orchestration

Yufan Dang, Chen Qian, Xueheng Luo et al.

NEURIPS 2025arXiv:2505.19591
35
citations

MURKA: Multi-Reward Reinforcement Learning with Knowledge Alignment for Optimization Tasks

WANTONG XIE, Yi-Xiang Hu, Jieyang Xu et al.

NEURIPS 2025

Neural Motion Simulator Pushing the Limit of World Models in Reinforcement Learning

Chenjie Hao, Weyl Lu, Yifan Xu et al.

CVPR 2025arXiv:2504.07095
5
citations

Neuroplastic Expansion in Deep Reinforcement Learning

Jiashun Liu, Johan S Obando Ceron, Aaron Courville et al.

ICLR 2025arXiv:2410.07994
13
citations

NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimation

Longtian Qiu, Shan Ning, Jiaxuan Sun et al.

NEURIPS 2025arXiv:2510.21122
1
citations

Normalizing Flows are Capable Models for Continuous Control

Raj Ghugare, Benjamin Eysenbach

NEURIPS 2025

Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning

Weidong Liu, Jiyuan Tu, Xi Chen et al.

NEURIPS 2025arXiv:2310.02581
5
citations

Online Reinforcement Learning in Non-Stationary Context-Driven Environments

Pouya Hamadanian, Arash Nasr-Esfahany, Malte Schwarzkopf et al.

ICLR 2025arXiv:2302.02182
3
citations

Online-to-Offline RL for Agent Alignment

Xu Liu, Haobo Fu, Stefano V. Albrecht et al.

ICLR 2025

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback

Marcus Williams, Micah Carroll, Adhyyan Narang et al.

ICLR 2025arXiv:2411.02306
43
citations

On the Convergence of Projected Policy Gradient for Any Constant Step Sizes

Jiacai Liu, Wenye Li, Dachao Lin et al.

NEURIPS 2025arXiv:2311.01104
4
citations

On the Sample Complexity of Differentially Private Policy Optimization

Yi He, Xingyu Zhou

NEURIPS 2025arXiv:2510.21060

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Jingcheng Hu, Yinmin Zhang, Qi Han et al.

NEURIPS 2025arXiv:2503.24290
347
citations

OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles

Yihe Deng, Hritik Bansal, Fan Yin et al.

NEURIPS 2025arXiv:2503.17352
16
citations

Open-World Drone Active Tracking with Goal-Centered Rewards

Haowei Sun, Jinwu Hu, Zhirui Zhang et al.

NEURIPS 2025arXiv:2412.00744
2
citations

Optimal Dynamic Regret by Transformers for Non-Stationary Reinforcement Learning

Baiyuan Chen, Shinji Ito, Masaaki Imaizumi

NEURIPS 2025arXiv:2508.16027

OptionZero: Planning with Learned Options

Po-Wei Huang, Pei-Chiun Peng, Hung Guei et al.

ICLR 2025arXiv:2502.16634
2
citations

OrbitZoo: Real Orbital Systems Challenges for Reinforcement Learning

Alexandre Oliveira, Katarina Dyreby, Francisco Caldas et al.

NEURIPS 2025arXiv:2504.04160

Parameter Efficient Fine-tuning via Explained Variance Adaptation

Fabian Paischer, Lukas Hauzenberger, Thomas Schmied et al.

NEURIPS 2025arXiv:2410.07170
6
citations

Pareto Prompt Optimization

Guang Zhao, Byung-Jun Yoon, Gilchan Park et al.

ICLR 2025
1
citations

Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing

Yilmazcan Ozyurt, Tunaberk Almaci, Stefan Feuerriegel et al.

NEURIPS 2025arXiv:2507.11060

Policy Gradient with Kernel Quadrature

Tetsuro Morimura, Satoshi Hayakawa

ICLR 2025arXiv:2310.14768
1
citations

Preference Distillation via Value based Reinforcement Learning

Minchan Kwon, Junwon Ko, Kangil kim et al.

NEURIPS 2025arXiv:2509.16965

Progress Reward Model for Reinforcement Learning via Large Language Models

Xiuhui Zhang, Ning Gao, Xingyu Jiang et al.

NEURIPS 2025

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Mingjie Liu, Shizhe Diao, Ximing Lu et al.

NEURIPS 2025arXiv:2505.24864
104
citations

Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control

Zijie Xu, Tong Bu, Zecheng Hao et al.

NEURIPS 2025arXiv:2505.24161
4
citations

RAST: Reasoning Activation in LLMs via Small-model Transfer

Siru Ouyang, Xinyu Zhu, Zilin Xiao et al.

NEURIPS 2025arXiv:2506.15710
2
citations

Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)

Zhenjie Yang, Xiaosong Jia, Qifeng Li et al.

NEURIPS 2025arXiv:2505.16394
21
citations

ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding

Yiyang Zhou, Yangfan He, Yaofeng Su et al.

NEURIPS 2025arXiv:2506.01300
29
citations

Real-World Reinforcement Learning of Active Perception Behaviors

Edward Hu, Jie Wang, Xingfang Yuan et al.

NEURIPS 2025arXiv:2512.01188

Reasoning as an Adaptive Defense for Safety

Taeyoun Kim, Fahim Tajwar, Aditi Raghunathan et al.

NEURIPS 2025arXiv:2507.00971
11
citations

Reducing the Probability of Undesirable Outputs in Language Models Using Probabilistic Inference

Stephen Zhao, Aidan Li, Rob Brekelmans et al.

NEURIPS 2025arXiv:2510.21184

Reinforced Active Learning for Large-Scale Virtual Screening with Learnable Policy Model

Yicong Chen, Jiahua Rao, Jiancong Xie et al.

NEURIPS 2025

Reinforcement Learning from Imperfect Corrective Actions and Proxy Rewards

Zhaohui JIANG, Xuening Feng, Paul Weng et al.

ICLR 2025arXiv:2410.05782
3
citations