"reinforcement learning" Papers

299 papers found • Page 1 of 6

$\texttt{G1}$: Teaching LLMs to Reason on Graphs with Reinforcement Learning

Xiaojun Guo, Ang Li, Yifei Wang et al.

NEURIPS 2025
4
citations

A Causal Lens for Learning Long-term Fair Policies

Jacob Lear, Lu Zhang

ICLR 2025arXiv:2506.11242
1
citations

Accelerating RL for LLM Reasoning with Optimal Advantage Regression

Kianté Brantley, Mingyu Chen, Zhaolin Gao et al.

NEURIPS 2025arXiv:2505.20686
12
citations

Action abstractions for amortized sampling

Oussama Boussif, Léna Ezzine, Joseph Viviano et al.

ICLR 2025arXiv:2410.15184
3
citations

ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning

Yarden As, Bhavya, Lenart Treven et al.

ICLR 2025arXiv:2410.09486
15
citations

AdaptiveAE: An Adaptive Exposure Strategy for HDR Capturing in Dynamic Scenes

Tianyi Xu, Fan Zhang, Boxin Shi et al.

ICCV 2025arXiv:2508.13503
1
citations

Adding Conditional Control to Diffusion Models with Reinforcement Learning

Yulai Zhao, Masatoshi Uehara, Gabriele Scalia et al.

ICLR 2025arXiv:2406.12120
13
citations

A Differential and Pointwise Control Approach to Reinforcement Learning

Minh Nguyen, Chandrajit Bajaj

NEURIPS 2025arXiv:2404.15617
1
citations

Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization

Mingzhe Du, Anh Tuan Luu, Yue Liu et al.

NEURIPS 2025arXiv:2505.23387
6
citations

A Generalized Bisimulation Metric of State Similarity between Markov Decision Processes: From Theoretical Propositions to Applications

Zhenyu Tao, Wei Xu, Xiaohu You

NEURIPS 2025arXiv:2509.18714
2
citations

ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition

Daolang Huang, Xinyi Wen, Ayush Bharti et al.

NEURIPS 2025spotlightarXiv:2506.07259
2
citations

AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws

Oren Neumann, Claudius Gros

NEURIPS 2025spotlightarXiv:2412.11979
9
citations

An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning

Haoran Xu, Shuozhe Li, Harshit Sikchi et al.

ICLR 2025arXiv:2504.13368
2
citations

AREAL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Wei Fu, Jiaxuan Gao, Xujie Shen et al.

NEURIPS 2025arXiv:2505.24298
108
citations

Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards

Charles Arnal, Gaëtan Narozniak, Vivien Cabannes et al.

NEURIPS 2025arXiv:2506.20520
17
citations

AutoEdit: Automatic Hyperparameter Tuning for Image Editing

Chau Pham, Quan Dao, Mahesh Bhosale et al.

NEURIPS 2025arXiv:2509.15031
1
citations

BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning

Artem Zholus, Maksim Kuznetsov, Roman Schutski et al.

AAAI 2025paperarXiv:2406.03686
15
citations

CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward

Yandong Guan, Xilin Wang, XiMing Xing et al.

NEURIPS 2025arXiv:2505.19713
10
citations

Can a MISL Fly? Analysis and Ingredients for Mutual Information Skill Learning

Chongyi Zheng, Jens Tuyls, Joanne Peng et al.

ICLR 2025arXiv:2412.08021
9
citations

ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding

Muye Huang, Lingling Zhang, Jie Ma et al.

NEURIPS 2025arXiv:2505.19076
5
citations

Checklists Are Better Than Reward Models For Aligning Language Models

Vijay Viswanathan, Yanchao Sun, Xiang Kong et al.

NEURIPS 2025spotlightarXiv:2507.18624
32
citations

Complexity Scaling Laws for Neural Models using Combinatorial Optimization

Lowell Weissman, Michael Krumdick, A. Abbott

NEURIPS 2025arXiv:2506.12932

Computational Hardness of Reinforcement Learning with Partial $q^{\pi}$-Realizability

Shayan Karimi, Xiaoqi Tan

NEURIPS 2025

Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics

Runzhe Wu, Ayush Sekhari, Akshay Krishnamurthy et al.

ICLR 2025arXiv:2406.11810
3
citations

Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation

Feichen Gan, Lu Youcun, Yingying Zhang et al.

NEURIPS 2025oralarXiv:2510.26026

Continual Knowledge Adaptation for Reinforcement Learning

Jinwu Hu, ZiHao Lian, Zhiquan Wen et al.

NEURIPS 2025arXiv:2510.19314
2
citations

CORE: Collaborative Optimization with Reinforcement Learning and Evolutionary Algorithm for Floorplanning

Pengyi Li, Shixiong Kai, Jianye Hao et al.

NEURIPS 2025

Co-Reinforcement Learning for Unified Multimodal Understanding and Generation

Jingjing Jiang, Chongjie Si, Jun Luo et al.

NEURIPS 2025spotlightarXiv:2505.17534
5
citations

Cross-Embodiment Dexterous Grasping with Reinforcement Learning

Haoqi Yuan, Bohan Zhou, Yuhui Fu et al.

ICLR 2025arXiv:2410.02479
20
citations

CURE: Co-Evolving Coders and Unit Testers via Reinforcement Learning

Yinjie Wang, Ling Yang, Ye Tian et al.

NEURIPS 2025spotlight

Cypher-RI: Reinforcement Learning for Integrating Schema Selection into Cypher Generation

Hanchen Su, Xuyuan Li, Yan Zhou et al.

NEURIPS 2025

d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning

Siyan Zhao, Devaansh Gupta, Qinqing Zheng et al.

NEURIPS 2025spotlightarXiv:2504.12216
87
citations

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Huajian Xin, Z.Z. Ren, Junxiao Song et al.

ICLR 2025arXiv:2408.08152
142
citations

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO

Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.

NEURIPS 2025arXiv:2505.17017
26
citations

DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization

Gang Li, Ming Lin, Tomer Galanti et al.

NEURIPS 2025arXiv:2505.12366
12
citations

Discrete Codebook World Models for Continuous Control

Aidan Scannell, Mohammadreza Nakhaeinezhadfard, Kalle Kujanpää et al.

ICLR 2025arXiv:2503.00653
9
citations

Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning

Qi Wang, Zhipeng Zhang, Baao Xie et al.

ICCV 2025arXiv:2503.08751
5
citations

Doubly Optimal Policy Evaluation for Reinforcement Learning

Shuze Liu, Claire Chen, Shangtong Zhang

ICLR 2025arXiv:2410.02226
5
citations

Dynamic Configuration for Cutting Plane Separators via Reinforcement Learning on Incremental Graph

Mingxuan Ye, Jie Wang, Fangzhou et al.

NEURIPS 2025

Dynamic Contrastive Skill Learning with State-Transition Based Skill Clustering and Dynamic Length Adjustment

Jinwoo Choi, Seung-Woo Seo

ICLR 2025oralarXiv:2504.14805
2
citations

DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation

Jiashuo Sun, Xianrui Zhong, Sizhe Zhou et al.

NEURIPS 2025arXiv:2505.07233
6
citations

Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining

Rosie Zhao, Alexandru Meterez, Sham M. Kakade et al.

COLM 2025paperarXiv:2504.07912
87
citations

EDELINE: Enhancing Memory in Diffusion-based World Models via Linear-Time Sequence Modeling

Jia-Hua Lee, Bor-Jiun Lin, Wei-Fang Sun et al.

NEURIPS 2025spotlightarXiv:2502.00466
2
citations

Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning

Claire Chen, Shuze Liu, Shangtong Zhang

ICLR 2025arXiv:2410.05655
1
citations

Efficient Reinforcement Learning in Probabilistic Reward Machines

Xiaofeng Lin, Xuezhou Zhang

AAAI 2025paperarXiv:2408.10381
2
citations

Efficient Reinforcement Learning Through Adaptively Pretrained Visual Encoder

Yuhan Zhang, Guoqing Ma, Guangfu Hao et al.

AAAI 2025paperarXiv:2502.05555
1
citations

Efficient Reinforcement Learning with Large Language Model Priors

Xue Yan, Yan Song, Xidong Feng et al.

ICLR 2025arXiv:2410.07927
21
citations

Enhancing Diversity In Parallel Agents: A Maximum State Entropy Exploration Story

Vincenzo De Paola, Riccardo Zamboni, Mirco Mutti et al.

ICML 2025arXiv:2505.01336
3
citations

Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data

Shilong Deng, Zetao Zheng, Hongcai He et al.

AAAI 2025paperarXiv:2501.07346

Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games

Runyu Lu, Peng Zhang, Ruochuan Shi et al.

NEURIPS 2025arXiv:2511.00811
2
citations
Previous
123...6
Next