Poster "reinforcement learning" Papers

220 papers found • Page 1 of 5

$\texttt{G1}$: Teaching LLMs to Reason on Graphs with Reinforcement Learning

Xiaojun Guo, Ang Li, Yifei Wang et al.

NEURIPS 2025
4
citations

A Causal Lens for Learning Long-term Fair Policies

Jacob Lear, Lu Zhang

ICLR 2025arXiv:2506.11242
1
citations

Accelerating RL for LLM Reasoning with Optimal Advantage Regression

Kianté Brantley, Mingyu Chen, Zhaolin Gao et al.

NEURIPS 2025arXiv:2505.20686
12
citations

Action abstractions for amortized sampling

Oussama Boussif, Léna Ezzine, Joseph Viviano et al.

ICLR 2025arXiv:2410.15184
3
citations

ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning

Yarden As, Bhavya, Lenart Treven et al.

ICLR 2025arXiv:2410.09486
15
citations

AdaptiveAE: An Adaptive Exposure Strategy for HDR Capturing in Dynamic Scenes

Tianyi Xu, Fan Zhang, Boxin Shi et al.

ICCV 2025arXiv:2508.13503
1
citations

Adding Conditional Control to Diffusion Models with Reinforcement Learning

Yulai Zhao, Masatoshi Uehara, Gabriele Scalia et al.

ICLR 2025arXiv:2406.12120
13
citations

A Differential and Pointwise Control Approach to Reinforcement Learning

Minh Nguyen, Chandrajit Bajaj

NEURIPS 2025arXiv:2404.15617
1
citations

Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization

Mingzhe Du, Anh Tuan Luu, Yue Liu et al.

NEURIPS 2025arXiv:2505.23387
7
citations

A Generalized Bisimulation Metric of State Similarity between Markov Decision Processes: From Theoretical Propositions to Applications

Zhenyu Tao, Wei Xu, Xiaohu You

NEURIPS 2025arXiv:2509.18714
2
citations

An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning

Haoran Xu, Shuozhe Li, Harshit Sikchi et al.

ICLR 2025arXiv:2504.13368
3
citations

AREAL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Wei Fu, Jiaxuan Gao, Xujie Shen et al.

NEURIPS 2025arXiv:2505.24298
117
citations

Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards

Charles Arnal, Gaëtan Narozniak, Vivien Cabannes et al.

NEURIPS 2025arXiv:2506.20520
17
citations

AutoEdit: Automatic Hyperparameter Tuning for Image Editing

Chau Pham, Quan Dao, Mahesh Bhosale et al.

NEURIPS 2025arXiv:2509.15031
1
citations

CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward

Yandong Guan, Xilin Wang, XiMing Xing et al.

NEURIPS 2025arXiv:2505.19713
10
citations

Can a MISL Fly? Analysis and Ingredients for Mutual Information Skill Learning

Chongyi Zheng, Jens Tuyls, Joanne Peng et al.

ICLR 2025arXiv:2412.08021
9
citations

ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding

Muye Huang, Lingling Zhang, Jie Ma et al.

NEURIPS 2025arXiv:2505.19076
5
citations

Complexity Scaling Laws for Neural Models using Combinatorial Optimization

Lowell Weissman, Michael Krumdick, A. Abbott

NEURIPS 2025arXiv:2506.12932

Computational Hardness of Reinforcement Learning with Partial $q^{\pi}$-Realizability

Shayan Karimi, Xiaoqi Tan

NEURIPS 2025

Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics

Runzhe Wu, Ayush Sekhari, Akshay Krishnamurthy et al.

ICLR 2025arXiv:2406.11810
3
citations

Continual Knowledge Adaptation for Reinforcement Learning

Jinwu Hu, ZiHao Lian, Zhiquan Wen et al.

NEURIPS 2025arXiv:2510.19314
2
citations

CORE: Collaborative Optimization with Reinforcement Learning and Evolutionary Algorithm for Floorplanning

Pengyi Li, Shixiong Kai, Jianye Hao et al.

NEURIPS 2025

Cross-Embodiment Dexterous Grasping with Reinforcement Learning

Haoqi Yuan, Bohan Zhou, Yuhui Fu et al.

ICLR 2025arXiv:2410.02479
20
citations

Cypher-RI: Reinforcement Learning for Integrating Schema Selection into Cypher Generation

Hanchen Su, Xuyuan Li, Yan Zhou et al.

NEURIPS 2025

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Huajian Xin, Z.Z. Ren, Junxiao Song et al.

ICLR 2025arXiv:2408.08152
142
citations

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO

Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.

NEURIPS 2025arXiv:2505.17017
26
citations

DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization

Gang Li, Ming Lin, Tomer Galanti et al.

NEURIPS 2025arXiv:2505.12366
12
citations

Discrete Codebook World Models for Continuous Control

Aidan Scannell, Mohammadreza Nakhaeinezhadfard, Kalle Kujanpää et al.

ICLR 2025arXiv:2503.00653
9
citations

Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning

Qi Wang, Zhipeng Zhang, Baao Xie et al.

ICCV 2025arXiv:2503.08751
5
citations

Doubly Optimal Policy Evaluation for Reinforcement Learning

Shuze Liu, Claire Chen, Shangtong Zhang

ICLR 2025arXiv:2410.02226
5
citations

Dynamic Configuration for Cutting Plane Separators via Reinforcement Learning on Incremental Graph

Mingxuan Ye, Jie Wang, Fangzhou et al.

NEURIPS 2025

DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation

Jiashuo Sun, Xianrui Zhong, Sizhe Zhou et al.

NEURIPS 2025arXiv:2505.07233
6
citations

Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning

Claire Chen, Shuze Liu, Shangtong Zhang

ICLR 2025arXiv:2410.05655
1
citations

Efficient Reinforcement Learning with Large Language Model Priors

Xue Yan, Yan Song, Xidong Feng et al.

ICLR 2025arXiv:2410.07927
21
citations

Enhancing Diversity In Parallel Agents: A Maximum State Entropy Exploration Story

Vincenzo De Paola, Riccardo Zamboni, Mirco Mutti et al.

ICML 2025arXiv:2505.01336
3
citations

Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games

Runyu Lu, Peng Zhang, Ruochuan Shi et al.

NEURIPS 2025arXiv:2511.00811
2
citations

EvolvedGRPO: Unlocking Reasoning in LVLMs via Progressive Instruction Evolution

Zhebei Shen, Qifan Yu, Juncheng Li et al.

NEURIPS 2025

From Kolmogorov to Cauchy: Shallow XNet Surpasses KANs

Xin Li, Xiaotao Zheng, Zhihong Xia

NEURIPS 2025

Generalizing Verifiable Instruction Following

Valentina Pyatkin, Saumya Malik, Victoria Graf et al.

NEURIPS 2025arXiv:2507.02833
38
citations

General-Reasoner: Advancing LLM Reasoning Across All Domains

Xueguang Ma, Qian Liu, Dongfu Jiang et al.

NEURIPS 2025arXiv:2505.14652
86
citations

GeoExplorer: Active Geo-localization with Curiosity-Driven Exploration

Li Mi, Manon Béchaz, Zeming Chen et al.

ICCV 2025arXiv:2508.00152

Globally Optimal Policy Gradient Algorithms for Reinforcement Learning with PID Control Policies

Vipul Sharma, Wesley Suttle, S Sivaranjani

NEURIPS 2025

GoalLadder: Incremental Goal Discovery with Vision-Language Models

Alexey Zakharov, Shimon Whiteson

NEURIPS 2025arXiv:2506.16396
1
citations

GraphChain: Large Language Models for Large-scale Graph Analysis via Tool Chaining

Chunyu Wei, Wenji Hu, Xingjia Hao et al.

NEURIPS 2025arXiv:2511.00457
1
citations

GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training

Tong Wei, Yijun Yang, Junliang Xing et al.

ICCV 2025arXiv:2503.08525
8
citations

HCRMP: An LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving

Zhiwen Chen, Hanming Deng, Zhuoren Li et al.

NEURIPS 2025arXiv:2505.15793
3
citations

Hierarchical World Models as Visual Whole-Body Humanoid Controllers

Nick Hansen, Jyothir S V, Vlad Sobal et al.

ICLR 2025arXiv:2405.18418
23
citations

How Ensembles of Distilled Policies Improve Generalisation in Reinforcement Learning

Max Weltevrede, Moritz Zanger, Matthijs Spaan et al.

NEURIPS 2025arXiv:2505.16581

Hybrid Latent Reasoning via Reinforcement Learning

Zhenrui Yue, Bowen Jin, Huimin Zeng et al.

NEURIPS 2025arXiv:2505.18454
8
citations

HYPRL: Reinforcement Learning of Control Policies for Hyperproperties

Tzu-Han Hsu, Arshia Rafieioskouei, Borzoo Bonakdarpour

NEURIPS 2025arXiv:2504.04675
2
citations
Previous
123...5
Next