"reinforcement learning" Papers

119 papers found • Page 1 of 3

AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws

Oren Neumann, Claudius Gros

NeurIPS 2025spotlightarXiv:2412.11979
9
citations

An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning

Haoran Xu, Shuozhe Li, Harshit Sikchi et al.

ICLR 2025posterarXiv:2504.13368
2
citations

Continual Knowledge Adaptation for Reinforcement Learning

Jinwu Hu, ZiHao Lian, Zhiquan Wen et al.

NeurIPS 2025posterarXiv:2510.19314
1
citations

GraphChain: Large Language Models for Large-scale Graph Analysis via Tool Chaining

Chunyu Wei, Wenji Hu, Xingjia Hao et al.

NeurIPS 2025posterarXiv:2511.00457

GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training

Tong Wei, Yijun Yang, Junliang Xing et al.

ICCV 2025posterarXiv:2503.08525
8
citations

Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks

Michael Matthews, Michael Beukman, Chris Lu et al.

ICLR 2025posterarXiv:2410.23208
20
citations

Meta-learning how to Share Credit among Macro-Actions

Ionel-Alexandru Hosu, Traian Rebedea, Razvan Pascanu

NeurIPS 2025oralarXiv:2506.13690

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning

Xi Chen, Mingkang Zhu, Shaoteng Liu et al.

NeurIPS 2025posterarXiv:2506.22434

Multi-Agent Collaboration via Evolving Orchestration

Yufan Dang, Chen Qian, Xueheng Luo et al.

NeurIPS 2025posterarXiv:2505.19591
25
citations

On the Convergence of Projected Policy Gradient for Any Constant Step Sizes

Jiacai Liu, Wenye Li, Dachao Lin et al.

NeurIPS 2025posterarXiv:2311.01104
4
citations

OrbitZoo: Real Orbital Systems Challenges for Reinforcement Learning

Alexandre Oliveira, Katarina Dyreby, Francisco Caldas et al.

NeurIPS 2025posterarXiv:2504.04160

Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing

Yilmazcan Ozyurt, Tunaberk Almaci, Stefan Feuerriegel et al.

NeurIPS 2025posterarXiv:2507.11060

Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control

Zijie Xu, Tong Bu, Zecheng Hao et al.

NeurIPS 2025posterarXiv:2505.24161
3
citations

Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)

Zhenjie Yang, Xiaosong Jia, Qifeng Li et al.

NeurIPS 2025posterarXiv:2505.16394
18
citations

Reasoning as an Adaptive Defense for Safety

Taeyoun Kim, Fahim Tajwar, Aditi Raghunathan et al.

NeurIPS 2025posterarXiv:2507.00971
9
citations

Reinforcement learning with combinatorial actions for coupled restless bandits

Lily Xu, Bryan Wilder, Elias Khalil et al.

ICLR 2025posterarXiv:2503.01919
5
citations

Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models

Zemin Huang, Zhiyang Chen, Zijun Wang et al.

NeurIPS 2025posterarXiv:2505.10446
28
citations

RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation

Tianyi Yan, Wencheng Han, xia zhou et al.

NeurIPS 2025posterarXiv:2509.16500

Semantic Temporal Abstraction via Vision-Language Model Guidance for Efficient Reinforcement Learning

Tian-Shuo Liu, Xu-Hui Liu, Ruifeng Chen et al.

ICLR 2025oral

SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data

Wenkai Fang, Shunyu Liu, Yang Zhou et al.

NeurIPS 2025posterarXiv:2505.20347
19
citations

Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning

Hung Le, Dung Nguyen, Kien Do et al.

ICLR 2025posterarXiv:2410.10132
6
citations

The Promise of RL for Autoregressive Image Editing

Saba Ahmadi, Rabiul Awal, Ankur Sikarwar et al.

NeurIPS 2025posterarXiv:2508.01119
2
citations

The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning

Xinyu Zhu, Mengzhou Xia, Zhepei Wei et al.

NeurIPS 2025posterarXiv:2506.01347
74
citations

Unlocking Multimodal Mathematical Reasoning via Process Reward Model

Ruilin Luo, Zhuofan Zheng, Lei Wang et al.

NeurIPS 2025posterarXiv:2501.04686
29
citations

VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception

Ziang Yan, Yinan He, Xinhao Li et al.

NeurIPS 2025oralarXiv:2509.21100
13
citations

WebDancer: Towards Autonomous Information Seeking Agency

Jialong Wu, Baixuan Li, Runnan Fang et al.

NeurIPS 2025posterarXiv:2505.22648
81
citations

Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement Learning

Yen-Ju Chen, Nai-Chieh Huang, Ching-pei Lee et al.

ICML 2024poster

Activation-Descent Regularization for Input Optimization of ReLU Networks

Hongzhan Yu, Sicun Gao

ICML 2024poster

A Hierarchical Adaptive Multi-Task Reinforcement Learning Framework for Multiplier Circuit Design

Zhihai Wang, Jie Wang, Dongsheng Zuo et al.

ICML 2024poster

A Neural-Guided Dynamic Symbolic Network for Exploring Mathematical Expressions from Data

Wenqiang Li, Weijun Li, Lina Yu et al.

ICML 2024poster

An Improved Finite-time Analysis of Temporal Difference Learning with Deep Neural Networks

Zhifa Ke, Zaiwen Wen, Junyu Zhang

ICML 2024oral

An Information Theoretic Approach to Interaction-Grounded Learning

Xiaoyan Hu, Farzan Farnia, Ho-fung Leung

ICML 2024poster

Augmenting Decision with Hypothesis in Reinforcement Learning

Nguyen Minh Quang, Hady Lauw

ICML 2024poster

Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays

Qingyuan Wu, Simon Zhan, Yixuan Wang et al.

ICML 2024poster

Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning

Zizhao Wang, Caroline Wang, Xuesu Xiao et al.

AAAI 2024paperarXiv:2401.12497
9
citations

Code as Reward: Empowering Reinforcement Learning with VLMs

David Venuto, Mohammad Sami Nur Islam, Martin Klissarov et al.

ICML 2024spotlight

ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference

Ziqian Zeng, Yihuai Hong, Hongliang Dai et al.

AAAI 2024paperarXiv:2312.11882

Cross-Domain Policy Adaptation by Capturing Representation Mismatch

Jiafei Lyu, Chenjia Bai, Jing-Wen Yang et al.

ICML 2024poster

Dealing With Unbounded Gradients in Stochastic Saddle-point Optimization

Gergely Neu, Nneka Okolo

ICML 2024poster

DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization

Wenze Chen, Shiyu Huang, Yuan Chiang et al.

AAAI 2024paperarXiv:2207.05631
9
citations

DiffAIL: Diffusion Adversarial Imitation Learning

Bingzheng Wang, Guoqiang Wu, Teng Pang et al.

AAAI 2024paperarXiv:2312.06348
20
citations

Discerning Temporal Difference Learning

AAAI 2024paperarXiv:2310.08091

DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation

Yinjun Wu, Mayank Keoliya, Kan Chen et al.

ICML 2024spotlight

Dynamic Knowledge Injection for AIXI Agents

Samuel Yang-Zhao, Kee Siong Ng, Marcus Hutter

AAAI 2024paperarXiv:2312.16184

Efficient Policy Evaluation with Offline Data Informed Behavior Policy Design

Shuze Liu, Shangtong Zhang

ICML 2024poster

Efficient World Models with Context-Aware Tokenization

Vincent Micheli, Eloi Alonso, François Fleuret

ICML 2024poster

EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data

Shengjie Wang, Shaohuai Liu, Weirui Ye et al.

ICML 2024spotlight

Episodic Return Decomposition by Difference of Implicitly Assigned Sub-trajectory Reward

Haoxin Lin, Hongqiu Wu, Jiaji Zhang et al.

AAAI 2024paperarXiv:2312.10642
3
citations

EvoRainbow: Combining Improvements in Evolutionary Reinforcement Learning for Policy Search

Pengyi Li, Yan Zheng, Hongyao Tang et al.

ICML 2024poster

Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations

Zilin Wang, Haolin Zhuang, Lu Li et al.

AAAI 2024paperarXiv:2312.11442
5
citations
← PreviousNext →