"reinforcement learning" Papers
119 papers found • Page 1 of 3
AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws
Oren Neumann, Claudius Gros
An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning
Haoran Xu, Shuozhe Li, Harshit Sikchi et al.
Continual Knowledge Adaptation for Reinforcement Learning
Jinwu Hu, ZiHao Lian, Zhiquan Wen et al.
GraphChain: Large Language Models for Large-scale Graph Analysis via Tool Chaining
Chunyu Wei, Wenji Hu, Xingjia Hao et al.
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
Tong Wei, Yijun Yang, Junliang Xing et al.
Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks
Michael Matthews, Michael Beukman, Chris Lu et al.
Meta-learning how to Share Credit among Macro-Actions
Ionel-Alexandru Hosu, Traian Rebedea, Razvan Pascanu
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Xi Chen, Mingkang Zhu, Shaoteng Liu et al.
Multi-Agent Collaboration via Evolving Orchestration
Yufan Dang, Chen Qian, Xueheng Luo et al.
On the Convergence of Projected Policy Gradient for Any Constant Step Sizes
Jiacai Liu, Wenye Li, Dachao Lin et al.
OrbitZoo: Real Orbital Systems Challenges for Reinforcement Learning
Alexandre Oliveira, Katarina Dyreby, Francisco Caldas et al.
Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing
Yilmazcan Ozyurt, Tunaberk Almaci, Stefan Feuerriegel et al.
Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control
Zijie Xu, Tong Bu, Zecheng Hao et al.
Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)
Zhenjie Yang, Xiaosong Jia, Qifeng Li et al.
Reasoning as an Adaptive Defense for Safety
Taeyoun Kim, Fahim Tajwar, Aditi Raghunathan et al.
Reinforcement learning with combinatorial actions for coupled restless bandits
Lily Xu, Bryan Wilder, Elias Khalil et al.
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models
Zemin Huang, Zhiyang Chen, Zijun Wang et al.
RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation
Tianyi Yan, Wencheng Han, xia zhou et al.
Semantic Temporal Abstraction via Vision-Language Model Guidance for Efficient Reinforcement Learning
Tian-Shuo Liu, Xu-Hui Liu, Ruifeng Chen et al.
SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data
Wenkai Fang, Shunyu Liu, Yang Zhou et al.
Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning
Hung Le, Dung Nguyen, Kien Do et al.
The Promise of RL for Autoregressive Image Editing
Saba Ahmadi, Rabiul Awal, Ankur Sikarwar et al.
The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning
Xinyu Zhu, Mengzhou Xia, Zhepei Wei et al.
Unlocking Multimodal Mathematical Reasoning via Process Reward Model
Ruilin Luo, Zhuofan Zheng, Lei Wang et al.
VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception
Ziang Yan, Yinan He, Xinhao Li et al.
WebDancer: Towards Autonomous Information Seeking Agency
Jialong Wu, Baixuan Li, Runnan Fang et al.
Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement Learning
Yen-Ju Chen, Nai-Chieh Huang, Ching-pei Lee et al.
Activation-Descent Regularization for Input Optimization of ReLU Networks
Hongzhan Yu, Sicun Gao
A Hierarchical Adaptive Multi-Task Reinforcement Learning Framework for Multiplier Circuit Design
Zhihai Wang, Jie Wang, Dongsheng Zuo et al.
A Neural-Guided Dynamic Symbolic Network for Exploring Mathematical Expressions from Data
Wenqiang Li, Weijun Li, Lina Yu et al.
An Improved Finite-time Analysis of Temporal Difference Learning with Deep Neural Networks
Zhifa Ke, Zaiwen Wen, Junyu Zhang
An Information Theoretic Approach to Interaction-Grounded Learning
Xiaoyan Hu, Farzan Farnia, Ho-fung Leung
Augmenting Decision with Hypothesis in Reinforcement Learning
Nguyen Minh Quang, Hady Lauw
Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays
Qingyuan Wu, Simon Zhan, Yixuan Wang et al.
Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning
Zizhao Wang, Caroline Wang, Xuesu Xiao et al.
Code as Reward: Empowering Reinforcement Learning with VLMs
David Venuto, Mohammad Sami Nur Islam, Martin Klissarov et al.
ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference
Ziqian Zeng, Yihuai Hong, Hongliang Dai et al.
Cross-Domain Policy Adaptation by Capturing Representation Mismatch
Jiafei Lyu, Chenjia Bai, Jing-Wen Yang et al.
Dealing With Unbounded Gradients in Stochastic Saddle-point Optimization
Gergely Neu, Nneka Okolo
DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization
Wenze Chen, Shiyu Huang, Yuan Chiang et al.
DiffAIL: Diffusion Adversarial Imitation Learning
Bingzheng Wang, Guoqiang Wu, Teng Pang et al.
Discerning Temporal Difference Learning
DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation
Yinjun Wu, Mayank Keoliya, Kan Chen et al.
Dynamic Knowledge Injection for AIXI Agents
Samuel Yang-Zhao, Kee Siong Ng, Marcus Hutter
Efficient Policy Evaluation with Offline Data Informed Behavior Policy Design
Shuze Liu, Shangtong Zhang
Efficient World Models with Context-Aware Tokenization
Vincent Micheli, Eloi Alonso, François Fleuret
EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data
Shengjie Wang, Shaohuai Liu, Weirui Ye et al.
Episodic Return Decomposition by Difference of Implicitly Assigned Sub-trajectory Reward
Haoxin Lin, Hongqiu Wu, Jiaji Zhang et al.
EvoRainbow: Combining Improvements in Evolutionary Reinforcement Learning for Policy Search
Pengyi Li, Yan Zheng, Hongyao Tang et al.
Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations
Zilin Wang, Haolin Zhuang, Lu Li et al.