Poster "reinforcement learning" Papers
220 papers found • Page 1 of 5
Conference
$\texttt{G1}$: Teaching LLMs to Reason on Graphs with Reinforcement Learning
Xiaojun Guo, Ang Li, Yifei Wang et al.
A Causal Lens for Learning Long-term Fair Policies
Jacob Lear, Lu Zhang
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Kianté Brantley, Mingyu Chen, Zhaolin Gao et al.
Action abstractions for amortized sampling
Oussama Boussif, Léna Ezzine, Joseph Viviano et al.
ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning
Yarden As, Bhavya, Lenart Treven et al.
AdaptiveAE: An Adaptive Exposure Strategy for HDR Capturing in Dynamic Scenes
Tianyi Xu, Fan Zhang, Boxin Shi et al.
Adding Conditional Control to Diffusion Models with Reinforcement Learning
Yulai Zhao, Masatoshi Uehara, Gabriele Scalia et al.
A Differential and Pointwise Control Approach to Reinforcement Learning
Minh Nguyen, Chandrajit Bajaj
Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization
Mingzhe Du, Anh Tuan Luu, Yue Liu et al.
A Generalized Bisimulation Metric of State Similarity between Markov Decision Processes: From Theoretical Propositions to Applications
Zhenyu Tao, Wei Xu, Xiaohu You
An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning
Haoran Xu, Shuozhe Li, Harshit Sikchi et al.
AREAL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
Wei Fu, Jiaxuan Gao, Xujie Shen et al.
Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards
Charles Arnal, Gaëtan Narozniak, Vivien Cabannes et al.
AutoEdit: Automatic Hyperparameter Tuning for Image Editing
Chau Pham, Quan Dao, Mahesh Bhosale et al.
CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward
Yandong Guan, Xilin Wang, XiMing Xing et al.
Can a MISL Fly? Analysis and Ingredients for Mutual Information Skill Learning
Chongyi Zheng, Jens Tuyls, Joanne Peng et al.
ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding
Muye Huang, Lingling Zhang, Jie Ma et al.
Complexity Scaling Laws for Neural Models using Combinatorial Optimization
Lowell Weissman, Michael Krumdick, A. Abbott
Computational Hardness of Reinforcement Learning with Partial $q^{\pi}$-Realizability
Shayan Karimi, Xiaoqi Tan
Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics
Runzhe Wu, Ayush Sekhari, Akshay Krishnamurthy et al.
Continual Knowledge Adaptation for Reinforcement Learning
Jinwu Hu, ZiHao Lian, Zhiquan Wen et al.
CORE: Collaborative Optimization with Reinforcement Learning and Evolutionary Algorithm for Floorplanning
Pengyi Li, Shixiong Kai, Jianye Hao et al.
Cross-Embodiment Dexterous Grasping with Reinforcement Learning
Haoqi Yuan, Bohan Zhou, Yuhui Fu et al.
Cypher-RI: Reinforcement Learning for Integrating Schema Selection into Cypher Generation
Hanchen Su, Xuyuan Li, Yan Zhou et al.
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Huajian Xin, Z.Z. Ren, Junxiao Song et al.
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
Gang Li, Ming Lin, Tomer Galanti et al.
Discrete Codebook World Models for Continuous Control
Aidan Scannell, Mohammadreza Nakhaeinezhadfard, Kalle Kujanpää et al.
Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning
Qi Wang, Zhipeng Zhang, Baao Xie et al.
Doubly Optimal Policy Evaluation for Reinforcement Learning
Shuze Liu, Claire Chen, Shangtong Zhang
Dynamic Configuration for Cutting Plane Separators via Reinforcement Learning on Incremental Graph
Mingxuan Ye, Jie Wang, Fangzhou et al.
DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation
Jiashuo Sun, Xianrui Zhong, Sizhe Zhou et al.
Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning
Claire Chen, Shuze Liu, Shangtong Zhang
Efficient Reinforcement Learning with Large Language Model Priors
Xue Yan, Yan Song, Xidong Feng et al.
Enhancing Diversity In Parallel Agents: A Maximum State Entropy Exploration Story
Vincenzo De Paola, Riccardo Zamboni, Mirco Mutti et al.
Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games
Runyu Lu, Peng Zhang, Ruochuan Shi et al.
EvolvedGRPO: Unlocking Reasoning in LVLMs via Progressive Instruction Evolution
Zhebei Shen, Qifan Yu, Juncheng Li et al.
From Kolmogorov to Cauchy: Shallow XNet Surpasses KANs
Xin Li, Xiaotao Zheng, Zhihong Xia
Generalizing Verifiable Instruction Following
Valentina Pyatkin, Saumya Malik, Victoria Graf et al.
General-Reasoner: Advancing LLM Reasoning Across All Domains
Xueguang Ma, Qian Liu, Dongfu Jiang et al.
GeoExplorer: Active Geo-localization with Curiosity-Driven Exploration
Li Mi, Manon Béchaz, Zeming Chen et al.
Globally Optimal Policy Gradient Algorithms for Reinforcement Learning with PID Control Policies
Vipul Sharma, Wesley Suttle, S Sivaranjani
GoalLadder: Incremental Goal Discovery with Vision-Language Models
Alexey Zakharov, Shimon Whiteson
GraphChain: Large Language Models for Large-scale Graph Analysis via Tool Chaining
Chunyu Wei, Wenji Hu, Xingjia Hao et al.
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
Tong Wei, Yijun Yang, Junliang Xing et al.
HCRMP: An LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving
Zhiwen Chen, Hanming Deng, Zhuoren Li et al.
Hierarchical World Models as Visual Whole-Body Humanoid Controllers
Nick Hansen, Jyothir S V, Vlad Sobal et al.
How Ensembles of Distilled Policies Improve Generalisation in Reinforcement Learning
Max Weltevrede, Moritz Zanger, Matthijs Spaan et al.
Hybrid Latent Reasoning via Reinforcement Learning
Zhenrui Yue, Bowen Jin, Huimin Zeng et al.
HYPRL: Reinforcement Learning of Control Policies for Hyperproperties
Tzu-Han Hsu, Arshia Rafieioskouei, Borzoo Bonakdarpour