"offline reinforcement learning" Papers
56 papers found • Page 1 of 2
ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning
Zeyuan Liu, Zhihe Yang, Jiawei Xu et al.
Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
Hyungkyu Kang, Min-hwan Oh
Energy-Weighted Flow Matching for Offline Reinforcement Learning
Shiyuan Zhang, Weitong Zhang, Quanquan Gu
Fewer May Be Better: Enhancing Offline Reinforcement Learning with Reduced Dataset
Yiqin Yang, Quanwei Wang, Chenghao Li et al.
Model-Free Offline Reinforcement Learning with Enhanced Robustness
Chi Zhang, Zain Ulabedeen Farhat, George Atia et al.
MOSDT: Self-Distillation-Based Decision Transformer for Multi-Agent Offline Safe Reinforcement Learning
Yuchen Xia, Yunjian Xu
OGBench: Benchmarking Offline Goal-Conditioned RL
Seohong Park, Kevin Frans, Benjamin Eysenbach et al.
RLZero: Direct Policy Inference from Language Without In-Domain Supervision
Harshit Sushil Sikchi, Siddhant Agarwal, Pranaya Jajoo et al.
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Jie Cheng, Ruixi Qiao, ma yingwei et al.
Semantic Temporal Abstraction via Vision-Language Model Guidance for Efficient Reinforcement Learning
Tian-Shuo Liu, Xu-Hui Liu, Ruifeng Chen et al.
Value-aligned Behavior Cloning for Offline Reinforcement Learning via Bi-level Optimization
Xingyu Jiang, Ning Gao, Xiuhui Zhang et al.
Adaptive Advantage-Guided Policy Regularization for Offline Reinforcement Learning
Tenglong Liu, Yang Li, Yixing Lan et al.
A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
Yinmin Zhang, Jie Liu, Chuming Li et al.
A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs
Kihyuk Hong, Ambuj Tewari
ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories
Qianlan Yang, Yu-Xiong Wang
Bayesian Design Principles for Offline-to-Online Reinforcement Learning
Hao Hu, yiqin yang, Jianing Ye et al.
Beyond OOD State Actions: Supported Cross-Domain Offline Reinforcement Learning
Jinxin Liu, Ziqi Zhang, Zhenyu Wei et al.
Causal Action Influence Aware Counterfactual Data Augmentation
Núria Armengol Urpí, Marco Bagatella, Marin Vlastelica et al.
Confidence Aware Inverse Constrained Reinforcement Learning
Sriram Ganapathi Subramanian, Guiliang Liu, Mohammed Elmahgiubi et al.
Contrastive Representation for Data Filtering in Cross-Domain Offline Reinforcement Learning
Xiaoyu Wen, Chenjia Bai, Kang Xu et al.
CUDC: A Curiosity-Driven Unsupervised Data Collection Method with Adaptive Temporal Distances for Offline Reinforcement Learning
Chenyu Sun, Hangwei Qian, Chunyan Miao
Debiased Offline Representation Learning for Fast Online Adaptation in Non-stationary Dynamics
Xinyu Zhang, Wenjie Qiu, Yi-Chen Li et al.
DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching
Guanghe Li, Yixiang Shan, Zhengbang Zhu et al.
Discovering Multiple Solutions from a Single Task in Offline Reinforcement Learning
Takayuki Osa, Tatsuya Harada
Efficient Policy Evaluation with Offline Data Informed Behavior Policy Design
Shuze Liu, Shangtong Zhang
Enhancing Value Function Estimation through First-Order State-Action Dynamics in Offline Reinforcement Learning
Yun-Hsuan Lien, Ping-Chun Hsieh, Tzu-Mao Li et al.
Exploration and Anti-Exploration with Distributional Random Network Distillation
Kai Yang, jian tao, Jiafei Lyu et al.
Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices
Jiin Woo, Laixi Shi, Gauri Joshi et al.
HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning
Shengchao Hu, Ziqing Fan, Li Shen et al.
Improving Generalization in Offline Reinforcement Learning via Adversarial Data Splitting
Da Wang, Lin Li, Wei Wei et al.
In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought
sili huang, Jifeng Hu, Hechang Chen et al.
Inferring the Long-Term Causal Effects of Long-Term Treatments from Short-Term Experiments
Allen Tran, Aurelien Bibaut, Nathan Kallus
Information-Directed Pessimism for Offline Reinforcement Learning
Alec Koppel, Sujay Bhatt, Jiacheng Guo et al.
Is Inverse Reinforcement Learning Harder than Standard Reinforcement Learning? A Theoretical Perspective
Lei Zhao, Mengdi Wang, Yu Bai
Learning a Diffusion Model Policy from Rewards via Q-Score Matching
Michael Psenka, Alejandro Escontrela, Pieter Abbeel et al.
Listwise Reward Estimation for Offline Preference-based Reinforcement Learning
Heewoong Choi, Sangwon Jung, Hongjoon Ahn et al.
Model-based Reinforcement Learning for Confounded POMDPs
Mao Hong, Zhengling Qi, Yanxun Xu
Model-Free Robust $\phi$-Divergence Reinforcement Learning Using Both Offline and Online Data
Kishan Panaganti, Adam Wierman, Eric Mazumdar
More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning
Kaiwen Wang, Owen Oertell, Alekh Agarwal et al.
Neural Network Approximation for Pessimistic Offline Reinforcement Learning
Di Wu, Yuling Jiao, Li Shen et al.
Offline Actor-Critic Reinforcement Learning Scales to Large Models
Jost Tobias Springenberg, Abbas Abdolmaleki, Jingwei Zhang et al.
Offline Transition Modeling via Contrastive Energy Learning
Ruifeng Chen, Chengxing Jia, Zefang Huang et al.
Optimistic Model Rollouts for Pessimistic Offline Policy Optimization
Yuanzhao Zhai, Yiying Li, Zijian Gao et al.
Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning
Dake Zhang, Boxiang Lyu, Shuang Qiu et al.
PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer
Chang Chen, Junyeob Baek, Fei Deng et al.
Position: Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination
Zhiyao Luo, Yangchen Pan, Peter Watkinson et al.
Q-value Regularized Transformer for Offline Reinforcement Learning
Shengchao Hu, Ziqing Fan, Chaoqin Huang et al.
ReDiffuser: Reliable Decision-Making Using a Diffuser with Confidence Estimation
Nantian He, Shaohui Li, Zhi Li et al.
Reinforcement Learning and Data
Generation for Syntax-Guided Synthesis
Reinformer: Max-Return Sequence Modeling for Offline RL
Zifeng Zhuang, Dengyun Peng, Jinxin Liu et al.