Offline Reinforcement Learning

ECCV 2024arXiv:2407.14143

#4

A Decade's Battle on Dataset Bias: Are We There Yet?

Zhuang Liu, Kaiming He

SubT-MRS Dataset: Pushing SLAM Towards All-weather Environments

Shibo Zhao, Yuanjun Gao, Tianhao Wu et al.

SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection

JUNSU KIM, Hoseong Cho, Jihyeon Kim et al.

OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code

Maxence Faldor, Jenny Zhang, Antoine Cully et al.

Class-Incremental Learning with CLIP: Adaptive Representation Adjustment and Parameter Fusion

Linlan Huang, Xusheng Cao, Haori Lu et al.

class-incremental learningvision-language pre-trainingrepresentation adjustmentparameter fusion+3

41

ICLR 2025arXiv:2412.13630

#9

Towards Robust Event-guided Low-Light Image Enhancement: A Large-Scale Real-World Event-Image Dataset and Novel Approach

Guoqiang Liang, Kanghao Chen, Hangyu Li et al.

Gradient Reweighting: Towards Imbalanced Class-Incremental Learning

Jiangpeng He

Provable Offline Preference-Based Reinforcement Learning

Wenhao Zhan, Masatoshi Uehara, Nathan Kallus et al.

Reasoning Gym: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Zafir Stojanovski, Oliver Stanley, Joe Sharratt et al.

Dataset Distillation by Automatic Training Trajectories

Dai Liu, Jindong Gu, Hu Cao et al.

Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model

Xiu Yuan, Tongzhou Mu, Stone Tao et al.

imitation learningresidual policyonline refinementrobot learning+3

27

ICLR 2025arXiv:2412.07762

#15

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

Zhiyuan Zhou, Andy Peng, Qiyang Li et al.

reinforcement learning fine-tuningoffline reinforcement learningonline reinforcement learningdistribution mismatch+4

27

AAAI 2024arXiv:2306.12755

#16

Beyond OOD State Actions: Supported Cross-Domain Offline Reinforcement Learning

Jinxin Liu, Ziqi Zhang, Zhenyu Wei et al.

offline reinforcement learningout-of-distribution state actionscross-domain learningtransition dynamics mismatch+4

26

AAAI 2024arXiv:2312.07685

#17

Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages

Guozheng Ma, Lu Li, Sen Zhang et al.

RLIF: Interactive Imitation Learning as Reinforcement Learning

Jianlan Luo, Perry Dong, Yuexiang Zhai et al.

A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning

Yinmin Zhang, Jie Liu, Chuming Li et al.

offline reinforcement learningq-value estimationonline finetuningoffline-to-online rl+3

25

AAAI 2024arXiv:2303.10891

#20

Decoupled Spatio-Temporal Consistency Learning for Self-Supervised Tracking

Yaozong Zheng, Bineng Zhong, Qihua Liang et al.

Non-exemplar Online Class-Incremental Continual Learning via Dual-Prototype Self-Augment and Refinement

Fushuo Huo, Wenchao Xu, Jingcai Guo et al.

class-incremental learningcontinual learningcatastrophic forgettingprototype alignment+4

23

AAAI 2024arXiv:2305.16645

#22

Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning

Yun Qu, Yuhang Jiang, Boyuan Wang et al.

From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks

Clementine Domine, Nicolas Anguita, Alexandra M Proca et al.

SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents

Ibragim Badertdinov, Alexander Golubev, Maksim Nekrashevich et al.

Summarizing Stream Data for Memory-Constrained Online Continual Learning

Jianyang Gu, Kai Wang, Wei Jiang et al.

online continual learningreplay-based methodsmemory buffer optimizationknowledge distillation+3

22

AAAI 2024arXiv:2312.06348

#26

Domain Randomization via Entropy Maximization

Gabriele Tiboni, Pascal Klink, Jan Peters et al.

DiffAIL: Diffusion Adversarial Imitation Learning

Bingzheng Wang, Guoqiang Wu, Teng Pang et al.

imitation learningadversarial imitation learningdiffusion modelsreward function learning+4

20

NeurIPS 2025arXiv:2505.20347

#28

ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning

Hongyin Zhang, Zifeng Zhuang, Han Zhao et al.

Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards

Zijing Hu, Fengda Zhang, Long Chen et al.

SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data

Wenkai Fang, Shunyu Liu, Yang Zhou et al.

reinforcement learninglarge language modelsself-instruction generationself-rewarding mechanisms+4

19

NeurIPS 2025arXiv:2502.14819

#31

OVOR: OnePrompt with Virtual Outlier Regularization for Rehearsal-Free Class-Incremental Learning

Wei-Cheng Huang, Chun-Fu Chen, Hsiang Hsu

Towards Adversarially Robust Dataset Distillation by Curvature Regularization

Eric Xue, Yijiang Li, Haoyang Liu et al.

Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models

Uladzislau Sobal, Wancong Zhang, Kyunghyun Cho et al.

reward-free offline learninglatent dynamics modelsmodel-based planninggoal-conditioned rl+4

18

AAAI 2024arXiv:2312.09901

#34

Temporally and Distributionally Robust Optimization for Cold-Start Recommendation

Xinyu Lin, Wenjie Wang, Jujia Zhao et al.

cold-start recommendationcollaborative filteringtemporal feature shiftsdistributionally robust optimization+2

18

ICLR 2025arXiv:2411.00418

#35

Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds

Hao Liang, Zhiquan Luo

SELF-EVOLVED REWARD LEARNING FOR LLMS

Chenghua Huang, Zhizhen Fan, Lu Wang et al.

reinforcement learning from human feedbackreward model trainingself-evolved learninglanguage model alignment+3

18

AAAI 2024arXiv:2402.07226

#37

Revisiting Adversarial Training Under Long-Tailed Distributions

Xinli Yue, Ningping Mou, Qian Wang et al.

Stitching Sub-trajectories with Conditional Diffusion Model for Goal-Conditioned Offline RL

Sungyoon Kim, Yunseon Choi, Daiki Matsunaga et al.

offline reinforcement learninggoal-conditioned rlconditional diffusion modelssub-trajectory stitching+4

17

AAAI 2024arXiv:2210.17178

#39

Learning to Optimize Permutation Flow Shop Scheduling via Graph-Based Imitation Learning

Longkang Li, Siyuan Liang, Zihao Zhu et al.

permutation flow shop schedulinggraph-based imitation learningmanufacturing systems optimizationlarge-scale scheduling problems+4

16

AAAI 2024arXiv:2312.15909

#40

An Empirical Study of Autoregressive Pre-training from Videos

Jathushan Rajasegaran, Ilija Radosavovic, Rahul Ravishankar et al.

Horizon Reduction Makes RL Scalable

Seohong Park, Kevin Frans, Deepinder Mann et al.

Reward-Consistent Dynamics Models are Strongly Generalizable for Offline Reinforcement Learning

Fan-Ming Luo, Tian Xu, Xingchen Cao et al.

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation

Yunhao Ge, Yihe Tang, Jiashu Xu et al.

SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning

Xiao Liang, Zhong-Zhi Li, Yeyun Gong et al.

Generalizable Task Representation Learning for Offline Meta-Reinforcement Learning with Data Limitations

Renzhe Zhou, Chen-Xiao Gao, Zongzhang Zhang et al.

offline meta-reinforcement learningtask representation learningdata limitationstask auto-encoder+4

14

#46

ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments

Taewoong Kim, Cheolhong Min, Byeonghwi Kim et al.

Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards

Charles Arnal, Gaëtan Narozniak, Vivien Cabannes et al.

Coreset Selection via Reducible Loss in Continual Learning

Ruilin Tong, Yuhang Liu, Javen Qinfeng Shi et al.

coreset selectioncontinual learningrehearsal memorybilevel optimization+3

12

#49

VTDexManip: A Dataset and Benchmark for Visual-tactile Pretraining and Dexterous Manipulation with Reinforcement Learning

Qingtao Liu, Yu Cui, Zhengnan Sun et al.

vision-tactile datasetdexterous manipulationreinforcement learningmultimodal pretraining+2

11

AAAI 2024arXiv:2312.16478

#50

Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation

Zhuohang Dang, Minnan Luo, Chengyou Jia et al.

noisy correspondence learningcross-modal retrievalenergy uncertaintyhard negatives+3

11

ICLR 2025arXiv:2406.11715

#51

XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning

Alexander Nikulin, Ilya Zisman, Alexey Zemtsov et al.

Measuring memorization in RLHF for code completion

Jamie Hayes, I Shumailov, Billy Porter et al.

rlhf alignment processtraining data memorizationdirect preference optimizationpreference learning methods+4

10

ICLR 2025arXiv:2501.17403

#53

Open-World Reinforcement Learning over Long Short-Term Imagination

Jiajian Li, Qi Wang, Yunbo Wang et al.

Budgeted Online Continual Learning by Adaptive Layer Freezing and Frequency-based Sampling

Minhyuk Seo, Hyunseo Koh, Jonghyun Choi

MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL

Claas Voelcker, Marcel Hussing, ERIC EATON et al.

Pareto Set Learning for Multi-Objective Reinforcement Learning

Erlong Liu, Yu-Chang Wu, Xiaobin Huang et al.

General Scene Adaptation for Vision-and-Language Navigation

Haodong Hong, Yanyuan Qiao, Sen Wang et al.

vision-and-language navigationscene adaptationinstruction orchestrationout-of-distribution generalization+4

10

NeurIPS 2025arXiv:2506.00070

#58

Learning from Sparse Offline Datasets via Conservative Density Estimation

Zhepeng Cen, Zuxin Liu, Zitong Wang et al.

Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics

Dongyoung Kim, Huiwon Jang, Sumin Park et al.

reinforcement learningembodied reasoningrobot controlvision-language models+4

9

AAAI 2024arXiv:2310.04884

#60

Improved Active Learning via Dependent Leverage Score Sampling

Atsushi Shimizu, Xiaoou Cheng, Christopher Musco et al.

BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation

Yulu Pan, Ce Zhang, Gedas Bertasius

Fast and Robust: Task Sampling with Posterior and Diversity Synergies for Adaptive Decision-Makers in Randomized Environments

Yun Qu, Cheems Wang, Yixiu Mao et al.

Offline-to-Online Hyperparameter Transfer for Stochastic Bandits

Dravyansh Sharma, Arun Suggala

LoRID: Low-Rank Iterative Diffusion for Adversarial Purification

Geigh Zollicoffer, Minh N. Vu, Ben Nebgen et al.

Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization

Daniel Palenicek, Florian Vogt, Joe Watson et al.

Offline Multi-Agent Reinforcement Learning via In-Sample Sequential Policy Optimization

Zongkai Liu, Qian Lin, Chao Yu et al.

WHAT MAKES MATH PROBLEMS HARD FOR REINFORCEMENT LEARNING: A CASE STUDY

Ali Shehper, Anibal Medina-Mardones, Lucas Fagan et al.

Finite-Sample Analysis of Policy Evaluation for Robust Average Reward Reinforcement Learning

Yang Xu, Washim Mondal, Vaneet Aggarwal

CAPrompt: Cyclic Prompt Aggregation for Pre-Trained Model Based Class Incremental Learning

Qiwei Li, Jiahuan Zhou

Regret Analysis of Repeated Delegated Choice

Suho Shin, Keivan Rezaei, Mohammad Hajiaghayi et al.

repeated delegated choiceonline learning variantregret analysisstrategic agent behavior+4

7

ICLR 2025arXiv:2410.10132

#71

Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory

Wenliang Zhong, Haoyu Tang, Qinghai Zheng et al.

DrS: Learning Reusable Dense Rewards for Multi-Stage Tasks

Tongzhou Mu, Minghua Liu, Hao Su

Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning

Hung Le, Dung Nguyen, Kien Do et al.

memory-augmented agentspartially observable environmentsreinforcement learninghadamard product+4

6

ICLR 2025arXiv:2411.07007

#74

Robust Offline Reinforcement Learning with Linearly Structured $f$-Divergence Regularization

Cheng Tang, Zhishuai Liu, Pan Xu

No-Regret is not enough! Bandits with General Constraints through Adaptive Regret Minimization

Martino Bernasconi, Matteo Castiglioni, Andrea Celli

Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching

Arnav Kumar Jain, Harley Wiltzer, Jesse Farebrother et al.

inverse reinforcement learningsuccessor feature matchingpolicy gradient descentstate-only imitation+4

6

#77

TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs

Yunheng Li, Jing Cheng, Shaoyong Jia et al.

CL-LoRA: Continual Low-Rank Adaptation for Rehearsal-Free Class-Incremental Learning

Jiangpeng He, Zhihao Duan, Fengqing Zhu

Dual-Enhanced Coreset Selection with Class-wise Collaboration for Online Blurry Class Incremental Learning

Yutian Luo, Shiqi Zhao, Haoran Wu et al.

The Curse of Diversity in Ensemble-Based Exploration

Zhixuan Lin, Pierluca D'Oro, Evgenii Nikishin et al.

Revisiting a Design Choice in Gradient Temporal Difference Learning

Xiaochi Qian, Shangtong Zhang

Inverse Reinforcement Learning by Estimating Expertise of Demonstrators

Mark Beliaev, Ramtin Pedarsani

DualCP: Rehearsal-Free Domain-Incremental Learning via Dual-Level Concept Prototype

Qiang Wang, Yuhang He, Songlin Dong et al.

Alice Benchmarks: Connecting Real World Re-Identification with the Synthetic

Xiaoxiao Sun, Yue Yao, Shengjin Wang et al.

Model-Free Offline Reinforcement Learning with Enhanced Robustness

Chi Zhang, Zain Ulabedeen Farhat, George Atia et al.

offline reinforcement learningmodel-free algorithmsrobustness enhancementdouble-pessimism principle+3

CVPR 2025arXiv:2408.15503

#86

Model-based Offline Reinforcement Learning with Lower Expectile Q-Learning

Kwanyoung Park, Youngwoon Lee

Autonomous Goal Detection and Cessation in Reinforcement Learning: A Case Study on Source Term Estimation

Yiwei Shi, Muning Wen, Qi Zhang et al.

Detect Any Mirrors: Boosting Learning Reliability on Large-Scale Unlabeled Data with an Iterative Data Engine

Zhaohu Xing, Lihao Liu, Yijun Yang et al.

Advancing Prompt-Based Methods for Replay-Independent General Continual Learning

Zhiqi KANG, Liyuan Wang, Xingxing Zhang et al.

Real-Time Recurrent Reinforcement Learning

Julian Lemmel, Radu Grosu

KAC: Kolmogorov-Arnold Classifier for Continual Learning

Yusong Hu, Zichen Liang, Fei Yang et al.

Graph Assisted Offline-Online Deep Reinforcement Learning for Dynamic Workflow Scheduling

Yifan Yang, Gang Chen, Hui Ma et al.

RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments

Haisheng Su, Feixiang Song, CONG MA et al.

egocentric robot perceptionautonomous navigationnear-field scene understandingmultimodal dataset+4

#94

Learning Graph Invariance by Harnessing Spuriosity

Tianjun Yao, Yongqiang Chen, Kai Hu et al.

graph invariant learningout-of-distribution generalizationgraph representation learninginvariant risk minimization+1

#95

Efficient Multi-agent Offline Coordination via Diffusion-based Trajectory Stitching

Lei Yuan, Yuqi Bian, Lihe Li et al.

multi-agent reinforcement learningoffline coordinationdiffusion modelstrajectory stitching+3

#96

Revisit Event Generation Model: Self-Supervised Learning of Event-to-Video Reconstruction with Implicit Neural Representations

Zipeng Wang, yunfan lu, LIN WANG

AutoData: A Multi-Agent System for Open Web Data Collection

Tianyi Ma, Yiyue Qian, Zheyuan Zhang et al.

Dataset Ownership Verification in Contrastive Pre-trained Models

Yuechen Xie, Jie Song, Mengqi Xue et al.

Tackling Data Corruption in Offline Reinforcement Learning via Sequence Modeling

Jiawei Xu, Rui Yang, Shuang Qiu et al.

Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning

Wesley Suttle, Aamodh Suresh, Carlos Nieto-Granda

4