🧬Reinforcement Learning

Offline Reinforcement Learning

Learning from fixed datasets

100 papers1,729 total citations
Compare with other topics
Mar '24 Feb '26593 papers
Also includes: offline reinforcement learning, offline rl, batch rl, logged data

Top Papers

#1

Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection

Chengjie Wang, wenbing zhu, Bin-Bin Gao et al.

CVPR 2024
120
citations
#2

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World

Yifei Huang, Guo Chen, Jilan Xu et al.

CVPR 2024
84
citations
#3

OGBench: Benchmarking Offline Goal-Conditioned RL

Seohong Park, Kevin Frans, Benjamin Eysenbach et al.

ICLR 2025arXiv:2410.20092
offline reinforcement learninggoal-conditioned rlbenchmark evaluationoffline gcrl algorithms+3
74
citations
#4

A Decade's Battle on Dataset Bias: Are We There Yet?

Zhuang Liu, Kaiming He

ICLR 2025
52
citations
#5

SubT-MRS Dataset: Pushing SLAM Towards All-weather Environments

Shibo Zhao, Yuanjun Gao, Tianhao Wu et al.

CVPR 2024
49
citations
#6

SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection

JUNSU KIM, Hoseong Cho, Jihyeon Kim et al.

CVPR 2024
47
citations
#7

OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code

Maxence Faldor, Jenny Zhang, Antoine Cully et al.

ICLR 2025
44
citations
#8

Class-Incremental Learning with CLIP: Adaptive Representation Adjustment and Parameter Fusion

Linlan Huang, Xusheng Cao, Haori Lu et al.

ECCV 2024arXiv:2407.14143
class-incremental learningvision-language pre-trainingrepresentation adjustmentparameter fusion+3
41
citations
#9

Towards Robust Event-guided Low-Light Image Enhancement: A Large-Scale Real-World Event-Image Dataset and Novel Approach

Guoqiang Liang, Kanghao Chen, Hangyu Li et al.

CVPR 2024
41
citations
#10

Gradient Reweighting: Towards Imbalanced Class-Incremental Learning

Jiangpeng He

CVPR 2024
39
citations
#11

Provable Offline Preference-Based Reinforcement Learning

Wenhao Zhan, Masatoshi Uehara, Nathan Kallus et al.

ICLR 2024
39
citations
#12

Reasoning Gym: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Zafir Stojanovski, Oliver Stanley, Joe Sharratt et al.

NeurIPS 2025
39
citations
#13

Dataset Distillation by Automatic Training Trajectories

Dai Liu, Jindong Gu, Hu Cao et al.

ECCV 2024
29
citations
#14

Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model

Xiu Yuan, Tongzhou Mu, Stone Tao et al.

ICLR 2025arXiv:2412.13630
imitation learningresidual policyonline refinementrobot learning+3
27
citations
#15

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

Zhiyuan Zhou, Andy Peng, Qiyang Li et al.

ICLR 2025arXiv:2412.07762
reinforcement learning fine-tuningoffline reinforcement learningonline reinforcement learningdistribution mismatch+4
27
citations
#16

Beyond OOD State Actions: Supported Cross-Domain Offline Reinforcement Learning

Jinxin Liu, Ziqi Zhang, Zhenyu Wei et al.

AAAI 2024arXiv:2306.12755
offline reinforcement learningout-of-distribution state actionscross-domain learningtransition dynamics mismatch+4
26
citations
#17

Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages

Guozheng Ma, Lu Li, Sen Zhang et al.

ICLR 2024
25
citations
#18

RLIF: Interactive Imitation Learning as Reinforcement Learning

Jianlan Luo, Perry Dong, Yuexiang Zhai et al.

ICLR 2024
25
citations
#19

A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning

Yinmin Zhang, Jie Liu, Chuming Li et al.

AAAI 2024arXiv:2312.07685
offline reinforcement learningq-value estimationonline finetuningoffline-to-online rl+3
25
citations
#20

Decoupled Spatio-Temporal Consistency Learning for Self-Supervised Tracking

Yaozong Zheng, Bineng Zhong, Qihua Liang et al.

AAAI 2025
24
citations
#21

Non-exemplar Online Class-Incremental Continual Learning via Dual-Prototype Self-Augment and Refinement

Fushuo Huo, Wenchao Xu, Jingcai Guo et al.

AAAI 2024arXiv:2303.10891
class-incremental learningcontinual learningcatastrophic forgettingprototype alignment+4
23
citations
#22

Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning

Yun Qu, Yuhang Jiang, Boyuan Wang et al.

AAAI 2025
23
citations
#23

From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks

Clementine Domine, Nicolas Anguita, Alexandra M Proca et al.

ICLR 2025
22
citations
#24

SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents

Ibragim Badertdinov, Alexander Golubev, Maksim Nekrashevich et al.

NeurIPS 2025
22
citations
#25

Summarizing Stream Data for Memory-Constrained Online Continual Learning

Jianyang Gu, Kai Wang, Wei Jiang et al.

AAAI 2024arXiv:2305.16645
online continual learningreplay-based methodsmemory buffer optimizationknowledge distillation+3
22
citations
#26

Domain Randomization via Entropy Maximization

Gabriele Tiboni, Pascal Klink, Jan Peters et al.

ICLR 2024
20
citations
#27

DiffAIL: Diffusion Adversarial Imitation Learning

Bingzheng Wang, Guoqiang Wu, Teng Pang et al.

AAAI 2024arXiv:2312.06348
imitation learningadversarial imitation learningdiffusion modelsreward function learning+4
20
citations
#28

ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning

Hongyin Zhang, Zifeng Zhuang, Han Zhao et al.

ICML 2025
20
citations
#29

Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards

Zijing Hu, Fengda Zhang, Long Chen et al.

CVPR 2025
19
citations
#30

SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data

Wenkai Fang, Shunyu Liu, Yang Zhou et al.

NeurIPS 2025arXiv:2505.20347
reinforcement learninglarge language modelsself-instruction generationself-rewarding mechanisms+4
19
citations
#31

OVOR: OnePrompt with Virtual Outlier Regularization for Rehearsal-Free Class-Incremental Learning

Wei-Cheng Huang, Chun-Fu Chen, Hsiang Hsu

ICLR 2024
18
citations
#32

Towards Adversarially Robust Dataset Distillation by Curvature Regularization

Eric Xue, Yijiang Li, Haoyang Liu et al.

AAAI 2025
18
citations
#33

Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models

Uladzislau Sobal, Wancong Zhang, Kyunghyun Cho et al.

NeurIPS 2025arXiv:2502.14819
reward-free offline learninglatent dynamics modelsmodel-based planninggoal-conditioned rl+4
18
citations
#34

Temporally and Distributionally Robust Optimization for Cold-Start Recommendation

Xinyu Lin, Wenjie Wang, Jujia Zhao et al.

AAAI 2024arXiv:2312.09901
cold-start recommendationcollaborative filteringtemporal feature shiftsdistributionally robust optimization+2
18
citations
#35

Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds

Hao Liang, Zhiquan Luo

NeurIPS 2025
18
citations
#36

SELF-EVOLVED REWARD LEARNING FOR LLMS

Chenghua Huang, Zhizhen Fan, Lu Wang et al.

ICLR 2025arXiv:2411.00418
reinforcement learning from human feedbackreward model trainingself-evolved learninglanguage model alignment+3
18
citations
#37

Revisiting Adversarial Training Under Long-Tailed Distributions

Xinli Yue, Ningping Mou, Qian Wang et al.

CVPR 2024
17
citations
#38

Stitching Sub-trajectories with Conditional Diffusion Model for Goal-Conditioned Offline RL

Sungyoon Kim, Yunseon Choi, Daiki Matsunaga et al.

AAAI 2024arXiv:2402.07226
offline reinforcement learninggoal-conditioned rlconditional diffusion modelssub-trajectory stitching+4
17
citations
#39

Learning to Optimize Permutation Flow Shop Scheduling via Graph-Based Imitation Learning

Longkang Li, Siyuan Liang, Zihao Zhu et al.

AAAI 2024arXiv:2210.17178
permutation flow shop schedulinggraph-based imitation learningmanufacturing systems optimizationlarge-scale scheduling problems+4
16
citations
#40

An Empirical Study of Autoregressive Pre-training from Videos

Jathushan Rajasegaran, Ilija Radosavovic, Rahul Ravishankar et al.

ICCV 2025
15
citations
#41

Horizon Reduction Makes RL Scalable

Seohong Park, Kevin Frans, Deepinder Mann et al.

NeurIPS 2025
15
citations
#42

Reward-Consistent Dynamics Models are Strongly Generalizable for Offline Reinforcement Learning

Fan-Ming Luo, Tian Xu, Xingchen Cao et al.

ICLR 2024
14
citations
#43

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation

Yunhao Ge, Yihe Tang, Jiashu Xu et al.

CVPR 2024
14
citations
#44

SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning

Xiao Liang, Zhong-Zhi Li, Yeyun Gong et al.

NeurIPS 2025
14
citations
#45

Generalizable Task Representation Learning for Offline Meta-Reinforcement Learning with Data Limitations

Renzhe Zhou, Chen-Xiao Gao, Zongzhang Zhang et al.

AAAI 2024arXiv:2312.15909
offline meta-reinforcement learningtask representation learningdata limitationstask auto-encoder+4
14
citations
#46

ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments

Taewoong Kim, Cheolhong Min, Byeonghwi Kim et al.

ECCV 2024
13
citations
#47

Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards

Charles Arnal, Gaëtan Narozniak, Vivien Cabannes et al.

NeurIPS 2025
13
citations
#48

Coreset Selection via Reducible Loss in Continual Learning

Ruilin Tong, Yuhang Liu, Javen Qinfeng Shi et al.

ICLR 2025
coreset selectioncontinual learningrehearsal memorybilevel optimization+3
12
citations
#49

VTDexManip: A Dataset and Benchmark for Visual-tactile Pretraining and Dexterous Manipulation with Reinforcement Learning

Qingtao Liu, Yu Cui, Zhengnan Sun et al.

ICLR 2025
vision-tactile datasetdexterous manipulationreinforcement learningmultimodal pretraining+2
11
citations
#50

Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation

Zhuohang Dang, Minnan Luo, Chengyou Jia et al.

AAAI 2024arXiv:2312.16478
noisy correspondence learningcross-modal retrievalenergy uncertaintyhard negatives+3
11
citations
#51

XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning

Alexander Nikulin, Ilya Zisman, Alexey Zemtsov et al.

ICLR 2025
11
citations
#52

Measuring memorization in RLHF for code completion

Jamie Hayes, I Shumailov, Billy Porter et al.

ICLR 2025arXiv:2406.11715
rlhf alignment processtraining data memorizationdirect preference optimizationpreference learning methods+4
10
citations
#53

Open-World Reinforcement Learning over Long Short-Term Imagination

Jiajian Li, Qi Wang, Yunbo Wang et al.

ICLR 2025
10
citations
#54

Budgeted Online Continual Learning by Adaptive Layer Freezing and Frequency-based Sampling

Minhyuk Seo, Hyunseo Koh, Jonghyun Choi

ICLR 2025
10
citations
#55

MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL

Claas Voelcker, Marcel Hussing, ERIC EATON et al.

ICLR 2025
10
citations
#56

Pareto Set Learning for Multi-Objective Reinforcement Learning

Erlong Liu, Yu-Chang Wu, Xiaobin Huang et al.

AAAI 2025
10
citations
#57

General Scene Adaptation for Vision-and-Language Navigation

Haodong Hong, Yanyuan Qiao, Sen Wang et al.

ICLR 2025arXiv:2501.17403
vision-and-language navigationscene adaptationinstruction orchestrationout-of-distribution generalization+4
10
citations
#58

Learning from Sparse Offline Datasets via Conservative Density Estimation

Zhepeng Cen, Zuxin Liu, Zitong Wang et al.

ICLR 2024
10
citations
#59

Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics

Dongyoung Kim, Huiwon Jang, Sumin Park et al.

NeurIPS 2025arXiv:2506.00070
reinforcement learningembodied reasoningrobot controlvision-language models+4
9
citations
#60

Improved Active Learning via Dependent Leverage Score Sampling

Atsushi Shimizu, Xiaoou Cheng, Christopher Musco et al.

ICLR 2024
9
citations
#61

BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation

Yulu Pan, Ce Zhang, Gedas Bertasius

CVPR 2025
9
citations
#62

Fast and Robust: Task Sampling with Posterior and Diversity Synergies for Adaptive Decision-Makers in Randomized Environments

Yun Qu, Cheems Wang, Yixiu Mao et al.

ICML 2025
8
citations
#63

Offline-to-Online Hyperparameter Transfer for Stochastic Bandits

Dravyansh Sharma, Arun Suggala

AAAI 2025
8
citations
#64

LoRID: Low-Rank Iterative Diffusion for Adversarial Purification

Geigh Zollicoffer, Minh N. Vu, Ben Nebgen et al.

AAAI 2025
8
citations
#65

Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization

Daniel Palenicek, Florian Vogt, Joe Watson et al.

NeurIPS 2025
8
citations
#66

Offline Multi-Agent Reinforcement Learning via In-Sample Sequential Policy Optimization

Zongkai Liu, Qian Lin, Chao Yu et al.

AAAI 2025
8
citations
#67

WHAT MAKES MATH PROBLEMS HARD FOR REINFORCEMENT LEARNING: A CASE STUDY

Ali Shehper, Anibal Medina-Mardones, Lucas Fagan et al.

NeurIPS 2025
7
citations
#68

Finite-Sample Analysis of Policy Evaluation for Robust Average Reward Reinforcement Learning

Yang Xu, Washim Mondal, Vaneet Aggarwal

NeurIPS 2025
7
citations
#69

CAPrompt: Cyclic Prompt Aggregation for Pre-Trained Model Based Class Incremental Learning

Qiwei Li, Jiahuan Zhou

AAAI 2025
7
citations
#70

Regret Analysis of Repeated Delegated Choice

Suho Shin, Keivan Rezaei, Mohammad Hajiaghayi et al.

AAAI 2024arXiv:2310.04884
repeated delegated choiceonline learning variantregret analysisstrategic agent behavior+4
7
citations
#71

Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory

Wenliang Zhong, Haoyu Tang, Qinghai Zheng et al.

CVPR 2025
7
citations
#72

DrS: Learning Reusable Dense Rewards for Multi-Stage Tasks

Tongzhou Mu, Minghua Liu, Hao Su

ICLR 2024
7
citations
#73

Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning

Hung Le, Dung Nguyen, Kien Do et al.

ICLR 2025arXiv:2410.10132
memory-augmented agentspartially observable environmentsreinforcement learninghadamard product+4
6
citations
#74

Robust Offline Reinforcement Learning with Linearly Structured $f$-Divergence Regularization

Cheng Tang, Zhishuai Liu, Pan Xu

ICML 2025
6
citations
#75

No-Regret is not enough! Bandits with General Constraints through Adaptive Regret Minimization

Martino Bernasconi, Matteo Castiglioni, Andrea Celli

ICML 2025
6
citations
#76

Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching

Arnav Kumar Jain, Harley Wiltzer, Jesse Farebrother et al.

ICLR 2025arXiv:2411.07007
inverse reinforcement learningsuccessor feature matchingpolicy gradient descentstate-only imitation+4
6
citations
#77

TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs

Yunheng Li, Jing Cheng, Shaoyong Jia et al.

NeurIPS 2025
6
citations
#78

CL-LoRA: Continual Low-Rank Adaptation for Rehearsal-Free Class-Incremental Learning

Jiangpeng He, Zhihao Duan, Fengqing Zhu

CVPR 2025
6
citations
#79

Dual-Enhanced Coreset Selection with Class-wise Collaboration for Online Blurry Class Incremental Learning

Yutian Luo, Shiqi Zhao, Haoran Wu et al.

CVPR 2024
6
citations
#80

The Curse of Diversity in Ensemble-Based Exploration

Zhixuan Lin, Pierluca D'Oro, Evgenii Nikishin et al.

ICLR 2024
6
citations
#81

Revisiting a Design Choice in Gradient Temporal Difference Learning

Xiaochi Qian, Shangtong Zhang

ICLR 2025
6
citations
#82

Inverse Reinforcement Learning by Estimating Expertise of Demonstrators

Mark Beliaev, Ramtin Pedarsani

AAAI 2025
6
citations
#83

DualCP: Rehearsal-Free Domain-Incremental Learning via Dual-Level Concept Prototype

Qiang Wang, Yuhang He, Songlin Dong et al.

AAAI 2025
6
citations
#84

Alice Benchmarks: Connecting Real World Re-Identification with the Synthetic

Xiaoxiao Sun, Yue Yao, Shengjin Wang et al.

ICLR 2024
6
citations
#85

Model-Free Offline Reinforcement Learning with Enhanced Robustness

Chi Zhang, Zain Ulabedeen Farhat, George Atia et al.

ICLR 2025
offline reinforcement learningmodel-free algorithmsrobustness enhancementdouble-pessimism principle+3
5
citations
#86

Model-based Offline Reinforcement Learning with Lower Expectile Q-Learning

Kwanyoung Park, Youngwoon Lee

ICLR 2025
5
citations
#87

Autonomous Goal Detection and Cessation in Reinforcement Learning: A Case Study on Source Term Estimation

Yiwei Shi, Muning Wen, Qi Zhang et al.

AAAI 2025
5
citations
#88

Detect Any Mirrors: Boosting Learning Reliability on Large-Scale Unlabeled Data with an Iterative Data Engine

Zhaohu Xing, Lihao Liu, Yijun Yang et al.

CVPR 2025
5
citations
#89

Advancing Prompt-Based Methods for Replay-Independent General Continual Learning

Zhiqi KANG, Liyuan Wang, Xingxing Zhang et al.

ICLR 2025
5
citations
#90

Real-Time Recurrent Reinforcement Learning

Julian Lemmel, Radu Grosu

AAAI 2025
5
citations
#91

KAC: Kolmogorov-Arnold Classifier for Continual Learning

Yusong Hu, Zichen Liang, Fei Yang et al.

CVPR 2025
5
citations
#92

Graph Assisted Offline-Online Deep Reinforcement Learning for Dynamic Workflow Scheduling

Yifan Yang, Gang Chen, Hui Ma et al.

ICLR 2025
5
citations
#93

RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments

Haisheng Su, Feixiang Song, CONG MA et al.

CVPR 2025arXiv:2408.15503
egocentric robot perceptionautonomous navigationnear-field scene understandingmultimodal dataset+4
5
citations
#94

Learning Graph Invariance by Harnessing Spuriosity

Tianjun Yao, Yongqiang Chen, Kai Hu et al.

ICLR 2025
graph invariant learningout-of-distribution generalizationgraph representation learninginvariant risk minimization+1
5
citations
#95

Efficient Multi-agent Offline Coordination via Diffusion-based Trajectory Stitching

Lei Yuan, Yuqi Bian, Lihe Li et al.

ICLR 2025
multi-agent reinforcement learningoffline coordinationdiffusion modelstrajectory stitching+3
5
citations
#96

Revisit Event Generation Model: Self-Supervised Learning of Event-to-Video Reconstruction with Implicit Neural Representations

Zipeng Wang, yunfan lu, LIN WANG

ECCV 2024
5
citations
#97

AutoData: A Multi-Agent System for Open Web Data Collection

Tianyi Ma, Yiyue Qian, Zheyuan Zhang et al.

NeurIPS 2025
4
citations
#98

Dataset Ownership Verification in Contrastive Pre-trained Models

Yuechen Xie, Jie Song, Mengqi Xue et al.

ICLR 2025
4
citations
#99

Tackling Data Corruption in Offline Reinforcement Learning via Sequence Modeling

Jiawei Xu, Rui Yang, Shuang Qiu et al.

ICLR 2025
4
citations
#100

Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning

Wesley Suttle, Aamodh Suresh, Carlos Nieto-Granda

ICLR 2025
4
citations