Paper "reinforcement learning" Papers

39 papers found

BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning

Artem Zholus, Maksim Kuznetsov, Roman Schutski et al.

AAAI 2025paperarXiv:2406.03686
15
citations

Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining

Rosie Zhao, Alexandru Meterez, Sham M. Kakade et al.

COLM 2025paperarXiv:2504.07912
87
citations

Efficient Reinforcement Learning in Probabilistic Reward Machines

Xiaofeng Lin, Xuezhou Zhang

AAAI 2025paperarXiv:2408.10381
2
citations

Efficient Reinforcement Learning Through Adaptively Pretrained Visual Encoder

Yuhan Zhang, Guoqing Ma, Guangfu Hao et al.

AAAI 2025paperarXiv:2502.05555
1
citations

Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data

Shilong Deng, Zetao Zheng, Hongcai He et al.

AAAI 2025paperarXiv:2501.07346

Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback

Runlong Zhou, Maryam Fazel, Simon Shaolei Du

COLM 2025paperarXiv:2503.08942
13
citations

FFCG: Effective and Fast Family Column Generation for Solving Large-Scale Linear Program

Yi-Xiang Hu, Feng Wu, Shaoang Li et al.

AAAI 2025paperarXiv:2412.19066

GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL

Lang Qin, Ziming Wang, Runhao Jiang et al.

AAAI 2025paperarXiv:2404.15597
3
citations

Intelligent OPC Engineer Assistant for Semiconductor Manufacturing

Guojin Chen, Haoyu Yang, Bei Yu et al.

AAAI 2025paperarXiv:2408.12775
2
citations

Learning to Reason for Long-Form Story Generation

Alexander Gurung, Mirella Lapata

COLM 2025paper
19
citations

MALT: Improving Reasoning with Multi-Agent LLM Training

Sumeet Ramesh Motwani, Chandler Smith, Rocktim Jyoti Das et al.

COLM 2025paperarXiv:2412.01928
37
citations

Noise-Resilient Symbolic Regression with Dynamic Gating Reinforcement Learning

Chenglu Sun, Shuo Shen, Wenzhi Tao et al.

AAAI 2025paperarXiv:2501.01085
5
citations

Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human Feedback

Johannes Ackermann, Takashi Ishida, Masashi Sugiyama

COLM 2025paperarXiv:2507.15507

On Shallow Planning Under Partial Observability

Randy Lefebvre, Audrey Durand

AAAI 2025paperarXiv:2407.15820
2
citations

REM: A Scalable Reinforced Multi-Expert Framework for Multiplex Influence Maximization

Huyen Nguyen, Hieu Dam, Nguyen Hoang Khoi Do et al.

AAAI 2025paperarXiv:2501.00779
1
citations

RRO: LLM Agent Optimization Through Rising Reward Trajectories

Zilong Wang, Jingfeng Yang, Sreyashi Nag et al.

COLM 2025paperarXiv:2505.20737
1
citations

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Bowen Jin, Hansi Zeng, Zhenrui Yue et al.

COLM 2025paperarXiv:2503.09516
694
citations

SORREL: Suboptimal-Demonstration-Guided Reinforcement Learning for Learning to Branch

Shengyu Feng, Yiming Yang

AAAI 2025paperarXiv:2412.15534
5
citations

Teaching Models to Improve on Tape

Liat Bezalel, Eyal Orgad, Amir Globerson

AAAI 2025paperarXiv:2411.01483

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

Nathan Lambert, Jacob Morrison, Valentina Pyatkin et al.

COLM 2025paperarXiv:2411.15124
494
citations

Universal Post-Processing Networks for Joint Optimization of Modules in Task-Oriented Dialogue Systems

Atsumoto Ohashi, Ryuichiro Higashinaka

AAAI 2025paperarXiv:2502.00747

Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors

Fan Nie, Lan Feng, Haotian Ye et al.

COLM 2025paperarXiv:2504.04785
11
citations

Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning

Zizhao Wang, Caroline Wang, Xuesu Xiao et al.

AAAI 2024paperarXiv:2401.12497
9
citations

ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference

Ziqian Zeng, Yihuai Hong, Hongliang Dai et al.

AAAI 2024paperarXiv:2312.11882
17
citations

DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization

Wenze Chen, Shiyu Huang, Yuan Chiang et al.

AAAI 2024paperarXiv:2207.05631
9
citations

DiffAIL: Diffusion Adversarial Imitation Learning

Bingzheng Wang, Guoqiang Wu, Teng Pang et al.

AAAI 2024paperarXiv:2312.06348
22
citations

Discerning Temporal Difference Learning

Jianfei Ma

AAAI 2024paperarXiv:2310.08091
1
citations

Dynamic Knowledge Injection for AIXI Agents

Samuel Yang-Zhao, Kee Siong Ng, Marcus Hutter

AAAI 2024paperarXiv:2312.16184

Episodic Return Decomposition by Difference of Implicitly Assigned Sub-trajectory Reward

Haoxin Lin, Hongqiu Wu, Jiaji Zhang et al.

AAAI 2024paperarXiv:2312.10642
3
citations

Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations

Zilin Wang, Haolin Zhuang, Lu Li et al.

AAAI 2024paperarXiv:2312.11442
5
citations

Learning Diverse Risk Preferences in Population-Based Self-Play

Yuhua Jiang, Qihan Liu, Xiaoteng Ma et al.

AAAI 2024paperarXiv:2305.11476
8
citations

Learning Uncertainty-Aware Temporally-Extended Actions

Joongkyu Lee, Seung Joon Park, Yunhao Tang et al.

AAAI 2024paperarXiv:2402.05439
3
citations

OVD-Explorer: Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments

Jinyi Liu, Zhi Wang, Yan Zheng et al.

AAAI 2024paperarXiv:2312.12145
13
citations

Parameterized Projected Bellman Operator

Théo Vincent, Alberto Maria Metelli, Boris Belousov et al.

AAAI 2024paperarXiv:2312.12869
4
citations

Prompt to Transfer: Sim-to-Real Transfer for Traffic Signal Control with Prompt Learning

Longchao Da, Minquan Gao, Hua Wei et al.

AAAI 2024paperarXiv:2308.14284
52
citations

Rating-Based Reinforcement Learning

Devin White, Mingkang Wu, Ellen Novoseller et al.

AAAI 2024paperarXiv:2307.16348
14
citations

RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting

Lei Shu, Liangchen Luo, Jayakumar Hoskere et al.

AAAI 2024paperarXiv:2305.15685
78
citations

Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge

Meshal Alharbi, Mardavij Roozbehani, Munther Dahleh

AAAI 2024paperarXiv:2312.12558
4
citations

UNEX-RL: Reinforcing Long-Term Rewards in Multi-Stage Recommender Systems with UNidirectional EXecution

Gengrui Zhang, Xiaoshuang Chen, Yao WANG et al.

AAAI 2024paperarXiv:2401.06470
11
citations