2025 Poster "reinforcement learning" Papers

80 papers found • Page 1 of 2

$\texttt{G1}$: Teaching LLMs to Reason on Graphs with Reinforcement Learning

Xiaojun Guo, Ang Li, Yifei Wang et al.

NeurIPS 2025poster
4
citations

A Causal Lens for Learning Long-term Fair Policies

Jacob Lear, Lu Zhang

ICLR 2025posterarXiv:2506.11242

AdaptiveAE: An Adaptive Exposure Strategy for HDR Capturing in Dynamic Scenes

Tianyi Xu, Fan Zhang, Boxin Shi et al.

ICCV 2025posterarXiv:2508.13503
1
citations

Adding Conditional Control to Diffusion Models with Reinforcement Learning

Yulai Zhao, Masatoshi Uehara, Gabriele Scalia et al.

ICLR 2025posterarXiv:2406.12120
13
citations

A Differential and Pointwise Control Approach to Reinforcement Learning

Minh Nguyen, Chandrajit Bajaj

NeurIPS 2025posterarXiv:2404.15617
1
citations

A Generalized Bisimulation Metric of State Similarity between Markov Decision Processes: From Theoretical Propositions to Applications

Zhenyu Tao, Wei Xu, Xiaohu You

NeurIPS 2025posterarXiv:2509.18714
2
citations

An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning

Haoran Xu, Shuozhe Li, Harshit Sikchi et al.

ICLR 2025posterarXiv:2504.13368
2
citations

Can a MISL Fly? Analysis and Ingredients for Mutual Information Skill Learning

Chongyi Zheng, Jens Tuyls, Joanne Peng et al.

ICLR 2025posterarXiv:2412.08021
8
citations

Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics

Runzhe Wu, Ayush Sekhari, Akshay Krishnamurthy et al.

ICLR 2025posterarXiv:2406.11810
3
citations

Continual Knowledge Adaptation for Reinforcement Learning

Jinwu Hu, ZiHao Lian, Zhiquan Wen et al.

NeurIPS 2025posterarXiv:2510.19314
1
citations

Cypher-RI: Reinforcement Learning for Integrating Schema Selection into Cypher Generation

Hanchen Su, Xuyuan Li, Yan Zhou et al.

NeurIPS 2025poster

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Huajian Xin, Z.Z. Ren, Junxiao Song et al.

ICLR 2025posterarXiv:2408.08152
134
citations

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO

Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.

NeurIPS 2025posterarXiv:2505.17017
25
citations

Discrete Codebook World Models for Continuous Control

Aidan Scannell, Mohammadreza Nakhaeinezhadfard, Kalle Kujanpää et al.

ICLR 2025posterarXiv:2503.00653
9
citations

Dynamic Configuration for Cutting Plane Separators via Reinforcement Learning on Incremental Graph

Mingxuan Ye, Jie Wang, Fangzhou et al.

NeurIPS 2025poster

Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games

Runyu Lu, Peng Zhang, Ruochuan Shi et al.

NeurIPS 2025posterarXiv:2511.00811
2
citations

EvolvedGRPO: Unlocking Reasoning in LVLMs via Progressive Instruction Evolution

Zhebei Shen, Qifan Yu, Juncheng Li et al.

NeurIPS 2025poster

From Kolmogorov to Cauchy: Shallow XNet Surpasses KANs

Xin Li, Xiaotao Zheng, Zhihong Xia

NeurIPS 2025poster

GeoExplorer: Active Geo-localization with Curiosity-Driven Exploration

Li Mi, Manon Béchaz, Zeming Chen et al.

ICCV 2025posterarXiv:2508.00152

Globally Optimal Policy Gradient Algorithms for Reinforcement Learning with PID Control Policies

Vipul Sharma, Wesley Suttle, S Sivaranjani

NeurIPS 2025poster

GoalLadder: Incremental Goal Discovery with Vision-Language Models

Alexey Zakharov, Shimon Whiteson

NeurIPS 2025posterarXiv:2506.16396
1
citations

GraphChain: Large Language Models for Large-scale Graph Analysis via Tool Chaining

Chunyu Wei, Wenji Hu, Xingjia Hao et al.

NeurIPS 2025posterarXiv:2511.00457

GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training

Tong Wei, Yijun Yang, Junliang Xing et al.

ICCV 2025posterarXiv:2503.08525
8
citations

HCRMP: An LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving

Zhiwen Chen, Hanming Deng, Zhuoren Li et al.

NeurIPS 2025posterarXiv:2505.15793
3
citations

Hierarchical World Models as Visual Whole-Body Humanoid Controllers

Nick Hansen, Jyothir S V, Vlad Sobal et al.

ICLR 2025posterarXiv:2405.18418
20
citations

Improving Monte Carlo Tree Search for Symbolic Regression

Zhengyao Huang, Daniel Huang, Tiannan Xiao et al.

NeurIPS 2025posterarXiv:2509.15929

Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models

Cong Lu, Shengran Hu, Jeff Clune

ICLR 2025posterarXiv:2405.15143
26
citations

Iterative Foundation Model Fine-Tuning on Multiple Rewards

Pouya M. Ghari, simone sciabola, Ye Wang

NeurIPS 2025posterarXiv:2511.00220

Janus-Pro-R1: Advancing Collaborative Visual Comprehension and Generation via Reinforcement Learning

Kaihang Pan, Yang Wu, Wendong Bu et al.

NeurIPS 2025posterarXiv:2506.01480
6
citations

Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks

Michael Matthews, Michael Beukman, Chris Lu et al.

ICLR 2025posterarXiv:2410.23208
20
citations

Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning

Seanie Lee, Minsu Kim, Lynn Cherif et al.

ICLR 2025posterarXiv:2405.18540
44
citations

MallowsPO: Fine-Tune Your LLM with Preference Dispersions

Haoxian Chen, Hanyang Zhao, Henry Lam et al.

ICLR 2025posterarXiv:2405.14953
15
citations

MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility

Wayne Wu, Honglin He, Jack He et al.

ICLR 2025posterarXiv:2407.08725
11
citations

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning

Xi Chen, Mingkang Zhu, Shaoteng Liu et al.

NeurIPS 2025posterarXiv:2506.22434

Modelling the control of offline processing with reinforcement learning

Eleanor Spens, Neil Burgess, Tim Behrens

NeurIPS 2025poster

MRO: Enhancing Reasoning in Diffusion Language Models via Multi-Reward Optimization

Chenglong Wang, Yang Gan, Hang Zhou et al.

NeurIPS 2025posterarXiv:2510.21473

Multi-Agent Collaboration via Evolving Orchestration

Yufan Dang, Chen Qian, Xueheng Luo et al.

NeurIPS 2025posterarXiv:2505.19591
25
citations

NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimation

Longtian Qiu, Shan Ning, Jiaxuan Sun et al.

NeurIPS 2025posterarXiv:2510.21122

Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning

Weidong Liu, Jiyuan Tu, Xi Chen et al.

NeurIPS 2025posterarXiv:2310.02581
5
citations

On the Convergence of Projected Policy Gradient for Any Constant Step Sizes

Jiacai Liu, Wenye Li, Dachao Lin et al.

NeurIPS 2025posterarXiv:2311.01104
4
citations

On the Sample Complexity of Differentially Private Policy Optimization

Yi He, Xingyu Zhou

NeurIPS 2025posterarXiv:2510.21060

Optimal Dynamic Regret by Transformers for Non-Stationary Reinforcement Learning

Baiyuan Chen, Shinji Ito, Masaaki Imaizumi

NeurIPS 2025posterarXiv:2508.16027

OrbitZoo: Real Orbital Systems Challenges for Reinforcement Learning

Alexandre Oliveira, Katarina Dyreby, Francisco Caldas et al.

NeurIPS 2025posterarXiv:2504.04160

Parameter Efficient Fine-tuning via Explained Variance Adaptation

Fabian Paischer, Lukas Hauzenberger, Thomas Schmied et al.

NeurIPS 2025posterarXiv:2410.07170
4
citations

Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing

Yilmazcan Ozyurt, Tunaberk Almaci, Stefan Feuerriegel et al.

NeurIPS 2025posterarXiv:2507.11060

Policy Gradient with Kernel Quadrature

Tetsuro Morimura, Satoshi Hayakawa

ICLR 2025posterarXiv:2310.14768
1
citations

Progress Reward Model for Reinforcement Learning via Large Language Models

Xiuhui Zhang, Ning Gao, Xingyu Jiang et al.

NeurIPS 2025poster

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Mingjie Liu, Shizhe Diao, Ximing Lu et al.

NeurIPS 2025posterarXiv:2505.24864
99
citations

Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control

Zijie Xu, Tong Bu, Zecheng Hao et al.

NeurIPS 2025posterarXiv:2505.24161
3
citations

Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)

Zhenjie Yang, Xiaosong Jia, Qifeng Li et al.

NeurIPS 2025posterarXiv:2505.16394
18
citations
← PreviousNext →