"reinforcement learning fine-tuning" Papers

12 papers found

Filters:reinforcement learning fine-tuning Clear all

Conference

AAAI 2025 (3,028)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NeurIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,140)oral (1,594)spotlight (1,421)highlight (975)

AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward

Haonan Han, Xiangzuo Wu, Huan Liao et al.

CVPR 2025posterarXiv:2411.18654

citations

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

Zhiyuan Zhou, Andy Peng, Qiyang Li et al.

ICLR 2025posterarXiv:2412.07762

citations

Moral Alignment for LLM Agents

Elizaveta Tennant, Stephen Hailes, Mirco Musolesi

ICLR 2025oralarXiv:2410.01639

citations

Regulatory DNA Sequence Design with Reinforcement Learning

Zhao Yang, Bing Su, Chuan Cao et al.

ICLR 2025posterarXiv:2503.07981

citations

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

Tonghe Zhang, Chao Yu, Sichang Su et al.

NeurIPS 2025posterarXiv:2505.22094

citations

SceneDesigner: Controllable Multi-Object Image Generation with 9-DoF Pose Manipulation

Zhenyuan Qin, Xincheng Shuai, Henghui Ding

NeurIPS 2025spotlightarXiv:2511.16666

citations

ShiQ: Bringing back Bellman to LLMs

Pierre Clavier, Nathan Grinsztajn, Raphaël Avalos et al.

NeurIPS 2025posterarXiv:2505.11081

citations

Teaching Language Models to Reason with Tools

Chengpeng Li, Zhengyang Tang, Ziniu Li et al.

NeurIPS 2025posterarXiv:2510.20342

citations

VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding

Zongxia Li, Xiyang Wu, Guangyao Shi et al.

NeurIPS 2025posterarXiv:2505.01481

citations

Degeneration-free Policy Optimization: RL Fine-Tuning for Language Models without Degeneration

Youngsoo Jang, Geon-Hyeong Kim, Byoungjip Kim et al.

ICML 2024poster

Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving

Zhenghao Peng, Wenjie Luo, Yiren Lu et al.

ECCV 2024posterarXiv:2409.18343

citations

Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

Rui Yang, Xiaoman Pan, Feng Luo et al.

ICML 2024poster