2025 "supervised fine-tuning" Papers

15 papers found

InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales

Zhepei Wei, Wei-Lin Chen, Yu Meng

ICLR 2025posterarXiv:2406.13629
70
citations

Logical Consistency of Large Language Models in Fact-Checking

Bishwamittra Ghosh, Sarah Hasan, Naheed Anjum Arafat et al.

ICLR 2025posterarXiv:2412.16100
15
citations

Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning

Wang Yang, Zirui Liu, Hongye Jin et al.

NeurIPS 2025posterarXiv:2505.17315
3
citations

Multi-Token Prediction Needs Registers

Anastasios Gerontopoulos, Spyridon Gidaris, Nikos Komodakis

NeurIPS 2025posterarXiv:2505.10518
4
citations

Preserving Diversity in Supervised Fine-Tuning of Large Language Models

Ziniu Li, Congliang Chen, Tian Xu et al.

ICLR 2025posterarXiv:2408.16673
33
citations

Reinforcement Learning with Backtracking Feedback

Bilgehan Sel, Vaishakh Keshava, Phillip Wallis et al.

NeurIPS 2025poster

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Enshen Zhou, Jingkun An, Cheng Chi et al.

NeurIPS 2025posterarXiv:2506.04308
51
citations

Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics

Dongyoung Kim, Huiwon Jang, Sumin Park et al.

NeurIPS 2025posterarXiv:2506.00070
9
citations

Steering Information Utility in Key-Value Memory for Language Model Post-Training

Chunyuan Deng, Ruidi Chang, Hanjie Chen

NeurIPS 2025posterarXiv:2507.05158

The Promise of RL for Autoregressive Image Editing

Saba Ahmadi, Rabiul Awal, Ankur Sikarwar et al.

NeurIPS 2025posterarXiv:2508.01119
2
citations

Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties

Gouki Minegishi, Hiroki Furuta, Takeshi Kojima et al.

NeurIPS 2025posterarXiv:2506.05744
13
citations

Transformer Copilot: Learning from The Mistake Log in LLM Fine-tuning

Jiaru Zou, Yikun Ban, Zihao Li et al.

NeurIPS 2025spotlightarXiv:2505.16270
10
citations

Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs

Aldo Pareja, Nikhil Shivakumar Nayak, Hao Wang et al.

ICLR 2025posterarXiv:2412.13337
30
citations

WebDancer: Towards Autonomous Information Seeking Agency

Jialong Wu, Baixuan Li, Runnan Fang et al.

NeurIPS 2025posterarXiv:2505.22648
81
citations

Words or Vision: Do Vision-Language Models Have Blind Faith in Text?

Ailin Deng, Tri Cao, Zhirui Chen et al.

CVPR 2025posterarXiv:2503.02199
33
citations