NEURIPS 2025 "supervised fine-tuning" Papers

21 papers found

Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark

Hanlei Zhang, zhuohang li, Hua Xu et al.

NEURIPS 2025posterarXiv:2504.16427
2
citations

Complexity Scaling Laws for Neural Models using Combinatorial Optimization

Lowell Weissman, Michael Krumdick, A. Abbott

NEURIPS 2025posterarXiv:2506.12932

EvoLM: In Search of Lost Language Model Training Dynamics

Zhenting Qi, Fan Nie, Alexandre Alahi et al.

NEURIPS 2025oralarXiv:2506.16029
3
citations

GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement Learning

Haolong Yan, Yeqing Shen, Xin Huang et al.

NEURIPS 2025posterarXiv:2512.02423

Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning

Wang Yang, Zirui Liu, Hongye Jin et al.

NEURIPS 2025posterarXiv:2505.17315
3
citations

Multi-Token Prediction Needs Registers

Anastasios Gerontopoulos, Spyridon Gidaris, Nikos Komodakis

NEURIPS 2025posterarXiv:2505.10518
4
citations

OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles

Yihe Deng, Hritik Bansal, Fan Yin et al.

NEURIPS 2025posterarXiv:2503.17352
15
citations

Principled Fine-tuning of LLMs from User-Edits: A Medley of Preference, Supervision, and Reward

Dipendra Misra, Aldo Pacchiano, Ta-Chung Chi et al.

NEURIPS 2025posterarXiv:2601.19055

Reinforcement Learning with Backtracking Feedback

Bilgehan Sel, Vaishakh Keshava, Phillip Wallis et al.

NEURIPS 2025poster

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Enshen Zhou, Jingkun An, Cheng Chi et al.

NEURIPS 2025posterarXiv:2506.04308
55
citations

Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics

Dongyoung Kim, Huiwon Jang, Sumin Park et al.

NEURIPS 2025posterarXiv:2506.00070
10
citations

Scalable Fingerprinting of Large Language Models

Anshul Nasery, Jonathan Hayase, Creston Brooks et al.

NEURIPS 2025spotlightarXiv:2502.07760
8
citations

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models

Hongbo Liu, Jingwen He, Yi Jin et al.

NEURIPS 2025posterarXiv:2506.21356
7
citations

Steering Information Utility in Key-Value Memory for Language Model Post-Training

Chunyuan Deng, Ruidi Chang, Hanjie Chen

NEURIPS 2025posterarXiv:2507.05158

TANDEM: Bi-Level Data Mixture Optimization with Twin Networks

Jiaxing Wang, Deping Xiang, Jin Xu et al.

NEURIPS 2025poster

The Best Instruction-Tuning Data are Those That Fit

Dylan Zhang, Qirun Dai, Hao Peng

NEURIPS 2025spotlightarXiv:2502.04194
22
citations

The Promise of RL for Autoregressive Image Editing

Saba Ahmadi, Rabiul Awal, Ankur Sikarwar et al.

NEURIPS 2025posterarXiv:2508.01119
2
citations

Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties

Gouki Minegishi, Hiroki Furuta, Takeshi Kojima et al.

NEURIPS 2025posterarXiv:2506.05744
13
citations

Transformer Copilot: Learning from The Mistake Log in LLM Fine-tuning

Jiaru Zou, Yikun Ban, Zihao Li et al.

NEURIPS 2025spotlightarXiv:2505.16270
10
citations

Transforming Generic Coder LLMs to Effective Binary Code Embedding Models for Similarity Detection

Litao Li, Leo Song, Steven Ding et al.

NEURIPS 2025poster

WebDancer: Towards Autonomous Information Seeking Agency

Jialong Wu, Baixuan Li, Runnan Fang et al.

NEURIPS 2025posterarXiv:2505.22648
81
citations