NEURIPS 2025 "supervised fine-tuning" Papers
21 papers found
Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark
Hanlei Zhang, zhuohang li, Hua Xu et al.
Complexity Scaling Laws for Neural Models using Combinatorial Optimization
Lowell Weissman, Michael Krumdick, A. Abbott
EvoLM: In Search of Lost Language Model Training Dynamics
Zhenting Qi, Fan Nie, Alexandre Alahi et al.
GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement Learning
Haolong Yan, Yeqing Shen, Xin Huang et al.
Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning
Wang Yang, Zirui Liu, Hongye Jin et al.
Multi-Token Prediction Needs Registers
Anastasios Gerontopoulos, Spyridon Gidaris, Nikos Komodakis
OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles
Yihe Deng, Hritik Bansal, Fan Yin et al.
Principled Fine-tuning of LLMs from User-Edits: A Medley of Preference, Supervision, and Reward
Dipendra Misra, Aldo Pacchiano, Ta-Chung Chi et al.
Reinforcement Learning with Backtracking Feedback
Bilgehan Sel, Vaishakh Keshava, Phillip Wallis et al.
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics
Enshen Zhou, Jingkun An, Cheng Chi et al.
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics
Dongyoung Kim, Huiwon Jang, Sumin Park et al.
Scalable Fingerprinting of Large Language Models
Anshul Nasery, Jonathan Hayase, Creston Brooks et al.
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models
Hongbo Liu, Jingwen He, Yi Jin et al.
Steering Information Utility in Key-Value Memory for Language Model Post-Training
Chunyuan Deng, Ruidi Chang, Hanjie Chen
TANDEM: Bi-Level Data Mixture Optimization with Twin Networks
Jiaxing Wang, Deping Xiang, Jin Xu et al.
The Best Instruction-Tuning Data are Those That Fit
Dylan Zhang, Qirun Dai, Hao Peng
The Promise of RL for Autoregressive Image Editing
Saba Ahmadi, Rabiul Awal, Ankur Sikarwar et al.
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties
Gouki Minegishi, Hiroki Furuta, Takeshi Kojima et al.
Transformer Copilot: Learning from The Mistake Log in LLM Fine-tuning
Jiaru Zou, Yikun Ban, Zihao Li et al.
Transforming Generic Coder LLMs to Effective Binary Code Embedding Models for Similarity Detection
Litao Li, Leo Song, Steven Ding et al.
WebDancer: Towards Autonomous Information Seeking Agency
Jialong Wu, Baixuan Li, Runnan Fang et al.