Poster "supervised fine-tuning" Papers

45 papers found

Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages

Zui Chen, Tianqiao Liu, Tongqing et al.

ICLR 2025posterarXiv:2501.14002
11
citations

Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models

Wenxuan Zhang, Philip Torr, Mohamed Elhoseiny et al.

ICLR 2025posterarXiv:2408.15313
23
citations

Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark

Hanlei Zhang, zhuohang li, Hua Xu et al.

NEURIPS 2025posterarXiv:2504.16427
2
citations

CompCap: Improving Multimodal Large Language Models with Composite Captions

Xiaohui Chen, Satya Narayan Shukla, Mahmoud Azab et al.

ICCV 2025posterarXiv:2412.05243
6
citations

Complexity Scaling Laws for Neural Models using Combinatorial Optimization

Lowell Weissman, Michael Krumdick, A. Abbott

NEURIPS 2025posterarXiv:2506.12932

GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement Learning

Haolong Yan, Yeqing Shen, Xin Huang et al.

NEURIPS 2025posterarXiv:2512.02423

Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs

Shuo Li, Tao Ji, Xiaoran Fan et al.

ICLR 2025posterarXiv:2410.11302
7
citations

InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales

Zhepei Wei, Wei-Lin Chen, Yu Meng

ICLR 2025posterarXiv:2406.13629
74
citations

L3Ms — Lagrange Large Language Models

Guneet Singh Dhillon, Xingjian Shi, Yee Whye Teh et al.

ICLR 2025posterarXiv:2410.21533
1
citations

Logical Consistency of Large Language Models in Fact-Checking

Bishwamittra Ghosh, Sarah Hasan, Naheed Anjum Arafat et al.

ICLR 2025posterarXiv:2412.16100
15
citations

Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning

Wang Yang, Zirui Liu, Hongye Jin et al.

NEURIPS 2025posterarXiv:2505.17315
3
citations

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Zhangchen Xu, Fengqing Jiang, Luyao Niu et al.

ICLR 2025posterarXiv:2406.08464
261
citations

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

Yusu Qian, Hanrong Ye, Jean-Philippe Fauconnier et al.

ICLR 2025posterarXiv:2407.01509
41
citations

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

Pengxiang Li, Lu Yin, Shiwei Liu

ICLR 2025posterarXiv:2412.13795
25
citations

Multi-Token Prediction Needs Registers

Anastasios Gerontopoulos, Spyridon Gidaris, Nikos Komodakis

NEURIPS 2025posterarXiv:2505.10518
4
citations

OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles

Yihe Deng, Hritik Bansal, Fan Yin et al.

NEURIPS 2025posterarXiv:2503.17352
15
citations

OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling

Zhicheng YANG, Yiwei Wang, Yinya Huang et al.

ICLR 2025posterarXiv:2407.09887
27
citations

Preserving Diversity in Supervised Fine-Tuning of Large Language Models

Ziniu Li, Congliang Chen, Tian Xu et al.

ICLR 2025posterarXiv:2408.16673
35
citations

Principled Fine-tuning of LLMs from User-Edits: A Medley of Preference, Supervision, and Reward

Dipendra Misra, Aldo Pacchiano, Ta-Chung Chi et al.

NEURIPS 2025posterarXiv:2601.19055

Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM

Yatai Ji, Jiacheng Zhang, Jie Wu et al.

ICCV 2025posterarXiv:2412.15156
10
citations

Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning

Joey Hong, Anca Dragan, Sergey Levine

ICLR 2025posterarXiv:2411.05193
8
citations

Reinforcement Learning with Backtracking Feedback

Bilgehan Sel, Vaishakh Keshava, Phillip Wallis et al.

NEURIPS 2025poster

Repetition Improves Language Model Embeddings

Jacob Springer, Suhas Kotha, Daniel Fried et al.

ICLR 2025posterarXiv:2402.15449
59
citations

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Enshen Zhou, Jingkun An, Cheng Chi et al.

NEURIPS 2025posterarXiv:2506.04308
55
citations

Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics

Dongyoung Kim, Huiwon Jang, Sumin Park et al.

NEURIPS 2025posterarXiv:2506.00070
10
citations

Semantic Loss Guided Data Efficient Supervised Fine Tuning for Safe Responses in LLMs

Yuxiao Lu, Arunesh Sinha, Pradeep Varakantham

ICLR 2025posterarXiv:2412.06843
2
citations

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models

Hongbo Liu, Jingwen He, Yi Jin et al.

NEURIPS 2025posterarXiv:2506.21356
7
citations

Steering Information Utility in Key-Value Memory for Language Model Post-Training

Chunyuan Deng, Ruidi Chang, Hanjie Chen

NEURIPS 2025posterarXiv:2507.05158

Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking

Benjamin Feuer, Micah Goldblum, Teresa Datta et al.

ICLR 2025posterarXiv:2409.15268
27
citations

TANDEM: Bi-Level Data Mixture Optimization with Twin Networks

Jiaxing Wang, Deping Xiang, Jin Xu et al.

NEURIPS 2025poster

The Promise of RL for Autoregressive Image Editing

Saba Ahmadi, Rabiul Awal, Ankur Sikarwar et al.

NEURIPS 2025posterarXiv:2508.01119
2
citations

Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties

Gouki Minegishi, Hiroki Furuta, Takeshi Kojima et al.

NEURIPS 2025posterarXiv:2506.05744
13
citations

Training Language Models to Self-Correct via Reinforcement Learning

Aviral Kumar, Vincent Zhuang, Rishabh Agarwal et al.

ICLR 2025posterarXiv:2409.12917
318
citations

Transforming Generic Coder LLMs to Effective Binary Code Embedding Models for Similarity Detection

Litao Li, Leo Song, Steven Ding et al.

NEURIPS 2025poster

Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs

Aldo Pareja, Nikhil Shivakumar Nayak, Hao Wang et al.

ICLR 2025posterarXiv:2412.13337
30
citations

VPO: Aligning Text-to-Video Generation Models with Prompt Optimization

Jiale Cheng, Ruiliang Lyu, Xiaotao Gu et al.

ICCV 2025posterarXiv:2503.20491
13
citations

WebDancer: Towards Autonomous Information Seeking Agency

Jialong Wu, Baixuan Li, Runnan Fang et al.

NEURIPS 2025posterarXiv:2505.22648
81
citations

Words or Vision: Do Vision-Language Models Have Blind Faith in Text?

Ailin Deng, Tri Cao, Zhirui Chen et al.

CVPR 2025posterarXiv:2503.02199
33
citations

Can AI Assistants Know What They Don't Know?

Qinyuan Cheng, Tianxiang Sun, Xiangyang Liu et al.

ICML 2024posterarXiv:2401.13275

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

Haoran Xu, Amr Sharaf, Yunmo Chen et al.

ICML 2024posterarXiv:2401.08417

Exploring the LLM Journey from Cognition to Expression with Linear Representations

Yuzi Yan, Jialian Li, YipinZhang et al.

ICML 2024posterarXiv:2405.16964

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

Le Yu, Bowen Yu, Haiyang Yu et al.

ICML 2024posterarXiv:2311.03099

Privacy-Preserving Instructions for Aligning Large Language Models

Da Yu, Peter Kairouz, Sewoong Oh et al.

ICML 2024posterarXiv:2402.13659

Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

Rui Yang, Xiaoman Pan, Feng Luo et al.

ICML 2024posterarXiv:2402.10207

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Zixiang Chen, Yihe Deng, Huizhuo Yuan et al.

ICML 2024posterarXiv:2401.01335