Poster "supervised fine-tuning" Papers

45 papers found

Filters:poster supervised fine-tuning Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages

Zui Chen, Tianqiao Liu, Tongqing et al.

ICLR 2025posterarXiv:2501.14002

citations

Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models

Wenxuan Zhang, Philip Torr, Mohamed Elhoseiny et al.

ICLR 2025posterarXiv:2408.15313

citations

Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark

Hanlei Zhang, zhuohang li, Hua Xu et al.

NEURIPS 2025posterarXiv:2504.16427

citations

CompCap: Improving Multimodal Large Language Models with Composite Captions

Xiaohui Chen, Satya Narayan Shukla, Mahmoud Azab et al.

ICCV 2025posterarXiv:2412.05243

citations

Complexity Scaling Laws for Neural Models using Combinatorial Optimization

Lowell Weissman, Michael Krumdick, A. Abbott

NEURIPS 2025posterarXiv:2506.12932

GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement Learning

Haolong Yan, Yeqing Shen, Xin Huang et al.

NEURIPS 2025posterarXiv:2512.02423

Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs

Shuo Li, Tao Ji, Xiaoran Fan et al.

ICLR 2025posterarXiv:2410.11302

citations

InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales

Zhepei Wei, Wei-Lin Chen, Yu Meng

ICLR 2025posterarXiv:2406.13629

citations

L3Ms — Lagrange Large Language Models

Guneet Singh Dhillon, Xingjian Shi, Yee Whye Teh et al.

ICLR 2025posterarXiv:2410.21533

citations

Logical Consistency of Large Language Models in Fact-Checking

Bishwamittra Ghosh, Sarah Hasan, Naheed Anjum Arafat et al.

ICLR 2025posterarXiv:2412.16100

citations

Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning

Wang Yang, Zirui Liu, Hongye Jin et al.

NEURIPS 2025posterarXiv:2505.17315

citations

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Zhangchen Xu, Fengqing Jiang, Luyao Niu et al.

ICLR 2025posterarXiv:2406.08464

261

citations

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

Yusu Qian, Hanrong Ye, Jean-Philippe Fauconnier et al.

ICLR 2025posterarXiv:2407.01509

citations

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

Pengxiang Li, Lu Yin, Shiwei Liu

ICLR 2025posterarXiv:2412.13795

citations

Multi-Token Prediction Needs Registers

Anastasios Gerontopoulos, Spyridon Gidaris, Nikos Komodakis

NEURIPS 2025posterarXiv:2505.10518

citations

OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles

Yihe Deng, Hritik Bansal, Fan Yin et al.

NEURIPS 2025posterarXiv:2503.17352

citations

OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling

Zhicheng YANG, Yiwei Wang, Yinya Huang et al.

ICLR 2025posterarXiv:2407.09887

citations

Preserving Diversity in Supervised Fine-Tuning of Large Language Models

Ziniu Li, Congliang Chen, Tian Xu et al.

ICLR 2025posterarXiv:2408.16673

citations

Principled Fine-tuning of LLMs from User-Edits: A Medley of Preference, Supervision, and Reward

Dipendra Misra, Aldo Pacchiano, Ta-Chung Chi et al.

NEURIPS 2025posterarXiv:2601.19055

Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM

Yatai Ji, Jiacheng Zhang, Jie Wu et al.

ICCV 2025posterarXiv:2412.15156

citations

Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning

Joey Hong, Anca Dragan, Sergey Levine

ICLR 2025posterarXiv:2411.05193

citations

Reinforcement Learning with Backtracking Feedback

Bilgehan Sel, Vaishakh Keshava, Phillip Wallis et al.

NEURIPS 2025poster

Repetition Improves Language Model Embeddings

Jacob Springer, Suhas Kotha, Daniel Fried et al.

ICLR 2025posterarXiv:2402.15449

citations

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Enshen Zhou, Jingkun An, Cheng Chi et al.

NEURIPS 2025posterarXiv:2506.04308

citations

Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics

Dongyoung Kim, Huiwon Jang, Sumin Park et al.

NEURIPS 2025posterarXiv:2506.00070

citations

Semantic Loss Guided Data Efficient Supervised Fine Tuning for Safe Responses in LLMs

Yuxiao Lu, Arunesh Sinha, Pradeep Varakantham

ICLR 2025posterarXiv:2412.06843

citations

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models

Hongbo Liu, Jingwen He, Yi Jin et al.

NEURIPS 2025posterarXiv:2506.21356

citations

Steering Information Utility in Key-Value Memory for Language Model Post-Training

Chunyuan Deng, Ruidi Chang, Hanjie Chen

NEURIPS 2025posterarXiv:2507.05158

Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking

Benjamin Feuer, Micah Goldblum, Teresa Datta et al.

ICLR 2025posterarXiv:2409.15268

citations

TANDEM: Bi-Level Data Mixture Optimization with Twin Networks

Jiaxing Wang, Deping Xiang, Jin Xu et al.

NEURIPS 2025poster

The Promise of RL for Autoregressive Image Editing

Saba Ahmadi, Rabiul Awal, Ankur Sikarwar et al.

NEURIPS 2025posterarXiv:2508.01119

citations

Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties

Gouki Minegishi, Hiroki Furuta, Takeshi Kojima et al.

NEURIPS 2025posterarXiv:2506.05744

citations

Training Language Models to Self-Correct via Reinforcement Learning

Aviral Kumar, Vincent Zhuang, Rishabh Agarwal et al.

ICLR 2025posterarXiv:2409.12917

318

citations

Transforming Generic Coder LLMs to Effective Binary Code Embedding Models for Similarity Detection

Litao Li, Leo Song, Steven Ding et al.

NEURIPS 2025poster

Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs

Aldo Pareja, Nikhil Shivakumar Nayak, Hao Wang et al.

ICLR 2025posterarXiv:2412.13337

citations

VPO: Aligning Text-to-Video Generation Models with Prompt Optimization

Jiale Cheng, Ruiliang Lyu, Xiaotao Gu et al.

ICCV 2025posterarXiv:2503.20491

citations

WebDancer: Towards Autonomous Information Seeking Agency

Jialong Wu, Baixuan Li, Runnan Fang et al.

NEURIPS 2025posterarXiv:2505.22648

citations

Words or Vision: Do Vision-Language Models Have Blind Faith in Text?

Ailin Deng, Tri Cao, Zhirui Chen et al.

CVPR 2025posterarXiv:2503.02199

citations

Can AI Assistants Know What They Don't Know?

Qinyuan Cheng, Tianxiang Sun, Xiangyang Liu et al.

ICML 2024posterarXiv:2401.13275

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

Haoran Xu, Amr Sharaf, Yunmo Chen et al.

ICML 2024posterarXiv:2401.08417

Exploring the LLM Journey from Cognition to Expression with Linear Representations

Yuzi Yan, Jialian Li, YipinZhang et al.

ICML 2024posterarXiv:2405.16964

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

Le Yu, Bowen Yu, Haiyang Yu et al.

ICML 2024posterarXiv:2311.03099

Privacy-Preserving Instructions for Aligning Large Language Models

Da Yu, Peter Kairouz, Sewoong Oh et al.

ICML 2024posterarXiv:2402.13659

Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

Rui Yang, Xiaoman Pan, Feng Luo et al.

ICML 2024posterarXiv:2402.10207

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Zixiang Chen, Yihe Deng, Huizhuo Yuan et al.

ICML 2024posterarXiv:2401.01335