2025 Poster "reasoning tasks" Papers

22 papers found

Accelerating RL for LLM Reasoning with Optimal Advantage Regression

Kianté Brantley, Mingyu Chen, Zhaolin Gao et al.

NEURIPS 2025posterarXiv:2505.20686
12
citations

Advancing LLM Reasoning Generalists with Preference Trees

Lifan Yuan, Ganqu Cui, Hanbin Wang et al.

ICLR 2025posterarXiv:2404.02078
179
citations

Analyzing the Power of Chain of Thought through Memorization Capabilities

Lijia Yu, Xiao-Shan Gao, Lijun Zhang

NEURIPS 2025posterarXiv:2511.01190

AREAL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Wei Fu, Jiaxuan Gao, Xujie Shen et al.

NEURIPS 2025posterarXiv:2505.24298
95
citations

Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards

Charles Arnal, Gaëtan Narozniak, Vivien Cabannes et al.

NEURIPS 2025posterarXiv:2506.20520
15
citations

Bag of Tricks for Inference-time Computation of LLM Reasoning

Fan LIU, Wen-Shuo Chao, Naiqiang Tan et al.

NEURIPS 2025posterarXiv:2502.07191
12
citations

Balancing Act: Diversity and Consistency in Large Language Model Ensembles

Ahmed Abdulaal, Chen Jin, Nina Montaña-Brown et al.

ICLR 2025poster

Benchmarking Agentic Workflow Generation

Shuofei Qiao, Runnan Fang, Zhisong Qiu et al.

ICLR 2025posterarXiv:2410.07869
19
citations

Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning

Jiayu Wang, Yifei Ming, Zixuan Ke et al.

NEURIPS 2025posterarXiv:2506.04723
1
citations

C3PO: Optimized Large Language Model Cascades with Probabilistic Cost Constraints for Reasoning

Antonios Valkanas, Soumyasundar Pal, Pavel Rumiantsev et al.

NEURIPS 2025posterarXiv:2511.07396

Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation

Liliang Ren, Congcong Chen, Haoran Xu et al.

NEURIPS 2025posterarXiv:2507.06607
6
citations

Enhancing Language Model Agents using Diversity of Thoughts

Vijay Chandra Lingam, Behrooz Tehrani, sujay sanghavi et al.

ICLR 2025poster

Fast attention mechanisms: a tale of parallelism

Jingwen Liu, Hantao Yu, Clayton Sanford et al.

NEURIPS 2025posterarXiv:2509.09001

Generalization v.s. Memorization: Tracing Language Models’ Capabilities Back to Pretraining Data

Xinyi Wang, Antonis Antoniades, Yanai Elazar et al.

ICLR 2025posterarXiv:2407.14985
77
citations

InfiGFusion: Graph-on-Logits Distillation via Efficient Gromov-Wasserstein for Model Fusion

Yuanyi Wang, Zhaoyi Yan, Yiming Zhang et al.

NEURIPS 2025posterarXiv:2505.13893
2
citations

LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits

Duy Nguyen, Archiki Prasad, Elias Stengel-Eskin et al.

NEURIPS 2025posterarXiv:2410.01735
5
citations

Multipole Attention for Efficient Long Context Reasoning

Coleman Hooper, Sebastian Zhao, Luca Manolache et al.

NEURIPS 2025posterarXiv:2506.13059
3
citations

On the self-verification limitations of large language models on reasoning and planning tasks

Kaya Stechly, Karthik Valmeekam, Subbarao Kambhampati

ICLR 2025posterarXiv:2402.08115
100
citations

PID-controlled Langevin Dynamics for Faster Sampling on Generative Models

Hongyi Chen, Jianhai Shu, Jingtao Ding et al.

NEURIPS 2025posterarXiv:2511.12603

The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning

Xinyu Zhu, Mengzhou Xia, Zhepei Wei et al.

NEURIPS 2025posterarXiv:2506.01347
74
citations

ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning

Shulin Huang, Linyi Yang, Yan Song et al.

NEURIPS 2025posterarXiv:2502.16268
14
citations

TTRL: Test-Time Reinforcement Learning

Yuxin Zuo, Kaiyan Zhang, Li Sheng et al.

NEURIPS 2025posterarXiv:2504.16084
122
citations