2025 Poster "inference efficiency" Papers
9 papers found
Can LLMs Outshine Conventional Recommenders? A Comparative Evaluation
Qijiong Liu, Jieming Zhu, Lu Fan et al.
NeurIPS 2025posterarXiv:2503.05493
4
citations
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Saeed Ranjbar Alvar, Gursimran Singh, Mohammad Akbari et al.
CVPR 2025posterarXiv:2503.02175
48
citations
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Wei Huang, Yue Liao, Jianhui Liu et al.
ICLR 2025posterarXiv:2410.06270
22
citations
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models
Dhouib Mohamed, Davide Buscaldi, Vanier Sonia et al.
CVPR 2025posterarXiv:2504.08966
15
citations
Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping
Weili Zeng, Ziyuan Huang, Kaixiang Ji et al.
ICCV 2025posterarXiv:2503.21817
4
citations
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Zayne Sprague, Fangcong Yin, Juan Rodriguez et al.
ICLR 2025posterarXiv:2409.12183
239
citations
Týr-the-Pruner: Structural Pruning LLMs via Global Sparsity Distribution Optimization
Guanchen Li, Yixing Xu, Zeping Li et al.
NeurIPS 2025posterarXiv:2503.09657
6
citations
Value-Guided Search for Efficient Chain-of-Thought Reasoning
Kaiwen Wang, Jin Zhou, Jonathan Chang et al.
NeurIPS 2025posterarXiv:2505.17373
7
citations
Variational Best-of-N Alignment
Afra Amini, Tim Vieira, Elliott Ash et al.
ICLR 2025posterarXiv:2407.06057
37
citations