"inference efficiency" Papers
9 papers found
Can LLMs Outshine Conventional Recommenders? A Comparative Evaluation
Qijiong Liu, Jieming Zhu, Lu Fan et al.
NeurIPS 2025posterarXiv:2503.05493
4
citations
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Saeed Ranjbar Alvar, Gursimran Singh, Mohammad Akbari et al.
CVPR 2025posterarXiv:2503.02175
48
citations
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models
Dhouib Mohamed, Davide Buscaldi, Vanier Sonia et al.
CVPR 2025posterarXiv:2504.08966
15
citations
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Zayne Sprague, Fangcong Yin, Juan Rodriguez et al.
ICLR 2025posterarXiv:2409.12183
239
citations
Variational Best-of-N Alignment
Afra Amini, Tim Vieira, Elliott Ash et al.
ICLR 2025posterarXiv:2407.06057
37
citations
APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference
Bowen Zhao, Hannaneh Hajishirzi, Qingqing Cao
ICML 2024poster
Deciphering RNA Secondary Structure Prediction: A Probabilistic K-Rook Matching Perspective
Cheng Tan, Zhangyang Gao, Hanqun CAO et al.
ICML 2024poster
Efficient Denoising Diffusion via Probabilistic Masking
Weizhong Zhang, Zhiwei Zhang, Renjie Pi et al.
ICML 2024poster
Tandem Transformers for Inference Efficient LLMs
Aishwarya P S, Pranav Nair, Yashas Samaga et al.
ICML 2024poster