2024 "efficient inference" Papers
6 papers found
ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference
Ziqian Zeng, Yihuai Hong, Hongliang Dai et al.
AAAI 2024paperarXiv:2312.11882
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Junyuan Hong, Jinhao Duan, Chenhui Zhang et al.
ICML 2024posterarXiv:2403.15447
Deep Feature Surgery: Towards Accurate and Efficient Multi-Exit Networks
Cheng Gong, Yao Chen, Qiuyang Luo et al.
ECCV 2024posterarXiv:2407.13986
3
citations
FrameQuant: Flexible Low-Bit Quantization for Transformers
Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang et al.
ICML 2024posterarXiv:2403.06082
MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices
Yang Zhao, Zhisheng Xiao, Yanwu Xu et al.
ECCV 2024posterarXiv:2311.16567
35
citations
Switchable Decision: Dynamic Neural Generation Networks
Shujian Zhang, Korawat Tanwisuth, Chengyue Gong et al.
ICML 2024posterarXiv:2405.04513