Beidi Chen

10

Papers

114

Total Citations

Papers (10)

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

JoMA: Demystifying Multilayer Transformers via Joint Dynamics of MLP and Attention

Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation

Zeroth-Order Fine-Tuning of LLMs with Transferable Static Sparsity

LoCoCo: Dropping In Convolutions for Long Context Compression

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

HexGen: Generative Inference of Large Language Model over Heterogeneous Environment

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Soft Prompt Recovers Compressed LLMs, Transferably