Beidi Chen

18

Papers

114

Total Citations

Papers (18)

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

JoMA: Demystifying Multilayer Transformers via Joint Dynamics of MLP and Attention

Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation

Zeroth-Order Fine-Tuning of LLMs with Transferable Static Sparsity

LoCoCo: Dropping In Convolutions for Long Context Compression

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

HexGen: Generative Inference of Large Language Model over Heterogeneous Environment

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Soft Prompt Recovers Compressed LLMs, Transferably

Fast and Accurate Stochastic Gradient Estimation

Scatterbrain: Unifying Sparse and Low-rank Attention

Locality Sensitive Teaching

Fine-tuning Language Models over Slow Networks using Activation Quantization with Guarantees

Decentralized Training of Foundation Models in Heterogeneous Environments

Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions

H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer