Beidi Chen
18
Papers
114
Total Citations
Papers (18)
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
ICML 2025
56
citations
JoMA: Demystifying Multilayer Transformers via Joint Dynamics of MLP and Attention
ICLR 2024
46
citations
Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation
ICML 2025
7
citations
Zeroth-Order Fine-Tuning of LLMs with Transferable Static Sparsity
ICLR 2025
5
citations
LoCoCo: Dropping In Convolutions for Long Context Compression
ICML 2024
0
citations
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
ICML 2024
0
citations
HexGen: Generative Inference of Large Language Model over Heterogeneous Environment
ICML 2024
0
citations
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
ICML 2024
0
citations
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
ICML 2024
0
citations
Soft Prompt Recovers Compressed LLMs, Transferably
ICML 2024
0
citations
Fast and Accurate Stochastic Gradient Estimation
NeurIPS 2019
0
citations
Scatterbrain: Unifying Sparse and Low-rank Attention
NeurIPS 2021
0
citations
Locality Sensitive Teaching
NeurIPS 2021
0
citations
Fine-tuning Language Models over Slow Networks using Activation Quantization with Guarantees
NeurIPS 2022
0
citations
Decentralized Training of Foundation Models in Heterogeneous Environments
NeurIPS 2022
0
citations
Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions
NeurIPS 2023
0
citations
H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
NeurIPS 2023
0
citations
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
NeurIPS 2023
0
citations