Poster "attention heads" Papers
2 papers found
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Guangxuan Xiao, Jiaming Tang, Jingwei Zuo et al.
ICLR 2025posterarXiv:2410.10819
165
citations
Decomposing and Editing Predictions by Modeling Model Computation
Harshay Shah, Andrew Ilyas, Aleksander Madry
ICML 2024poster