"feed-forward layers" Papers
2 papers found
Distributional Associations vs In-Context Reasoning: A Study of Feed-forward and Attention Layers
Lei Chen, Joan Bruna, Alberto Bietti
ICLR 2025posterarXiv:2406.03068
7
citations
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
Rahul Ramesh, Ekdeep Singh Lubana, Mikail Khona et al.
ICML 2024posterarXiv:2311.12997