Daniel Hsu
4
Papers
0
Total Citations
Papers (4)
Fast attention mechanisms: a tale of parallelism
NeurIPS 2025arXiv
0
citations
Multi-group Learning for Hierarchical Groups
ICML 2024
0
citations
Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot
ICML 2024
0
citations
Transformers, parallel computation, and logarithmic depth
ICML 2024
0
citations