Banggu Wu

4

Papers

46

Total Citations

Papers (4)

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Hyper-Connections

ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks

What Deep CNNs Benefit From Global Covariance Pooling: An Optimization Perspective