Pierre Ablin

13

Papers

62

Total Citations

Papers (13)

Theory, Analysis, and Best Practices for Sigmoid Self-Attention

The AdEMAMix Optimizer: Better, Faster, Older

Shielded Diffusion: Generating Novel and Diverse Images using Sparse Repellency

Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging

Careful with that Scalpel: Improving Gradient Surgery with an EMA

How Smooth Is Attention?

Optimization without Retraction on the Random Generalized Stiefel Manifold

Modeling Shared responses in Neuroimaging Studies through MultiView ICA

NeurIPS 2020arXiv

Shared Independent Component Analysis for Multi-Subject Neuroimaging

NeurIPS 2021arXiv

Benchopt: Reproducible, efficient and collaborative optimization benchmarks

NeurIPS 2022arXiv

A framework for bilevel optimization that enables stochastic and global variance reduction algorithms

NeurIPS 2022arXiv

Do Residual Neural Networks discretize Neural Ordinary Differential Equations?

NeurIPS 2022arXiv

How to Scale Your EMA

NeurIPS 2023arXiv