Taiji Suzuki

18

Papers

51

Total Citations

Papers (18)

Symmetric Mean-field Langevin Dynamics for Distributional Minimax Problems

Flow matching achieves almost minimax optimal convergence

On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent

State Space Models are Provably Comparable to Transformers in Dynamic Token Selection

Koopman-based generalization bound: New aspect for full-rank weights

Propagation of Chaos for Mean-Field Langevin Dynamics and its Application to Model Ensemble

Weighted Point Set Embedding for Multimodal Contrastive Learning Toward Optimal Similarity Metric

Quantifying Memory Utilization with Effective State-Size

Trained Mamba Emulates Online Gradient Descent in In-Context Linear Regression

Mean-field Analysis on Two-layer Neural Networks from a Kernel Perspective

Mean Field Langevin Actor-Critic: Faster Convergence and Global Optimality beyond Lazy Learning

Provably Neural Active Learning Succeeds via Prioritizing Perplexing Samples

High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization

Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape

State-Free Inference of State-Space Models: The Transfer Function Approach

Mechanistic Design and Scaling of Hybrid Architectures

SILVER: Single-loop variance reduction and application to federated learning

How do Transformers Perform In-Context Autoregressive Learning ?