Dan Busbridge

dbusbridge

AI & ML interests

Deep learning, optimization, self-supervised learning, representation learning, large language modeling, equivariance, geometric deep learning, attention mechanisms, transformers

Recent Activity

authored a paper 7 days ago

Stabilizing Transformer Training by Preventing Attention Entropy Collapse

authored a paper 7 days ago

The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning

authored a paper 7 days ago

Poly-View Contrastive Learning

View all activity

Organizations

dbusbridge's activity

authored 5 papers 7 days ago

upvoted a paper 8 days ago

Distillation Scaling Laws

Paper • 2502.08606 • Published 8 days ago • 41

authored a paper 24 days ago

Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models

Paper • 2501.12370 • Published about 1 month ago • 11

authored a paper 5 months ago

Theory, Analysis, and Best Practices for Sigmoid Self-Attention

Paper • 2409.04431 • Published Sep 6, 2024 • 1

commented 2 papers over 1 year ago

How to Scale Your EMA

Paper • 2307.13813 • Published Jul 25, 2023 • 9 •

How to Scale Your EMA

Paper • 2307.13813 • Published Jul 25, 2023 • 9 •

authored a paper over 1 year ago

How to Scale Your EMA

Paper • 2307.13813 • Published Jul 25, 2023 • 9