Cluster Scheduling with Iris
Training a frontier-level LLM requires significant resources across a variety of providers and accelerator types. Our Iris scheduling system has allowed us to effectively make use of these precious resources using a custom global scheduler; in the months since rollout, our sustained concurrent TPU usage has roughly doubled.
Preparing for the AI Future with Ethics in Mind
At a recent panel discussion, our COO and CSO Jared Crooks explained why openness is key to understanding AI, and dug into the importance of embedding ethics in this new technology.
Improving our LLM Pretraining Efficiency
How Marin pretraining became more efficient through Mixture of Experts, higher expert sparsity, MuonH, PKO, and routed expert normalization.
Scaling Laws That Extrapolate 300× Past the Fit
Delphi is an open scaling suite ranging from 3e18 to 1e23 FLOPs. A pre-registered forecast from its scaling law predicted the loss of the largest run within 0.2%, extrapolating 300× past the largest run used in the fit.
Mixture of Experts Quantile Balancing: Validated at 32B-A5B (1e22 FLOPs) Scale
Quantile Balancing (QB) is a hyperparameter-free load balancer for Mixture of Experts models, introduced by Jianlin Su. We validated it on a 32B-A5B (1e22 FLOPs) Marin run over 326B tokens: zero hyperparameters, zero loss spikes, and no need for leading dense layers, auxiliary losses, or capacity overload factors.
Problems with Chinchilla Approach 2
The Chinchilla paper's "Approach 2," fitting parabolas to IsoFLOP curves, turns out to have some subtle biases that can add up. We show these can lead to non-trivial errors in compute-optimal allocation (around 6.5% of total compute for Llama 3, worth over $1M in GPU time), especially when IsoFLOP grids aren't perfectly centered or symmetric. The paper also proposes a reparameterization of "Approach 3" that makes direct parametric fitting simple and stable. You can even run it in 70 lines of JavaScript.