Projects Blog About
BLOG
MARIN

Scaling Laws That Extrapolate 300× Past the Fit

Delphi is an open scaling suite ranging from 3e18 to 1e23 FLOPs. A pre-registered forecast from its scaling law predicted the loss of the largest run within 0.2%, extrapolating 300× past the largest run used in the fit.

April 14, 2026 · Will Held

MARIN

Mixture of Experts Quantile Balancing: Validated at 32B-A5B (1e22 FLOPs) Scale

Quantile Balancing (QB) is a hyperparameter-free load balancer for Mixture of Experts models, introduced by Jianlin Su. We validated it on a 32B-A5B (1e22 FLOPs) Marin run over 326B tokens: zero hyperparameters, zero loss spikes, and no need for leading dense layers, auxiliary losses, or capacity overload factors.

April 10, 2026 · Larry Dial

Problems with Chinchilla Approach 2

The Chinchilla paper's "Approach 2," fitting parabolas to IsoFLOP curves, turns out to have some subtle biases that can add up. We show these can lead to non-trivial errors in compute-optimal allocation (around 6.5% of total compute for Llama 3, worth over $1M in GPU time), especially when IsoFLOP grids aren't perfectly centered or symmetric. The paper also proposes a reparameterization of "Approach 3" that makes direct parametric fitting simple and stable. You can even run it in 70 lines of JavaScript.

March 27, 2026 · Eric Czech

© 2026 Open Athena.
All rights reserved.

Privacy Policy Terms of Use
LinkedIn Logo