Open Athena

BLOG

Cluster Scheduling with Iris

Training a frontier-level LLM requires significant resources across a variety of providers and accelerator types. Our Iris scheduling system has allowed us to effectively make use of these precious resources using a custom global scheduler; in the months since rollout, our sustained concurrent TPU usage has roughly doubled.

June 25, 2026 · Russell Power

Preparing for the AI Future with Ethics in Mind

At a recent panel discussion, our COO and CSO Jared Crooks explained why openness is key to understanding AI, and dug into the importance of embedding ethics in this new technology.

June 17, 2026 · Jared Crooks

MARIN

Improving our LLM Pretraining Efficiency

How Marin pretraining became more efficient through Mixture of Experts, higher expert sparsity, MuonH, PKO, and routed expert normalization.

June 3, 2026 · Larry Dial

MARIN

Scaling Laws That Extrapolate 300× Past the Fit

Delphi is an open scaling suite ranging from 3e18 to 1e23 FLOPs. A pre-registered forecast from its scaling law predicted the loss of the largest run within 0.2%, extrapolating 300× past the largest run used in the fit.

May 11, 2026 · Will Held

MARIN

Mixture of Experts Quantile Balancing: Validated at 32B-A5B (1e22 FLOPs) Scale

Quantile Balancing (QB) is a hyperparameter-free load balancer for Mixture of Experts models, introduced by Jianlin Su. We validated it on a 32B-A5B (1e22 FLOPs) Marin run over 326B tokens: zero hyperparameters, zero loss spikes, and no need for leading dense layers, auxiliary losses, or capacity overload factors.

April 10, 2026 · Larry Dial

Problems with Chinchilla Approach 2

The Chinchilla paper's "Approach 2," fitting parabolas to IsoFLOP curves, turns out to have some subtle biases that can add up. We show these can lead to non-trivial errors in compute-optimal allocation (around 6.5% of total compute for Llama 3, worth over $1M in GPU time), especially when IsoFLOP grids aren't perfectly centered or symmetric. The paper also proposes a reparameterization of "Approach 3" that makes direct parametric fitting simple and stable. You can even run it in 70 lines of JavaScript.

March 27, 2026 · Eric Czech