Problems with Chinchilla Approach 2
The Chinchilla paper's "Approach 2," fitting parabolas to IsoFLOP curves, turns out to have some subtle biases that can add up. We show these can lead to non-trivial errors in compute-optimal allocation (around 6.5% of total compute for Llama 3, worth over $1M in GPU time), especially when IsoFLOP grids aren't perfectly centered or symmetric. The paper also proposes a reparameterization of "Approach 3" that makes direct parametric fitting simple and stable. You can even run it in 70 lines of JavaScript.
March 27, 2026 · Eric Czech