I should start by saying that “Generative AI is the defining technology of our time. It is transforming how we work, how we learn, how we create and consume knowledge,” etc, etc.
So much is obvious. What is less obvious is how frontier AI is built.
By “how it is built,” I do not mean the high-level story you get in a popular science article. I do not even mean the knowledge you learn from classes like the CS336 lectures at Stanford, which are painstakingly excavated from technical reports released by commercial AI companies and the handful of open source labs.
I mean the process knowledge that leads to the discovery of those insights you might learn from that class. Process knowledge is the body of techniques for improving the model building process itself: how you decide which architectural changes make it into your most expensive training runs, or how you determine what data is included or excluded from training.
Frontier AI models are built from this process knowledge, and it is a closely guarded secret. When leading AI labs disclose bits of that knowledge, it is always limited, tactical, or contingent. They are taking part in a status game, credibility signaling, or an attempt to capture mindshare. When these labs achieve these goals, they taper the transparency, disclosing less and less until they tell us almost nothing about how they got to their results.
Unfortunately, this diminishing transparency makes it harder and harder for non-frontier labs, academia, and the broader public to understand how frontier AI is actually built or to contest the choices being made on their behalf. It is relatively easy for researchers to download web data and a training toolkit and train a few-billion-parameter model. But without the process knowledge behind frontier systems, researchers’ efforts are unlikely to yield conclusions that apply at any relevant scale.
What is needed, then, is an organization whose purpose is the discovery and promulgation of the process knowledge to create frontier-level artificial intelligence, rather than organizations who merely disclose it for publicity or signaling.
But first, if you will permit me, a story.
Knowledge as weapon
In 16th century Renaissance Italy, mathematical knowledge was rarely shared freely. Instead, it was kept secret because it could serve as a kind of munition. Scholars kept discoveries secret so they could deploy them in public contests called “disputations,” where two intellectuals would compete to see who was the superior thinker. Victory would mean prestige, university positions, and patronage from wealthy nobles. Defeat could bring professional ruin.
So when Scipione del Ferro, a professor in Bologna, discovered a method for solving one form of the depressed cubic (what we would now write as $x^3+px=q$), he did not publish it. He kept it as both sword and shield, revealing it only near the end of his life to his student Antonio Fior.
Fior, convinced he now possessed a secret weapon, challenged Niccolò Tartaglia to a disputation, a “math duel.” He posed a series of cubic equations, assuming Tartaglia would have no general method for solving them. Instead, Tartaglia derived his own method during the contest and defeated Fior decisively.
Naturally, Tartaglia did not publish his new method. It was now his weapon. He kept it secret for years, revealing it only to another man, Girolamo Cardano, after Cardano swore a solemn oath of secrecy.1 Eventually, Cardano published the method anyway, leading to a dispute between Cardano and Tartaglia and eventually, perhaps inevitably, another disputation.
The story of the depressed cubic’s solution is usually presented as a kind of farce. It is comical to imagine mathematicians stockpiling their secret knowledge, ready to unsheathe it in a “math duel” at a moment’s notice. It borders on outright silly.
And yet, for me, this story has an undercurrent of sadness, of waste. The solution to the depressed cubic led to the development of complex numbers. Without this cloak-and-dagger skulking, how much sooner could we have discovered them? How many other ideas were delayed, or lost entirely, to that culture of secrecy?
What is secret in modern AI?
Much of the debate around open-weight releases has focused on what they do and do not reveal. They reveal a model’s architecture and parameters. A technical report may gesture at the data and training recipe. But the most important knowledge usually remains outside the artifact: the process knowledge behind the model, accumulated through thousands of small decisions, experiments, failures, and corrections.
The process knowledge of frontier AI consists of many parts: data sourcing, curation, and preprocessing; data mixtures and curricula; scaling experiments; optimizer and initialization choices; and kernels and systems tricks. It also consists of what doesn’t work: methods that don’t scale, or aren’t stable enough, or otherwise exhibit unacceptable pathologies.
Perhaps most important is the meta-process knowledge, the techniques for making these choices: how you should design experiments to assess data quality; which key evaluations and metrics to track; how to determine whether a new model or technique is worth scaling. These techniques take the form of quality gates, scaling suites and ablation protocols, and the like.
This is the real “secret sauce” of modern AI labs. And it is precisely this process knowledge that rarely, if ever, enters the scientific record.
Transparency as tactic
Instead, when an AI lab releases something other than a new API endpoint, it is most often a set of model weights. These open weight (often erroneously called “open source”) releases don't reveal process knowledge, except very indirectly. Today, when an established lab releases an open weight model, there is usually a press release with evaluation numbers and some vague gestures at technical details, but usually this is all we get. Google’s Gemma 4, OpenAI’s GPT-OSS, Meta’s Llama 4, and Alibaba’s more recent Qwen models all fit this mold.
Some commercial labs do release more detailed reports. In fact, the exceptions to this rule are instructive. Generally speaking, it is newer labs—or labs looking to regain the spotlight—that release detailed reports of how they built their systems. Recently, among American efforts, Arcee, Cursor, Nvidia, and Microsoft’s MAI have all released detailed reports. Many Chinese and European labs have also released reports.
However, for companies, this transparency is tactical. The goal is to generate attention, excitement, funding, and above all, credibility. Like the challenger Fior, up-and-coming labs put their knowledge on display to project power, to attempt to claim their place among the “real” frontier labs. Of course, the ritualization is different from that of Renaissance Italy. Rather than challenge and disputation, we get the press release, the LaTeX tech report, and the accompanying social media fanfare. But the underlying motivations are the same.
As those goals are achieved, subsequent releases from those same labs contain fewer and fewer details. Witness the decline of substance in the tech reports from OpenAI for its GPT series. (I will resist making the joke.) Meta’s Llama series follows this decline as well: Llama 1 and Llama 2 have lots of details, Llama 3 fewer, and Llama 4 no tech report at all. Now even Alibaba’s Qwen has seemingly followed the same path. The more credibility and reputation (or sometimes just capital) an organization has, the less it needs to disclose. “Open weights labs” have repeatedly proven to be only contingently open.
This is why disclosure always stops short of the full details. The training system's software architecture is described in just enough detail to establish credibility. We learn which open-source kernels are used, but (usually) not the details of the ones developed in-house.2 We learn that X% of the pretraining data is code, but we do not know how the code is presented to the model—is it a single file at a time, an entire repo, the commit history as a sequence of diffs?
We do not know and, from the company’s perspective, we do not need to. This is correct: businesses manufacture knowledge for business ends. If transparency no longer advances those ends, the disclosure stops.
Academia’s role
But academia's ends are different. Academia’s purpose is the creation and dissemination of knowledge. However, academia’s role in the development of artificial intelligence is under threat. As we have seen, the manufacture of knowledge increasingly happens inside closed AI labs.
Some of this is necessary, since the cost to develop frontier-level AIs is prohibitive for typical academic budgets. But academia should still be able to contribute new ideas and techniques, just as it has in so many commercialized technical domains, such as UC Berkeley’s development of the RISC-V processor architecture in 2010.
How can we expect researchers to develop useful optimizers or scalable architectures if the protocol for a well-behaved scaling-suite (which many frontier labs have developed) is a secret? How can academia contribute to the development of safe AI—be it filtering for pretraining or alignment at posttraining—if the recipes being used aren't relevant?
Open development and process knowledge
So if process knowledge is the secret sauce of frontier AI, and if smaller actors like academia are to have a role in its creation, then what is needed is an organization whose strategic objective is to turn frontier AI process knowledge into a public good: an organization dedicated to open development.
Marin is our attempt to build such an institution. While we of course aim to build and release high-quality open models, Marin’s primary goal is to make the process of developing frontier AI legible, reusable, and public.
A key part of this effort is the creation of reliable scaling suites like Delphi. Delphi consists of models of increasing size, trained according to a consistent recipe that makes performance predictable across scale. Using this recipe, we were able to accurately predict the performance of the largest model from runs trained with 300x less compute.
Scaling suites like Delphi chart a path toward a world where researchers can test new ideas at small scale and have some confidence that they will continue to work at industry-relevant scales.
Just as importantly, we try to build our tools and methods in the open. Our goals are public from the beginning, and progress is documented as it happens. Even aborted attempts are not quietly buried. Experiment logs, including failures, remain visible. Our Delphi release included a “v1” recipe that failed to scale. Releasing and describing that failure helps researchers avoid repeating the same mistake.
The result is a legible trail of documented process knowledge, as well as the artifacts it produces. We aim to do the same with data curation in pretraining, environment creation and curriculum design in posttraining, and the many smaller pieces of process knowledge that rarely make it into a polished release.
Academia and open development
More broadly, we believe that academic AI should likewise adopt stronger norms around open development. Academia rightly prides itself on the dissemination of knowledge: we publish our ideas, present them at conferences, and share them with the world. But academic practice is still largely organized around disclosure at publication time. Results and methodology are often withheld until a paper is accepted or posted on arXiv. Experimental artifacts may be held back even longer to preserve follow-on work. Negative results often never see the light of day, only to be rediscovered again and again.
Instead, code should be developed in the open where possible, rather than made public only at release time (if at all). Replications, scaling protocols, evaluation suites, and negative results should be treated as meaningful scientific outputs. Failures should be acknowledged as part of the research process, not hidden from it.
Some constraints are real. Data may need to remain private for licensing, privacy, or security reasons. Some projects cannot be conducted fully in public. But where openness is possible, it should be the default. And where full transparency is impossible, careful public documentation should still be the norm.
Scooping is a real concern. But academia has strong norms around plagiarism and priority. Open development, if anything, creates a clearer public record of both. More importantly, in an environment of uncertain and often insufficient funding, it is wasteful for hard-won knowledge to remain locked inside private lab notebooks, repos, or Slack threads.
Frontier AI is built from process knowledge: the accumulated body of experimental technique, evaluation discipline, systems judgment, and hard-won operational understanding that makes large-scale model development possible. If that knowledge remains largely private, then academia cannot meaningfully contribute to the field’s development.
To avoid that outcome, more of this process knowledge has to enter the scientific record. Not every detail can be public. But the default should move toward legibility, reproducibility, and public accumulation, rather than private inheritance.
DeepSeek is an important exception, but one that proves the rule. The kernel releases after DeepSeek-V3 helped answer skepticism about the company’s cost-efficiency claims and established its systems credibility. The later releases around V4 look even more tactical: they make third-party inference providers better at serving DeepSeek’s architecture, thereby increasing adoption while weakening the API and serving moats of closed-weight labs.↩
Cite this post
@misc{hall2026_open_development_of_frontier_ai,
author = {Hall, David},
title = {Open Development of Frontier AI},
year = {2026},
month = {jun},
howpublished = {\url{https://www.openathena.ai/blog/open-development-of-frontier-ai/}},
note = {Open Athena Blog}
}