Distributed Training is the Future

The current state of AI should be alarming to you. Steep capital expenditures and technical requirements for foundation model training have prevented all but the largest companies (OpenAI, Google, Meta, etc) from participating. These centralized entities, under the guise of safety and security, are attempting to use regulatory capture to establish moats and prevent competition. Blockchain provides a solution to this. Blockchain technology can be used to coordinate a global network of GPUs, powered by crypto-economic incentives, to enable decentralized training runs. The capital expenditure requirements for training runs will be reduced by orders of magnitude as a result of distributed training solutions. This will democratize both the training process and model ownership: distributed training will allow for a freer AI market and prevent the stifling regulatory environment that large private corporations are trying to force.

Excitement around distributed training has roared to life in recent months. Technical “requirements” for LLM training, like the need for high-throughput bandwidth and co-located GPUs have been proven false, which is promising for a democratized training future. DisTrO (Nous), OpenDiLoCo (Prime Intellect), and lo-fi have proven that distributed training can be done at scale. As the crypto markets tire from rapid meme rotations and lack of utility, distributed training offers an escape from speculative fatigue and a return to blockchain-driven, real-world solutions. Most importantly, cryptoeconomic incentives and peer-to-peer coordination can act as a catalyst to propel distributed training outcomes past their centralized analogs.

Sector Leaders

@PrimeIntellectand @NousResearchhave both recently released papers validating the feasibility of distributed training. Both teams are launching protocols that will allow for widespread participation; the stage is set for these projects to capture significant market share in the crypto AI landscape. @gensynaiand @PluralisHQare both building in this space as well, and will have breakthrough attention-capturing launches this year. These projects each have unique approaches, and understanding each of them is a pre-requisite for allocating capital efficiently in what will be a top sector in 2025.

Prime Intellect

Prime Intellect’s OpenDiLoCo, off the back of Google’s DiLoCo, showed that training could be done with minimal communication between nodes. Rather than synchronizing gradients after every batch, synchronization occurs after every few hundred batches. OpenDiLoCo brings a number of unlocks. Importantly, models can now be trained in environments where bandwidth is more scarce, ie, distributed environments. This breakthrough means participants no longer need to source all their GPUs from the same datacenter; instead, they can aggregate GPUs from different locations, significantly lowering costs. Additionally, models can be trained across different GPUs — each island’s hardware must be homogeneous, but islands can differ from each other. Islands can also drop in and out during the training process with minimal loss, since gradients are synced amongst islands at checkpoints. Prime Intellect has proven the viability of this approach, training in a cost-effective, distributed way (as highlighted in this post).

Prime Intellect’s compute aggregator and training implementation cover the compute and training layers of the AI stack, but they are setting their sights on capturing the application layer as well. With the launch of an L1, they aim to create an ecosystem that captures the entirety of the decentralized AI stack. By building a full stack developer experience that provides training, models, and compute to prospective AI application builders, they are very well positioned to attract top talent. From there, a defensible moat could emerge, as libraries and tooling that work directly with the Prime Intellect stack produce strong network effects.

Nous Research

Nous announced the launch of Psyche, a crypto-incentivized protocol that will leverage Solana to orchestrate distributed training of LLMs. They use DisTrO as the algorithmic backbone of this architecture. DisTrO implements a novel optimizer that drastically reduces the need for communication between nodes during training– they’ve already completed training of a 15b parameter model via this method. Bandwidth requirements were reduced by up to 3000x during pretraining without seeing significant degradation of loss. DisTrO enables democratized training opportunities as training is now possible over standard internet bandwidth.

Nous will continue its research with Psyche, pushing the boundaries of training and inference to enable widespread participation. If proven successful at scaling, Nous will break through the current limitations of datacenters, eliminating compute constraints. This would enable larger training runs and the creation of models of unprecedented size. Importantly, it will democratize both coordination and participation in training, allowing less-capitalized builders to train unique models with the help of the community, powered by crypto rewards.

Gensyn

Gensyn is building a decentralized machine learning compute protocol that connects devices worldwide, including GPUs, processors in phones, and personal computers. The goal is to create a trustless, peer-to-peer network where individuals can rent out their unused compute power to others who need it for training machine learning models, cutting out centralized cloud services like AWS. Gensyn has emphasized the criticality of verification in training and has created a proof mechanism to enable a trustless training environment. Without verification, model training lacks transparency, which allows the issues we’ve seen in centralized model providers—such as censorship and response curation—to persist.

They seek to enable a decentralized, constantly updating digital twin of the world, with machines curating and interacting with knowledge programmatically, transforming the way humanity interacts with machine learning. More concretely, this allows models to constantly update when receiving environmental stimuli, which results in AI that can learn side-by-side with humans and respond to information in real time. Not only will your model constantly adapt to what you interact with, but will also be updated with information from the interactions others have with their environments, creating a constantly improving digital assistant.

Pluralis

Pluralis’s approach is underpinned by the belief that it is economically unviable for foundation models to be open-sourced in perpetuity. The commoditization of models seems inevitable, as switching costs for users are quite low and models, at this point, are easily replaceable on the backend. By implementing fractional model ownership, Pluralis will incentivize crowdsourced foundation model training where no entity can extract a model’s weights. They’re implementing a verification mechanism dubbed Proof of Learning, a system that lets the trainer verify the model’s final parameters came from the stated training process.

The final state of Pluralis will be the production of closed-source yet community-owned model that directly rewards users for participation, hopefully fortifying a unique moat (created by training on private user data) that leads to value accrual. Training verification is undoubtedly important; further research in this area is necessary across the entirety of the decentralized training landscape to help prevent nefarious actors from ruining training runs.

These distributed training protocols each attack the problem in a slightly different way. Prime Intellect aims to capture the entire AI stack, using a blockchain to facilitate compute access, coordinate training, and house AI applications. Nous’s Psyche will leverage Solana to coordinate training runs, pushing the boundaries of compute constraints and serving inference needs. Gensyn seeks to create a mesh grid of edge devices that through training can house models that are constantly being updated and improved allowing for performant interaction with our world. Lastly, Pluralis’s approach attempts to create a moat around foundation models by incentivizing participation through fractional model ownership, driving value accrual to users via the foundation model itself. While these different approaches will be met with differing results, we believe the sector as a whole will outcompete centralized training because of a few key mechanisms. The improvement of distributed training techniques, efficiency of inference costs (cc: deepseek, 1-bit LLMs), and edge hardware unlocks AI that is much more impactful and versatile than what currently exists. The combination of these mechanisms allows unique access to organic data and access to unique inputs (image, sound, speech, other sensory stimuli). Imagine a world in which our edge devices are constantly ingesting & responding to environmental stimuli on behalf of a user, and constantly improving with new information… continuous, distributed training and the improvements of edge hardware and inference techniques such a world, and should be the north star for deAI.

As these protocols redefine the boundaries of what is possible, there will be a massive rotation of both attention and capital into distributed training solutions. Because AI is also the most interesting part of traditional tech, all crypto will benefit from the frontier R&D that these teams are working on. There are two places where this will manifest most directly; protocols that capture training data, and protocols that aggregate and coordinate compute (this can be both datacenter compute and edge devices). The biggest beneficiaries will be DePIN networks that have positioned themselves to fulfill critical pieces of the AI stack– we have defined this in previous writing.

This is the year of distributed training. Deepseek has proven that open source can compete with closed source; crypto training protocols will prove that distributed training can compete with centralized training. DePIN networks stand to gain from the success and attention that distributed training will garner, particularly in the compute and data space. Aggregating edge compute and capturing unique user data through crypto-economic incentives creates advantages for distributed solutions that will lead to outperformance versus traditional training practices.