The evolution of AI training paradigm: a technological revolution from centralized control to decentralized collaboration

06-11

This article is machine translated

Show original

Author: 0xjacobzhao and ChatGPT 4o

Special thanks to Advait Jayant (Peri Labs), Sven Wellmann (Polychain Capital), Chao (Metropolis DAO), Jiahao (Flock), Alexander Long (Pluralis Research) Ben Fielding & Jeff Amico (Gensyn) for their advice and feedback.

In the entire value chain of AI, model training is the link with the highest resource consumption and the highest technical threshold, which directly determines the upper limit of the model's capabilities and the actual application effect. Compared with the lightweight calls in the inference stage, the training process requires continuous large-scale computing power investment, complex data processing processes and high-intensity optimization algorithm support, which is the real "heavy industry" for building AI systems. From the perspective of architectural paradigm, training methods can be divided into four categories: centralized training, distributed training, federated learning, and decentralized training, which is the focus of this article.

Centralized training is the most common traditional method, in which a single organization completes the entire training process in a local high-performance cluster. All components, from hardware (such as NVIDIA GPU), underlying software (CUDA, cuDNN), cluster scheduling systems (such as Kubernetes), to training frameworks (such as PyTorch based on NCCL backend), are coordinated and operated by a unified control system. This deeply collaborative architecture optimizes the efficiency of memory sharing, gradient synchronization, and fault tolerance mechanisms, and is very suitable for the training of large-scale models such as GPT and Gemini. It has the advantages of high efficiency and controllable resources, but it also has problems such as data monopoly, resource barriers, energy consumption, and single-point risks.

Distributed training is the mainstream method for large model training. Its core is to break down the model training tasks and distribute them to multiple machines for collaborative execution to break through the bottleneck of single-machine computing and storage. Although it has the "distributed" feature physically, the overall scheduling and synchronization are still controlled by a centralized organization. It often runs in a high-speed local area network environment. Through the NVLink high-speed interconnect bus technology, the master node coordinates all subtasks. Mainstream methods include:

Data Parallel: Each node trains different data parameters and shares them, which requires matching model weights.
Model Parallel: Deploy different parts of the model on different nodes to achieve strong scalability;
Pipeline Parallel: Execute serially in stages to improve throughput;
Tensor Parallel: Refined segmentation of matrix calculations to improve parallel granularity.

Distributed training is a combination of "centralized control + distributed execution", similar to the same boss remotely directing multiple "office" employees to collaborate to complete tasks. Currently, almost all mainstream large models (GPT-4, Gemini, LLaMA, etc.) are trained in this way.

Decentralized Training represents a more open and censorship-resistant future path. Its core feature is that multiple untrusted nodes (which may be home computers, cloud GPUs, or edge devices) work together to complete training tasks without a central coordinator, usually through protocol-driven task distribution and collaboration, and with the help of cryptographic incentive mechanisms to ensure the honesty of contributions. The main challenges faced by this model include:

Heterogeneous devices and difficult task division: Heterogeneous devices are difficult to coordinate and task division is inefficient;
Communication efficiency bottleneck: Network communication is unstable and gradient synchronization bottleneck is obvious;
Lack of trusted execution: The lack of a trusted execution environment makes it difficult to verify whether the node is actually involved in the calculation;
Lack of unified coordination: There is no central scheduler, and the task distribution and exception rollback mechanisms are complex.

Decentralized training can be understood as: a group of volunteers around the world, each contributing computing power to collaboratively train models, but "truly feasible large-scale decentralized training" is still a systematic engineering challenge, involving multiple aspects such as system architecture, communication protocols, cryptographic security, economic mechanisms, and model verification. However, whether "collaboration is effective + incentives are honest + results are correct" can be achieved is still in the early prototype exploration stage.

Federated Learning is a transitional form between distribution and decentralization. It emphasizes local data retention and centralized aggregation of model parameters. It is suitable for scenarios that focus on privacy compliance (such as medical and financial). Federated learning has the engineering structure and local coordination capabilities of distributed training, and also has the data dispersion advantages of decentralized training, but it still relies on trusted coordinators and does not have the characteristics of complete openness and censorship resistance. It can be regarded as a "controlled decentralization" solution in the privacy compliance scenario. It is relatively mild in training tasks, trust structure and communication mechanism, and is more suitable as a transitional deployment architecture in the industry.

AI training paradigm panorama comparison table (technical architecture × trust incentive × application characteristics)

The boundaries, opportunities and realistic paths of decentralized training

From the perspective of training paradigm, decentralized training is not applicable to all types of tasks. In some scenarios, due to the complex structure of the task, extremely high resource requirements, or difficulty in collaboration, it is naturally not suitable to be completed efficiently between heterogeneous, trustless nodes. For example, large model training often relies on high video memory, low latency, and high-speed bandwidth, which makes it difficult to effectively split and synchronize in an open network; tasks with strong data privacy and sovereignty restrictions (such as medical, financial, and confidential data) are restricted by legal compliance and ethical constraints and cannot be openly shared; and tasks that lack a collaborative incentive basis (such as corporate closed-source models or internal prototype training) lack external motivation for participation. These boundaries together constitute the current practical limitations of decentralized training.

But this does not mean that decentralized training is a false proposition. In fact, decentralized training shows clear application prospects in tasks that are lightweight, easy to parallelize, and incentivized. Including but not limited to: LoRA fine-tuning, behavioral alignment post-training tasks (such as RLHF, DPO), data crowdsourcing training and labeling tasks, resource-controllable small basic model training, and collaborative training scenarios involving edge devices. These tasks generally have the characteristics of high parallelism, low coupling, and tolerance for heterogeneous computing power, and are very suitable for collaborative training through P2P networks, Swarm protocols, distributed optimizers, etc.

Decentralized training task suitability overview table

Analysis of classic decentralized training projects

At present, in the frontier field of decentralized training and federated learning, representative blockchain projects mainly include Prime Intellect, Pluralis.ai, Gensyn, Nous Research and Flock.io. From the perspective of technological innovation and engineering difficulty, Prime Intellect, Nous Research and Pluralis.ai have proposed more original explorations in system architecture and algorithm design, representing the frontier direction of current theoretical research; while the implementation paths of Gensyn and Flock.io are relatively clear, and initial engineering progress can be seen. This article will analyze the core technologies and engineering architectures behind these five projects in turn, and further explore their differences and complementary relationships in the decentralized AI training system.

Prime Intellect: A pioneer in collaborative reinforcement learning networks with verifiable training trajectories

Prime Intellect is committed to building a trustless AI training network that allows anyone to participate in training and receive credible rewards for their computing contributions. Prime Intellect hopes to build a verifiable, open, and fully incentivized AI decentralized training system through the three modules of PRIME-RL + TOPLOC + SHARDCAST.

1. Prime Intellect protocol stack structure and key module value

2. Detailed explanation of the key mechanisms of Prime Intellect training

PRIME-RL: Decoupled Asynchronous Reinforcement Learning Task Architecture

PRIME-RL is a task modeling and execution framework customized by Prime Intellect for decentralized training scenarios, designed for heterogeneous networks and asynchronous participation. It uses reinforcement learning as the priority adaptation object, structurally decouples the training, reasoning and weight upload processes, so that each training node can complete the task cycle independently locally, and collaborate with the verification and aggregation mechanism through standardized interfaces. Compared with traditional supervised learning processes, PRIME-RL is more suitable for flexible training in a decentralized scheduling environment, which not only reduces the complexity of the system, but also lays the foundation for supporting multi-task parallelism and strategy evolution.

TOPLOC: A lightweight training behavior verification mechanism

TOPLOC (Trusted Observation & Policy-Locality Check) is the core mechanism of training verifiability proposed by Prime Intellect, which is used to determine whether a node has actually completed effective policy learning based on observed data. Unlike heavy solutions such as ZKML, TOPLOC does not rely on full model recalculation, but completes lightweight structure verification by analyzing the local consistency trajectory between "observation sequence ↔ policy update". It is the first time that it converts the behavioral trajectory of the training process into a verifiable object. It is a key innovation to achieve trustless training reward distribution and provides a feasible path for building an auditable and incentivized decentralized collaborative training network.

SHARDCAST: Asynchronous Weight Aggregation and Propagation Protocol

SHARDCAST is a weight propagation and aggregation protocol designed by Prime Intellect, optimized for real network environments with asynchronous, bandwidth-constrained, and variable node states. It combines the gossip propagation mechanism with the local synchronization strategy, allowing multiple nodes to continuously submit partial updates in an asynchronous state, achieving progressive convergence and multi-version evolution of weights. Compared with centralized or synchronous AllReduce methods, SHARDCAST significantly improves the scalability and fault tolerance of decentralized training, and is the core foundation for building stable weight consensus and continuous training iterations.

OpenDiLoCo: A framework for sparse asynchronous communication

OpenDiLoCo is a communication optimization framework independently implemented and open-sourced by the Prime Intellect team based on the DiLoCo concept proposed by DeepMind. It is designed for challenges such as bandwidth constraints, device heterogeneity, and node instability that are common in decentralized training. Its architecture is based on data parallelism. By building sparse topological structures such as Ring, Expander, and Small-World, it avoids the high communication overhead of global synchronization and only relies on local neighbor nodes to complete model collaborative training. Combined with asynchronous updates and breakpoint fault tolerance mechanisms, OpenDiLoCo enables consumer-grade GPUs and edge devices to stably participate in training tasks, significantly improving the participation of global collaborative training, and is one of the key communication infrastructures for building decentralized training networks.

PCCL: Collaborative Communication Library

PCCL (Prime Collective Communication Library) is a lightweight communication library tailored by Prime Intellect for decentralized AI training environments. It aims to solve the adaptation bottleneck of traditional communication libraries (such as NCCL, Gloo) in heterogeneous devices and low-bandwidth networks. PCCL supports sparse topology, gradient compression, low-precision synchronization and breakpoint recovery. It can run on consumer-grade GPUs and unstable nodes. It is the underlying component that supports the asynchronous communication capabilities of the OpenDiLoCo protocol. It significantly improves the bandwidth tolerance and device compatibility of the training network, and opens up the "last mile" communication foundation for building a truly open, trustless collaborative training network.

3. Prime Intellect Incentive Network and Role Division

Prime Intellect has built a permissionless, verifiable, and economically incentivized training network that enables anyone to participate in tasks and be rewarded based on real contributions. The protocol operates based on three core roles:

Task initiator: defines the training environment, initial model, reward function and verification criteria
Training nodes: perform local training, submit weight updates and observe trajectories
Verification node: Use the TOPLOC mechanism to verify the authenticity of training behavior and participate in reward calculation and strategy aggregation

The core process of the protocol includes task release, node training, trajectory verification, weight aggregation (SHARDCAST) and reward distribution, forming an incentive closed loop around "real training behavior".

INTELLECT-2: The Release of the First Verifiable Decentralized Training Model

Prime Intellect released INTELLECT-2 in May 2025, the world's first large reinforcement learning model trained by asynchronous, trustless decentralized nodes, with a parameter scale of 32B. The INTELLECT-2 model is trained by 100+ GPU heterogeneous nodes across three continents, using a fully asynchronous architecture and training time of over 400 hours, demonstrating the feasibility and stability of asynchronous collaborative networks. This model is not only a breakthrough in performance, but also the first systematic implementation of the "training is consensus" paradigm proposed by Prime Intellect. INTELLECT-2 integrates core protocol modules such as PRIME-RL (asynchronous training structure), TOPLOC (training behavior verification) and SHARDCAST (asynchronous weight aggregation), marking the first time that a decentralized training network has achieved the openness, verifiability and economic incentive closed loop of the training process.

In terms of performance, INTELLECT-2 is based on QwQ-32B training and has done special RL training in code and mathematics, which is at the forefront of current open source RL fine-tuning models. Although it has not yet surpassed closed-source models such as GPT-4 or Gemini, its real significance lies in: it is the world's first decentralized model experiment with a complete training process that is reproducible, verifiable, and auditable. Prime Intellect not only open-sourced the model, but more importantly, the training process itself - the training data, strategy update trajectory, verification process and aggregation logic are all transparent and traceable, building a decentralized training network prototype that everyone can participate in, trustworthy collaboration, and share benefits.

5. Team and Financing Background

Prime Intellect completed a $15 million seed round of financing in February 2025, led by Founders Fund, with participation from industry leaders such as Menlo Ventures, Andrej Karpathy, Clem Delangue, Dylan Patel, Balaji Srinivasan, Emad Mostaque, and Sandeep Nailwal. Prior to this, the project completed a $5.5 million early round of financing in April 2024, led by CoinFund and Distributed Global, with participation from Compound VC, Collab + Currency, and Protocol Labs. To date, Prime Intellect has raised more than $20 million in total.

The co-founders of Prime Intellect are Vincent Weisser and Johannes Hagemann. The team members have backgrounds in AI and Web3. The core members come from Meta AI, Google Research, OpenAI, Flashbots, Stability AI and the Ethereum Foundation. They have profound capabilities in system architecture design and distributed engineering implementation. They are one of the very few executive teams that have successfully completed real decentralized large-scale model training.

Pluralis: A paradigm explorer for asynchronous model parallelism and structure compression collaborative training

Pluralis is a Web3 AI project focusing on "trusted collaborative training networks". Its core goal is to promote a decentralized, open-participation, and long-term incentive model training paradigm. Different from the current mainstream centralized or closed training paths, Pluralis proposed a new concept called Protocol Learning: "protocol-based" model training process, and build an open training system with an intrinsic incentive closed loop through verifiable collaboration mechanisms and model ownership mapping.

1. Core Concept: Protocol Learning

Protocol Learning proposed by Pluralis consists of three key pillars:

Unmaterializable Models: The model is distributed in fragments among multiple nodes, and no single node can restore the complete weights and remain closed source. This design makes the model a natural "in-protocol asset", which can achieve access credential control, leakage protection and income attribution binding.
Model-parallel Training over Internet: Through the asynchronous Pipeline model parallel mechanism (SWARM architecture), different nodes only hold partial weights and complete training or inference through low-bandwidth network collaboration.
Partial Ownership for Incentives: All participating nodes obtain partial ownership of the model based on their training contributions, thereby enjoying future revenue sharing and protocol governance rights.

2. Technical Architecture of Pluralis Protocol Stack

The evolution of AI training paradigm: a technological revolution from centralized control to decentralized collaboration

3. Detailed explanation of key technical mechanisms

Unmaterializable Models

In A Third Path: Protocol Learning, it was first proposed that model weights be distributed in the form of fragments to ensure that "model assets" can only run in the Swarm network, ensuring that their access and benefits are controlled by the protocol. This mechanism is the prerequisite for achieving a sustainable incentive structure for decentralized training.

Asynchronous Model-Parallel Training

In SWARM Parallel with Asynchronous Updates, Pluralis built an asynchronous model parallel architecture based on Pipeline and demonstrated it for the first time on LLaMA-3. The core innovation is the introduction of the Nesterov Accelerated Gradient (NAG) mechanism, which effectively corrects the gradient drift and convergence instability during the asynchronous update process, making training between heterogeneous devices practical in a low-bandwidth environment.

Column-Space Sparsification

In Beyond Top-K, it is proposed to replace the traditional Top-K with a structure-aware column space compression method to avoid destroying the semantic path. This mechanism takes into account both model accuracy and communication efficiency. It has been tested that more than 90% of communication data can be compressed in an asynchronous model parallel environment, which is a key breakthrough in achieving structure-aware efficient communication.

4. Technology Positioning and Path Selection

Pluralis clearly takes "asynchronous model parallelism" as its core direction, emphasizing that it has the following advantages over data parallelism:

Support low bandwidth networks and non-coherent nodes;
Adapt to device heterogeneity and allow consumer-grade GPUs to participate;
It has natural elastic scheduling capabilities and supports frequent online/offline of nodes;
The three major breakthrough points are structure compression + asynchronous update + weight non-extractability.

At present, according to the six technical blog documents published on the official website, the logical structure is integrated into the following three main lines:

Philosophy and Vision: A Third Path: Protocol Learning Why Decentralized Training Matters
Technical mechanism details: "SWARM Parallel", "Beyond Top-K", "Asynchronous Updates"
Exploration of Institutional Innovation: Unmaterializable Models and Partial Ownership Protocols

At present, Pluralis has not yet launched any products, test networks or open source codes. The reason is that the technical path it has chosen is extremely challenging: it must first solve system-level problems such as the underlying system architecture, communication protocols, and the non-exportability of weights before it can package product services upward.

In a new paper published by Pluralis Research in June 2025, its decentralized training framework was expanded from model pre-training to the model fine-tuning stage, supporting asynchronous updates, sparse communication and partial weight aggregation. Compared with previous designs that focused on theory and pre-training, this work pays more attention to the feasibility of implementation, marking its further maturity in the full-cycle training architecture.

5. Team and Financing Background

Pluralis completed a $7.6 million seed round in 2025, led by Union Square Ventures (USV) and CoinFund. Founder Alexander Long has a PhD in machine learning and a background in both mathematics and systems research. The core members are all machine learning researchers with PhDs. It is a typical technology-driven project, with high-density papers and technical blogs as the main publishing path. It has not yet established a BD/Growth team and is focused on overcoming the infrastructure challenges of low-bandwidth asynchronous model parallelism.

Gensyn: A decentralized training protocol layer driven by verifiable execution

Gensyn is a Web3 AI project that focuses on "trusted execution of deep learning training tasks". The core is not to reconstruct the model architecture or training paradigm, but to build a verifiable distributed training execution network with the full process of "task distribution + training execution + result verification + fair incentives". Through the architectural design of off-chain training + on-chain verification, Gensyn has established an efficient, open and incentivized global training market, making "training is mining" a reality.

1. Project Positioning: Execution Protocol Layer for Training Tasks

Gensyn is not about “how to train”, but about the infrastructure of “who trains, how to verify, and how to share profits”. Its essence is a verifiable computing protocol for training tasks, which mainly solves:

Who will perform the training task (computing power distribution and dynamic matching)
How to verify the execution results (no need to recalculate the whole thing, only verify the disputed operators)
How to distribute training income (Stake, Slashing and multi-role game mechanism)

2. Technical Architecture Overview

3. Module Detailed Explanation

RL Swarm: A collaborative reinforcement learning training system

RL Swarm, pioneered by Gensyn, is a decentralized multi-model collaborative optimization system for the post-training phase, with the following core features:

Distributed reasoning and learning process:

Generation phase (Answering): Each node outputs the answer independently;
Critique stage: Nodes comment on each other’s output and select the best answer and logic;
Consensus phase (Resolving): predict the preferences of most nodes and modify their own answers accordingly to achieve local weight updates.

RL Swarm proposed by Gensyn is a decentralized multi-model collaborative optimization system. Each node runs an independent model and performs local training without gradient synchronization. It naturally adapts to heterogeneous computing power and unstable network environment, and supports elastic node access and exit. This mechanism draws on the ideas of RLHF and multi-agent game, but is closer to the dynamic evolution logic of collaborative reasoning network. Nodes are rewarded according to the degree of consistency with the group consensus results, thereby driving the continuous optimization and convergent learning of reasoning capabilities. RL Swarm significantly improves the robustness and generalization ability of the model in an open network, and has been deployed as a core execution module in Gensyn's Testnet Phase 0 based on Ethereum Rollup.

Verde + Proof-of-Learning: Trusted Verification Mechanism

Gensyn's Verde module combines three mechanisms:

Proof-of-Learning: Determine whether training actually occurred based on gradient traces and training metadata;
Graph-Based Pinpoint: locates divergent nodes in the training computation graph and only needs to recalculate specific operations;
Refereed Delegation: It uses an arbitration verification mechanism, where the verifier and challenger raise disputes and conduct partial verification, which greatly reduces the verification cost.

Compared with ZKP or full recomputation verification schemes, the Verde scheme achieves a better balance between verifiability and efficiency.

SkipPipe: Communication fault-tolerant optimization mechanism

SkipPipe is designed to solve the communication bottleneck problem in the "low bandwidth + node offline" scenario. Its core capabilities include:

Skip Ratio: skip restricted nodes to avoid training blockage;
Dynamic scheduling algorithm: generates the optimal execution path in real time;
Fault-tolerant execution: Even if 50% of the nodes fail, the inference accuracy only drops by about 7%.

It supports training throughput improvement of up to 55%, and implements key capabilities such as "early-exit reasoning", "seamless reordering", and "inference completion".

HDEE: Cross-domain heterogeneous expert cluster

The HDEE (Heterogeneous Domain-Expert Ensembles) module is dedicated to optimizing the following scenarios:

Multi-domain, multi-modal, and multi-task training;
The distribution of various types of training data is uneven and the difficulty varies greatly;
Task allocation and scheduling problems in an environment with heterogeneous device computing capabilities and inconsistent communication bandwidth.

Its core features:

MHe-IHo: Assign models of different sizes to tasks of different difficulty (heterogeneous models and consistent training step size);
MHo-IHe: The task difficulty is unified, but the training step size is adjusted asynchronously;
Support heterogeneous expert models + pluggable training strategies to improve adaptability and fault tolerance;
It emphasizes "parallel collaboration + extremely low communication + dynamic expert allocation" and is suitable for complex task ecosystems in reality.

Multi-role game mechanism: trust and incentives go hand in hand

The Gensyn network introduces four types of participants:

Submitter: publishes training tasks, sets structure and budget;
Solver: executes training tasks and submits results;
Verifier: Verify training behavior to ensure compliance and effectiveness;
Whistleblower: Challenge validators to obtain arbitration rewards or bear penalties.

This mechanism is inspired by the Truebit economic game design. By forcibly inserting errors + random arbitration, it encourages participants to collaborate honestly and ensures the reliable operation of the network.

4. Testnet and Roadmap Planning

5. Team and Financing Background

Gensyn was co-founded by Ben Fielding and Harry Grieve and is headquartered in London, UK. In May 2023, Gensyn announced the completion of a $43 million Series A financing led by a16z crypto, with other investors including CoinFund, Canonical, Ethereal Ventures, Factor and Eden Block. The team background combines distributed systems and machine learning engineering experience, and has long been committed to building a verifiable, trustless, large-scale AI training execution network.

Nous Research: A cognitive evolutionary training system driven by subjective AI concepts

Nous Research is one of the few decentralized training teams that has both philosophical and engineering achievements. Its core vision stems from the concept of "Desideratic AI": viewing AI as an intelligent subject with subjectivity and evolutionary capabilities, rather than a simple controllable tool. The uniqueness of Nous Research lies in the fact that it does not optimize AI training as an "efficiency problem", but rather views it as a process of forming a "cognitive subject". Driven by this vision, Nous focuses on building an open training network that is collaboratively trained by heterogeneous nodes, does not require central scheduling, and is censorship-resistant, and is systematically implemented through a full-stack tool chain.

1. Concept support: Redefine the "purpose" of training

Nous doesn’t invest too much in incentive design or protocol economics, but instead seeks to change the philosophical premise of the training itself:

Oppose "alignmentism": do not agree with "training" that aims at human control as the only goal, and advocate that training should encourage the model to form an independent cognitive style;
Emphasis on model subjectivity: It is believed that the basic model should retain uncertainty, diversity and hallucination generation ability (hallucination as virtue);
Model training is cognitive formation: the model is not "optimizing task completion" but an individual participating in the cognitive evolution process.

Although this training concept is "romantic", it reflects the core logic of Nous in designing training infrastructure: how to allow heterogeneous models to evolve in an open network rather than being uniformly disciplined.

2. Training Core: Psyche Network and DisTrO Optimizer

Nous's most critical contribution to decentralized training is the construction of the Psyche network and the underlying communication optimizer DisTrO (Distributed Training Over-the-Internet), which together constitute the execution center of the training task: DisTrO + Psyche network has multiple core capabilities, including communication compression (using DCT + 1-bit sign encoding to greatly reduce bandwidth requirements), node adaptability (supporting heterogeneous GPUs, disconnection reconnection and autonomous exit), asynchronous fault tolerance (continuous training without synchronization, with high fault tolerance), and decentralized scheduling mechanism (no central coordinator, consensus and task distribution based on blockchain). This architecture provides a realistic and feasible technical foundation for low-cost, highly flexible, and verifiable open training networks.

This architectural design emphasizes practical feasibility: it does not rely on central servers, is adaptable to global volunteer nodes, and has on-chain traceability of training results.

3. Reasoning and agency system composed of Hermes / Forge / TEE_HEE

In addition to building decentralized training infrastructure, Nous Research has also conducted several exploratory system experiments around the concept of "AI subjectivity":

1. Hermes open source model series: Hermes 1 to 3 are representative open source large models launched by Nous, based on LLaMA 3.1 training, covering three parameter scales of 8B, 70B and 405B. This series aims to embody the "de-instruction, retain diversity" training concept advocated by Nous, and demonstrates stronger expressiveness and generalization capabilities in long context retention, role-playing, multi-round dialogue, etc.

2. Forge Reasoning API: Multimodal Reasoning System

Forge is a reasoning framework developed by Nous that combines three complementary mechanisms to achieve more flexible and creative reasoning capabilities:

MCTS (Monte Carlo Tree Search): Strategy search for complex tasks;
CoC (Chain of Code): Introduces the combination path of code chain and logical reasoning;
MoA (Mixture of Agents): Allows multiple models to negotiate and improve the breadth and diversity of output.

The system emphasizes "non-deterministic reasoning" and combinatorial generation paths, which is a powerful response to the traditional instruction alignment paradigm.

3. TEE_HEE: AI autonomous agent experiment: TEE_HEE is Nous's cutting-edge exploration in the direction of autonomous agents, which aims to verify whether AI can run independently in a trusted execution environment (TEE) and have a unique digital identity. The agent has its own Twitter and Ethereum accounts, and all control permissions are managed by a remotely verifiable enclave, so developers cannot interfere with its behavior. The goal of the experiment is to build an AI subject with "immutability" and "independent behavioral intentions", taking an important step towards building an autonomous intelligent body.

4. AI behavior simulator platform: Nous has also developed multiple simulators including WorldSim, Doomscroll, Gods & S8n, etc., to study the behavior evolution and value formation mechanism of AI in a multi-role social environment. Although not directly involved in the training process, these experiments lay the semantic foundation for the cognitive behavior modeling of long-term autonomous AI.

IV. Team and Financing Overview

Nous Research was founded in 2023 by Jeffrey Quesnelle (CEO), Karan Malhotra, Teknium, Shivani Mitra and others. The team is driven by philosophy and focuses on system engineering, with diverse backgrounds in machine learning, system security, decentralized networks, etc. In 2024, it received $5.2 million in seed round financing. In April 2025, it completed a $50 million Series A financing led by Paradigm, with a valuation of $1 billion, becoming one of the Web3 AI unicorns.

Flock: A blockchain-enhanced federated learning network

Flock.io is a blockchain-based federated learning platform that aims to decentralize data, computing, and models for AI training. FLock prefers the integrated framework of "federated learning + blockchain reward layer", which is essentially an on-chain evolution of the traditional FL architecture rather than a systematic exploration of building a new training protocol. Compared with decentralized training projects such as Gensyn, Prime Intellect, Nous Research, and Pluralis, Flock focuses on privacy protection and usability improvements rather than theoretical breakthroughs in communication, verification, or training methods. Its real comparison objects are federated learning systems such as Flower, FedML, and OpenFL.

1. The core mechanism of Flock.io

1. Federated learning architecture: emphasizing data sovereignty and privacy protection

Flock is based on the classic Federated Learning (FL) paradigm, allowing multiple data owners to collaboratively train a unified model without sharing the original data, focusing on solving data sovereignty, security, and trust issues. The core process includes:

Local training: Each participant (Proposer) trains the model on a local device without uploading the original data;
On-chain aggregation: After training is completed, local weight updates are submitted and aggregated into a global model by the on-chain Miner;
Committee evaluation: VRF randomly elects voter nodes and uses an independent test set to evaluate and score the aggregation model;
Incentives and punishments: rewards or confiscation of collateral are executed based on the scoring results to achieve anti-malice and dynamic trust maintenance.

2. Blockchain Integration: Achieving Trustless System Coordination

Flock has put all the core links of the training process (task allocation, model submission, evaluation and scoring, and incentive execution) on the chain to make the system transparent, verifiable, and censorship-resistant. The main mechanisms include:

VRF random election mechanism: improves the fairness and anti-manipulation ability of the rotation between Proposer and Voter;
Stake mechanism (PoS): Constrain node behavior through token pledge and penalty to improve system robustness;
Automatic execution of on-chain incentives: Through smart contracts, reward distribution and slashing penalties that are bound to task completion and evaluation results are realized, building a collaborative network that does not require trusted intermediaries.

3. zkFL: Privacy protection innovation of zero-knowledge aggregation mechanism: Flock introduces the zkFL zero-knowledge aggregation mechanism, which allows Proposers to submit locally updated zero-knowledge proofs. Voters can verify their correctness without accessing the original gradients. This improves the credibility of the training process while ensuring privacy, and represents an important innovation in federated learning in the direction of integrating privacy protection and verifiability.

2. Flock’s core product components

AI Arena: It is a decentralized training platform of Flock.io. Users can participate in model tasks through train.flock.io, act as trainers, validators or delegators, and receive rewards by submitting models, evaluating performance or delegating tokens. Currently, tasks are officially released and will be gradually opened to the community for co-creation in the future.

FL Alliance: It is a Flock federated learning client that supports participants to use private data to further fine-tune the model. Through VRF election, staking and slashing mechanisms, it ensures the honesty and collaboration efficiency of the training process, and is the key link between community initial training and real deployment.

AI Marketplace: It is a model co-creation and deployment platform where users can propose models, contribute data, and call model services. It supports database access and RAG enhanced reasoning, and promotes the implementation and circulation of AI models in various practical scenarios.

3. Team and Financing Overview

Flock.io was founded by Sun Jiahao and has issued the platform token FLOCK. The project has raised a total of US$11 million, with investors including DCG, Lightspeed Faction, Tagus Capital, Animoca Brands, Fenbushi, OKX Ventures, etc. In March 2024, Flock completed a US$6 million seed round of financing to launch the test network and federated learning client; in December of the same year, it added US$3 million in financing and received funding from the Ethereum Foundation to focus on blockchain-driven AI incentive mechanisms. At present, the platform has created 6,428 models, connected to 176 training nodes, 236 verification nodes, and 1,178 delegators.

Compared with decentralized training projects, federated learning-based systems such as Flock have more advantages in training efficiency, scalability, and privacy protection. They are especially suitable for collaborative training of small and medium-sized models. The solutions are pragmatic and easy to implement, and are more inclined to feasibility optimization at the engineering level. Projects such as Gensyn and Pluralis pursue deeper theoretical breakthroughs in training methods and communication mechanisms. The system challenges are greater, but they are also closer to the exploration of a truly "trustless, decentralized" training paradigm.

EXO: Decentralized training attempt for edge computing

EXO is a representative AI project in the current edge computing scenario, dedicated to realizing lightweight AI training, reasoning and agent applications on home-level consumer devices. Its decentralized training path emphasizes "low communication overhead + local autonomous execution", using the DiLoCo asynchronous delay synchronization algorithm and SPARTA sparse parameter exchange mechanism to significantly reduce the bandwidth requirements for multi-device collaborative training. At the system level, EXO did not build an on-chain network or introduce an economic incentive mechanism, but instead launched the single-machine multi-process simulation framework EXO Gym, which supports researchers to easily conduct rapid verification and experimentation of distributed training methods in a local environment.

1. Overview of the Core Mechanism

DiLoCo asynchronous training: Node synchronization is performed every H steps to adapt to unstable networks;

SPARTA sparse synchronization: only a very small number of parameters (e.g. 0.1%) are exchanged at each step, maintaining model relevance and reducing bandwidth requirements;

Asynchronous combination optimization: The two can be used in combination to achieve a better compromise between communication and performance.

Exploration of evML verification mechanism: Edge-Verified Machine Learning (evML) proposes the use of TEE/Secure Context for low-cost computing verification, and realizes the trusted participation of edge devices without staking through remote verification + spot check mechanism. It is an engineering compromise between economic security and privacy protection.

2. Tools and Scenario Applications

EXO Gym: can simulate multi-node training environments on a single device and support communication strategy experiments for models such as NanoGPT, CNN, and Diffusion;

EXO Desktop App: A desktop AI tool for individual users that supports privacy-friendly personalization features such as local large model running, iPhone mirroring control, and private context integration (such as SMS, calendar, video recording).

EXO Gym is more like an exploration-oriented decentralized training experiment project, which mainly integrates existing communication compression technologies (such as DiLoCo and SPARTA) to achieve lightweight training paths. Compared with projects such as Gensyn, Nous, and Pluralis, EXO has not yet entered the core stages of on-chain collaboration, verifiable incentive mechanisms, or real distributed network deployment.

The front-end engine of decentralized training: a panoramic study of model pre-training

Faced with the core challenges commonly found in decentralized training, such as device heterogeneity, communication bottlenecks, coordination difficulties, and lack of trusted execution, Gensyn, Prime Intellect, Pluralis, and Nous Research have proposed differentiated system architecture paths. From the perspectives of training methods and communication mechanisms, these four projects have demonstrated their unique technical focus and engineering implementation logic.

In terms of training method optimization, the four explored key dimensions such as collaborative strategies, update mechanisms, and asynchronous control, covering different stages from pre-training to post-training.

Prime Intellect's PRIME-RL is an asynchronous scheduling structure for the pre-training stage. Through the strategy of "local training + periodic synchronization", it realizes an efficient and verifiable training scheduling mechanism in a heterogeneous environment. This method has strong versatility and flexibility. It has a high degree of theoretical innovation and proposes a clear paradigm for the training control structure; the engineering implementation difficulty is medium to high, and it has high requirements for the underlying communication and control modules.

The DeMo optimizer launched by Nous Research focuses on the training stability problem in asynchronous low-bandwidth environments, and realizes a high-fault-tolerant gradient update process under heterogeneous GPU conditions. It is one of the few solutions that have achieved theoretical and engineering unity in the "asynchronous communication compression closed loop". The theoretical innovation is very high, especially in the compression and scheduling collaborative path; the engineering implementation is also very difficult, especially relying on the coordination accuracy of asynchronous parallelism.

Pluralis' SWARM + NAG is one of the most systematic and groundbreaking designs in the current asynchronous training path. It is based on the asynchronous model parallel framework, introduces Column-space sparse communication and NAG momentum correction, and builds a large model training solution that can converge stably under low bandwidth conditions. It has a high degree of theoretical innovation and is a structural pioneer of asynchronous collaborative training; the engineering difficulty is also extremely high, requiring deep integration of multi-level synchronization and model segmentation.

Gensyn's RL Swarm mainly serves the post-training stage, focusing on fine-tuning strategies and collaborative learning of agents. Its training process follows the three-step process of "generate-evaluate-vote", which is particularly suitable for the dynamic adjustment of complex behaviors in multi-agent systems. The theoretical innovation is medium-high, mainly reflected in the collaborative logic of agents; the engineering implementation difficulty is moderate, and the main challenges lie in system scheduling and behavior convergence control.

In terms of communication mechanism optimization, these four projects also have their own targeted layouts, and generally focus on systematic solutions to bandwidth bottlenecks, node heterogeneity and scheduling stability problems.

Prime Intellect's PCCL is a low-level communication library that replaces the traditional NCCL, aiming to provide a more robust collective communication foundation for the upper-level training protocol. The theoretical innovation is medium-high, with certain breakthroughs in fault-tolerant communication algorithms; the engineering difficulty is medium, with strong module adaptability.

Nous Research's DisTrO is the core communication module of DeMo, which emphasizes achieving minimum communication overhead under low bandwidth while ensuring the continuity of the training closed loop. It has high theoretical innovation and universal design value in scheduling and coordination structure; it is difficult to engineer and has high requirements for compression accuracy and training synchronization.

Pluralis' communication mechanism is deeply embedded in the SWARM architecture, significantly reducing the communication load in asynchronous training of large models, while ensuring convergence and maintaining efficient throughput. It has a high degree of theoretical innovation and sets a paradigm for asynchronous model communication design; the engineering difficulty is extremely high, relying on distributed model orchestration and structural sparsity control.

Gensyn's SkipPipe is a fault-tolerant scheduling component for RL Swarm. This solution has low deployment cost and is mainly used to enhance the training stability of the engineering landing layer. The theoretical innovation is average, and it is more of an engineering implementation of known mechanisms; the engineering difficulty is relatively low, but it is highly practical in actual deployment.

In addition, we can measure the value of decentralized training projects from two more macroscopic categories: blockchain collaboration layer and AI training layer:

Blockchain collaboration level: Emphasis on protocol credibility and incentive collaboration logic
Verifiability: Build trust in whether the training process is verifiable and whether gaming or encryption mechanisms are introduced;
Incentive mechanism: whether a task-driven token reward/role mechanism has been designed;
Openness and entry barriers: whether the node is easy to access and whether it is centralized or permission-controlled.
AI training system level: highlighting engineering capabilities and performance accessibility
Scheduling and fault-tolerance mechanism: whether it is fault-tolerant, asynchronous, dynamic, or distributed;
Training method optimization: whether the model training algorithm or structure is optimized;
Communication path optimization: whether to compress gradients/sparse communication and adapt to low bandwidth.

Based on the above indicator system, the following table systematically evaluates the technical depth, engineering maturity and theoretical innovation of Gensyn, Prime Intellect, Pluralis and Nous Research in the decentralized training path.

The evolution of AI training paradigm: a technological revolution from centralized control to decentralized collaboration

Post-chain ecology of decentralized training: model fine-tuning based on LoRA

In the complete value chain of decentralized training, projects such as Prime Intellect, Pluralis.ai, Gensyn and Nous Research mainly focus on front-end infrastructure construction such as model pre-training, communication mechanisms and collaborative optimization. However, another type of project focuses on model adaptation and inference deployment in the post-training stage (post-training fine-tuning & inference delivery), and does not directly participate in systematic training processes such as pre-training, parameter synchronization or communication optimization. Representative projects include Bagel, Pond and RPS Labs, all of which are based on the LoRA fine-tuning method and constitute a key "back-end chain" in the decentralized training ecosystem.

LoRA + DPO: A realistic path to fine-tuned deployment of Web3

LoRA (Low-Rank Adaptation) is an efficient parameter fine-tuning method. Its core idea is to insert low-rank matrices into pre-trained large models to learn new tasks while freezing the original model parameters. This strategy significantly reduces training costs and resource consumption, improves fine-tuning speed and deployment flexibility, and is particularly suitable for Web3 scenarios characterized by modularity and combined calls.

Traditional large language models such as LLaMA and GPT-3 often have billions or even hundreds of billions of parameters, and direct fine-tuning is expensive. LoRA, on the other hand, achieves efficient adaptation of large models by only training a small number of inserted parameter matrices, becoming one of the most practical mainstream methods currently.

Direct Preference Optimization (DPO) is a language model post-training method that has emerged in recent years. It is often used in conjunction with the LoRA fine-tuning mechanism for the model behavior alignment stage. Compared with the traditional RLHF (Reinforcement Learning from Human Feedback) method, DPO achieves preference learning by directly optimizing paired samples, eliminating the complex reward modeling and reinforcement learning process. It has a simpler structure and more stable convergence, and is especially suitable for fine-tuning tasks in lightweight and resource-constrained environments. Due to its high efficiency and ease of use, DPO is gradually becoming the preferred solution for many decentralized AI projects in the model alignment stage.

Reinforcement Learning (RL): The future of post-training fine-tuning

From a long-term perspective, more and more projects regard reinforcement learning (RL) as a core path with greater adaptability and evolutionary potential in decentralized training. Compared with supervised learning or parameter fine-tuning mechanisms that rely on static data, RL emphasizes continuous optimization of strategies in a dynamic environment, which naturally fits the asynchronous, heterogeneous and incentive-driven collaboration pattern in the Web3 network. Through continuous interaction with the environment, RL can achieve a highly personalized, continuous incremental learning process, providing an evolvable "behavioral intelligence" infrastructure for the construction of agent networks, on-chain task markets, and smart economies.

This paradigm is not only highly consistent with the spirit of decentralization in concept, but also has significant system advantages. However, due to the high engineering threshold and complex scheduling mechanism, RL still faces great challenges in its implementation at the current stage and is difficult to be widely promoted in the short term.

It is worth noting that Prime Intellect’s PRIME-RL and Gensyn’s RL Swarm are pushing RL to evolve from a post-training fine-tuning mechanism to a pre-training main structure, attempting to build an RL-centric collaborative training system that does not require trust coordination.

Bagel (zkLoRA): A Trusted Validation Layer for LoRA Fine-tuning

Bagel introduces zero-knowledge proof (ZK) technology based on the LoRA fine-tuning mechanism, and is committed to solving the credibility and privacy protection problems in the process of "on-chain model fine-tuning". zkLoRA does not participate in the actual training calculation, but provides a lightweight and verifiable mechanism that allows external users to confirm that a fine-tuned model is indeed derived from the specified base model and LoRA parameters without accessing the original data or weights.

Unlike Gensyn's Verde or Prime Intellect's TOPLOC, which focus on dynamic verification of the training process, "whether the behavior actually occurred", Bagel focuses more on static verification of "whether the fine-tuning results are credible". The biggest advantage of zkLoRA is its low verification resource consumption and strong privacy protection, but its application scope is usually limited to fine-tuning tasks with small parameter changes.

Pond: A fine-tuning and agent evolution platform for GNN scenarios

Pond is the only decentralized training project in the industry that focuses on fine-tuning graph neural networks (GNNs) and serves structured data applications such as knowledge graphs, social networks, and transaction graphs. It supports users to upload graph structure data and participate in model training feedback, providing a lightweight and controllable training and reasoning platform for personalized tasks.

Pond also uses efficient fine-tuning mechanisms such as LoRA. Its core goal is to realize a modular and deployable intelligent agent system on the GNN architecture, opening up a new exploration path of "small model fine-tuning + multi-agent collaboration" in a decentralized context.

RPS Labs: AI-driven liquidity engine for DeFi

RPS Labs is a decentralized training project based on the Transformer architecture, dedicated to using fine-tuned AI models for DeFi liquidity management, mainly deployed in the Solana ecosystem. Its flagship product UltraLiquid is an active market-making engine that uses fine-tuned models to dynamically adjust liquidity parameters, reduce slippage, increase depth, and optimize token issuance and trading experience.

In addition, RPS also launched the UltraLP tool to support liquidity providers to optimize their capital allocation strategies on DEX in real time, thereby improving capital efficiency and reducing the risk of Impermanent Loss, reflecting the practical value of AI fine-tuning in financial scenarios.

From the front-chain engine to the back-chain ecosystem: the future of decentralized training

In the complete ecological map of decentralized training, it can be divided into two categories: the front-chain engine corresponds to the model pre-training stage, and the back-chain ecology corresponds to the model fine-tuning and deployment stage, forming a complete closed loop from infrastructure to application implementation.

The front-chain engine focuses on the construction of the underlying protocol for model pre-training, represented by projects such as Prime Intellect, Nous Research, Pluralis.ai, and Gensyn. They are committed to creating a system architecture with asynchronous updates, sparse communication, and training verifiability, and achieving efficient and reliable distributed training capabilities in a trustless network environment, forming the technical foundation of decentralized training.

At the same time, Flock, as a representative of the middle layer, uses the federated learning path to integrate model aggregation, on-chain verification, and multi-party incentive mechanisms to build a feasible and collaborative bridge between training and deployment, providing a practical paradigm for multi-node collaborative learning.

The post-chain ecosystem focuses on fine-tuning the model and deployment at the application layer. Projects such as Pond, Bagel, and RPS Labs revolve around the LoRA fine-tuning method: Bagel provides a trusted verification mechanism on the chain, Pond focuses on the evolution of small models of graph neural networks, and RPS applies the fine-tuning model to smart market making in DeFi scenarios. They provide developers and end users with low-threshold, composable model calls and personalized customization solutions through components such as reasoning APIs and Agent SDKs, and are an important entry point for the implementation of decentralized AI.

We believe that decentralized training is not only a natural extension of the blockchain spirit in the AI era, but also the prototype of the infrastructure of a global collaborative intelligent productivity system. In the future, when we look back on this challenging journey, we will still encourage each other with the original intention: decentralization is not just a means, it is value itself.

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content