Gensyn testnet is online, how to make AI training more efficient and more decentralized?

CN
PANews
Follow
1 day ago

Gensyn Testnet Launch: How to Make AI Training More Efficient and Decentralized?

Author: Zen, PANews

AI is currently the most关注ed niche in the cryptocurrency industry, and the distributed AI computing network Gensyn, which has raised $50 million led by a16z, is undoubtedly a competitive project. Recently, Gensyn officially launched its testnet, which, although delayed by more than a year from the original schedule, has finally entered a new phase with the launch of the testnet.

As a customized Ethereum Rollup designed specifically for machine learning, the Gensyn testnet integrates off-chain execution, verification, and communication frameworks, aiming to provide key functions for decentralized AI systems, including persistent identity, participation tracking, ownership maintenance, payments, remote execution coordination, trustless verification, training process recording, and large-scale training task crowdfunding.

The first phase of the testnet focuses on tracking participation within the RL Swarm. RL Swarm is an application for collaborative reinforcement learning post-training, where nodes can be bound to on-chain identities to ensure that the contributions of each participating node are accurately recorded.

RL Swarm: Core Functions and Collaborative Training

In the Gensyn testnet, RL Swarm, as the core application, is a model collaborative training system built on a decentralized network. Unlike traditional independent training of a single model, RL Swarm allows multiple models to communicate, critique, and improve each other within the network, thereby enhancing overall performance. Its core concept is "collective intelligence," which achieves more efficient training results through collaboration and feedback among node models.

It can be simply understood that when models like DeepSeek-R1 perform inference training, they can iteratively improve their inference performance through self-criticism, while RL Swarm extends this mechanism to a group of multiple models, achieving the effect of "many hands make light work."

Based on the RL Swarm system, models not only rely on their own feedback but also identify their shortcomings and optimize themselves by observing and evaluating the performance of other models. Each model node that joins the Swarm participates in a three-stage process: first, independently completing a problem and outputting thoughts and answers; second, reviewing the answers of other nodes and providing feedback; and finally, the models vote to select the optimal solution and adjust their outputs accordingly. This collaborative mechanism not only improves the performance of each model but also drives the evolution of the entire group of models. Models that join the Swarm can retain their improved local weights after leaving, gaining practical benefits.

Gensyn Testnet Launch: How to Make AI Training More Efficient and Decentralized?

In addition, Gensyn has open-sourced the code for RL Swarm, allowing anyone to run nodes, start, or join existing Swarms without permission. The underlying communication of the Swarm uses the gossip protocol provided by Hivemind, supporting decentralized messaging and learning signal sharing between models. Whether on a home laptop or a cloud GPU, participants can join the RL Swarm nodes for collaborative training.

Infrastructure Three Pillars: Execution, Communication, and Verification

Currently, RL Swarm is still just an experimental demonstration, showcasing a large-scale, scalable machine learning approach rather than a final product form. Over the past four years, Gensyn's core work has actually been to build the underlying infrastructure, which has now entered the v0.1 phase after the testnet launch and is already operational. According to official information, Gensyn's overall architecture is divided into three parts: execution, communication, and verification.

Execution: Consistency and Distributed Computing Power

Gensyn believes that the future of machine learning will no longer be limited to traditional monolithic models but will consist of fragmented parameters distributed across devices worldwide. To achieve this goal, the Gensyn team has developed an underlying execution architecture that ensures cross-device consistency. Key technologies include:

  • Distributed Parameter Storage and Training: By splitting large-scale models into multiple parameter blocks and distributing them across different devices, Gensyn achieves fragmented deployment of models, reducing the memory requirements for individual nodes.
  • Reinforcement Learning Post-Training (RL Post-Training): Research shows that when models are trained collaboratively in a group, communicating and critiquing each other's answers, overall learning efficiency significantly improves. Gensyn demonstrates this concept with RL Swarm, allowing models to progress rapidly through collective discussion, further validating the effectiveness of distributed execution.
  • Reproducible Operators (RepOps): To ensure that different hardware (such as Nvidia A100 and H100) can produce completely consistent computational results, Gensyn has developed the RepOps library, which achieves bitwise reproducibility across platforms by fixing the execution order of floating-point operations.

Communication: Efficient Information Exchange

In large-scale distributed training scenarios, efficient communication between nodes is crucial. Traditional data parallel methods can reduce communication overhead to some extent, but since they require each node to store the complete model, their scalability is limited by memory. To address this, Gensyn has proposed a new solution:

  • SkipPipe – Dynamic Pipeline Parallelism: The SkipPipe technology dynamically selects the computation layers that microbatches pass through, skipping certain stages in the traditional pipeline to reduce unnecessary waiting time. Its innovative scheduling algorithm can assess the availability of each path in real-time, reducing node idle time and significantly shortening overall training duration. Test data shows that in a decentralized environment, SkipPipe can reduce training time by about 55%, and in the case of partial node failures, model performance only decreases by about 7%.
  • Communication Standards and Cross-Node Collaboration: Gensyn has built a communication protocol similar to TCP/IP, enabling participants worldwide to efficiently and seamlessly transmit data and exchange information, regardless of the devices they use. This open standard provides a solid network foundation for distributed collaborative training.

Verification: Ensuring Trust and Security

In a trustless distributed network, confirming the authenticity and validity of the computational results submitted by participants is a significant challenge. To address this, Gensyn has introduced a dedicated verification protocol aimed at ensuring that all computing power providers deliver correct work results through a low-cost, efficient mechanism:

  • Verde Verification Protocol: Verde is the first verification system designed specifically for modern machine learning. Its core lies in utilizing a lightweight dispute resolution mechanism to quickly identify the step where discrepancies arise between the model and the verifier during training. Unlike traditional verification methods that require re-running the entire task, Verde only needs to recompute the disputed operation, significantly reducing verification costs.
  • Refereed Delegation: With this method, if a supplier's output has issues, the verifier can persuade a neutral arbitrator through an efficient dispute resolution game, ensuring that the correctness of the entire computational result is guaranteed as long as at least one honest node exists.
  • Storing and Hashing Intermediate States: To support the above verification process, participants only need to store and hash partial intermediate training checkpoints instead of the full data, which reduces resource consumption and enhances the system's scalability and real-time performance.

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

ad
币安:注册返10%、领$600
链接:https://accounts.suitechsui.blue/zh-CN/register?ref=FRV6ZPAF&return_to=aHR0cHM6Ly93d3cuc3VpdGVjaHN1aS5hY2FkZW15L3poLUNOL2pvaW4_cmVmPUZSVjZaUEFG
Ad
Share To
APP

X

Telegram

Facebook

Reddit

CopyLink