Gate Ventures Research Insights: Parallel Execution Breaks Through Bottlenecks, Performance Challenges of Ethereum EVM and the Path of Parallel Execution

CN
8 hours ago

The Ethereum EVM faces serious performance issues due to its single-threaded architecture, especially when handling high-concurrency transactions. This serial execution method limits the network's throughput, resulting in processing speeds that fall far short of the growing user demand. By analyzing the entire process of EVM's serial execution step by step, we identified two main software issues that Ethereum must address to achieve parallel execution: 1. State identification 2. Data architecture and IPOS consumption. Each entity has its own solutions to these problems, facing multiple trade-offs including EVM compatibility, whether to use optimistic or pessimistic parallelism, the data architecture of the database, whether to utilize high-efficiency memory or high-concurrency disk channels, developer experience, hardware requirements, and whether to adopt a modular or monolithic chain architecture, all of which deserve careful consideration.

We also noted that while parallel execution seems to exponentially scale blockchain, it still has bottlenecks that are inherent to distributed systems, including issues with P2P networks and the throughput of the database itself. Blockchain computing still has a long way to go to reach the computational capabilities of traditional computers. Currently, parallel execution itself faces some challenges, including increasingly high hardware requirements, centralization risks due to the specialization of nodes, and the risk of downtime associated with reliance on memory, central clusters, or nodes. Meanwhile, cross-chain communication between parallel blockchains can also pose a problem. This still warrants further exploration by more teams.

From the architectural considerations and EVM compatibility discussions of various entities, we understand that disruptive innovation comes from an uncompromising stance on the past. Blockchain technology is still in its early stages, and performance improvements have yet to reach their final form. We eagerly anticipate seeing more entrepreneurs who are unwilling to compromise and who think outside the box to build stronger and more interesting products.

The Single-Threaded Problem of Ethereum

The Ethereum EVM has long been criticized for its performance, primarily due to its outdated architectural design, with the lack of support for parallelization being the most severe issue. Transaction parallelization is akin to turning a single road into multiple lanes, which can exponentially increase the capacity of traffic on that road.

Delving into Ethereum, in the context of the current separation of the execution layer and the consensus layer, we find that all transaction processing in the EVM is executed serially.

Gate Ventures Research Insight: Breaking Through Bottlenecks in Parallel Execution, Performance Challenges of Ethereum EVM and the Path to Parallel Execution

Ethereum's Transaction Execution Process

In an Ethereum transaction, the user first sends the transaction to the Mempool via RPC. Then, the Builder selects the transactions that maximize profit and sorts them, at which point the order of transactions is determined. The Proposer broadcasts the transactions to the consensus layer, which verifies the validity of the block, ensuring it comes from a valid sender. The transactions in the block are sent as execution payloads to the execution layer, where the transactions are executed. The simplified process of executing a transaction (for example, Alice sending 1 ETH to Bob) is as follows:

  1. In the EVM of the execution node, it converts the NO.0 transaction instruction into OPcode code that the EVM can recognize.

  2. Then, based on the hardcoded OPcode, it determines the Gas required for this transaction.

  3. During the execution of the transaction, it needs to access the database to obtain the balance states of Alice and Bob, then execute the transaction opcode to know that Alice's balance decreases by one, Bob's increases by one, and write/update this balance state to the database.

  4. The NO.1 transaction is then taken from the block to continue the sequential loop execution until all transactions in the block are executed.

  5. The data in the database is primarily stored in the form of a Merkle tree. After all transactions in the block have been executed sequentially, the state root (storing account states), transaction root (storing the order of transactions), and receipt root (storing transaction-related information, such as success status and Gas Fees) are submitted to the consensus layer.

  6. The consensus layer can easily prove the authenticity of the transactions upon receiving the state root, and the execution layer returns the verification data to the consensus layer, at which point the block is considered validated.

  7. The consensus layer adds the block to the head of its own blockchain and proves it, broadcasting the proof through the network.

This is the entire process, which involves the sequential execution of transactions within the block, where each transaction has its own NO, and each transaction may need to read the state from the database and rewrite the state. If not executed sequentially, it could lead to state conflicts, as some transactions depend on each other's states. This is also why MEV chooses serial execution.

The simplicity of MEV also means that the serial execution results in extremely low performance, which is one of the main reasons Ethereum only achieves double-digit TPS. In the current context of developing consumer-focused applications, the EVM, as an outdated design paradigm, faces urgent performance issues that need improvement. Although the current EVM has moved towards a fully Layer 2-centric roadmap, issues such as the MPT trie structure and the inefficiency of databases like LevelDB still need to be addressed.

With the evolution of this issue, many projects have begun to build high-performance EVMs in parallel to address Ethereum's outdated design paradigm. Through the EVM process, we can see that there are two main issues that high-performance EVMs aim to solve:

  1. Transaction state separation: Due to the interdependencies of states in transactions, they need to run serially. It is necessary to separate transactions with independent states, while those that depend on each other still need to be executed sequentially. This allows for distribution to multiple cores for parallel processing.

  2. Database architecture improvement: Ethereum uses MPT trees, and since a single transaction involves a large number of read operations, this often leads to extremely high read/write operations in the database, meaning the IOPS (IO Per Second) requirements are very high, which typical consumer-grade SSDs cannot meet.

Overall, in the current mainstream blockchain construction solutions, optimizations in software often focus on separating transaction states to build parallel transactions and optimizing databases to support high-concurrency transaction state reads. With the demand for high performance, the hardware requirements of most projects are also increasing. Ethereum has always been very cautious about performance expansion on Layer 1, as this implies centralization and instability. However, current high-performance EVMs often abandon these self-imposed constraints, introducing extreme software improvements, P2P network optimizations, database restructuring, and enterprise-grade specialized hardware.

The Derivative Database IOPS Problem

The identification and separation of parallel execution are not difficult to understand, but the database IOPS problem is less frequently discussed in public. In this article, we will also use practical examples to help readers appreciate the complex challenges faced by blockchain databases.

In Ethereum, full nodes actually run a virtual machine, which can be thought of as our ordinary computer, with our data stored in specialized software—databases. This software manages vast amounts of data. Different industries have different database type requirements; for example, in the AI field, which has a wide variety of data types, vector databases are quite popular. In the blockchain field, Ethereum uses a relatively simple Key-Value Pair database.

Gate Ventures Research Insight: Breaking Through Bottlenecks in Parallel Execution, Performance Challenges of Ethereum EVM and the Path to Parallel Execution

Illustration of Ethereum's Data Organization, Source: Github

Typically, data in a database is organized in an abstract manner, and Ethereum organizes it in the form of an MPT tree. The final data state of the MPT tree forms a root node, and if any data changes, the root node will also change, making it convenient to verify data integrity using MPT.

Let’s take an example to understand the current resource consumption of the database:

In a k-ary Merkle Patricia Tree (MPT) with n leaf nodes, updating a single key-value pair requires O(k logk n) read operations and O(logk n) write operations. For instance, for a binary MPT with 16 billion keys (i.e., 1 TB blockchain state), this translates to approximately 68 read operations and 34 write operations. Assuming our blockchain wants to handle 10,000 transfer transactions per second, it needs to update three key-value pairs in the state tree, totaling 10,000 x 3 x 68 = 2,040,000 operations, or 2M IOPS (I/O Per Second) operations (in practice, this can be reduced by an order of magnitude through compression and caching, roughly to 200,000 IOPS, which we will not elaborate on). Currently, mainstream consumer-grade SSDs cannot support this operation (Intel Optane DC P5800X 800GB only provides 100,000 IOPS).

The MPT tree currently faces several issues, including:

● Performance issues: Due to its hierarchical tree structure, MPT often requires traversing multiple nodes when updating data. Each state update (such as a transfer or smart contract execution) necessitates a large number of hash calculations, and the need to access many layers of nodes results in low performance.

● State bloat: Over time, state data such as smart contracts and account balances on the blockchain continuously increase, causing the size of the MPT tree to expand. This leads to nodes needing to store more and more data, posing challenges for storage space and node synchronization.

Gate Ventures Research Insight: Breaking Through Bottlenecks in Parallel Execution, Performance Challenges of Ethereum EVM and the Path to Parallel Execution

Ethereum State Bloat Situation (involving state pruning), Source: Etherscan

● Inefficient handling of partial updates: When a leaf node in the tree structure changes, all nodes along the path are affected. The Ethereum MPT needs to recalculate all hash values from the leaf node to the root node, which means that even a local state change requires updating a large number of nodes, thereby impacting performance.

We can see that Ethereum under the current MPT faces various issues and has proposed multiple solutions, such as state pruning to address state bloat and constructing a new Verkel Tree to tackle MPT performance problems. By building a new database structure based on the Verkel Tree, it aims to reduce the depth of the tree to decrease the number of database accesses during partial updates, using vector commitments (primarily KZG commitments) to reduce the proof size while also minimizing database access.

In summary, the past MPT tree is outdated and faces many new challenges, so changing the data storage method, such as using the Verkel tree, has been included in its roadmap. However, this is merely a very slight modification and does not address the solutions required for parallel execution and the high IOPS demands under high concurrency.

New Public Chains Standardize Parallel Execution

As mentioned in the previous section, parallel execution means transitioning from a single lane to multiple lanes. At a multi-lane intersection, a middleware like traffic lights is often needed to coordinate and send information so that each lane can pass smoothly. The same applies to blockchains; in parallel transactions, we need to use a state identification middleware to separate transactions that do not interfere with each other, thus achieving parallel execution. The high TPS brought by parallel execution also implies a significant demand for IOPS from the underlying database. Almost all new Layer 1 chains have parallel execution as a standard feature.

There are various ways to classify the implementation solutions for parallel execution. Based on the underlying virtual machine, they can be divided into EVM (Sei, MegaETH, Monad) and Non-EVM (Solana, Aptos, Sui). According to the method of transaction state separation, they can be categorized into optimistic execution (assuming all transactions do not interfere, and if there is a state conflict, the conflicting transactions are rolled back and re-executed) and pre-declaration (developers need to declare the state data they will access in their programs). These classifications also imply trade-offs.

Next, we will compare various public chains based on improvements in state separation and database aspects, rather than distinguishing them by whether they are EVM or not.

MegaETH

MegaETH is strictly a heterogeneous blockchain primarily aimed at performance, relying on Ethereum's security while using EigenDA as the consensus and data availability layer. As the execution layer, it aims to maximize hardware performance to enhance TPS.

The optimization methods for transaction processing are divided into three types:

  1. State separation: It adopts a streaming block construction method based on transaction priority. This model is similar to Solana's POS; in fact, Solana's transactions are also constructed in a streaming manner, but Solana does not have priorities and relies solely on speed for competition. MegaETH hopes to establish some priority algorithm for transactions.

  2. Database: To address the MPT Trie issues, it has constructed a new data structure to provide higher IOPS. Upon reviewing MegaETH's codebase, we found that it also references the design of the Verkel Trie.

  3. Hardware specialization: By centralizing and specializing the sorter, it achieves memory computation, significantly improving I/O efficiency.

In fact, MegaETH hopes to delegate security and censorship resistance to Ethereum as a Layer 2, thereby optimizing nodes to the greatest extent possible and ensuring the good behavior of the sorter through the economic security of POS. There are many challenging points here, but we will not elaborate. Although MegaETH is building parallel execution transactions, it has not yet achieved parallel execution. It aims to push the performance of a single sorting node to the limit to maximize hardware performance, and then expand performance through parallel execution.

Monad

Unlike MegaETH, Monad is a standalone chain that meets the two important points we introduced for optimizing parallel execution: state separation and database reconstruction. Here’s a brief overview of the specific methods used by Monad:

● Optimistic parallel execution: For transaction identification, it adopts the classic assumption that all transaction states are independent. When a state conflict occurs, the transaction is re-executed. This method currently works well with Aptos's Block-STM mechanism.

● Database reconstruction: To improve database IOPS, Monad has restructured the EVM-compatible data structure MPT (Merkel Patricia Tree). Monad implements a data structure compatible with Patricia Tree and supports the latest asynchronous I/O in databases, allowing it to read a state without waiting for the write operation to finish.

● Asynchronous execution: On Ethereum, although we strictly identify the specific consensus layer and execution layer, we find that the consensus layer and execution layer (which is different from the concept of Layer 2 as the execution layer, here referring to Ethereum still needing execution nodes to execute transactions on Ethereum) are still coupled together. The execution layer updates the Merkel Root after execution and provides it to the consensus layer, allowing the consensus layer to vote to reach consensus. Monad believes that the state is determined at the moment sorting is completed, so it only needs to reach consensus on the sorted transactions, without requiring execution to reveal this result. This idea allows Monad to cleverly separate the consensus layer from the execution layer, enabling consensus and execution to occur simultaneously. Nodes can execute transactions from N-1 blocks while maintaining consensus voting on N blocks.

Of course, Monad has many other technologies, including a new consensus algorithm called MonadBFT, which together construct a high-performance parallel EVM Layer 1.

Aptos

Aptos was split from Facebook's Diem team and, along with Sui, is regarded as the dual powerhouse of Move. However, due to differing technical philosophies, the Move languages of the two have diverged significantly. Overall, Aptos adheres more closely to the original Diem design, while Sui has made substantial modifications to that design.

To address the issues that need to be solved for parallel execution:

● State identification: Optimistic parallel execution. Aptos developed the Block-STM parallel execution engine, which defaults to optimistically executing transactions. If a state conflict occurs, it re-executes the transaction. This technology has been widely accepted, with Polygon, Monad, Sei, and StarkNet all utilizing it.

● I/O improvements: Block-STM uses a multi-version data structure to avoid state conflicts. For example, if someone else is writing to a database, we normally cannot access it to avoid data conflicts. However, a multi-version data structure allows us to access past versions. The problem with this solution is that it can lead to significant resource consumption, as you need to generate a visible version for each thread.

● Asynchronous execution: Similar to Monad, transaction propagation, transaction sorting, transaction execution, state storage, and block validation all occur simultaneously.

Currently, Block-STM has been accepted by most public chains, and Monad claims that thanks to this technology, it can effectively alleviate the pressure on developers. However, Aptos faces the issue that the intelligence of Block-STM places excessively high demands on nodes, which requires specialized hardware and centralized solutions.

Sui

Sui, like Aptos, is also derived from the Diem project. In contrast, Sui uses pessimistic parallelization, strictly verifying the state dependencies between transactions and employing a locking mechanism to prevent conflicts during execution. Aptos aims to reduce the development burden on developers.

● State identification: Unlike Aptos, Sui adopts pessimistic parallelization, requiring developers to declare their state access rather than leaving the parallel state identification to the system, which increases the development burden on developers, reduces the complexity of system design, and enhances parallelization capabilities.

● I/O improvements: The current I/O improvements mainly involve improving the model. Ethereum uses an account-based model where each account maintains its data, but Sui replaces the account model with an Objects structure. This architectural improvement significantly affects the difficulty and peak performance of implementing parallelism.

Since Sui does not have the historical legacy issues of EVM and compatibility problems, it has made substantial changes based on the Move framework. For the account model, it has proposed innovative ideas with Objects. This level of abstraction innovation is indeed more challenging. However, its Objects model brings many benefits for parallel processing, and because of the Objects model, it has constructed a very unique network architecture that theoretically can scale infinitely.

Solana - FireDancer

Solana is regarded as a pioneer in parallel computing. Solana's philosophy has always been that blockchain systems should evolve with hardware advancements. Here are the current parallel processing methods of Solana:

● State identification: Like Sui, Solana also uses a deterministic parallel approach, which stems from Anatoly's past experience with embedded systems. In embedded systems, all states are typically declared in advance. This allows the CPU to know all dependencies, enabling it to preload the necessary parts into memory. The result is optimized system execution, but once again, it requires developers to do extra work in advance. On Solana, all memory dependencies of a program are required and declared in the constructed transactions (i.e., access lists), allowing the runtime to efficiently schedule and execute multiple transactions in parallel.

● Database: Solana has built its custom account database using Cloudbreak, which employs an account-based model. Account data is distributed across multiple "shards," similar to dividing a library into several floors, and it can increase or decrease the number of floors as needed for load balancing. It maps to memory on SSDs, allowing for quick memory operations on SSDs through a pipelined design while supporting parallel access to multiple data, enabling simultaneous processing of 32 I/O operations.

Solana requires developers to declare the states they need to access through a deterministic parallel approach, which indeed resembles traditional programming situations. In terms of program construction, developers need to build parallel applications themselves, while program scheduling and runtime pipeline asynchronous parallelism are aspects that the project needs to construct. In terms of the database, it improves IOPS through its own database for data parallelization.

Meanwhile, the subsequent Solana iteration client, FireDancer, is driven by Jump Trading, a quantitative giant with strong engineering capabilities, and shares the same vision as Solana, aiming to eliminate software inefficiencies and push performance to the limits of hardware. Its improvements mainly focus on underlying hardware enhancements, including P2P propagation and hardware SIMD data parallel processing, which is very similar to MegaETH's ideas.

Sei

Sei currently uses:

● Optimistic parallel execution: Referencing Aptos's Block-STM design. The Sei V1 version used a passive parallel scheme similar to Sui and Solana, requiring developers to declare the objects they use. However, after Sei V2, it switched to Aptos's optimistic parallel scheme. This change may benefit Sei, which may lack developers, making it easier to migrate contracts from the EVM ecosystem.

Gate Ventures Research Insight: Parallel Execution Breaks Through Bottlenecks, Performance Challenges of Ethereum EVM and the Path of Parallel Execution

SeiDB Design, Image Source: Sei

● SeiDB: The entire database solution is built on the foundation of Cosmos's ADR-065 proposal, utilizing PebbleDB for its entity. The data structure design divides data into active data and historical data, mapping data on SSDs to memory, while SSD data uses the MemIAVL tree structure, and memory data uses the IAVL tree (invented by Cronos) for state commitments, providing the state root for consensus operation. The abstract idea of MemIAVL is that each time a new block is submitted, we extract all change sets from the transactions of that block and apply these changes to the current IAVL tree in memory, generating a new version of the tree for the latest block, allowing us to obtain the Merkle root hash of the block submission. Therefore, it is equivalent to using memory for hot updates, enabling most state accesses to reside in memory rather than on SSDs, thus improving IOPS.

The main issue with SeiDB is that if the latest active data is stored in memory, there is a risk of data loss during a crash. Therefore, MemIAVL introduces WAL files and tree snapshots. Data in memory needs to be snapshotted and stored on local disks at certain intervals to control the snapshot interval and timely manage the impact of data bloat on memory's OOM (Out of Memory) issues.

Parallel Comparison

Full-node Requirement

Gate Ventures Research Insight: Parallel Execution Breaks Through Bottlenecks, Performance Challenges of Ethereum EVM and the Path of Parallel Execution

Full Node Operation Requirements

FireDancer has the highest operational requirements for nodes, making it a performance monster. MegaETH's main performance requirements focus on the Sequencer needing to have 100+ cores. Due to the existence of centralized node sequencers, the requirements for other full node nodes are not high. Currently, SSD prices are relatively low, so we mainly look at the performance requirements for CPU and Memory. We rank the Full Node performance requirements from high to low as follows: FireDancer > Sei > Monad > Sui > Aptos > MegaETH > Ethereum.

Solutions

Current parallel processing solutions generally focus on three main software-level optimizations: 1. Database IOPS consumption issues 2. State identification issues 3. Pipeline asynchronous issues. State identification is currently divided into two camps: optimistic parallel execution and declarative programming. Both have their pros and cons, with optimistic parallel execution primarily based on Aptos's Block-STM solution, which is mainly adopted by Monad and Sei V2, while Sui, Solana, and Sei V1 use declarative programming, which is somewhat similar to traditional concurrent or asynchronous programming paradigms. Regarding the IOPS consumption issues of databases, the solutions vary among different projects:

Gate Ventures Research Insight: Parallel Execution Breaks Through Bottlenecks, Performance Challenges of Ethereum EVM and the Path of Parallel Execution

Solution Comparison

One interesting point regarding data structures is that Monad tries to place data on SSDs, but the read speed of hard drives is much lower than that of memory, although they are much cheaper. Monad's placement on SSDs considers price, hardware thresholds, and the degree of parallelism, as SSDs can now support 32-channel I/O operations, thus enhancing parallel capabilities. In contrast, Solana and Sei choose memory mapping because memory speed is far superior to SSDs. One approach expands parallel channels horizontally, while the other reduces I/O consumption vertically. This is also why Monad's node requirement is 32GB, while Sei and Solana require more memory.

In addition, Ethereum's data structure evolved from the Patricia Tree to the Merkel Patricia Tree, so EVM-compatible public chains need to be compatible with the Merkel Trie. Therefore, they cannot construct abstract ways to think about assets like Aptos, Sui, and Solana. Ethereum uses an account model, while Sui uses an Objects model, and Solana separates data from code in its account model, whereas Ethereum is indeed coupled together.

Disruptive innovation comes from an uncompromising stance on the past. From a business perspective, it is also necessary to consider the developer community and past compatibility, as compatibility with EVM has its pros and cons.

Outlook

The main optimization components for current parallel execution have clear goals, focusing on how to identify states and how to improve the speed of data reading and storage. The way data is stored can lead to additional overhead and consumption when reading or storing data, especially since the introduction of root verification by the Merkel trie incurs extremely high I/O costs.

Although parallel execution seems to exponentially expand blockchain capabilities, it still has bottlenecks, which are inherent to distributed systems, including P2P networks, databases, and other issues. Blockchain computing still has a long way to go to reach the computational power of traditional computers. Currently, parallel execution itself faces some problems, including increasingly high hardware requirements for parallel execution, the centralization and censorship risks brought by increasingly specialized nodes, and the reliance on memory, centralized clusters, or nodes in mechanism design, which can also lead to crash risks. Meanwhile, cross-chain communication between parallel blockchains will also be a challenge.

Although there are still certain issues with parallel execution, various projects are exploring the best engineering practices, including the modular design architecture led by MegaETH and the monolithic chain design architecture led by Monad. The optimization solutions for parallel execution indeed prove that blockchain technology is continuously moving closer to traditional computer optimization solutions, with increasingly detailed optimizations at the lower levels, especially in data storage, hardware, and pipeline technologies. However, there are still bottlenecks and problems, leaving a vast exploration space for entrepreneurs.

Disruptive innovation comes from an uncompromising stance on the past. We eagerly anticipate seeing more entrepreneurs who are unwilling to compromise build stronger and more interesting products.

Disclaimer:

The above content is for reference only and should not be considered as any advice. Please seek professional advice before making any investments.

About Gate Ventures

Gate Ventures is the venture capital arm of Gate.io, focusing on investments in decentralized infrastructure, ecosystems, and applications that will reshape the world in the Web 3.0 era. Gate Ventures collaborates with global industry leaders to empower teams and startups with innovative thinking and capabilities to redefine the interaction patterns of society and finance.

Official Website: https://ventures.gate.io/ Twitter: https://x.com/gateventures Medium: https://medium.com/gateventures

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

Share To
APP

X

Telegram

Facebook

Reddit

CopyLink