Blockchain Technology of Storage Explosion: Problems, Analysis, and Optimization
  • joint
  • 2022-09-21
  • 2671
  • Tech
AD
Summary:The boom in decentralized applications such as DeFi and GameFi has greatly increased the demand for high-performance blockchain technology with low transaction fees.

The boom in decentralized applications such as DeFi and GameFi has greatly increased the demand for high-performance blockchain technology with low transaction fees. However, a key challenge in building high-performance blockchains is storage explosion. If we further analyze the storage usage, we can see that the block data accounts for only about 300GB of data (from 0 to 13.6M in block height), which is much less than 9TB. So where did the remaining 8.7 terabytes of data come from?

In fact, the archive node executes all blocks and retains all historical data, including: blocks, states, and transaction receipts. State is the main component of this 8.7TB. So sometimes we refer to storage explosions as "state explosions." But why is the state so big?

What is the Ethereum state?

The Ethereum state is a Merkle Patrica tree (MPT) where the leaf nodes are addresses (0x...) => Account mapping, where the account stores the balance and nonce associated with the address, and the internal node maintains the tree structure so that the hash root of the entire tree can be quickly calculated.

Since the archive node will keep all the historical states of all blocks, this means that any update in the MPT will create O(log(N)) internal nodes, and the old internal nodes will not be deleted.

All nodes of Geth

To solve the problem of exploding archive node states, the genius engineers at Geth created a new mode called "pruning" mode, which only periodically stores the MPT. Here we take a simplified example where the node only stores the MPT for every third block. Note that in order to obtain a state that does not contain any state blocks, the node must obtain the most recent state before that block and replay subsequent transactions. By storing the MPT periodically, the storage size of the state is significantly reduced. According to Etherscan, the current size of the blockchain data of the full Geth node is about 1TB.

Geth's full node for fast synchronization

One problem with running a node by replaying all transactions from the Genesis block is that replaying all transactions takes a long time. Typically, setting up such a node would take weeks to catch up with the latest state of the network from Genesis Block. To speed up the node startup process, Geth further provides a fast synchronization mode that downloads the latest stable block MPT without having to replay and maintain the previous block history MPT. After downloading the MPT, it replays the new block as if it were a full node (with periodic state storage).

The problem

With the current Ethereum storage size of 447GB and 15 TPS, we expect that a typical configuration computer with 1TB SSD should be able to run Ethereum nodes for quite some time (say, years). So does storage explosion or state explosion really exist? Maybe not in the next few years, but what if we could scale Ethereum's virtual machine (EVM) to hundreds or thousands of TPS?

Let's turn our attention to another EVM-based chain, Binance Smart Chain (BSC). As of December 8, 2021, BSC has: approximately 984 GB of on-chain data, of which approximately 550 GB is in the block and 400 GB is in the state; 2,06623 million transactions at 100 TPS. If we further use the number of transactions to predict the data size, we can get: If the TPS is 100, that is ~3,153 M TPY.

To sum up, for BSC, if the current speed is maintained even higher, it will soon reach the same storage size as the Ethereum archive node, which is almost impossible for ordinary computers to run.

Block storage optimization

With snapshot blocks, we can further reduce the amount of block data required in a node by storing only the following data: the pre-execution state snapshot of the latest snapshot block, that is, the post-execution state of the (latest -- 1) snapshot block. We can do some simple math on the storage cost: assuming an epoch duration of 2 weeks, the block replay size is 2 * 14 (days) * 24 (hours) * 3600 (seconds) * 100 * 1000 (TPS) = 224 GB. Also, the numbers here don't grow over time.

Conclusion

We analyzed the current storage usage of Ethereum: not only blocks, but also state storage consumes a lot of space. When TPS > 1000, the storage usage is prohibitively high. We propose to optimize the block and state, and a 2TB ordinary configuration computer should be able to satisfy the conditions for a long-running node.

Disclaimer:As an open information publishing platform, shilian only represents the author's personal views and has nothing to do with shilian. If the article, picture, audio or video contains infringement, violation or other inappropriate remarks, please provide relevant materials and send it to: 2785592653@qq.com.
Hint:The information provided on this site does not represent any investment suggestion. Investment is risky, and you must be cautious when entering the market.
ShilianFan group:Provide the latest hot news, airdrop candy, red envelopes and other benefits, WeChat: rtt4322.