Querying Ethereum State with the Ethereum Data Warehouse
The Ethereum Data Warehouse (EDW) brings lots of new possibilities for blockchain data analysis. Its major and breakthrough feature is a detailed history of state and storage changes. It enables new types of queries that answer some basic questions in the simplest and most reliable way possible – by directly using variables of smart contracts. Let’s look at a few examples.
With the way that Ethereum storage is organized it is almost impossible to query storage directly. It is because memory locations are actually hashes of variable slots and keys. This is why so many rookies ask the same question over and over again: ‘How can I iterate over mappings (e.g. balanceOf of any ERC20 token)?’ and are so surprised to hear: ‘You cannot!’.
But you need it – so what are your options? Most people following advice similar to this suggestion https://stackoverflow.com/a/68863130, they build local databases or they use one of existing data providers like The Graph or Dune Analytics to query the history of events. If you are a blockchain data analyst you know this pattern well: “I want a history of state so I have to aggregate historical events to recreate smart contract values at a given point in time”.
This is equivalent to you logging in to your bank to check your balance but you only see the history of your transactions, not your balance. Of course you can go through all transactions and calculate your current balance, but is it really the most efficient and reliable way?
With EDW you can use the full history of events and calls in case you are actually interested in the history of ACTIONS but you also have a history of smart contract variables in case you need the history of STATE. This totally changes the approach to some typical problems as shown in the four examples below.
Problem 1 - What is the full list of token holders at a point in time?
- Analyze the contract code to fully understand its logic.
- Identify all types of events that can be used to identify new holders (typically it is enough to use
Transferevents but for example WETH may emit a
Depositevent instead of
Transferso you have to use both of them).
- Make sure that new holders cannot be added without emitting any event (in such case you probably should also aggregate relevant internal calls).
- Select all identified events since the contract deployment and aggregate them to get a distinct list of holders.
- Identify the variable that acts as the register of token holders and their balances (typically
- Query all distinct keys used in this mapping.
A sample query to get the full list of holders of DAI would be:
select distinct substr(location, 3, 42) as holder from edw.raw.storage_diffs where contract = '0x6b175474e89094c44da98b954eedeac495271d0f' and location like '2[%' and status;
In the raw EDW layer we store memory locations as strings following the pattern:
slot is a memory slot used for storage,
key1 is an optional first mapping key,
field1 is an index of structure field or 0,
key2 is an optional second mapping key and
field2 is an optional index of structure field or 0.
For DAI the contract address is
0x6b17...d0f and the holders register is defined as:
mapping (address => uint) public balanceOf and it is stored in slot number
2. This means that in EDW relevant memory locations are for example
2[0x2faf...ad2].0 (slot number 2, holder address 0x2faf…ad2 and no structure used). Additionally we add
status to the WHERE clause to only allow for successful storage changes.
This obviously requires some knowledge about a storage layout of a given contract but is quite easy to get and with the next decoded layer of EDW (to be released soon) it will be translated to human readable format and split into separate fields.
Problem 2 - What is the full status of a given contract?
1. Well… 😓
1. Get the list of memory locations used by a contract (e.g. UNI2 pool) with a simple query:
select distinct location from edw.raw.storage_diffs where contract = '0xd0671e35d25d82ec6b75e3de43975c89968d1c91' and status order by location;
2. If you additionally want to get the snapshot of storage variables at a given point in time you should look for the latest value written to this location before or at a given block. This requires using window functions which enable many new constructs not available with a simple GROUP BY clause. For example to get the latest values written to memory before or at a given block you should use
last_value function over a partition defined by each memory location with a proper ordering:
select distinct location, last_value(curr_value) over (partition by location order by block) as value from edw.raw.storage_diffs where contract = '0xd0671e35d25d82ec6b75e3de43975c89968d1c91' and block <= 14000000 and status order by location;
Problem 3 - Which contracts are allowed to call user proxies in the OpenSea Registry?
- Analyze the contract code to fully understand its logic.
- Find out that you can grant and revoke access by calling a few different functions (
- Realize that granting is actually a two step process with a delay period.
- Additionally realize that these functions do not emit any events.
- Write a non-trivial aggregation logic using the history of internal calls.
- Identify the variable that acts as the register of allowed contracts – in this case it is
mapping(address => bool) public contractsoccupying slot number 4.
- Query the keys with their current mapping values.
select distinct substr(location, 3, 42) as contract, last_value(curr_value) over (partition by location order by block) as allowance from edw.raw.storage_diffs where contract = '0xa5409ec958c83c3f309868babaca7c86dcb077c1' and location like '4[%' and status;
Problem 4 - Which contracts store or use my address?
- Well… 😓
- Simply look for all locations that use my address as a key and storage values that are equal to my address.
select distinct contract, location from edw.raw.storage_diffs where location like '%0x0035fc5208ef989c28d47e552e92b0c507d2b318%' or curr_value = '0x0035fc5208ef989c28d47e552e92b0c507d2b318';
What comes NextThe sample problems presented above are just the tip of the iceberg.
With the full history of state and storage diffs the Ethereum Data Warehouse creates endless new possibilities. It even provides the history of storage reads for use cases like e.g. oracle usage analysis.
With the next edition of EDW (coming soon) we will be adding a next decoded layer of data that will greatly simplify the queries by using variable names instead of slots and casting storage values to the proper types.
AnyTx transaction decoder reveals the full spectrum of blockchain interactions to get better understanding of what’s going on chain.
Argent will use Token Flow’s fully queryable blockchain history for its own analysis of on-chain activities.
First close of Series A with $12m committed to expand Token Flow’s unique state processing technology which takes blockchain data analytics to a whole new level.