Querying Ethereum State with the Ethereum Data Warehouse
Token Flow
Token Flow

The Ethereum Data Warehouse (EDW) brings lots of new possibilities for blockchain data analysis. Its major and breakthrough feature is a detailed history of state and storage changes. It enables new types of queries that answer some basic questions in the simplest and most reliable way possible – by directly using variables of smart contracts. Let’s look at a few examples.
With the way that Ethereum storage is organized it is almost impossible to query storage directly. It is because memory locations are actually hashes of variable slots and keys. This is why so many rookies ask the same question over and over again: ‘How can I iterate over mappings (e.g. balanceOf of any ERC20 token)?’ and are so surprised to hear: ‘You cannot!’.
But you need it – so what are your options? Most people following advice similar to this suggestion https://stackoverflow.com/a/68863130, they build local databases or they use one of existing data providers like The Graph or Dune Analytics to query the history of events. If you are a blockchain data analyst you know this pattern well: “I want a history of state so I have to aggregate historical events to recreate smart contract values at a given point in time”.
This is equivalent to you logging in to your bank to check your balance but you only see the history of your transactions, not your balance. Of course you can go through all transactions and calculate your current balance, but is it really the most efficient and reliable way?
With EDW you can use the full history of events and calls in case you are actually interested in the history of ACTIONS but you also have a history of smart contract variables in case you need the history of STATE. This totally changes the approach to some typical problems as shown in the four examples below.
Problem 1 - What is the full list of token holders at a point in time?
- Analyze the contract code to fully understand its logic.
- Identify all types of events that can be used to identify new holders (typically it is enough to use
Transfer
events but for example WETH may emit aDeposit
event instead ofTransfer
so you have to use both of them). - Make sure that new holders cannot be added without emitting any event (in such case you probably should also aggregate relevant internal calls).
- Select all identified events since the contract deployment and aggregate them to get a distinct list of holders.
New approach:
- Identify the variable that acts as the register of token holders and their balances (typically
balanceOf
). - Query all distinct keys used in this mapping.
A sample query to get the full list of holders of DAI would be:
select distinct substr(location, 3, 42) as holder
from edw.raw.storage_diffs
where contract = '0x6b175474e89094c44da98b954eedeac495271d0f' and
location like '2[%' and status;
In the raw EDW layer we store memory locations as strings following the pattern:
slot[key1].field1[key2].field2
where slot
is a memory slot used for storage, key1
is an optional first mapping key, field1
is an index of structure field or 0, key2
is an optional second mapping key and field2
is an optional index of structure field or 0.
For DAI the contract address is 0x6b17...d0f
and the holders register is defined as: mapping (address => uint) public balanceOf
and it is stored in slot number 2
. This means that in EDW relevant memory locations are for example 2[0x2faf...ad2].0
(slot number 2, holder address 0x2faf…ad2 and no structure used). Additionally we add status
to the WHERE clause to only allow for successful storage changes.
This obviously requires some knowledge about a storage layout of a given contract but is quite easy to get and with the next decoded layer of EDW (to be released soon) it will be translated to human readable format and split into separate fields.
Problem 2 - What is the full status of a given contract?
Traditional approach:
1. Well… 😓
New approach:
1. Get the list of memory locations used by a contract (e.g. UNI2 pool) with a simple query:
select distinct location
from edw.raw.storage_diffs
where contract = '0xd0671e35d25d82ec6b75e3de43975c89968d1c91' and status
order by location;
2. If you additionally want to get the snapshot of storage variables at a given point in time you should look for the latest value written to this location before or at a given block. This requires using window functions which enable many new constructs not available with a simple GROUP BY clause. For example to get the latest values written to memory before or at a given block you should use last_value function
over a partition defined by each memory location with a proper ordering:
select distinct location,
last_value(curr_value)
over (partition by location order by block) as value
from edw.raw.storage_diffs
where contract = '0xd0671e35d25d82ec6b75e3de43975c89968d1c91' and
block <= 14000000 and status
order by location;
Problem 3 - Which contracts are allowed to call user proxies in the OpenSea Registry?
- Analyze the contract code to fully understand its logic.
- Find out that you can grant and revoke access by calling a few different functions (
startGrantAuthentication
,endGrantAuthentication
,revokeAuthentication
). - Realize that granting is actually a two step process with a delay period.
- Additionally realize that these functions do not emit any events.
- Write a non-trivial aggregation logic using the history of internal calls.
New approach:
- Identify the variable that acts as the register of allowed contracts – in this case it is
mapping(address => bool) public contracts
occupying slot number 4. - Query the keys with their current mapping values.
Sample query:
select distinct substr(location, 3, 42) as contract,
last_value(curr_value)
over (partition by location order by block) as allowance
from edw.raw.storage_diffs
where contract = '0xa5409ec958c83c3f309868babaca7c86dcb077c1' and
location like '4[%' and status;
Problem 4 - Which contracts store or use my address?
- Well… 😓
New approach:
- Simply look for all locations that use my address as a key and storage values that are equal to my address.
select distinct contract, location
from edw.raw.storage_diffs
where location like '%0x0035fc5208ef989c28d47e552e92b0c507d2b318%' or
curr_value = '0x0035fc5208ef989c28d47e552e92b0c507d2b318';
What comes Next
The sample problems presented above are just the tip of the iceberg.With the full history of state and storage diffs the Ethereum Data Warehouse creates endless new possibilities. It even provides the history of storage reads for use cases like e.g. oracle usage analysis.
With the next edition of EDW (coming soon) we will be adding a next decoded layer of data that will greatly simplify the queries by using variable names instead of slots and casting storage values to the proper types.
Stay tuned…
More articles

Discover AnyTx – Token Flow’s advanced transaction decoder
AnyTx transaction decoder reveals the full spectrum of blockchain interactions to get better understanding of what’s going on chain.

Token Flow and Argent announce data partnership for StarkNet and Ethereum
Argent will use Token Flow’s fully queryable blockchain history for its own analysis of on-chain activities.

Token Flow raises $12m in Series A funding to expand its blockchain data platform
First close of Series A with $12m committed to expand Token Flow’s unique state processing technology which takes blockchain data analytics to a whole new level.