Raw blockchain data is a public good, objective and true. It's only natural to start building your analytics with it, but very quickly you realise that it's not working for any meaningful analytics. What do you need then? You need decoded data. Let's deep dive!
First things first: what is decoding? It’s simplifying smart contract analysis by converting raw, hard to understand, data into human-readable tables by applying “semantics” that are gained from the specific smart contract’s Application Binary Interface (ABI) and/or from generic signature libraries.
Let’s look at an example of WETH transfers here.
This is the data in it’s raw form:
select block_date,
'0x' || LTRIM(SUBSTR(event_topics[1], 3), '0') AS from_address,
'0x' || LTRIM(SUBSTR(event_topics[2], 3), '0') AS to_address,
common.common.hex_to_int(log_data) / pow(10,18) as value
from mode.core.RECEIPT_EVENTS
where contract_address='0x4200000000000000000000000000000000000006'
and event_topics[0]='0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef'
and block_date <= '2024-08-01';
….and this is the same string, but decoded:
select block_date,
parameters['src'] as from_address,
parameters['dst'] as to_address,
parameters['value'] :: NUMBER(38,0) / pow(10,18) as amount
from mode.core.RECEIPT_EVENTS
where contract_address='0x4200000000000000000000000000000000000006'
and event_name='Transfer'
and block_date <= '2024-08-01';
We can see in this simple WETH example that in the decoded version human rather than numbered topics that readable parameters (src, dst, value) are now visible, as is the event name.
Building analytics with raw data requires a great deal of understanding of the contract structure and is much more prone to errors. In the example given above, the analyst must:
- understand how the topics are ordered
- how much to substring
- where to substring
In the decoded version they only need to understand the general behaviour of the event they want to look at. The absolute advantage of decoded versus raw data is apparent for the contracts that aren’t verified in bock explorers. In this scenario, there is no way for the analyst to know where to look and how to learn the structure of the contract. Even though the ABI may not be available, it can still often be decoded using signatures from similar contracts.
Besides this, decoded data is clearly labelled and converted to usable format. Let’s take an example of function calls to IONIC pools. With decoded data, we are able to get all function calls with just one command.
select block_date,
function_name,
count(*)
from mode.core.CALLS
where to_address=lower('0x2BE717340023C9e14C1Bb12cb3ecBcfd3c3fB038')
and call_path=''
group by block_date, function_name
order by block_date, function_name;
Without decoded data, we would have to find and list all Function calls manually. If we miss any, the results will likely be incorrect.
select block_date,
case substr(call_data, 1, 10)
when '0xc2998238' then 'enterMarkets'
when '0xede4edd0' then 'exitMarket'
when '0xa0712d68' then 'Mint'
when '0x852a12e3' then 'redeemUnderlying'
when '0x0e752702' then 'repayBorrow'
when '0xc5ebeaec' then 'borrow'
when '0xdb006a75' then 'redeem'
else 'Other' end as function_name,
count(*)
from mode.core.transactions
where to_address=lower('0x2BE717340023C9e14C1Bb12cb3ecBcfd3c3fB038')
group by block_date, function_name
order by block_date, function_name;
Of course having to make fewer stipulations in the query makes it shorter and simpler. It also reduces the possibility of error, making your analytics more reliable.
Decoded by Token Flow
Thanks to our extensive semantics and signature library, Token Flow automatically decodes the contents of the chain from genesis to head, without the need to provide an ABI or the delay of having to wait for the specific contract you want to analyze to be decoded. Even contracts without verified code or published ABI can often still be decoded!
Since our tables in Studio and Snowflake already include decoded columns, there is no need to attempt to decode raw data in-query, or to look across raw and decoded tables separately – everything is in once place, ready to be analyzed.