WTF is Data Availability?
Lots of data availability discussion these days, but minimal understanding. Lets change that (with pictures).
Short Answer
Remember during school when your math teacher would exclaim "show your work!" when you turned in homework (meaning you couldn't just give the answers)?
That's basically what Data Availability, in the context of blockchains, is referring to – L2s "showing their work" to the rest of the ecosystem for how they got to the new account balances each batch.
If you want to know how or why that's the case, read on. If not – congrats you've completed this lesson!
Long Answer
If you're here you've, elected to read me nerd-ramble on a niche property within a niche category for a niche asset class. I solute you.
We'll approach this similarly to the Base Rollup explainer, which means light vocabulary, a building overview of the topic and conclusion where I give some thoughts on the market direction for the subject. Enjoy!
NOTE: to keep things simple, we'll focus on a "traditional" L2 construction within the Ethereum ecosystem. That is to say, the example L2s will all be connected to Ethereum for settlement but DA can be alternative solutions
GLOSSARY
Data Availability Layer: The location used to store the data which is to be made available (tx history). These can be other blockchains (both general and purpose-built), other decentralized services like IPFS/AVSs or completely centralized services like AWS.
Blobs: An acronym for "Binary Large Object," these are are the typical format of the compressed, batched L2 transaction history posted to a Data Availability Layer. Literally just giant, dumb, blobs/chunks of data that when decoded let people know what happened on the L2 for a period.
Settlement Layer: The location where an L2 posts their proofs to. Typically an L1 which can guarantee some level of security through decentralization and stake that those proofs won't change once finalized.
Proofs: Using our analogy from above, these are the "answers" of the new account balances on the L2, given by an L2 to the L1. These can come in two forms: optimistic and ZK (we'll cover distinctions later) and wherever these are posted is referred to as the Settlement Layer of the L2.
That should cover it.
Now that we have that, we can look at what information L2s make available, why and what their options are.
L2 POSTING OVERVIEW
Lets start with a picture answering the question "wtf does an L2 even send to other layers?" so that we can separate what goes where.
Again, calling back to our analogy, you can expect an L2 to send:
A proof, which is the "answer" to the question "who owns what in your ecosystem," and
The transaction history since the last time they reported (aka their "work" to get to their proof/answer)
That's really it (example), and only optimistic L2s are required to send both due to the nature of the proof they use (fraud). ZK rollups don't technically need to send the tx data because their proof (validity) can be mathematically proven to be true without needing to see the work.
Now lets talk about which of those two datapoints qualifies for our article topic.
SO, WHAT GOES WHERE?
To keep it short, a picture:
Proof: Goes to Ethereum so that it knows who and how much each account owns. In this way it allows for users to take their bridged funds out of the L2 contract (because it knows balances)
TX History: Doesn't technically have to go anywhere, but L2s choose different providers because each has a unique value prop that might benefit the L2 sending the data.
Those value props include:
Capacity: Can the Data Availability Layer even consume as much data as the L2 is producing? (@EclipseFND and@megaeth_labs have opted for@CelestiaOrg and@eigen_da, respectively, because ETH can't handle their throughput)
Cost: Each Data Availability Layer comes at a different price/kb of storage and combined with capacity may increase/decrease on usage. Money spent posting for DA reduces an L2s profit margins.
Security: You want to be sure that the location you're putting this data isn't corruptible so that it can be accessed with assurances that a malicious actor or system hasn't manipulated or withheld the data to perform an attack on the L2.
Integration Ease: What information from the DA Layer is available to the L2's smartcontracts on the L1?
The combination of the above properties, and their implications, will lead to them being better-suited for individual L2s. There is also a conversation to be had about read speeds of each of these layers, but that's a more nascent discussion right now.
At present, I'd give them something like the below in terms of the four categories scaled to one another:
NOTE: Security isn't one-dimensional or 100% calculable, so I'm basing the above on Ethereum being most decentralized network, Celestia being live (but with less economic security) and EigenDA being a subset of ETH validators and younger than the other two.
OKAY, BUT WHO NEEDS THIS INFO ANYWAYS?
The next and final question in our DA overview. We know what we're storing (answers + work) and where each of those things are being stored (Ethereum + DA Layers), but who even cares about it and why?
Answer: you do.
Well, maybe not you specifically (right now), but having this data available enables individual users and a whole host of ecosystem participants the ability to:
Check L2 activity for fraudulent or malicious behavior (in order to prevent it)
Verify the current state of the chain
Retrieve chain history
It's actually quite impactful, albeit infrequently used (which is why it's a hot topic in L1<>L2 debate land).
CONCLUSION
Whew - we made it. Let's summarize:
Data Availability is like L2s "showing their work" to the rest of the ecosystem
This "work" is the history of transactions that took place since the last time it broadcasted information and can be sent to one or several Data Availability Layers depending on L2 needs
Users and ecosystem participants access the info to make sure no one is lying or being malicious on the L2
What do I ultimately think will happen in the DA landscape? I've actually been pretty public in the fact that I think most L2s will migrate to alternative DA solutions like Celestia and EigenDA because they're given both the incentives and excuses to.
Ethereum as a DA Layer won't scale fast enough to meet the demands of the most dominant L2s while those L2s will want to increase margins and reduce user mean tx cost at the cost of ~nothing technically.
The two elements Ethereum DA has over alternative DA layers (Security and Native Integration) are either over-indexed for (Security) or can be coded away (Integration). The latter point already is with Blobstream (Celestia) and EigenDA (albeit with trust assumptions - again over-indexed on how big of a hurdle that is).
We'll see it play out over the next year or so though.
Thanks for coming to my Bread Talks™️ - hope you learned something!
References, Interesting Discussions around DA:
[1] DA Read Assurances
https://x.com/sreeramkannan/status/1827908768887353429
[2] Inherited Security
https://x.com/gluk64/status/1840485157167628354
[3] Is DA a Commodity?
https://x.com/_weidai/status/1836163922849890390
[4] EthDA vs the Competition
https://x.com/0xBreadguy/status/1833986081714270523