Data Structures | Lesson 3 of 5

The decentralized web: Content addressing

As we've seen, the centralized web relies on trustworthy authorities to host our data and uses location-based URLs to access it. But there's another option. On the decentralized web, we can all host each other's data, with a different kind of linking that's more secure, making it easy to trust our neighbors.

Cryptographic hashing

Cryptographic hashing is the most important tool in the toolbox of decentralized data structures. It opens the door to a new form of linking, known as content addressing, that liberates us from reliance on central authorities.

Hashing takes data of any size and type and returns to you a single, fixed-size "hash" that represents it. A hash is a string of characters that looks like gobbledygook, but you can think of it as a unique name for the data. It might look something like this:

zdpuAsHkamdCQgrDrNSwJVgjMkQWoLxdrccxV6qe9htipNein

To be honest, these names aren't currently very friendly to humans (beagle.jpg was so much more descriptive!), but they're a lot more secure. Here's why:

Cryptographic hashes can be derived from the content of the data itself, meaning that anyone using the same algorithm on the same data will arrive at the same hash. If Ada and Grace are both using the same decentralized web protocol, such as IPFS, to share the exact same photo of a kitten, both images will have exactly the same hash. By comparing those hashes and confirming that they're the same, we can guarantee that every single pixel of those two photos is identical.

Cryptographic hashes are unique. If Grace uses Photoshop to remove a single whisker from that kitty, the updated image will have a new hash. Simply by looking at that hash, even without access to the file itself, it will be easy to tell that the file now contains different data.

Trust on the decentralized web

On the centralized web, we've learned to trust certain authorities and not others. We do our best with the clues we have from URLs, but there are some malicious actors who use the shortcomings of location addressing to trick us.

On the decentralized web, though, we all pitch in and host each other's data, and content addressing enables us to trust the information that's shared. We may not know much about the peers who are hosting data, but hashes can prevent malicious actors from deceiving us about the content of files. That's what makes cryptographic hashing so important to the decentralized web.

Asking peers for content

With traditional location addressing, we knew that we needed to visit the domain puppies.com to find the content stored as beagle.jpg. If the puppies.com domain were broken for some reason, we'd lose access to that image.

The decentralized web works differently. When we want a specific photo of an adorable pet, we ask for it by its content address (hash). Who do we ask? The whole network! If Ada is online, we'll see that she has the content we're looking for, and we'll know that it's exactly the file we need because it has a matching hash. If she goes offline, we may still be able to get the same photo from Grace or another peer.

Since we use hashes to request data on the decentralized web, we can think of a hash as a link, not just a name.