Hashing · 08

Hashes In The Wild

Every piece you have seen in this track converges in real systems. Git stores every file by its hash. Docker pulls images by SHA-256 digest. Bitcoin chains blocks together through SHA-256. TLS certificates publish their fingerprints. Browsers verify CDN-hosted scripts via Subresource Integrity. This page traces those uses through the same lens: a single cryptographic primitive, deployed everywhere the system needs to know whether two things are byte-identical.

01

The Hash Iceberg

When you say "I want to verify this download," you reach for a hash. But hashes do far more than that. They are the addressing scheme of distributed systems, the integrity proof of blockchains, the version identifier of source control, the deduplication key of content stores, and the structural backbone of cryptographic protocols.

The recurring pattern is the same: same bytes in, same hash out. Different bytes in, different hash out. From that single property, hash users build content addressing, tamper detection, proof of work, fingerprinting, and structural integrity for arbitrarily large data sets.

02

Git: Every Object Is Addressed By Its Hash

git is a content-addressed object store with a graph layer on top. Every file (blob), every directory listing (tree), and every commit (commit) is stored as an object whose identifier is the SHA-1 of its content. Some bullet points:

git is migrating from SHA-1 to SHA-256 (the SHA-256 transition has been in progress since 2018). Most modern git deployments still default to SHA-1 because the network of compatible tools is enormous, but new repos can opt into SHA-256 by default. The interoperability layer between the two is still maturing.

Why git uses SHA-1 against an adversary

git was designed for collaboration, not adversarial integrity. The SHA-1 collision attacks require thousands of dollars of compute and specific input shapes; they do not work against arbitrary commits. Linus Torvalds famously argued that git is robust enough in practice. The SHA-256 migration is happening anyway, because betting against improving attacks is a poor long-term strategy.

03

Bitcoin: Hashes Everywhere

Bitcoin uses SHA-256 (and sometimes RIPEMD-160) for everything that needs an identifier or a proof:

The proof-of-work hash race is also why Bitcoin mining ASICs exist. They do nothing but compute SHA-256 as fast as possible. Modern ASIC miners run at 100+ trillion hashes per second per chip.

04

The Merkle Tree

A Merkle tree hashes a list of items by pairs, then hashes the pairs by pairs, all the way to a single root hash. Any change to any input changes the root. Anyone who knows the root and is given the right "Merkle path" can verify a single item is included without needing the whole tree.

Merkle trees are everywhere: Bitcoin transactions, certificate transparency logs, ZFS filesystem checksums, BitTorrent piece verification, IPFS objects. The interactive below shows the construction with real SHA-256.

Interactive · Merkle Tree

Edit any leaf and watch the root hash change

Eight leaf nodes hold pieces of data (think of them as files, transactions, or log entries). The widget computes SHA-256 of each leaf, then pairs them and hashes the concatenated hashes upward until a single root hash remains. Edit any leaf to tamper with the data. Every node from that leaf up to the root will turn red. Even one bit changed at the bottom propagates all the way to the top.

Merkle Root (SHA-256)
computing...
Tamper detected. One or more leaves have been modified. Every node on the path from those leaves to the root reflects the change. The Merkle root no longer matches the original, so any verifier comparing against the published root knows the data is no longer authentic.
05

Docker Image Digests

A Docker image is a stack of filesystem layers plus a manifest that lists them. Everything is content-addressed by SHA-256:

Production deployments almost always pin to digests. Tags can be retagged, but a digest mathematically cannot drift: the content that produces sha256:abc123 is the only content that produces sha256:abc123.

06

IPFS: A Content-Addressed Web

The InterPlanetary File System (IPFS) turns the entire web into a content-addressed store. Every file has an address that is a function of its contents (typically multihash-wrapped SHA-256 or BLAKE3). Identical content has identical addresses regardless of where it is stored.

Consequences:

IPFS sees adoption in academic data publishing, NFT metadata, and Filecoin-backed permanent storage. Its content addressing model is the broader idea you'll see in any system that wants to identify data by what it is rather than where it came from.

07

TLS Certificate Fingerprints

An X.509 certificate's fingerprint is the SHA-256 of its DER-encoded bytes. Two certificates either have the same fingerprint (byte-identical) or they do not. Use cases:

The fingerprint is not the same as the cert's signature. The fingerprint is just a hash, anyone can compute. The signature is the CA's signature over the cert's content, which only the CA can produce.

08

Subresource Integrity (SRI)

When a website loads a script from a CDN, it implicitly trusts the CDN to serve the right bytes. If the CDN is compromised or has its files swapped, the website silently executes whatever the attacker placed there. Subresource Integrity (a W3C standard, deployed in all major browsers) fixes this by pinning the expected hash directly in the HTML:

<script
  src="https://cdn.example.com/lib/jquery-3.7.1.min.js"
  integrity="sha384-1H217gwSVyLSIfaLxHbE7dRb3v4mYCKbpQvzx0cegeju1MVsGrX5xXxAvs/HgeFs"
  crossorigin="anonymous">
</script>

The browser fetches the script, computes its SHA-384, and compares against the integrity attribute. Mismatch → script is rejected. The CDN can be hacked tomorrow; your page still loads the exact jQuery you pinned today.

SRI is one of the simplest practical examples of hash-based integrity protection on the modern web. Three lines of HTML, complete protection against CDN substitution.

09

Wrapping The Track

Eight pages, one primitive. Every system on this page solves a different problem (version control, payment networks, image distribution, file sharing, server identity, script integrity) with the same answer: identify content by its cryptographic hash. The properties from the Foundations page (deterministic, collision-resistant, avalanche) directly enable every use case in the wild.

You now know enough about hashing to:

The next track on the Codex (symmetric and asymmetric crypto) builds on these primitives. Hash functions appear in nearly every protocol there too. The math you learned here is permanent; it just keeps showing up.