The Hash Iceberg
When you say "I want to verify this download," you reach for a hash. But hashes do far more than that. They are the addressing scheme of distributed systems, the integrity proof of blockchains, the version identifier of source control, the deduplication key of content stores, and the structural backbone of cryptographic protocols.
The recurring pattern is the same: same bytes in, same hash out. Different bytes in, different hash out. From that single property, hash users build content addressing, tamper detection, proof of work, fingerprinting, and structural integrity for arbitrarily large data sets.
Git: Every Object Is Addressed By Its Hash
git is a content-addressed object store with a graph layer on top. Every file (blob), every directory listing (tree), and every commit (commit) is stored as an object whose identifier is the SHA-1 of its content. Some bullet points:
- When you
git adda file, git computes its SHA-1, names the blob after that hash, and stores it under.git/objects/<first 2 chars>/<rest>. - A directory tree is a list of (mode, name, hash) entries, also hashed. Two directories with identical contents have identical tree hashes.
- A commit object references a tree hash, parent commit hashes, author, committer, and message. The commit's own hash covers all of those.
- Cloning a repo is downloading all the objects whose hashes you do not have. Pushing is uploading new objects. Hashes make the protocol stateless.
git is migrating from SHA-1 to SHA-256 (the SHA-256 transition has been in progress since 2018). Most modern git deployments still default to SHA-1 because the network of compatible tools is enormous, but new repos can opt into SHA-256 by default. The interoperability layer between the two is still maturing.
git was designed for collaboration, not adversarial integrity. The SHA-1 collision attacks require thousands of dollars of compute and specific input shapes; they do not work against arbitrary commits. Linus Torvalds famously argued that git is robust enough in practice. The SHA-256 migration is happening anyway, because betting against improving attacks is a poor long-term strategy.
Bitcoin: Hashes Everywhere
Bitcoin uses SHA-256 (and sometimes RIPEMD-160) for everything that needs an identifier or a proof:
- Block chaining. Each block contains the SHA-256 hash of the previous block. Modifying any historical block invalidates every subsequent block's hash. That is what "the chain" in "blockchain" means.
- Proof of work. A miner must find a nonce such that
SHA-256(SHA-256(block_header))begins with a number of zero bits set by the network difficulty. There is no shortcut: miners just try nonces until one works. - Merkle root. All transactions in a block are summarized by a single Merkle root hash (next section). Light clients can verify a transaction is in a block by receiving just the Merkle path, not the full block.
- Addresses. A Bitcoin address is a hash of a public key.
RIPEMD-160(SHA-256(public_key)). - Transaction IDs. Every transaction is identified by the double-SHA-256 of its serialization.
The proof-of-work hash race is also why Bitcoin mining ASICs exist. They do nothing but compute SHA-256 as fast as possible. Modern ASIC miners run at 100+ trillion hashes per second per chip.
The Merkle Tree
A Merkle tree hashes a list of items by pairs, then hashes the pairs by pairs, all the way to a single root hash. Any change to any input changes the root. Anyone who knows the root and is given the right "Merkle path" can verify a single item is included without needing the whole tree.
Merkle trees are everywhere: Bitcoin transactions, certificate transparency logs, ZFS filesystem checksums, BitTorrent piece verification, IPFS objects. The interactive below shows the construction with real SHA-256.
Edit any leaf and watch the root hash change
Eight leaf nodes hold pieces of data (think of them as files, transactions, or log entries). The widget computes SHA-256 of each leaf, then pairs them and hashes the concatenated hashes upward until a single root hash remains. Edit any leaf to tamper with the data. Every node from that leaf up to the root will turn red. Even one bit changed at the bottom propagates all the way to the top.
Docker Image Digests
A Docker image is a stack of filesystem layers plus a manifest that lists them. Everything is content-addressed by SHA-256:
- Each layer is a tarball. Its SHA-256 is its identifier in the registry.
- The manifest is a JSON document listing layer digests, config digest, and metadata. The manifest's SHA-256 is the image digest.
- You can pull by tag (
docker pull nginx:1.25) or by digest (docker pull nginx@sha256:abc123...). The digest form is immutable and cannot be silently swapped under you. The tag form can.
Production deployments almost always pin to digests. Tags can be retagged, but a digest mathematically cannot drift: the content that produces sha256:abc123 is the only content that produces sha256:abc123.
IPFS: A Content-Addressed Web
The InterPlanetary File System (IPFS) turns the entire web into a content-addressed store. Every file has an address that is a function of its contents (typically multihash-wrapped SHA-256 or BLAKE3). Identical content has identical addresses regardless of where it is stored.
Consequences:
- Two servers serving the same file serve it under the same address. The network deduplicates implicitly.
- Mutability is awkward (any change creates a new address), so IPFS adds a mutable name layer called IPNS on top.
- Verification is automatic: the act of looking up content by hash means the data must hash to that address or you reject it.
IPFS sees adoption in academic data publishing, NFT metadata, and Filecoin-backed permanent storage. Its content addressing model is the broader idea you'll see in any system that wants to identify data by what it is rather than where it came from.
TLS Certificate Fingerprints
An X.509 certificate's fingerprint is the SHA-256 of its DER-encoded bytes. Two certificates either have the same fingerprint (byte-identical) or they do not. Use cases:
- Certificate pinning: mobile apps embed the expected SHA-256 fingerprint of their server's cert. Any cert with a different fingerprint, even one signed by the same CA, is rejected.
- Certificate Transparency logs: certs are indexed by their fingerprint, allowing fast lookup of "has this exact cert been logged?"
- SSH host keys (analogous): on first connection, SSH shows you the fingerprint of the host key for verification. After that, the local
known_hostsfile pins it.
The fingerprint is not the same as the cert's signature. The fingerprint is just a hash, anyone can compute. The signature is the CA's signature over the cert's content, which only the CA can produce.
Subresource Integrity (SRI)
When a website loads a script from a CDN, it implicitly trusts the CDN to serve the right bytes. If the CDN is compromised or has its files swapped, the website silently executes whatever the attacker placed there. Subresource Integrity (a W3C standard, deployed in all major browsers) fixes this by pinning the expected hash directly in the HTML:
<script src="https://cdn.example.com/lib/jquery-3.7.1.min.js" integrity="sha384-1H217gwSVyLSIfaLxHbE7dRb3v4mYCKbpQvzx0cegeju1MVsGrX5xXxAvs/HgeFs" crossorigin="anonymous"> </script>
The browser fetches the script, computes its SHA-384, and compares against the integrity attribute. Mismatch → script is rejected. The CDN can be hacked tomorrow; your page still loads the exact jQuery you pinned today.
SRI is one of the simplest practical examples of hash-based integrity protection on the modern web. Three lines of HTML, complete protection against CDN substitution.
Wrapping The Track
Eight pages, one primitive. Every system on this page solves a different problem (version control, payment networks, image distribution, file sharing, server identity, script integrity) with the same answer: identify content by its cryptographic hash. The properties from the Foundations page (deterministic, collision-resistant, avalanche) directly enable every use case in the wild.
You now know enough about hashing to:
- Pick the right hash for a given use case (Foundations, Hash Functions, Password Hashing).
- Reason about collision and birthday attacks (Collisions).
- Verify file integrity correctly and recognize where authentication is also required (Integrity, HMAC).
- Store passwords without putting your users in the next LinkedIn-style breach (Password Hashing, Salting and KDFs).
- Read TLS, git, blockchain, and Docker documentation and understand the role of hashes throughout (this page).
The next track on the Codex (symmetric and asymmetric crypto) builds on these primitives. Hash functions appear in nearly every protocol there too. The math you learned here is permanent; it just keeps showing up.