Why Integrity Matters
Networks corrupt data. Disks corrupt data. Memory corrupts data. People modify data. Sometimes you want to know whether the file you just downloaded is exactly the file the publisher sent. Hashes give you that answer in 256 bits.
A hash mismatch can mean several things:
- Network transmission errors corrupted bytes in flight.
- A mirror server is serving an old or different version.
- Disk corruption flipped bits during storage.
- An attacker substituted a malicious version (this is the threat case).
A hash match almost certainly means the file is byte-identical to what the publisher hashed. The collision-resistance property of a good hash makes accidental matches with the wrong file astronomically unlikely.
The Pattern: Publish, Download, Hash, Compare
- The publisher computes the hash of the file and posts the hash alongside the download link.
- You download the file.
- You compute the hash of your downloaded copy locally.
- You compare the two hashes character by character.
If they match, the file is intact. If not, redownload or investigate.
The crucial detail: the published hash must come from a channel you trust. If an attacker can replace the download AND replace the published hash on the same page, the hash gives you nothing. This is why serious software projects post hashes on HTTPS sites, or sign the hashes with GPG (turning a plain hash into a signed manifest, which we'll see on the Signatures page).
| Use case | What is hashed | How the hash is published |
|---|---|---|
| Linux ISO download | The .iso file | SHA-256 file on the project's HTTPS site, often signed with GPG. |
| apt / dpkg packages | Each .deb file | Hashes in the Release file, which is itself signed by the repo's GPG key. |
| Docker image | Each layer and the manifest | Image digest is a SHA-256 of the manifest. docker pull image@sha256:... is content-addressed. |
| npm packages | The .tgz tarball | SHA-512 stored in package-lock.json under integrity:. |
| git commits | The commit object (tree + parents + author + message) | The commit hash IS the commit's identifier. |
Live Integrity Check
The interactive below simulates the download-and-verify pattern. The left panel holds an "original" file with a fixed hash. The right panel holds the "received" file, which you can edit. Any change at all to the right panel breaks the integrity check.
Try to tamper with a file without breaking its hash
The publisher's hash on the left is fixed. Edit the received content on the right. The verdict updates with every keystroke. There is no way to change the content and keep the hash matching: that is the integrity guarantee.
Two takeaways from the demo. First, even a single invisible character flips the hash entirely (the avalanche effect from the Foundations page). Second, the integrity check is binary: it either matches or it does not. There is no partial credit, no proximity score, no "almost matches." A hash either is right or is wrong.
What Plain Hashes Cannot Detect
An unauthenticated hash protects only against changes between when the publisher hashed and when you verified. It says nothing about who published the hash. Consider this scenario:
- An attacker takes over the download mirror.
- They replace the legitimate file with a malicious one.
- They also replace the published
SHA256SUMSfile with the hash of the malicious file. - You download both. Your local hash matches the published hash. The check "passes."
Plain hashes solve integrity, not authenticity. To prove that the published hash came from the legitimate publisher (and not the attacker), you need a signature on the hash, which means the publisher used their private key to sign the SHA256SUMS file. That is where the Signatures page picks up.
Most serious software distributions ship a SHA256SUMS file and a SHA256SUMS.asc file (the PGP signature on it). The right verification flow is:
1. Fetch SHA256SUMS and SHA256SUMS.asc.
2. Verify the signature with the publisher's public PGP key.
3. Then compute the file's hash and check it against SHA256SUMS.
Skipping step 2 is the most common reason hash-based integrity checks fail to prevent attacks in practice.
Content-Addressed Storage: git, Docker, IPFS
A more sophisticated use of hashes for integrity is to use the hash as the file's identifier. Instead of storing files by name or path, you store them by their content hash. Same content, same name. Different content, different name. This pattern is called content addressing.
- git: Every commit, tree, and blob is named by its SHA-1 (transitioning to SHA-256). When you check out a commit, git verifies the hash of every retrieved object. Any single bit corruption is detected automatically.
- Docker: Image layers are content-addressed by SHA-256. Pulling
nginx:1.25resolves to a specific manifest hash, and you can pin it:nginx@sha256:1234.... - IPFS: Every file's address is its content hash. Two different servers serving the same file serve it under the same address.
Content addressing combines integrity and deduplication. Two identical files automatically share storage. Tampering is automatically detected on retrieval. The downside is that any change to a file produces a new identifier, which is fine for immutable artifacts (release tags, signed commits) but awkward for mutable data.
When A Hash Alone Is Enough
Hashes-without-signatures are still useful when:
- The threat is accidental corruption, not malicious tampering. Network glitches, disk bit-rot, RAM errors. Any unauthenticated hash detects these with overwhelming probability.
- The hash comes from a side channel. Reading a SHA-256 sum out of an email from a trusted colleague over Signal is, in practice, a signature on the value.
- The integrity check is part of a larger signed manifest. A Linux package manager downloads a signed Release file containing hashes of every individual package. The signature is on the Release file; the integrity of each package falls out for free.
If any of those conditions are absent and an attacker could replace both the file and its hash, you have authenticity questions to answer, not integrity ones.