Hashing · 04

Hashes For Integrity

The most common use of cryptographic hashes is also the simplest: detecting whether a file or message changed in transit or at rest. Compute the hash before. Compute it after. If they match, nothing changed. If they differ, something did, and even a single bit of difference will reveal itself.

01

Why Integrity Matters

Networks corrupt data. Disks corrupt data. Memory corrupts data. People modify data. Sometimes you want to know whether the file you just downloaded is exactly the file the publisher sent. Hashes give you that answer in 256 bits.

A hash mismatch can mean several things:

A hash match almost certainly means the file is byte-identical to what the publisher hashed. The collision-resistance property of a good hash makes accidental matches with the wrong file astronomically unlikely.

02

The Pattern: Publish, Download, Hash, Compare

  1. The publisher computes the hash of the file and posts the hash alongside the download link.
  2. You download the file.
  3. You compute the hash of your downloaded copy locally.
  4. You compare the two hashes character by character.

If they match, the file is intact. If not, redownload or investigate.

The crucial detail: the published hash must come from a channel you trust. If an attacker can replace the download AND replace the published hash on the same page, the hash gives you nothing. This is why serious software projects post hashes on HTTPS sites, or sign the hashes with GPG (turning a plain hash into a signed manifest, which we'll see on the Signatures page).

Use caseWhat is hashedHow the hash is published
Linux ISO downloadThe .iso fileSHA-256 file on the project's HTTPS site, often signed with GPG.
apt / dpkg packagesEach .deb fileHashes in the Release file, which is itself signed by the repo's GPG key.
Docker imageEach layer and the manifestImage digest is a SHA-256 of the manifest. docker pull image@sha256:... is content-addressed.
npm packagesThe .tgz tarballSHA-512 stored in package-lock.json under integrity:.
git commitsThe commit object (tree + parents + author + message)The commit hash IS the commit's identifier.
03

Live Integrity Check

The interactive below simulates the download-and-verify pattern. The left panel holds an "original" file with a fixed hash. The right panel holds the "received" file, which you can edit. Any change at all to the right panel breaks the integrity check.

Interactive · Integrity Verification

Try to tamper with a file without breaking its hash

The publisher's hash on the left is fixed. Edit the received content on the right. The verdict updates with every keystroke. There is no way to change the content and keep the hash matching: that is the integrity guarantee.

Publisher\u2019s original file
SHA-256 (published on HTTPS site)
computing...
Your downloaded copy
SHA-256 (computed locally)
computing...
\u2713 INTEGRITY OK
The downloaded copy is byte-identical to the publisher\u2019s original. SHA-256 hashes match exactly.

Two takeaways from the demo. First, even a single invisible character flips the hash entirely (the avalanche effect from the Foundations page). Second, the integrity check is binary: it either matches or it does not. There is no partial credit, no proximity score, no "almost matches." A hash either is right or is wrong.

04

What Plain Hashes Cannot Detect

An unauthenticated hash protects only against changes between when the publisher hashed and when you verified. It says nothing about who published the hash. Consider this scenario:

  1. An attacker takes over the download mirror.
  2. They replace the legitimate file with a malicious one.
  3. They also replace the published SHA256SUMS file with the hash of the malicious file.
  4. You download both. Your local hash matches the published hash. The check "passes."

Plain hashes solve integrity, not authenticity. To prove that the published hash came from the legitimate publisher (and not the attacker), you need a signature on the hash, which means the publisher used their private key to sign the SHA256SUMS file. That is where the Signatures page picks up.

The "hash file with a signature" pattern

Most serious software distributions ship a SHA256SUMS file and a SHA256SUMS.asc file (the PGP signature on it). The right verification flow is:
1. Fetch SHA256SUMS and SHA256SUMS.asc.
2. Verify the signature with the publisher's public PGP key.
3. Then compute the file's hash and check it against SHA256SUMS.
Skipping step 2 is the most common reason hash-based integrity checks fail to prevent attacks in practice.

05

Content-Addressed Storage: git, Docker, IPFS

A more sophisticated use of hashes for integrity is to use the hash as the file's identifier. Instead of storing files by name or path, you store them by their content hash. Same content, same name. Different content, different name. This pattern is called content addressing.

Content addressing combines integrity and deduplication. Two identical files automatically share storage. Tampering is automatically detected on retrieval. The downside is that any change to a file produces a new identifier, which is fine for immutable artifacts (release tags, signed commits) but awkward for mutable data.

06

When A Hash Alone Is Enough

Hashes-without-signatures are still useful when:

If any of those conditions are absent and an attacker could replace both the file and its hash, you have authenticity questions to answer, not integrity ones.