What A Hash Function Is
A hash function takes an input of arbitrary length and returns an output of fixed length. "hello" becomes a 256-bit number. A 5 GB video file becomes the same 256 bits. Whether the input is one byte or one terabyte, SHA-256 always returns 256 bits.
This compression alone is not enough. Plenty of bad hash functions exist: CRC32 hashes anything to 32 bits, but you can easily find two inputs that collide. A cryptographic hash has stronger properties that make it useful for security work.
The Five Properties
A cryptographic hash function must satisfy all five of these. Drop any one and the function becomes unsuitable for security use.
| Property | What it means |
|---|---|
| Deterministic | The same input always produces the same output. No randomness, no timestamps, no surprises. |
| Fast to compute | Hashing a megabyte should take milliseconds, not minutes. Hardware acceleration is common. |
| One-way (preimage resistant) | Given a hash output, finding any input that produces it should be computationally infeasible. |
| Collision resistant | Finding two different inputs that produce the same hash output should be computationally infeasible. |
| Avalanche effect | Changing a single bit of the input should change roughly half the bits of the output, unpredictably. |
The first two are about utility: hashes need to be consistent and usable. The last three are about security: they make it computationally hard for an attacker to do useful things with hash outputs, like find inputs that match a given target.
Watch A Hash In Real Time
The interactive below computes a real SHA-256 hash as you type. Notice how the output is always exactly 64 hex characters (256 bits) regardless of how much you type, and how it looks like random noise even though it is fully deterministic.
Type anything. See its hash update with every keystroke.
This uses the real Web Crypto API, the same SHA-256 implementation your browser uses for TLS certificates. Try typing one character. Then erase it. The output looks completely different even though only one byte changed.
The Avalanche Effect
The avalanche effect is the single most surprising property of a cryptographic hash. Two inputs that differ by a single bit produce outputs that look completely unrelated. About half of the output bits will be different. There is no pattern; no proximity in input space corresponds to proximity in output space.
Change one character and watch every output bit scatter.
Two side-by-side inputs. They start identical. Edit either one. The right panel highlights every hex character that differs. The bit-difference count tells you what fraction of the output flipped.
The "fox" vs "fpx" pair differs by exactly one bit (the letter o at 0x6F vs p at 0x70, differing in three bits actually). The hashes share essentially no structure. This unpredictability is what makes hashes useful for fingerprinting, signing, and detecting tampering.
Hashes Are Not Encryption
This is one of the most common confusions in introductory security: hashes look like ciphertext, but they are not. The differences are fundamental.
| Encryption | Hashing | |
|---|---|---|
| Has a key? | Yes | No |
| Reversible? | Yes, with the right key | No, ever, by design |
| Output size | Same as input (roughly) | Fixed, regardless of input |
| Provides confidentiality? | Yes | No |
| Provides integrity? | Only if AEAD or with MAC | Yes, against accidental changes |
People say things like "I encrypted the password with SHA-256." That sentence has no meaning. SHA-256 is not encryption. There is no key. There is no decryption. Once you hash something, the original is gone. This is actually the point for password storage, but it has nothing to do with encryption.
Where Hashes Show Up
Almost everywhere in security, and many places outside it. A short tour of the rest of this track:
- Integrity verification: compute the hash of a file before and after transmission to confirm it arrived intact. Covered on the Integrity page.
- Digital signatures: nobody signs raw messages. They hash the message first, then sign the hash. Already covered on the Signatures page.
- Password storage: servers never store passwords. They store hashes of passwords. Covered on Password Hashing.
- Message authentication (MAC): hashes combined with a key produce HMAC, a primitive for proving messages came from a specific party. Covered on HMAC.
- Content addressing: in git, IPFS, and Docker, files are identified by the hash of their content. Same content, same address. Different content, different address. Covered on In The Wild.
- Blockchains: each block's hash is part of the next block. Tamper with any historical block and every subsequent hash breaks. Covered on In The Wild.