Hashing · 03

Collisions and the Birthday Attack

Every hash function has collisions. There are infinitely many possible inputs and only finitely many possible outputs. The pigeonhole principle guarantees it. The whole game in cryptographic hash design is to make collisions exist mathematically while being computationally impossible to find. When that game has been lost, we have called the hash function broken.

01

The Pigeonhole Principle

If you put 11 pigeons into 10 holes, at least one hole has two pigeons in it. That is the pigeonhole principle. It is so obvious that it sounds like a joke, but its implication for hashing is profound.

SHA-256 produces 2^256 possible outputs. That is a 78-digit number. But the input space is unbounded: there are infinitely many possible files, messages, and data values. By pigeonhole, infinitely many pairs of inputs must map to the same output. Collisions exist.

The question is not whether collisions exist. It is whether anyone can find one. For a good cryptographic hash, the answer should be: only by brute-force search over astronomical numbers of attempts.

02

The Birthday Paradox

Brute-force search is not as expensive as you might think. The intuition trap is the birthday paradox.

In a room of how many people is there a 50% chance that two share a birthday? Most people guess about 180 (half of 365). The correct answer is 23. The reason is that you are not comparing each person to one specific birthday; you are comparing every pair. With 23 people, you have 253 pairs, and the probability that any of them matches reaches 50%.

The same math applies to hash collisions. To find any collision in an N-bit hash, you do not need 2^N attempts. You need about 2^(N/2). That square-root is the birthday-attack advantage.

Interactive · Birthday Collision Calculator

How many hash outputs do you need before a collision becomes likely?

Drag the slider to set the hash output size in bits. The widget computes the number of items you'd need to hash to have a 50% chance of finding a collision, and how long that would take on a fast machine.

Items needed for 50% collision probability
264
computing...
03

Birthday Attacks On Hash Functions

The birthday math defines the actual security level of a hash function against collision attacks:

Hash outputBrute-force preimageBirthday collision
MD5: 128 bits2^128 (impossible)2^64 (within reach but mostly theoretical)
SHA-1: 160 bits2^160 (impossible)2^80 (was the boundary in 2005)
SHA-256: 256 bits2^256 (impossible)2^128 (still infeasible)
SHA-512: 512 bits2^512 (impossible)2^256 (utterly infeasible)

When you read that a hash provides "n-bit security against collisions," it almost always means n/2 actual bits, because of the birthday bound. This is why crypto people insist on 256-bit hashes when they want "128-bit security": the collision bound is half the output size.

Why this matters for hash function design

To stand up to a 128-bit attacker, your hash needs at least a 256-bit output. To stand up to a 256-bit attacker (post-quantum-style), you need 512 bits. This is exactly why SHA-512 and SHA-3-512 exist. The output doubles to compensate for the square-root advantage of birthday attacks.

04

Find A Collision Yourself

The interactive below truncates SHA-256 to a small number of bits, then hashes random strings until two of them collide. With 16 bits of output, the expected collision count is about 256 attempts. With 24 bits, about 4,096. You can watch the birthday math happen in real time.

Interactive · Live Collision Hunt

Hash random strings, see when two collide

Pick a truncation size with the slider. Click Start to begin hashing random strings. The log shows each one. When two hashes match, the colliding pair appears in red. The expected attempt count is about 2^(N/2) where N is the truncation size.

0
attempts
0
unique hashes seen
2^8
expected for collision
idle
status
Click "Start" to begin hashing.
05

MD5 Collisions, And Why They Matter

The first practical MD5 collision was demonstrated by Wang et al. in 2004. Two distinct 128-byte messages with the same MD5 hash. The attack ran in about an hour on a desktop machine.

Bare collisions sound abstract until someone weaponizes them:

Two flavors of collision attack matter in practice:

06

SHAttered: SHA-1 Falls (2017)

On February 23, 2017, Google and CWI Amsterdam published the first SHA-1 collision: two PDF files with the same SHA-1 hash, different visible content. They named the attack SHAttered.

By 2017, the browser industry had already been preparing for this. Chrome, Firefox, Edge, and Safari had all announced SHA-1 certificate rejection schedules between 2014 and 2017. The CA/Browser Forum had banned issuing new SHA-1 certificates after 2015. SHAttered was the long-predicted event for which everyone was ready.

In 2020, Leurent and Peyrin extended the attack to a chosen-prefix collision on SHA-1, costing about $45,000 of GPU time. This is the variant that breaks signature schemes. Anyone still depending on SHA-1 in 2026 has done so against persistent warnings.

07

Defending Against Collision Attacks