Collisions and the Birthday Attack · Rolling Thunder Security Codex

The Pigeonhole Principle

If you put 11 pigeons into 10 holes, at least one hole has two pigeons in it. That is the pigeonhole principle. It is so obvious that it sounds like a joke, but its implication for hashing is profound.

SHA-256 produces 2^256 possible outputs. That is a 78-digit number. But the input space is unbounded: there are infinitely many possible files, messages, and data values. By pigeonhole, infinitely many pairs of inputs must map to the same output. Collisions exist.

The question is not whether collisions exist. It is whether anyone can find one. For a good cryptographic hash, the answer should be: only by brute-force search over astronomical numbers of attempts.

The Birthday Paradox

Brute-force search is not as expensive as you might think. The intuition trap is the birthday paradox.

In a room of how many people is there a 50% chance that two share a birthday? Most people guess about 180 (half of 365). The correct answer is 23. The reason is that you are not comparing each person to one specific birthday; you are comparing every pair. With 23 people, you have 253 pairs, and the probability that any of them matches reaches 50%.

The same math applies to hash collisions. To find any collision in an N-bit hash, you do not need 2^N attempts. You need about 2^(N/2). That square-root is the birthday-attack advantage.

Interactive · Birthday Collision Calculator

How many hash outputs do you need before a collision becomes likely?

Drag the slider to set the hash output size in bits. The widget computes the number of items you'd need to hash to have a 50% chance of finding a collision, and how long that would take on a fast machine.

Hash output size: 128 bits

Items needed for 50% collision probability

2⁶⁴

computing...

Birthday Attacks On Hash Functions

The birthday math defines the actual security level of a hash function against collision attacks:

Hash output	Brute-force preimage	Birthday collision
MD5: 128 bits	2^128 (impossible)	2^64 (within reach but mostly theoretical)
SHA-1: 160 bits	2^160 (impossible)	2^80 (was the boundary in 2005)
SHA-256: 256 bits	2^256 (impossible)	2^128 (still infeasible)
SHA-512: 512 bits	2^512 (impossible)	2^256 (utterly infeasible)

When you read that a hash provides "n-bit security against collisions," it almost always means n/2 actual bits, because of the birthday bound. This is why crypto people insist on 256-bit hashes when they want "128-bit security": the collision bound is half the output size.

Why this matters for hash function design

To stand up to a 128-bit attacker, your hash needs at least a 256-bit output. To stand up to a 256-bit attacker (post-quantum-style), you need 512 bits. This is exactly why SHA-512 and SHA-3-512 exist. The output doubles to compensate for the square-root advantage of birthday attacks.

Find A Collision Yourself

The interactive below truncates SHA-256 to a small number of bits, then hashes random strings until two of them collide. With 16 bits of output, the expected collision count is about 256 attempts. With 24 bits, about 4,096. You can watch the birthday math happen in real time.

Interactive · Live Collision Hunt

Hash random strings, see when two collide

Pick a truncation size with the slider. Click Start to begin hashing random strings. The log shows each one. When two hashes match, the colliding pair appears in red. The expected attempt count is about 2^(N/2) where N is the truncation size.

Truncate hash to: 16 bits

attempts

unique hashes seen

2^8

expected for collision

idle

status

Click "Start" to begin hashing.

MD5 Collisions, And Why They Matter

The first practical MD5 collision was demonstrated by Wang et al. in 2004. Two distinct 128-byte messages with the same MD5 hash. The attack ran in about an hour on a desktop machine.

Bare collisions sound abstract until someone weaponizes them:

2008 rogue CA certificate: Sotirov, Stevens, Appelbaum, Lenstra, Molnar, Osvik, and de Weger generated a pair of certificate signing requests that produced colliding MD5 hashes when one was for a normal website and the other was for a rogue intermediate CA. They got RapidSSL to sign the first. Because the MD5s collided, the signature was valid on both. They now had a CA cert browsers trusted.
2012 Flame malware: Suspected to be a nation-state operation. Used a chosen-prefix MD5 collision against the Microsoft Terminal Server Licensing Service certificate to forge a code-signing certificate trusted by Windows. The malware then masqueraded as a Microsoft update.
Many file integrity scanners still use MD5 in their default configurations. An attacker who can place a file in scanned storage can craft a "good" file and a "bad" file with the same MD5, then swap them after the scan.

Two flavors of collision attack matter in practice:

Identical-prefix: find two messages M1 and M2 that produce the same hash. Both messages are entirely under attacker control. Easy for MD5, expensive but feasible for SHA-1.
Chosen-prefix: given two arbitrary prefixes P1 and P2 (chosen by the victim), find suffixes S1 and S2 such that hash(P1||S1) = hash(P2||S2). This is what makes attacks on certificates work, because you cannot control the prefix; the CA does. Both MD5 and SHA-1 have practical chosen-prefix attacks.

SHAttered: SHA-1 Falls (2017)

On February 23, 2017, Google and CWI Amsterdam published the first SHA-1 collision: two PDF files with the same SHA-1 hash, different visible content. They named the attack SHAttered.

The collision used 9 quintillion (9.2 × 10^18) SHA-1 computations.
Cost on Google Cloud: about $110,000.
Compared to brute-force (2^80), the attack was 100,000x faster.
Compared to running on a single CPU, the attack would have taken 6,500 years.

By 2017, the browser industry had already been preparing for this. Chrome, Firefox, Edge, and Safari had all announced SHA-1 certificate rejection schedules between 2014 and 2017. The CA/Browser Forum had banned issuing new SHA-1 certificates after 2015. SHAttered was the long-predicted event for which everyone was ready.

In 2020, Leurent and Peyrin extended the attack to a chosen-prefix collision on SHA-1, costing about $45,000 of GPU time. This is the variant that breaks signature schemes. Anyone still depending on SHA-1 in 2026 has done so against persistent warnings.

Defending Against Collision Attacks

Use bigger hashes. SHA-256 gives 128 bits of collision resistance, far beyond what any conceivable attacker can do. SHA-512 gives 256.
Use families with different internal structures. SHA-3 was standardized as a backup precisely so that a structural break in SHA-2 would not affect SHA-3.
Use hashes with chosen-prefix resistance proofs. The newest hash designs are evaluated against chosen-prefix attacks as a baseline requirement.
Pin the algorithm in protocol design. Many MD5 / SHA-1 attacks succeeded because the protocol allowed the hash function to be negotiated, and downgrade attacks selected the weakest option. Modern protocols hardcode SHA-256 or SHA-3.
Add a counter-collision step. Git, faced with the SHA-1 collision problem and unable to fully migrate yet, uses a tool called SHA-1DC (detect-collision) that watches for the specific block patterns used in known SHA-1 attacks and rejects them.