The Rainbow Table Problem
Suppose a site hashes passwords with bcrypt but uses no salt. Two users with the same password get the same hash. An attacker who steals the database immediately sees which accounts share passwords by grouping identical hashes.
Worse, the attacker can precompute a giant table of (password, hash) pairs offline using the same hash function. The precomputation might take a year. Once done, looking up any leaked hash is instant. This precomputed lookup structure is called a rainbow table, and free rainbow tables for unsalted MD5, SHA-1, and SHA-256 covering the top 10 million common passwords exist on GitHub today.
The attack:
- Adversary builds (or downloads) a table mapping every common password to its hash.
- Adversary steals a database of unsalted hashes.
- For each row, look up the hash in the table.
- Every password that appears in the table is recovered in microseconds.
What A Salt Does
A salt is a random value, unique per user, combined with the password before hashing. The salt is stored next to the hash. It does not need to be secret.
Storage shape:
user_id salt hash ------- ---- ---- alice R3#x9!fL2qmKp 8f2a3... bob z@nVj8H4tAcU c91e7... carol R3#x9!fL2qmKp d75b2... (collision: alice's salt by chance) dave kP$rW7eXz5Bn 8f2a3... (same hash as alice by chance)
What the salt accomplishes:
- Precomputation breaks. A rainbow table indexed by hash no longer works, because the attacker would have to precompute a separate table for every possible salt. With 16-byte (128-bit) random salts, that is 2^128 tables. Infeasible.
- Same password no longer collides. Alice and Carol both use "summer2024" but they have different salts so their hashes are different. The attacker cannot tell which accounts share passwords by inspection.
- Each password must be cracked individually. The attacker still has to brute-force each leaked row, but they cannot amortize across the database.
If the slow-hash work factor is set correctly (per the Password Hashing page), and every user has a unique salt, even a leaked database of 100 million rows costs the attacker about 100 million password cracks, not 1. The economics flip.
Live: Rainbow Table Defeated By Salt
The interactive shows three users on the left who all chose the same weak password, with no salt. Their hashes are identical, so a single rainbow-table hit cracks all three. On the right, the same three users with random per-user salts produce three different hashes, so each must be attacked separately.
Same password, three users, two scenarios
Click Recompute with new random salts to generate fresh salts for the right panel. The left panel never changes: without a salt, the three identical passwords produce three identical hashes, and one rainbow lookup wins them all. With per-user salts, the hashes diverge entirely even though the password is the same.
Salt Storage: It Goes Right Next To The Hash
A common misconception is that salts should be kept secret. They should not. Salts work even when the attacker has them. The whole defense is precomputation-resistance, not secrecy. Hide a salt and you've added complexity for no benefit.
Modern password hash formats encode the salt inline with the hash output:
$argon2id$v=19$m=19456,t=2,p=1$IEvxhuekN+8B5RxRMz6XnQ$cMtq8VqcFiRrLeRfb+5gZQ
^ ^
| |
salt (base64) hash (base64)
The library reads the salt from the stored string when verifying. No separate column, no separate storage. The format is self-describing: algorithm, version, parameters, salt, and hash all in one field.
Implementation rules:
- Random salt per user. Generate with a cryptographic RNG (
secrets.token_bytes(16)in Python,crypto.randomBytes(16)in Node). - At least 16 bytes (128 bits). Less than that and birthday collisions among salts become non-negligible at large user counts.
- Generated server-side. Never let the client supply the salt.
- Stored alongside the hash. Standard formats handle this automatically.
Pepper: The Debated Optional Extra
A pepper is an additional secret value, the same for every user, combined with the password before hashing. Unlike a salt, the pepper is kept in a different location: ideally a hardware security module, a separate vault, or an environment variable that never appears in the database backup.
The argument for pepper:
- If only the database leaks (the most common breach pattern), all the password hashes become useless because the pepper is not in the dump. The attacker cannot even brute-force the most common passwords without the pepper.
- It is a low-cost defense-in-depth measure.
The argument against:
- Once an attacker gets both the database and the pepper (which is often easier than getting just one), the pepper is useless.
- Operational overhead: pepper rotation is hard, and you cannot rotate it without re-hashing every password.
- Increased complexity, more places things can go wrong.
OWASP's current position is that pepper is optional and "useful for additional defense-in-depth," but it is not a substitute for strong password hashing with proper salts.
Key Derivation Functions (KDFs)
A password hash is a special case of a more general primitive: a key derivation function, or KDF. The general problem is: take an input that has some entropy but is not directly usable as a cryptographic key, and stretch or shape it into one or more uniform random-looking keys.
Two flavors:
| KDF type | Input | Examples | Use case |
|---|---|---|---|
| Password-based KDF | Human password (low entropy) | PBKDF2, scrypt, Argon2, bcrypt | Storing password hashes, deriving disk-encryption keys from passphrases (LUKS, VeraCrypt, age) |
| Extract-and-expand KDF | Cryptographic input (high entropy) | HKDF, KMAC, KMAC-XOF | Deriving session keys from a shared secret in TLS 1.3, Signal, WireGuard |
Password-based KDFs are slow on purpose, because their input is low-entropy and an attacker who steals the output can brute-force the input. Extract-and-expand KDFs are fast, because their input is already cryptographically strong; their job is just to format the bits.
HKDF: The Modern Workhorse
HKDF, defined in RFC 5869, is the standard extract-and-expand KDF used in modern protocols. It is built entirely on HMAC. The construction is two steps:
- Extract:
PRK = HMAC(salt, IKM)where IKM is the input key material and PRK is a pseudorandom intermediate key. - Expand: iteratively apply HMAC with PRK as the key to produce as many output bytes as needed, with an info string mixed in to support multiple derived keys from the same IKM.
Where you encounter HKDF:
- TLS 1.3: the entire key schedule (early secrets, handshake secrets, application secrets, traffic keys, IVs, exporter secrets) is HKDF chained. See the TLS Handshake page.
- Signal protocol: the double-ratchet uses HKDF on every message to derive a fresh message key from the chain key.
- WireGuard: session keys derived from ECDH outputs via HKDF.
- QUIC: same key schedule mechanics as TLS 1.3.
- HPKE (RFC 9180): the modern hybrid public-key encryption standard. HKDF for every derived key.
The reason HKDF appears in so many designs: it is simple, fast, formally analyzed, and its underlying primitive (HMAC) is universally available. When a new protocol needs "turn this shared secret into N independent keys," reaching for HKDF is the standard move.