Why Key Management Is The Hard Part
AES has not been broken in 25 years. The same cannot be said for the systems that hold AES keys. The vast majority of real-world cryptographic failures happen not in the algorithm but in the surrounding plumbing.
Every symmetric cipher assumes the key is secret, random, and known only to the parties who need it. Every one of those three properties is something an engineer has to actively enforce. A key written to a config file, checked into Git, mailed in plaintext, or generated from the system clock fails the assumption before AES even runs.
Key management is the discipline of answering five questions for every key in a system:
- How was it generated, and is the source of randomness trustworthy?
- How is it distributed to the parties that need it?
- Where is it stored while in use, and who can read it?
- When and how is it rotated or revoked?
- How is it destroyed when it is no longer needed?
Every modern compliance framework (FIPS 140-3, PCI DSS, HIPAA, SOC 2, NIST SP 800-57) is, at its core, a checklist for these five questions.
Generating Keys
A symmetric key is just a string of random bits. AES-128 needs 128 of them, AES-256 needs 256. The hard part is the word random.
Keys must come from a cryptographically secure pseudo-random number generator (CSPRNG). The operating system provides one. On Linux it is /dev/urandom or the getrandom() syscall. On Windows it is BCryptGenRandom. Most language standard libraries wrap these: Python secrets.token_bytes(32), Go crypto/rand, Node crypto.randomBytes(32).
What never works:
- The non-cryptographic RNG built into most languages (Python
random, JavaMath.random, Crand()). These are predictable. - Hashing a timestamp, a username, or anything else with low entropy.
- Asking the user to type the key, unless that string is then fed through a proper key derivation function (see next section).
In 2008, Debian shipped an OpenSSL with a patched random number generator that produced only 32,767 distinct values. Every SSH key, TLS certificate, and DNSSEC key generated on a Debian system for almost two years was one of about 32,000 possible keys. They were instantly enumerable. The cipher was fine. The randomness was not.
Key Derivation Functions
Users do not pick 256-bit random keys. They pick passwords. The job of a key derivation function (KDF) is to convert a low-entropy human password into a high-entropy, fixed-length symmetric key.
A KDF takes three inputs: the password, a salt (random bytes stored alongside the output), and a cost parameter that controls how expensive the derivation is. The expense is the point. If deriving a key takes 100 milliseconds for a legitimate user, it also takes 100 milliseconds for an attacker testing each password from a dictionary.
| KDF | Year | Cost knob | Status |
|---|---|---|---|
| PBKDF2 | 2000 | Iteration count | Acceptable, but GPU-friendly. Use only with 600,000+ iterations. |
| bcrypt | 1999 | Work factor (log2) | Still solid for passwords. Limited to 72-byte input. |
| scrypt | 2009 | N, r, p (CPU and memory) | Memory-hard. Resists custom hardware better than PBKDF2. |
| Argon2id | 2015 | Time, memory, parallelism | The current recommendation. Winner of the Password Hashing Competition. |
Argon2id is the default to reach for in any new system. PBKDF2 still appears in protocols (WPA2, TLS 1.2, many disk encryption tools) because it predates the others and is built into countless standards.
The salt is not secret. It exists to make two users with the same password produce two different stored values. Without a unique salt per user, an attacker can compute a single dictionary against all users at once. The salt should be at least 16 random bytes per password and stored alongside the derived key.
Storing Keys
Once a key exists, the question becomes where to put it. The wrong answers are obvious in retrospect and embarrassingly common in practice.
Source code. Configuration files committed to version control. Environment variables logged by error trackers. Database tables alongside the data they protect. Shared network drives. Slack messages. Email attachments. Screenshots in tickets.
The right answers all share a property: the key is held in a place that exposes only the operations you can perform with it, never the bytes themselves.
| Mechanism | What it is | When to use it |
|---|---|---|
| HSM | Hardware Security Module. A tamper-resistant box that generates and uses keys but refuses to export them. | Root keys, certificate authority signing keys, anything regulated (PCI, FIPS). |
| Cloud KMS | AWS KMS, Azure Key Vault, Google Cloud KMS. Managed HSM-backed services with an API. | Application-level encryption keys, envelope encryption schemes. |
| TPM | Trusted Platform Module. A small chip on the motherboard that holds keys tied to the machine. | Disk encryption keys (BitLocker, LUKS with TPM), platform attestation. |
| Secure enclave | Apple Secure Enclave, Android StrongBox, Intel SGX. A separate processor with its own memory. | Mobile device keys, biometric template protection, Touch ID and Face ID. |
| Secrets manager | HashiCorp Vault, AWS Secrets Manager, 1Password Service Accounts. | Application secrets, database credentials, API tokens. Not a replacement for a true HSM. |
For a typical application: cloud KMS holds the root key, the secrets manager holds the day-to-day credentials, and individual disk volumes use TPM-sealed keys. No human reads any of those bytes directly.
The Key Lifecycle
Every key goes through the same six phases from creation to destruction. NIST SP 800-57 formalizes them; the diagram below shows the practical version.
| Phase | What happens |
|---|---|
| Generate | A CSPRNG produces the key. It is registered in a key store and tagged with an ID, an algorithm, and a creation timestamp. |
| Distribute | The key, or a wrapped copy of it, is delivered to the systems that need it. Usually over an authenticated channel like TLS, or via a KMS that exposes the operation rather than the bytes. |
| Use | The key encrypts and decrypts data. Usage is logged. The key is held only in protected memory. |
| Rotate | A new key is generated and takes over for new operations. The old key stays available to decrypt anything it previously encrypted. |
| Revoke | The old key is marked as no longer trusted. It cannot be used for new operations. Existing data may still need to be decrypted and re-encrypted under the new key. |
| Destroy | The key bytes are securely erased from every store. Auditable. Irreversible. After this point, anything still encrypted under the key is unrecoverable. |
Key Hierarchies: KEK and DEK
A system encrypting a million files does not use one key for all of them, and it does not put a million keys in the HSM. It uses a hierarchy.
The pattern is called envelope encryption. One master key, the Key Encryption Key (KEK), lives in the HSM or KMS and never leaves. It is used only to encrypt other keys. The keys it encrypts are called Data Encryption Keys (DEKs), and the DEKs are the ones that actually encrypt user data.
The payoff is enormous:
- Rotation is cheap. To rotate, you re-wrap each DEK under the new KEK. The actual user data is never re-encrypted.
- Scale is unbounded. The HSM only ever sees the small KEK and tiny DEK wrap or unwrap calls. The DEKs do the heavy lifting elsewhere.
- Blast radius is limited. A leaked DEK exposes only the data it protected, not the whole system.
- Compliance is auditable. The HSM logs every KEK operation. That log is the audit trail.
AWS S3 server-side encryption, Google Drive, Apple FileVault, and almost every modern database encryption feature use this pattern. When AWS says "encryption is on by default," what they mean is that a KEK in KMS is wrapping per-object DEKs.
Rotation In Practice
Rotating a key means generating a new one and shifting future operations to it. The old key sticks around long enough to decrypt anything it previously encrypted, then is destroyed.
Why rotate at all?
- Limit blast radius. If a key leaks, only the data encrypted during that key's active window is exposed.
- Limit ciphertext volume per key. AES-GCM, for example, has a hard limit of about 232 messages per key before nonce-reuse becomes statistically likely.
- Compliance. PCI DSS requires at least annual rotation of cryptographic keys, and immediate rotation on suspected compromise.
- Personnel change. When someone with access to a key leaves the team, the key rotates.
Modern KMS systems automate this. AWS KMS supports automatic annual rotation of customer master keys. The application code does not need to know the rotation happened; the KMS keeps both versions and uses the appropriate one based on the encryption context tag baked into the ciphertext.
Where Keys Actually Live
The abstract lifecycle becomes concrete in a few well-known places. Each of these is worth being able to picture when you read the term.
| System | Where the key lives | How it gets there |
|---|---|---|
| TLS 1.3 session | Ephemeral. In RAM only, for the duration of the connection. | Derived from an ECDHE handshake; thrown away when the connection closes (forward secrecy). |
| BitLocker / LUKS | Sealed to the TPM on the local machine. | Released to the OS only when the boot measurements match the expected values. |
| iMessage / Signal | Per-device keys in the Secure Enclave; per-message keys derived via the Double Ratchet. | Generated on device enrollment. The server never sees them. |
| AWS S3 SSE-KMS | Per-object DEK encrypted under a KEK in AWS KMS. | Generated at upload time; the wrapped DEK is stored next to the object. |
| SSH server host key | On disk in /etc/ssh/, owned by root, mode 0600. | Generated once at install time. Rotated only on compromise or migration. |
| Application secrets | HashiCorp Vault or AWS Secrets Manager. | Fetched at startup over an authenticated channel; kept in memory only. |
Notice what is missing from every row of that table: source code, environment files, and Git history. If you can name the file path on a developer's laptop where a production key lives, the key is in the wrong place.