Definition
Confidentiality is the property that information is disclosed only to parties authorized to see it.
Two parts of that definition matter. The first is disclosure, which means any path by which data reaches an unauthorized observer: a stolen file, a glance over a shoulder, an intercepted packet, a leaked screenshot. The second is authorization, which is a policy decision separate from any technical mechanism. A system can be perfectly secure against attackers and still violate confidentiality if it discloses data to a user who should not have access under the organization's rules.
Confidentiality is therefore a relationship between three things: an asset, a population of viewers, and a policy that distinguishes the authorized from the unauthorized. The mechanisms in the rest of this page are how we enforce that policy.
Classifications and Levels
Most organizations of any size assign data to classification levels. The levels are an organizing tool: they let policy authors talk about whole categories of information instead of every individual file, and they let security controls be applied uniformly within a level.
| Level | Typical examples | Disclosure impact |
|---|---|---|
| Public | Marketing collateral, published research, press releases | None. Intended for unrestricted release. |
| Internal | Org charts, internal procedures, non-sensitive emails | Embarrassment, competitive disadvantage. |
| Confidential | HR records, customer PII, business plans | Regulatory penalty, customer trust loss, financial harm. |
| Restricted / Secret | Trade secrets, source code, security keys, M&A plans | Existential. Disclosure could threaten the organization. |
U.S. government classification uses a parallel four-level system (Unclassified, Confidential, Secret, Top Secret) with additional compartments and caveats. Commercial classifications vary by industry. The labels are less important than the distinction: not all data deserves the same protection, and treating it that way wastes money on the low end and underprotects on the high end.
Mechanisms
The toolkit for enforcing confidentiality is the most mature of the three pillars. The core mechanisms divide into four families.
Encryption transforms data into ciphertext readable only with the correct key. Encryption is applied at rest (full-disk encryption, encrypted databases, encrypted backups), in transit (TLS, IPsec, SSH, WireGuard), and in use (confidential computing, homomorphic encryption, secure enclaves). The full treatment of encryption is in the Cryptography module.
Access controls decide who is allowed to read what. The two dominant models are discretionary access control (DAC), where the data owner sets the permissions, and role-based access control (RBAC), where permissions are attached to organizational roles. Mandatory access control (MAC) and attribute-based access control (ABAC) are more rigid alternatives. All access control systems share a core idea: the right answer to "may this subject read this object?" is decided by a policy, not by the subject.
Data masking and minimization reduce confidentiality risk by removing or obscuring sensitive data when it is not needed. A test environment that uses synthetic data instead of production records cannot leak production records. A customer service screen that shows only the last four digits of a credit card cannot disclose the rest.
Physical controls protect confidentiality against direct observation and theft. Locked cabinets, badged-only zones, privacy screens, clean-desk policies, and shredders fall into this category. Physical confidentiality is often forgotten in a network-focused training, but a tailgater with a phone camera bypasses every cryptographic control on the laptop they walk past.
Failure Modes
Confidentiality fails when an unauthorized party obtains access to data. The path matters less than the outcome, but in incident response the path determines the lessons learned.
- Eavesdropping. Unencrypted traffic captured on a network. A 2014 study of unsecured WiFi networks at airports found that more than a third of observed sessions transmitted credentials in cleartext.
- Misconfiguration. The single most common cloud-era cause. Open S3 buckets, public databases, exposed dashboards, default credentials. The 2017 Capital One breach exploited a server-side request forgery against a misconfigured firewall to exfiltrate 100 million records.
- Credential compromise. Stolen, phished, or reused passwords used to log in as a legitimate user. The system did exactly what it was told to do; the policy was correct; the authentication was the weak link.
- Insider misuse. Authorized access used for unauthorized purposes. A DBA who reads sensitive records out of curiosity, or copies them on the way out the door.
- Physical loss. Lost laptops, stolen backup tapes, photographed screens. The 2006 Veterans Affairs breach disclosed records on 26.5 million veterans because an analyst took home an unencrypted laptop that was then stolen.
Case Study: Equifax 2017
The Equifax breach of 2017 is the standard case study for confidentiality failure at scale, because nearly every defensive control that should have stopped it was either missing or broken.
The vulnerability was CVE-2017-5638, a remote code execution flaw in Apache Struts disclosed in March 2017. A patch was released the same day. Equifax did not apply it. In May, attackers exploited the unpatched Struts instance in a consumer-facing web portal, gained code execution on the server, and discovered they could pivot to other internal systems.
What turned a server compromise into a breach of 147 million consumer records was a chain of confidentiality failures inside the perimeter:
- Network segmentation between the public portal and internal data stores was insufficient. Once the attackers were on one server, they could reach databases on many others.
- Credentials for those databases were stored in plaintext on the compromised server. The attackers used legitimate logins to query for data.
- Encryption of data at rest on those databases had been allowed to expire, because a TLS certificate used to decrypt the data for monitoring tools had not been renewed for ten months. The monitoring tool that would have detected mass exfiltration was therefore blind.
- The attackers exfiltrated data over 76 days before detection. Outbound traffic anomaly detection was either absent or not tuned to catch the pattern.
The disclosed records included Social Security numbers, birth dates, addresses, and in some cases driver's license numbers and credit card numbers. The cost to Equifax in fines, settlements, and remediation exceeded 1.4 billion dollars. The lesson is not that any single control failed. The lesson is that confidentiality at scale requires layered controls, and when many layers are weak, an attacker who gets past one of them gets past all of them.
Patching, segmentation, secrets management, encryption at rest, and exfiltration detection are not redundant. Each one catches a different attacker mistake. The breaches that make headlines are usually the ones where most of those controls were absent at the same time.
The Hard Question
Confidentiality has a counterintuitive trap. The most confidential data is the data you do not have. Every record you collect, every backup you keep, every log you retain becomes an asset to defend and a liability if disclosed.
The GDPR encoded this insight as the data minimization principle: collect only what you need, keep it only as long as you need it, and delete it when you do not. Organizations that resist the principle on the grounds that "we might need this data later" are accumulating obligations they have not budgeted for. The 2017 Equifax breach disclosed records on people who had never been Equifax customers and had no way to opt out of being in the database. They were collateral damage to a business model that assumed data was always an asset.
When you propose a security control for confidentiality, also ask whether the data exists at all. The cheapest way to protect a record is to not create it.