Definition
Integrity is the property that information has not been modified except by parties authorized to modify it, and that any unauthorized modification can be detected.
The definition has two halves that are often conflated. The first half is prevention: stop unauthorized writes from happening. The second half is detection: when prevention fails, notice that the data has changed. Both halves matter because prevention is never perfect, and a control that does not detect its own failures is worse than no control at all.
Integrity also covers accidental change, not just malicious change. A bit flip in storage, a buggy script, a truncated network transfer, a human typing the wrong account number, all of these threaten integrity. The defensive techniques in this page were originally developed to detect transmission errors and were later adopted to detect intentional tampering.
Two Flavors: Data Integrity and System Integrity
It is useful to separate two related concepts that share the word integrity.
Data integrity applies to specific information: a file, a database row, a message. It answers the question "is this data the same as when it was authored?"
System integrity applies to the platform that handles the data: an operating system, an application, a configuration. It answers the question "is this system behaving as its designers intended?" A compromised operating system can produce correct-looking output from corrupted internal state, which is why system integrity is often a prerequisite for trusting data integrity.
The SolarWinds case study at the end of this page is fundamentally a system integrity failure that was used to deliver downstream data integrity failures. Most large breaches involve both flavors.
Mechanisms
The integrity toolkit divides into mathematical mechanisms (which detect change) and procedural mechanisms (which constrain who can change what).
Cryptographic hashing produces a fixed-size fingerprint of arbitrary input. Any change to the input produces an almost completely different output. If you publish a hash alongside a file, anyone can recompute the hash and confirm the file is unchanged. SHA-256 is the modern default. Hashes are covered in depth on the Hashing track.
Message authentication codes (MACs) extend hashing with a shared secret key, so that only parties who know the key can produce a valid MAC for a given message. A hash alone proves the data is intact; a MAC proves the data is intact and was sent by someone with the key. HMAC-SHA-256 is the workhorse construction.
Digital signatures use asymmetric cryptography to produce a tag that anyone with the public key can verify, but only the holder of the private key could have produced. Signatures give you integrity, authenticity, and non-repudiation in one mechanism. They are how code signing, certificate authorities, and signed Git commits all work.
Version control and audit logs attack integrity from a different angle. Instead of trying to detect change, they record every change with a timestamp, an author, and a reason. The data is allowed to change; what cannot change is the history of who changed it. Git is the canonical example.
Separation of duties is the procedural counterpart. No single individual should be able to both make a change and approve it. Two-person rule for high-impact production changes, code review before merge, and four-eyes principle for financial transactions all fall in this category. The control works by forcing collusion: a single insider cannot compromise integrity alone.
Write-once and immutable storage physically or logically prevents modification of records after they are written. WORM optical media, append-only logs, S3 Object Lock, and immutable backups all serve this purpose. The integrity guarantee is absolute within the storage layer, which is why immutable backups are the single most effective defense against ransomware encrypting backup copies.
Failure Modes
- Tampering in transit. A message is altered between sender and receiver. TLS and IPsec defend against this with integrity-protected envelopes; raw HTTP and unencrypted protocols do not.
- Tampering at rest. Stored data is modified after the fact. Defenses include filesystem-level integrity (dm-verity, FileVault), database transaction logs, and periodic hash verification of critical files.
- Unauthorized modification by authorized users. A user with write access changes data outside their permitted scope. The 2014 Sony Pictures breach included the deletion and corruption of internal records by attackers using stolen administrator credentials.
- Accidental corruption. A buggy migration script overwrites a column. A backup restore is applied to the wrong environment. These failures are integrity failures even when no attacker is involved, and they are usually more frequent than malicious ones.
- Repudiation. A user denies having performed an action they actually performed. Strong authentication plus signed audit logs prevent this; weak authentication and editable logs invite it.
- Supply-chain tampering. Code or hardware is modified before it reaches the customer. The SolarWinds case study at the end of this page is the textbook example.
Case Study: SolarWinds Orion, 2020
The SolarWinds Orion compromise is the modern case study for integrity failure because the attackers did not steal data directly. They corrupted the software supply chain of a network monitoring product used by tens of thousands of organizations, and then waited.
The Orion platform was a network management tool installed inside customer environments with broad visibility and trusted credentials. Beginning around September 2019, attackers (later attributed to Russian foreign intelligence) gained access to SolarWinds' software build system. They inserted a malicious component, later named SUNBURST, into the source code path of Orion in a way that did not appear in version control but did appear in compiled releases. The malicious build was then signed with SolarWinds' legitimate code-signing certificate and distributed through the normal update channel.
Between March and June 2020, approximately 18,000 customers installed the compromised update. SUNBURST waited two weeks after installation, then phoned home to a command-and-control server. For a small subset of high-value targets (including U.S. federal agencies, FireEye, and Microsoft), the attackers used SUNBURST as a foothold to deploy additional tooling and conduct extended espionage.
What made the attack succeed was every standard integrity check passing:
- The download came from the official SolarWinds update server.
- The binary was signed with SolarWinds' valid certificate.
- The hash matched the hash published by SolarWinds.
- The vendor's own build pipeline produced the binary, so to any reasonable observer it was an authentic SolarWinds release.
The integrity failure was upstream of every defensive check that customers were doing. The vendor's build system was compromised, and downstream every signature and hash was, by any technical definition, valid. The defense is not better hash verification. The defense is reproducible builds, software bills of materials, build-system isolation, and behavioral detection on installed software that does not assume signed code is safe.
Integrity controls based on cryptographic verification only protect against tampering after a known-good reference point. If the reference point is itself compromised, every downstream check confirms the wrong answer with mathematical certainty. The chain of trust is only as strong as its origin.
The Hard Question
Trust is transitive in a way that integrity assumptions usually are not. When you install software, you are trusting not just the vendor, but every dependency the vendor included, every developer with commit access to those dependencies, every build server that compiled them, and every distribution path between you and the binary. A modern software product can easily depend on a thousand upstream packages from hundreds of maintainers.
SolarWinds was one company being compromised; the open-source ecosystem has dozens of similar compromises every year (event-stream in 2018, the xz-utils backdoor in 2024). Asking "did this file change?" is a question with a clear answer. Asking "should I trust this file in the first place?" is not. The next decade of integrity work is largely about that second question.