Defaults are the only security control that scales

Richard Thaler and Cass Sunstein, in Nudge (2008), built an entire policy framework on one observation: defaults are sticky. At the firm studied by Madrian and Shea (2001), 401(k) participation was 37.4% under opt-in enrollment and 85.9% under opt-out. The underlying population did not change. The default did. Thaler’s broader body of work, which later earned him the 2017 Nobel in economics and extended the loss-aversion research Daniel Kahneman and Amos Tversky pioneered in the late 1970s, shows that the same mechanism of inertia, friction, and a slight preference for not making a choice quietly dominates behavior across domains that range from organ donation to insurance plan selection to every piece of infrastructure your engineers interact with every day.

xychart-beta
    title "401(k) participation by enrollment default"
    x-axis ["Opt-in default", "Opt-out default"]
    y-axis "Participation rate (%)" 0 --> 100
    bar [37, 86]

This is the most underappreciated fact in platform security. The paved road works because it is the default, not because it is better. Every friction you add to the paved road (one more opt-in flag, one more “enterprise” tier, one more manual approval) is a tax that routes traffic off it. The platform team’s real product is not the platform. It is the default.

Why “mandatory security” is the wrong framing

Most security programs try to produce behavior change through a combination of mandate and education. The mandate says “you must use X.” The education says “here is why X is important.” This is the opt-in 401(k) model: it assumes engineers, when presented with a choice, will rationally evaluate the tradeoff and comply with the policy.

This does not work at scale, and the reason it does not work at scale is behavioral, not cultural. An engineer under deadline pressure does not run a policy-compliance analysis before every technical decision. They reach for whatever is closest, fastest, and easiest to reason about. If the paved road has three extra steps, they take the unpaved one. If the unpaved one has a tripwire that produces a ticket for them in two weeks, they get to the ticket after the feature ships.

This is not a failure of engineering discipline. It is an entirely predictable consequence of how humans navigate environments under cognitive load. Thaler and Sunstein’s core insight in Nudge is that you cannot beat this by asking people to try harder. You have to change what happens when they do nothing.

Microsoft is the standing illustration of this at scale. They have spent more than a decade telling enterprise customers to enable MFA: awareness campaigns, prescriptive guidance, the 99% account-takeover-blocked statistic reprinted in every product blog, Security Defaults shipped for new tenants in October 2019, Conditional Access nudges, and a long roadmap of partial nudges. The 2022 Cyber Signals report disclosed that only 22% of Azure AD enterprise customers used any MFA solution as of 2021. Two and a half years later, the 2024 Digital Defense Report reported broader MFA adoption had risen to 41%.

xychart-beta
    title "Microsoft enterprise MFA adoption"
    x-axis ["2021 Azure AD customers", "2024 reported adoption"]
    y-axis "MFA adoption (%)" 0 --> 100
    bar [22, 41]

The comparison is directional rather than a clean same-cohort measurement, but the shape is still the point: years of awareness and partial defaults left most accounts on the unsafe default. That is the structural ceiling on voluntary security at scale, even when the vendor running the awareness campaign owns the platform. Microsoft is now mandating MFA for admin and Azure resource-management operations, which is what flipping the default actually looks like once you accept that the awareness campaign was never going to finish the job.

Every checkbox on the paved road is a tax

The specific pattern I see most often in platform security is death by well-intentioned checkbox. The platform team ships the secure-by-default deployment pipeline. It is excellent. It handles secret rotation, identity propagation, network egress restrictions, SBOM generation, audit logging, all the things.

Then the audit team asks for a “data classification tag” field. Then the compliance team asks for a “business owner” field. Then the FinOps team asks for a “cost center” field. Then the platform team adds an “optional” performance tier selector. Each addition is individually reasonable. Each takes ten seconds to fill out. Each is a small tax.

At some threshold (and the threshold is lower than most platform teams think), engineers start routing around the pipeline. They spin up services on a team-owned Kubernetes cluster “just for this POC.” The POC ships to production. The platform’s coverage rate, which is the first metric that matters, quietly drops by tens of percentage points over a year or two. Nobody noticed the change, because nobody was tracking coverage, only adoption of each new feature.

The inverse of the same mechanism just played out in public on GitHub. In May 2022, only 16.5% of active GitHub users had 2FA enabled, the slice of users who ranked account security high enough to manually go set up TOTP. GitHub then flipped the default for code contributors: 2FA went from “available” to “required for continued code contribution.” By the end of 2023, GitHub reported a 95% opt-in rate across the contributors who received the requirement.

xychart-beta
    title "GitHub 2FA enrollment"
    x-axis ["Active users, May 2022", "Required contributors, 2023"]
    y-axis "2FA enrollment (%)" 0 --> 100
    bar [16.5, 95]

Those are not identical denominators: the 16.5% figure is a broad active-user baseline, while the 95% figure is the contributor cohort that received the requirement. The useful comparison is the mechanism, not a clean before-and-after population study. GitHub made not enrolling more expensive than enrolling by tying continued code contribution to it. The 16.5% on the left is what voluntary security adoption looked like at GitHub scale. The 95% on the right is what happened inside the cohort where the platform made the secure path the path of least resistance. Madrian and Shea found this in 401(k)s in 2001. Microsoft is finding it for MFA in 2024. GitHub completed the first large rollout for 2FA in 2023. You are about to re-discover it for whatever your next “engineering teams should…” policy proposes.

Coverage is the first metric that matters for a security-oriented platform. Adoption numbers are vanity unless they tell you how much of the real production surface is actually flowing through the control. If your coverage is 70%, your security story is the 30%, not the 70%. And coverage falls the moment the friction on the paved road exceeds the friction of going around it.

Insight

Coverage erosion is the silent failure mode

No single team’s decision to spin up a side-cluster is visible. The aggregate effect, a slow decline in the percentage of production workloads that actually flow through your security controls, is invisible until you measure it on purpose. This is why platform teams that only track adoption of their newest feature are structurally blind to their most important metric.

The paved road as a nudge architecture

The mental model that works is to treat the paved road as nudge architecture, not as enforcement architecture. The platform’s job is to make the secure path the path of least resistance, and the insecure paths either impossible or obviously more expensive to travel. Thaler and Sunstein call this “choice architecture”: the design of the environment in which decisions are made, as distinct from the content of the decisions themselves. In platform terms, it means:

The default path must require fewer decisions than the alternatives. Every decision is a fork where the engineer can choose the unpaved road. If the paved road is “run this one command,” the alternative needs to be at least “find the internal wiki page, set up your own IAM role, write your own terraform module.” If the paved road is “fill out this 20-field form and wait for three approvals,” the alternative is “use the team’s AWS account,” and the alternative wins.

Security controls belong in the substrate, not in the UX. The engineer should not have to become a security decision-maker for every deploy. A correctly designed platform presents a product-shaped UX and handles secret rotation, network segmentation, egress policy, identity, and audit invisibly. The zero trust without automation essay makes the same point from a different angle: any security control that requires a human in the loop for routine operation is already losing to the humans who route around it.

New controls must be added without new UX. The single most destructive move a platform team can make is to add a new required field every time a new compliance requirement appears. The correct move is to derive the field from existing context (the service name, the repo, the team membership) or to make a sane default with an opt-out rather than an opt-in. The SBOM example is canonical: shipping an SBOM with every deploy is a default. Asking each team to “opt into SBOM generation” is a broken UX that half the teams will skip.

Insight

The platform’s customer is the median engineer under deadline

Not the enthusiastic early adopter who loves your new feature. Not the staff engineer on the security team who will do whatever you ask. The median engineer under deadline pressure. Design the paved road for that persona and your coverage number goes up. Design it for anyone else and your coverage number goes down.

Why this is especially important for AI-era infrastructure

The point generalizes, and it generalizes with unusual force to the current wave of AI-adjacent infrastructure. Every team is spinning up LLM integrations, RAG pipelines, agentic tools. The attack surface is new (see prompt injection, webMCP attack surfaces, and threat modeling an AI agent with shell access). The controls are not.

Here is the version that tends to fail: security team publishes an “AI usage policy.” Engineering teams read it, mostly agree, and then ship anyway, because the policy is a document and the deadline is a deliverable.

Here is the version that tends to work: platform team ships a paved-road SDK for LLM calls. The SDK handles egress policy, prompt logging, PII redaction, tool-call auditing, and model selection. It is one import. It is faster to use than reaching for the vendor SDK directly, because it is pre-wired into the organization’s auth and logging infrastructure. Engineers use it because it is easier, not because they were told to. The policy becomes the default. The default becomes the behavior.

The same shape is starting to play out, badly, with MCP. The failure mode is easy to see: every team that wants to expose internal data to an agent stands up its own MCP server, which means N different auth models, N different tool-call audit gaps, N different egress postures, and N different opportunities for a prompt injection to reach a privileged tool. That is the unpaved road, and in many organizations it is the first road teams discover. The paved-road answer is a default-on MCP gateway: a single ingress point that handles identity propagation from the calling user to the downstream tool, records every tool call against the existing audit infrastructure, enforces egress policy on tool inputs and outputs, and exposes a registration UX that is faster than rolling your own server. Done well, the team that wanted to expose a tool fills out one form and gets the gateway for free; the team that wants to roll their own has to justify why. That asymmetry is the entire control. Without it, the MCP layer becomes the next shadow-IT substrate, and the coverage number on agentic traffic never reaches the level the security program thinks it has.

The hard part

The hard part, and the reason most platform teams do not ship paved roads that scale, is that “making the secure path easier than the alternative” requires the platform team to out-compete every engineer’s natural inclination to roll their own. That is a product design problem, and most platform teams are staffed as infrastructure teams. The skillset overlap is partial. The engineers who can out-design an engineer-under-deadline are a scarce hiring pool.

This is also why the productizing infrastructure frame matters so much. A platform that treats its users like customers, with explicit contracts and support rotations and actual UX investment, can produce defaults that hold. A platform that treats its users like colleagues-who-should-read-the-wiki cannot. The policy layer and the default layer have to be the same layer, and that only happens when platform engineering is organized as product engineering with infrastructure as the domain.

The short version

Security at scale is a property of defaults. Policy is weak; education is weak; mandate is weak; adoption curves of new features are vanity. Coverage of the paved road is the first real metric, and it is set almost entirely by whether the paved road is the path of least resistance for the median engineer under deadline.

The platform team’s actual product is the default. Every control that lives in a checkbox rather than in the substrate is a tax, and the tax drives traffic off the road. Ship controls in the substrate. Track coverage, not adoption. Design for the median engineer. The organizations that have internalized this have the best-looking security programs I have seen. The organizations that have not are producing heroic policy documents and coverage numbers they are not measuring.