Skip to content

15. When AI Enters the Organization: From Tool to Capability Foundation

A team has been using AI coding for two years. Sit them down to inventory their own assets and a slightly unsettling thing surfaces. They have far more assets in hand than the traditional era ever produced. The fraction that is being used, still applicable, and known how to use is much smaller.

If you read this only as governance hasn't caught up, you are underestimating the problem. Pull the lens back. How have organizations accumulated cognition for the last few hundred years? Experience lives in people's heads, and walks out with them when they leave. That is what an organization, as a learner, has fundamentally looked like. AI is the first time this has had a chance of changing. What this chapter is about: when AI enters the organization, the cognition of the organization itself has a chance to move from an implicit asset attached to people into an explicit asset that can be engineered, versioned, and continuously iterated.

That is not free. To make it real, the organization first has to handle four things — assets, metrics, the middle layer, and accountability.

15.1 Hundreds of Skills, and No One Remembers Which Ones Still Work

Start with a concrete scene.

A team has been using AI coding for two years. Sit them down to inventory their own assets. There are a few things on the desk: hundreds of Skills, covering the recurring logic that keeps showing up in this codebase; a not-small knowledge base holding architecture docs, post-incident reviews, API how-tos, migration guides.

By the numbers, that is real accumulation. Compared to the empty working directory of two years ago when they first opened Cursor, this is another world.

Now ask a few specific questions. Of those few hundred Skills, how many were called this month? How many were written for one project that ended last year, and that nobody has touched since? Inside them, how many call internal APIs that have already been deprecated, reference modules that have been renamed, encode a code style from two versions back? Of the post-mortems in the knowledge base, how many describe an architecture that has already been refactored away — how many still reflect what the team is actually worried about today?

Nobody on the team can give you a clean answer. Someone will say I think most of them are still being used — and the uncertainty inside that I think is larger than what was said.

This is very different from traditional assets. Traditional code assets, when they go stale, you can feel it. An old function calls a deleted endpoint, the build fails. An old script depends on an environment variable that no longer exists, it does not run. A stale config file gets loaded, the service refuses to start. Traditional assets advertise their state in a blunt way: it runs, or it does not. The middle band is narrow.

These new assets in the AI era have no such mechanism. A stale Skill does not throw. It still gets loaded, still gets called by the agent — only the code it produces references an API that should not be used anymore, follows a style the project abandoned, and silently misses several constraints that were added in the last six months. The output looks fine. It compiles. It runs. It even passes tests. But it no longer represents what the team currently believes.

That is the most distinctive failure mode of organizational assets in the AI era — silent rot. Unlike code, when it breaks, nothing fires. It just gradually drifts further from reality.

One step further: is this something a particular team failed to govern, or does it happen structurally? If it is the latter, the old asset-management playbook is not just insufficient — it is wrong at the root.

15.2 Silent Rot Is Goodhart Playing Out at a Different Scale

Sit with the question and the conclusion is unavoidable: this is not the result of a lazy team. It is what the mechanism guarantees.

Earlier we walked through Goodhart at the judgment-layer scale: once a metric is used to manage some output, the output starts being shaped specifically to clear that metric, and the metric itself slowly decouples from what it was supposed to measure. At the code level, that shows up as test cases written specifically to pass CI, evaluation sets used to train a model that specifically passes them — the judgment layer is being judged, while the thing it judges quietly reshapes it.

That same mainline plays out at the asset layer, with the participants swapped from code to assets. Skills, prompts, specs, knowledge bases — every category of asset was built at a particular moment to serve a particular goal. This Skill exists to deposit the experience from that refactor. This Skill exists to keep the agent from walking down that specific dead end on this project. At the moment of creation, the asset and the goal it served were aligned.

Goals move. Projects end. APIs evolve. The team's read on what counts as good code shifts as the business shifts. The goal the asset was serving is moving. The asset is not moving with it. It stays at the moment it was created and slowly diverges from the goal it was meant to serve. The divergence happens quietly. Each day's drift is small. There is no day on which a clear this Skill is now invalid signal goes off. By the time the drift is large enough to notice, months — sometimes a quarter or two — have passed.

This is the same mechanism as the judgment-layer corrosion from earlier in the book, playing out at a different scale. There the scale was a single PR, a single test case, a single eval set; the time constant was weeks. Here the scale is the organization's entire pile of assets, and the time constant is quarters or years. Same mechanism. Different scope.

What do you do about it?

The intuitive answer is to build a process: regular Skill-library cleanups, version every asset, write an asset lifecycle policy. None of those moves is wrong. None of them touches the core. The genuinely counter-intuitive judgment is this: the heart of organizational governance in the AI era is not just standing new assets up; it is letting old assets retire gracefully.

That cuts against traditional organizational behavior. The traditional story is accumulate, deposit, pass it on; treating reduction of assets as a loss. The AI era flips it. Standing up new assets is not the hard part. Letting old assets leave is. A stale spec sits there because nobody dares to delete it — what if someone needs it later. An old Skill is never retired because it still runs, doesn't it. Every asset that should be gone has its own specific reason for staying. The reasons add up, and the asset layer of the organization gets murkier every year.

Letting assets retire gracefully does not need a lifecycle document. It needs the organization to admit something at the cultural level: the value of an asset is not in how long it has existed; it is in how well it fits the present. A spec that has lived for three years is not necessarily worth more than one written last week. Old is not stable; more often, old is stale.

These assets differ from traditional code assets in another way. They are not neutral. A Skill, a long prompt, a spec — beyond the task description, it carries the author's judgment about this project, about this category of problem, about how edge cases should be handled. The same requirement, a Skill written by Person A and a Skill written by Person B, generate different code. The code the agent produces while obeying that asset is being written with the original author's judgment baked in. That is not bad in itself. The direction governance has to go is to use shared templates, joint maintenance, and regular review to grind individual judgment into team judgment — to de-personalize the assets.

That covers assets. The next layer is metrics. The two look like different problems. The mechanism behind them is the same — Goodhart playing out at different scales.

Goodhart at the metric layer is faster and more visible than at the asset layer. Every organization that starts using AI wants to measure how well it is being used. The usual metrics are easy to list: human-takeover rate, code generation volume, AI-usage rate, PR pass rate, average iteration time. Take those and use them to manage the team, and within days the behavior starts to deform.

Set takeover rate must be low, and the team quickly notices that taking over the agent's output makes the number worse. Their next move is to take over less, even when the output is obviously wrong; letting a wrong output through has lower cost than leaving a takeover record. Set code generation volume must be high, and the team has the agent produce more — what could be one function gets written as three. The number looks good; the codebase puts on weight. Set AI-usage rate must be high, and the team will force AI into places that did not need it — a three-line fix now requires opening Cursor, writing a prompt, waiting for the agent, reviewing the output. Usage rate goes up; actual delivery tempo goes down.

What these three deformations share is that none of them is anyone deliberately gaming the numbers. A metric has gravity. Once defined, the action being measured leans toward the metric. The lean does not need anyone to push it.

What organizational governance is acting on in the AI era — assets, metrics — is not a neutral object. It pushes back. It reshapes the organization that defined it.

15.3 Half the Case for Middle Management Has Collapsed

Middle management looks like a layer organizations naturally have. It was built — and it has physical premises. To see what happened to those premises in the AI era, you have to step back and look at how the layer originally got constructed.

Two thousand years ago the Roman legion was already solving this problem: how do you coordinate a thousand-plus people into a coherent action far from the battlefield. Their answer was nested hierarchy. Eight soldiers shared a tent and a mule, run by an officer. Ten of those officers reported to a centurion running a hundred-person unit. Six of those units made a cohort. Ten cohorts made a five-thousand-strong legion. Eight to eighty, eighty to four hundred and eighty, four hundred and eighty to five thousand — every layer with one explicit commander, aggregating information up, distributing decisions down. Today we call this span of control. The reason is that the number of people any one person can directly run is somewhere between three and eight; outside that range, they can no longer process the information coming up at them.

The next major step in organizational evolution after the legion was the Prussian General Staff. After the disaster at Jena in 1806, when Napoleon dismantled the Prussian army, reformers like Scharnhorst and Gneisenau, in rebuilding it, admitted something: you cannot count on the supreme commander being a genius every time; you need a system. They built the General Staff — a body of specifically trained officers responsible for planning, information processing, and cross-unit coordination. That is the first time the middle-management role was systematically constructed. The work of those people was not to fight on the front line; it was to do information routing in the middle layer.

The role then seeped from armies into railroads, from railroads into large corporations, was tuned by Taylor's scientific management, refined by McKinsey's matrix structures, and walked all the way into modern enterprises. The forms got more and more elaborate, but the underlying function did not move: the essence of middle management is information routing. Aggregating what is happening below, distributing intent from above, coordinating across, aligning tempo. Those four are the day job of middle management.

This rule has very specific physical premises. Once an organization grows past a few dozen people, the top can no longer directly track what is happening in every corner; the bottom can no longer directly know where the organization is heading; the sides cannot align by every individual talking to every other individual. A layer in the middle is required, aggregating up, distributing down, aligning sideways, so the organization as a whole can move. For two thousand years, that layer could only be staffed by humans — because nothing else could do it.

The AI era is the first time that has had a chance of changing.

This is not the cheap version of the claim — some AI tool will replace middle managers. It is the larger background. For two thousand years, coordination work has had no non-human option. AI is the first non-human coordination mechanism to appear in two thousand years. When a system can continuously maintain the global picture of what is the organization doing, route decisions from the top down to the bottom, and align states across the sides, then the layer that used to only be doable by humans has, for the first time, become partly absorb-able.

The middle-management role carries two kinds of work. One is information routing: weekly status rollups, transmitting decisions, scheduling alignment meetings, cleaning up state. The features of this work are a clear input/output, coverable by rules and templates, increasingly within reach of AI. Half of the case is collapsing.

The other is judgment backstop. Calls that have to be made on incomplete information. Trade-offs across multiple stakeholders. Things where the data all looks fine but your gut says no. Pushes that need physical presence and emotional labor to land. Those judgments are not knowledge problems; they are a mix of experience, instinct, and accountability. AI cannot do them on any short horizon. That half of the case still holds.

Put it together: middle management in the AI era is not vanishing. It is being thinned. The concrete shape of thinned is that a middle manager spends a larger fraction of their week on judgment and backstop. Want to know whether your organization's middle layer will get eaten by AI? Look at how a middle manager actually spends a week. How much is information shuttling? How much is judgment and decision? The higher the ratio of the first, the faster they get thinned.

Earlier we said the direction of Conway's Law has flipped — that was a team-level claim, time constant in months. This one is at the organizational level, time constant in years. As AI compresses coordination cost, the structures the organization built in order to bear high coordination cost get re-examined layer by layer. Designs that have existed for two thousand years specifically to support coordination need, for the first time, to re-justify themselves.

This is not happening someday. It is happening. Some remote-first, engineering-heavy companies have already pushed flattening surprisingly far; the share of middle management is markedly lower than in traditional firms.

15.4 Accountability Governance Has Stopped Working

Middle management thinned. Assets carrying personal judgment. Metrics being eaten by themselves. Layered together, those changes hit a problem nobody in the organization can dodge: when something breaks, who is on the hook?

At the execution level, AI-written code still gets merged by a specific person. git blame still resolves to that person. The PR has a reviewer; the merge has a presser. The records are there. The committer is responsible for the code they commit has not become invalid, and is not going to. A piece of agent-generated code blowing up is, in terms of accountability, no different from a piece of code copied from Stack Overflow blowing up.

The form holds. Its effectiveness as a governance instrument is dropping.

In the traditional era, the committer had a roughly clear sense of every line they committed: I wrote this, this is why, this is what it ripples into. They could explain it. The AI era is different. A single piece of code is the joint output of a Skill, a spec, a specific model version, and several iterations of prompt. Before pressing merge, the committer can read the output, run the tests, do a review — but they cannot achieve the I wrote it myself level of grip on every line that they used to have. They carry the responsibility, but their command over the artifact is weaker than before.

The side effect of find the committer when it breaks starts to surface. In the short term, every incident finds a name; it looks tidy. In the long term, every role in the chain becomes over-defensive. Skill authors stop writing anything specific. Reviewers refuse to approve anything that looks remotely risky. Agent users attach long disclaimers to every output. Mergers ask for one more review before pressing the button. Each role is protecting itself; the organization as a whole stops moving. This is not a moral failing of any individual. It is the structurally rational response. When a person is held fully accountable for an artifact they cannot fully control, the only self-protective move available to them is to produce less.

Once you accept that, the question changes from how do we assign responsibility more cleanly to how does the organization keep moving in a reality where accountability does not bite cleanly. Organizations that admit accountability has stopped working go a different route. When a piece of code breaks, the primary action is not to find the most-on-the-hook person. It is to look at which layer of backstop failed to catch this. CI did not block it — fix CI. Review did not see it — re-tier review. Monitoring did not warn — extend the observability layer. Every backstop layer has its own owner; what those owners carry is not take the blame next time but make this layer catch it next time. The committer is still the named owner of the specific incident, but the way they are processed is not a performance ding or a mea-culpa; they sit down with the backstop owners and post-mortem which layer can be hardened.

When every incident gets metabolized as which backstop to harden, instead of which person to mark, the organization starts genuinely growing through incidents. Accountability still attaches to a specific person. The center of gravity of governance has moved from accountability to backstop. The first is settling a single incident. The second is maintaining a long-term mechanism. The second is harder than the first. It is also more important.

15.5 The Organization as a System That Can Learn

Assets rot. Metrics get eaten by Goodhart. Middle management is thinned. Accountability governance is failing. Lined up, those four point at the same thing: what's changing in the AI era is not the rules of organizational governance — it is the nature of the object being governed.

How did the traditional organization accumulate cognition? Through people. A senior employee who has held the same role for ten years carries, in their head, the kind of experience the organization actually runs on — this kind of bug usually shows up here, this kind of customer needs to be paced like that, this kind of architecture choice will hurt three years from now. The way the organization retained that experience was by retaining the person, with documentation, training, and apprenticeship as side channels. That mechanism has a few persistent weaknesses. Slow — turning a new hire into someone with that experience takes years. Lossy — even the most diligent mentor only transmits a fraction of what is in their own head. Fragile — one critical departure usually means ten years of experience walking out the door, and the organization restarts the accumulation.

The mechanism worked for centuries because there was no other option. The best the organization could do was retain people, standardize processes, document experience. The underlying physical fact — experience lives in heads and walks with the person — was unbreachable.

The AI era is the first time it has been breachable.

Every PR's review comments. Every post-incident write-up. Every agent failure and how it was corrected. Every spec revision and the reasoning behind it. In the past, those things happened, got discussed, and quietly faded out of view. In the AI era they can be deposited into Skills, into knowledge bases, called automatically the next time a similar action happens. Deposit existed before — write docs, write wikis, write retros. The old kind of deposit had a foundational problem: written things had to be actively found, read, and applied by people, and the loss between those steps was massive. AI-era deposits are pulled in by the agent at the right moment, automatically; the loss is close to zero.

That is a foundational shift. An organization's cognition can, for the first time, move from an implicit asset attached to people into an explicit asset that can be engineered, versioned, and continuously iterated. A codebase can be versioned, walked back, continuously refactored. An organization's experience can now be too.

The judgment sounds nice. It is not free.

For an organization to actually become a learning system, every action covered in the previous four sections has to land. Assets need owners and need retirement, otherwise they rot faster than they accumulate. Metrics have to dodge Goodhart, otherwise everything learned is learned to game the metric. Middle management has to shift from information routing to judgment backstop, otherwise the learning gets stuck in the middle layer. The backstop mechanism in gray zones has to exist, otherwise the organization stalls in accountability arguments.

An organization's core competitive advantage is not which model it is using, not who it has hired — it is how much individual experience has been converted into organizational asset. The first gets flattened by model iteration. The second walks out with people. Only the third compounds with time.

In the traditional era, scale was the most stable moat an organization had. Bigger meant more capital, more talent, more customers, more bargaining power. Combined, those advantages made it nearly impossible for a small organization to win against a large one over the long run. That regularity had a specific reason. Traditional organizational cognition and capability scale linearly with headcount. A ten-person organization accumulates one unit of experience; a thousand-person organization accumulates a hundred. Scale was the volume of cognitive accumulation.

The AI era can break the linearity. An organization that genuinely deposits experience into reusable assets can let cognitive accumulation come unstuck from headcount. A thirty-person team with a healthy metabolism, healthy assets, and a stable owner regime can plausibly accumulate more effective experience than a three-hundred-person team with a broken metabolism, sick assets, and a heavy blame culture. Under that comparison, scale is no longer decisive.

This is not a prediction that big companies get beaten by small ones tomorrow. That is too tidy. The structural read is: a class of small team, large output companies will appear, and their output is not coming from headcount — it is coming from the organization itself being a system that can learn. The scale dividend of traditional incumbents will be eaten by these organizations from below. How fast depends on how good the incumbents' own metabolism is.

15.6 Closing: From Tool to Capability Foundation

Team assets, in themselves, are not wrong. Every asset had a reason at the moment it was built. The problem is not with any individual asset's existence. The problem is whether the organization has the capacity to keep them moving together, to keep them metabolized together, to keep them aligned with the present together. Without that capacity, the asset list becomes an old account that grows year by year. With that capacity, the same list becomes a snapshot of what the organization currently knows.

The advantage in the AI era does not come from who picked up AI first. It comes from who first turned it into their own capability foundation. Tools, anyone can buy. A capability foundation, only you can build.