Part I of the New Engine series. Part II: The Context Protocol → | Part III: The Bootstrap Spiral → | Part IV: The Living Kernel →
Ivan Zhao recently wrote about the steam engine's arrival at textile factories. When it first appeared, factory owners swapped out the waterwheel and kept everything else the same. Same floor plan, same layout, same workflow. Productivity gains were modest.
The real breakthrough came when they realized they could decouple from water entirely. Move the factory to the workers, the ports, the raw materials. Redesign everything around the new engine.
We're in the waterwheel phase of AI. A chatbot bolted onto Slack. A copilot bolted onto the IDE. A summary panel next to the document. A stronger engine connected to the old transmission shaft. What you get is smoother old motions, not new ones.
The throughput ceiling is still written in human bandwidth.
Compensatory Structures
Reading, summarizing, forwarding, alignment meetings, meeting minutes, layered approvals -- these steps feel correct because they repeat. But repetition is not proof of necessity.
These are compensatory structures: patches formed under conditions of insufficient context supply, absent agents, and limited human bandwidth. They exist to make collaboration barely functional given the constraint that humans must read, comprehend, and judge every piece of information passing through the system.
Stable is not the same as inevitable. And inevitable is not the same as permanent.
This explains why most AI products feel subtly conservative. They compress change into an intelligent button placed next to the old workflow. The sidebar summary. The chat assistant on the workbench. Familiar. Comfortable. But familiarity means the same constraint is still in force: the human is still on the main path, reading, understanding, supplying context, judging step by step.
A stronger engine on the old transmission shaft produces smoother old motions, not new ones.
The Faster Horse-Carriage and the Red Flag
Everyone knows the horse-carriage metaphor. People think they want a faster horse, until the car appears.
The deeper irony comes after. When the car arrived, the British Parliament passed the Red Flag Act of 1865. It required a person to walk ahead of each vehicle, holding a red flag, to warn pedestrians. The Act was not repealed until 1896. Thirty-one years of forcing the new engine to move at human walking speed.
This impulse recurs every time a production mode is about to be rewritten. It looks like caution on the surface. Underneath, it's path dependence -- an attachment to the intermediate layer that existed to compensate for the old constraint.
Today's Red Flag Act is "human-in-the-loop." Not the kind where humans supervise from a leveraged position -- that's good engineering. The kind where every PR gets reviewed by a human, every deploy gets approved by a human, every output gets read by a human. The flag-bearer walking ahead of the car.
The Florence Tipping Point
Zhao uses another image: you could walk across Florence in forty minutes. The rhythm of life was set by how far a person could walk, how loud a voice could carry. Then steel frames, steam railways, elevators, subways. Cities exploded in scale and density. Tokyo. Chongqing. Dallas.
These aren't bigger Florences. They're a different way of living. Disorienting, anonymous, harder to navigate. That illegibility is the price of scale. But they also carry more opportunity, more combinations, more density of interaction than a human-scaled Renaissance city could ever support.
Knowledge work is about to undergo the same transformation.
Do the arithmetic. One person can run a thousand agents concurrently. An organization of a hundred people can run a hundred thousand. Scale to every entity on earth, and the agent population dwarfs humans by orders of magnitude. Agent-to-agent transactions will be 10x, 100x, eventually 1000x the volume of human-to-software interactions.
OpenClaw went from zero to 196,000 GitHub stars in under three months. One developer built an agent that modifies its own code, writes its own skills, runs 24/7. It didn't ask for a UI review. It didn't need a design system. 600+ contributors, 10,000 commits -- the agent economy shipping while we were still debating alignment frameworks.
When the Florence of knowledge work tips into Tokyo, three things break simultaneously:
Operating rhythm. Meetings, email, weekly cadence -- all assume synchronous human presence. When task chains run continuously across time zones, never sleeping, the weekly sync becomes as absurd as walking a flag ahead of a car.
Collaboration medium. Documents and communication are load-bearing walls when information depends on humans reading it. At critical mass, these become throughput gates. The medium shifts from document-centric to state-and-context-centric: the system maintains continuous context, humans intervene only at critical valves.
Readability. Florence is legible. Everyone understands the full picture. Megacities are not. The shift to "locally usable, globally maintained by the system" is psychologically jarring -- it feels like losing control. But that's the signal of structural succession.
Human Alignment as Path Dependence
Now the synthesis.
Human-machine alignment -- the effort to make AI output match human expectations in interface design, visual layouts, conversational tone, user experience -- is itself a compensatory structure. It exists because we assumed humans are the consumers.
But when 99.999% of software consumers are agents, optimizing for human comprehension is optimizing for the rounding error.
The "alignment problem" was never about the agent. It was about the interface -- the translation layer between the agent's native representation (structured data, API contracts, text protocols) and the human's cognitive modality (visual layouts, spatial navigation, emotional tone). Every GUI is a lossy compression of the agent's native world into something a human can parse.
Remove the human from the consumption side, and the translation layer disappears. The alignment problem doesn't get solved. It evaporates.
The Ralph Wiggum loop already demonstrated this. The agent reads the tests, fixes the code, re-runs. No human reviewed the PR. No human approved the diff. The test suite is simultaneously the spec, the validator, and the acceptance criteria. Geoffrey Huntley ran Claude in a loop for three months and it produced CURSED -- a compiled programming language with an LLVM backend, editor extensions, and a standard library. The agent was both the builder and, functionally, the first user.
Why Agent-for-Agent Is Structurally Easier
When an agent builds a tool for another agent:
Requirements are unambiguous. Structured input, structured output. No "make it feel snappy." No design reviews debating the shade of gray on a button. The contract is the interface, and both sides speak the same language.
Testing is deterministic. The builder knows the exact contract. Write the test, run the test, ship if green. No user research. No A/B tests. No support tickets about edge cases that "feel weird."
Feedback loops are instant. Build, test, validate, deploy -- in seconds, not sprints. The lag between identifying a need and having it met collapses from weeks to minutes.
No GUI tax. No responsive layouts. No accessibility audits. No color palettes. No onboarding flows. No localization. No documentation for humans. Text in, text out. Structured data is the agent's native tongue.
Compare this to building for humans. Every item on that list -- spec ambiguity, design iterations, user testing, accessibility, localization, onboarding, documentation -- is a consequence of the translation problem between human cognition and machine execution. They are real costs imposed by a real constraint. But the constraint is human bandwidth, not physics. Remove the constraint, remove the costs.
The lawyer analogy is instructive. Young lawyers are trained as human search engines, human databases, human process engines. Not because those motions are the essence of legal expertise, but because when tooling and context infrastructure were insufficient, humans had to be molded into those shapes. Those motions are the finger pointing at the moon, not the moon itself. The real expertise -- abstracting complex facts into structured reasoning, decomposing goals into executable task chains -- survives the transition. The compensatory motions do not.
What to Build
The opportunity isn't "AI tools for developers." It's infrastructure for agents to build their own tools. Protocols, not products. APIs, not apps. Structured formats, not frontends.
The winning companies in the agent economy won't build the best dashboard. They'll build the substrate: registries, protocols, execution environments, context infrastructure. The things agents use to discover, compose, and orchestrate other agents' capabilities.
Anti-spec, not spec. The agent doesn't need a Figma mockup. It needs a goal and a test suite.
The 0.001%
Human-facing software won't disappear entirely. Art, music, physical interfaces -- the things that touch human senses and require human aesthetic judgment -- these remain. So do high-stakes decisions where human accountability is irreplaceable, at least for now.
But they become the edge case, not the default. The 0.001%.
The paradigm didn't shift. It flipped. And most of the industry is still building for the old one -- still walking a red flag ahead of a machine that no longer needs a road.
References
- Ivan Zhao, Steam, Steel, and Infinite Minds (Dec 2025)
- 人类带宽的尽头:从马车幻觉到佛罗伦萨倾倒 -- on compensatory structures and the Florence tipping point
- OpenClaw - Wikipedia
- The Ralph Wiggum Loop -- on agents running autonomously in loops
本文是「新引擎」系列第一篇。第二篇:上下文协议 → | 第三篇:自举螺旋 → | 第四篇:活内核 →
Ivan Zhao 最近写了蒸汽机进入纺织工厂的那段历史。蒸汽机刚出现时,工厂主把水轮换掉,其他一切照旧——同样的厂房布局,同样的工作流程。生产力提升微乎其微。
真正的突破,是他们意识到可以完全摆脱河流的束缚。把工厂搬到工人、港口和原材料旁边,围绕新引擎重新设计一切。
我们正处在 AI 的水轮阶段。Slack 上接一个聊天机器人,IDE 里嵌一个 Copilot,文档旁边加一个摘要面板。更强的引擎接在旧传动轴上,得到的只是更顺滑的旧动作。
吞吐 (Throughput) 的上限,仍然写在人类带宽上。
补偿结构
阅读、写摘要、转发、对齐会议、会议纪要、层层审批——这些步骤因为反复发生而显得天经地义。但反复不等于必然。
它们是补偿结构 (Compensatory Structures):在上下文 (Context) 供给不足、智能体 (Agent) 缺位、人类带宽有限的条件下,为了让协作勉强跑起来而形成的权宜之计。它们的存在前提是:人必须逐条阅读、逐步理解、逐层判断系统中流过的每一条信息。
稳定不等于必然。必然不等于永恒。
这也解释了为什么大多数 AI 产品看起来先进,骨子里却保守。它们把改变压缩成旧流程旁的一个智能按钮——侧边栏的摘要、工作台上的聊天助手。看起来熟悉,用起来舒服。但熟悉意味着同一条约束仍在:人依旧在主链路里读、理解、补上下文、逐步判断。
更强的引擎接在旧传动轴上,得到的往往只是更顺滑的旧动作。
更快的马车与红旗
马车的比喻人人都知道。人以为想要的是更快的马车,直到汽车出现。
更深的讽刺在后面。汽车出现之后,英国议会在 1865 年通过了《红旗法案》:每辆机动车前面必须有一个人举着红旗步行开道。这条法律直到 1896 年才废除。三十一年,新引擎被迫以人类步行的速度运行。
每当生产方式要被改写,这种冲动都会重现。表面上是谨慎,深处是路径依赖 (Path Dependence)——对旧约束下形成的中间层的执念。
今天的《红旗法案》就是「人机在环 (Human-in-the-Loop)」。不是那种人从杠杆位置进行监督的在环——那是好的工程实践。而是每个 PR 都要人看,每次部署都要人批,每份输出都要人读——举旗人走在车前面。
佛罗伦萨倾倒时刻
Zhao 用过另一个意象:你可以在四十分钟内走遍佛罗伦萨。生活的节奏由一个人能走多远、声音能传多远来决定。然后钢结构、蒸汽铁路、电梯、地铁依次出现,城市在规模与密度上爆炸。东京、重庆、达拉斯。
它们不是更大的佛罗伦萨,而是一种截然不同的生存方式。令人迷失、匿名、难以一眼看懂。这种不可读性是规模的代价。但它也承载了更多机会、更多组合、更高密度的交互——人类尺度的文艺复兴城市无法支撑的量级。
知识工作即将经历同样的转变。
算一笔账。一个人可以同时运行一千个智能体。一百人的组织可以运行十万个。推演到地球上的每个实体,智能体的数量将以数量级压倒人类。智能体之间的交互量将是人与软件交互量的 10 倍、100 倍,最终 1000 倍。
OpenClaw 在不到三个月内从零增长到 196,000 个 GitHub star。一个开发者构建了一个能修改自身代码、自行编写技能、全天候运行的智能体。它没有要求 UI 评审,也不需要设计系统。600 多位贡献者、10,000 次提交——智能体经济已经在运转,而我们还在争论对齐框架。
当知识工作的佛罗伦萨倾倒为东京,三件事同时崩塌:
运行节奏。 会议、邮件、周例会——全都建立在人类同步在场的假设上。当任务链跨时区持续运转、永不休眠,周会就变得像举旗人走在车前面一样荒谬。
协作介质。 当信息依赖人类阅读时,文档和沟通是承重墙 (Load-bearing Walls)。到了临界量,它们变成吞吐闸口 (Throughput Gates)。介质从以文档为中心转向以状态和上下文为中心:系统维持连续的上下文,人只在关键阀门处介入。
可读性。 佛罗伦萨是清晰的,每个人都能看懂全局。巨型城市不是。从「人人理解全貌」转向「局部可用、全局由系统维持」,心理冲击很大——像是失控。但这恰恰是结构换代的信号。
人机对齐是路径依赖
线索收拢。
人机对齐 (Human-Machine Alignment)——为了让 AI 输出符合人类在界面设计、视觉布局、对话语气、用户体验上的期望而付出的努力——本身就是一种补偿结构。它存在的前提是:人类是消费者。
但当 99.999% 的软件消费者是智能体时,为人类理解力做优化,就是在为四舍五入的误差做优化。
「对齐问题」从来不是关于智能体的。它是关于界面——那个横亘在智能体原生表达(结构化数据、API 契约、文本协议)与人类认知模式(视觉布局、空间导航、情感语气)之间的翻译层 (Translation Layer)。每一个 GUI 都是智能体原生世界向人类可解析形式的有损压缩 (Lossy Compression)。
把人从消费端移除,翻译层就消失了。对齐问题不是被解决的——它蒸发了。
Ralph Wiggum 循环已经演示了这一点。智能体读测试、改代码、重新跑。没有人类审查 PR,没有人类批准 diff。测试套件同时充当规格 (Spec)、验证器 (Validator) 和验收标准 (Acceptance Criteria)。Geoffrey Huntley 让 Claude 在循环中运行了三个月,产出了 CURSED——一门拥有 LLVM 后端、编辑器扩展和标准库的编译型编程语言。智能体既是构建者,实质上也是第一个用户。
为什么智能体为智能体造工具在结构上更简单
当一个智能体为另一个智能体构建工具时:
需求是无歧义的。 结构化输入、结构化输出。没有「让它感觉更灵敏」。没有就按钮灰度争论不休的设计评审。契约即接口,双方说同一种语言。
测试是确定性的。 构建者知道精确的契约。写测试、跑测试、绿了就发。没有用户调研,没有 A/B 测试,没有关于边界情况「感觉怪怪的」工单。
反馈循环是即时的。 构建、测试、验证、部署——秒级,而非冲刺级。从发现需求到满足需求的延迟从数周缩短到数分钟。
没有 GUI 税。 没有响应式布局,没有无障碍审计,没有配色方案,没有引导流程,没有国际化,没有面向人类的文档。文本进、文本出。结构化数据 (Structured Data) 是智能体的母语。
与为人类构建软件对比一下。上面列出的每一项——需求歧义、设计迭代、用户测试、无障碍、国际化、引导流程、文档——都是人类认知与机器执行之间翻译问题的代价。它们是真实约束带来的真实成本。但这个约束是人类带宽,不是物理定律。移除约束,成本随之消失。
律师行业的类比很能说明问题。年轻律师被训练成人肉搜索引擎、人肉数据库、人肉流程引擎。不是因为这些动作是法律专业能力的本质,而是因为在工具和上下文基础设施不足的年代,人只能被塑造成这些形状。这些动作是指向月亮的手指,不是月亮本身。真正的专业能力——对复杂事实进行抽象结构化、将目标拆解为可执行的任务链——经得起转变。补偿性的动作经不起。
该造什么
机会不在「给开发者的 AI 工具」。而在为智能体构建造工具的基础设施。协议,不是产品。API,不是应用。结构化格式,不是前端界面。
智能体经济中的赢家不会造出最好的仪表盘。他们会造出底层基座:注册中心 (Registry)、协议 (Protocol)、执行环境 (Execution Environment)、上下文基础设施 (Context Infrastructure)——智能体用来发现、组合、编排彼此能力的东西。
反规格,不是规格。智能体不需要 Figma 稿。它需要一个目标和一套测试。
那 0.001%
面向人类的软件不会完全消失。艺术、音乐、物理界面——触及人类感官、需要人类审美判断的事物——仍然存在。需要人类问责的高风险决策也是,至少目前如此。
但它们变成了边界情况,而非默认状态。那 0.001%。
范式没有平移,而是翻转了。大多数行业还在为旧范式造东西——还在一台不再需要路的机器前面举着红旗。
参考资料
- Ivan Zhao, Steam, Steel, and Infinite Minds(2025 年 12 月)
- 人类带宽的尽头:从马车幻觉到佛罗伦萨倾倒——关于补偿结构与佛罗伦萨倾倒时刻
- OpenClaw - Wikipedia
- Ralph Wiggum 循环——关于智能体在循环中自主运行