Dangerously Skip Reading Code: Shifting Rigor in the Age of AI Agents

The traditional role of the software engineer has long been centered on the ability to read, understand, and maintain source code. In this paradigm, the code is the primary proxy for understanding the software system. However, as Large Language Models (LLMs) and autonomous agents begin to generate code at a velocity and volume that far exceed human reading capacity, this model is reaching a breaking point.

When an agent can pump out thousands of lines of code in minutes, the traditional pull request (PR) review process becomes a bottleneck. If we insist on reviewing every line of generated code, we negate the productivity gains promised by AI. This leads to a provocative question: What happens if we stop reading the code entirely?

The Code-as-Machine-Code Hypothesis

There is a growing argument that we should treat LLM-generated code not as a human-readable source, but as a form of "machine code"—similar to how developers treat assembly, bytecode, or transpiled JavaScript. In this view, the high-level language (Python, TypeScript, Rust) is no longer the primary artifact of human intent; it is merely the implementation detail produced by the AI.

For this to work, the industry would need to shift its definition of "rigor." Instead of verifying the how (the code), engineers would focus exclusively on the what (the specification) and the result (the tests).

Shifting the Unit of Knowledge

To implement this shift, the "unit of knowledge" for a project would move from the source code to a standardized specification—potentially written in Markdown. In this proposed workflow:

Collaborative Specification: Product owners and engineers collaborate on a detailed Markdown spec and a set of test cases that enforce business rules.
Automated Implementation: An AI agent implements the code based on the spec.
Verification: Automated checks verify that tests pass and that the code conforms to the spec.
Accountability: The team is held accountable for the specification and the test suite, not the resulting code.

The Organizational Mandate

This shift cannot be a decision made by a single developer or team; it must be an organizational mandate. According to Amdahl's Law, maximizing code generation speed without rearranging organizational structures yields no tangible productivity gains.

If the unit of work remains "add a new endpoint to the API," the friction of coordination and bureaucracy will still limit throughput. To truly leverage agents, organizations would need to:

Remove humans-in-the-loop for trivial implementations.
Transition engineers into "pseudo-product designers" who own entire streams of work.
Accept that rework is "almost free," reducing the effort spent preventing every single incorrect line of code from being written.

Critical Counterpoints and Risks

While the prospect of extreme velocity is alluring, the community has raised significant concerns regarding the long-term viability of this approach.

The "Cognitive Debt" Trap

A primary concern is the accumulation of cognitive debt. If developers stop reading the code, they lose the ability to debug complex, systemic failures. As one commentator noted, if an LLM makes a fundamental architectural error—such as denormalizing a database for short-term ease—the system may pass all tests and conform to the spec, but become impossible to evolve later because no human understands the underlying structure.

The Specification Paradox

There is a recurring argument in software engineering that "a sufficiently precise specification is code." If a spec must be detailed enough to ensure an LLM doesn't hallucinate or drift, the effort required to write that spec may equal or exceed the effort of writing the code itself. In this case, the shift isn't a productivity gain, but simply a migration to a new, more verbose programming language.

The Loss of "Taste"

Some argue that code review is not just about finding bugs, but about applying "taste"—ensuring the code is maintainable, elegant, and follows architectural patterns. By treating code as a build artifact, we risk creating systems that are functionally correct but structurally incoherent, leading to a future where we have "code full of unknown bugs that is unfixable."

Practical Implementations

Despite the risks, some developers are already adopting a "Plan-First" workflow to mitigate these issues:

"Always get the agent to create a plan file (spec)... Get agents to iterate on the plan file until it's complete and thorough... Once the plan is final, have an agent implement it. Check the plan in with the impl commit. The plan is the unit of work really since it encodes intent."

Others suggest using RFC keywords (MUST, SHOULD, MAY) within specifications to provide the LLM with clearer constraints, effectively turning the specification into a formal contract.

Conclusion

Treating code as a disposable artifact is a high-risk, high-reward strategy. While it offers a path to unprecedented development speed, it threatens the very foundation of system understanding. The challenge for the modern engineer is determining where the line exists between "productive speed" and "irresponsible negligence," and whether the rigor of a specification can truly replace the rigor of a deep dive into the source code.

Dangerously Skip Reading Code: Shifting Rigor in the Age of AI Agents

Dangerously Skip Reading Code: Shifting Rigor in the Age of AI Agents

The Code-as-Machine-Code Hypothesis

Shifting the Unit of Knowledge

The Organizational Mandate

Critical Counterpoints and Risks

The "Cognitive Debt" Trap

The Specification Paradox

The Loss of "Taste"

Practical Implementations

Conclusion

Sources