Working With AI: A Concrete Example of the Sorcerer's Apprentice Problem

Working With AI: A Concrete Example of the Sorcerer's Apprentice Problem

AI is a powerful tool for investigation and testing, but a liability for architectural design

Using a real-world bug fix in the hyperscript parser, Carson Gross demonstrates that while AI (specifically Claude) excels at root-cause analysis and generating focused test cases, it often fails to propose clean, architecturally sound solutions. Without a knowledgeable human in the loop to reject "hacky" suggestions, AI can inadvertently introduce technical debt by prioritizing immediate fixes over long-term system coherence.

The Case Study: A Hyperscript Parser Regression

The Bug

In hyperscript version 0.9.91, a regression occurred where the expression fetch {% url 'trade:get_symbol_data' %}?symbol=${symbol} as JSON failed to parse correctly. The as JSON modifier, which should have instructed the fetch command on how to handle the response, was instead being consumed as a type conversion expression for the string literal before it reached the fetch command.

Root Cause Analysis

Gross utilized Claude to investigate the cause, which the AI identified quickly: a refactor intended to share logic between the go and fetch commands via a new parseURLOrExpression() method had accidentally expanded the grammar. This caused the parser to treat the as keyword as a general conversion expression rather than a specific modifier for the fetch command.

The Failure of AI-Driven Solutions

While the AI successfully diagnosed the problem, its proposed solutions lacked the necessary architectural foresight:

  1. The Hack: The first proposal suggested parsing "string-like" leaves first. This would have fixed the specific reported bug but failed for general cases, such as using a variable as a fetch target (fetch $url as JSON).
  2. Unnecessary Complexity: The second proposal suggested adding a noConversions flag to the parser. While functional, this introduced new state and complexity that was unnecessary given the existing infrastructure.
  3. The Overly Broad Fix: After Gross pointed the AI toward the existing "follows" mechanism (which allows commands to claim keywords so expressions ignore them), the AI implemented a fix in parseURLOrExpression(). However, because this method was shared by both fetch and go, the fix broke valid as conversion expressions within go commands.

The Human Resolution

Gross implemented the final fix manually by moving the pushFollow("as") logic specifically into FetchCommand#parse(). This ensured the fetch command behaved correctly without affecting the go command's parsing logic.

The "Sorcerer's Apprentice" Problem

Gross defines the "Sorcerer's Apprentice" problem as a state where a developer becomes so reliant on AI that they are unable to understand or properly address issues within the systems they build.

  • Vibe Coding vs. Engineering: The author contrasts "vibe coding"—where developers pride themselves on not understanding the underlying mechanics—with a disciplined approach where the human acts as the "sorcerer," demanding solutions that fit the existing architecture.
  • Technical Debt: The author asserts that technical debt grows exponentially. Blindly accepting AI suggestions often results in "hacky" corner cases and unnecessary state, accelerating this growth.

AI and the Evolution of the Developer

Gross notes that AI provides significant benefits for experienced developers, particularly in offsetting age-related declines in memory and stamina:

  • Rapid Context Switching: AI helps developers re-understand complex projects quickly through prompting.
  • Extensive Testing: AI can "grind" through the creation of comprehensive test suites more efficiently than a human, providing a level of verification that might otherwise be skipped due to energy constraints.

Community Perspectives and Counterpoints

Discussion among the developer community highlights several key tensions regarding AI integration:

  • Lack of World Models: Some argue that LLMs lack a "world model," meaning they cannot form a mental map of a design to find the most elegant solution, leading them to jump to immediate, suboptimal fixes.
  • The Testing Paradox: One critic noted that if the AI were truly proficient at generating tests, it should have used those tests to catch its own flawed architectural proposals before presenting them to the human.
  • The Plasticity of Intellect: While Gross worries about the "dulling of intellects," some developers argue that AI functions like any other tool or language shift; the brain remains plastic, and skills are simply retrieved "when needed" rather than maintained constantly.

"I think the debt in that code is going to be a millstone around our necks for a long time to come." — @jdlshore

Sources