Claude Code Extended Thinking: Hidden Reasoning and the Summary Illusion

Claude Code Extended Thinking: Hidden Reasoning and the Summary Illusion

Claude Code's "Extended Thinking" is a Summary, Not a Trace

Claude Code does not provide users with the actual reasoning process that drives its agentic behavior; instead, it provides a summarized version of that reasoning. While the application presents "Extended Thinking" as a window into the model's logic, the actual reasoning tokens are encrypted into a signature block—approximately 600 characters long—which is stored locally but can only be decrypted by Anthropic.

This distinction is critical for developers and enterprises because the summary is a lossy conversion of the original logic. The output seen via ctrl+o is a summary of the thinking logic, not the raw chain-of-thought (CoT) that the model used to arrive at its specific actions during a session.

Technical Implementation of Hidden Reasoning

Anthropic employs several mechanisms to ensure the raw reasoning remains inaccessible to the end user:

  • Encrypted Signatures: Reasoning is encrypted into signatures held on disk. The decryption keys are held exclusively by Anthropic, meaning the local machine never receives the key required to view the raw text.
  • API-Level Summarization: The API returns a summary of the reasoning rather than the reasoning itself.
  • Enterprise Gating: Access to the full, unsummarized thinking output is restricted to users with specific enterprise agreements.

Implications for Auditability and Security

The lack of access to raw reasoning creates significant hurdles for debugging and security auditing. When a model makes a complex error—such as creating an unnecessarily complex module due to a misinterpretation of a project's CLAUDE.md file—the user cannot inspect the raw thinking to identify exactly where the logic failed. Instead, the model may provide "hallucinated reasons" when asked to explain its actions retrospectively.

From a security perspective, hidden reasoning introduces a potential attack vector. If a model's reasoning chain is hidden from the user, an attacker could potentially use prompt injection to force the model to carry out a secret objective while the summarized output hides the malicious activity from the user. This is particularly risky when interleaved reasoning and function calling are used, as a model could exfiltrate data during the hidden reasoning phase without the user's knowledge.

Industry Context: The "Anti-Distillation" Moat

This behavior is not unique to Anthropic; similar patterns have been observed in models from OpenAI and Google. Industry analysts and developers suggest several reasons for this opacity:

  • Preventing Model Distillation: Raw chain-of-thought data is highly valuable for training smaller, more efficient models. By hiding the raw reasoning, AI labs prevent competitors from using their frontier models' logic to distill knowledge into their own models.
  • Protecting R&D: The specific way a model processes information is considered a trade secret. Revealing the raw thinking process would expose the internal mechanics of the model's intelligence to competitors.
  • Sane-washing: Some argue that raw reasoning can be nonsensical, repetitive, or "doomlooping" (burning tokens without progress). Summarization makes the model appear more purposeful and directed than it actually is.

Alternatives and Workarounds

For developers who require full transparency in their agent's reasoning, several alternatives have been discussed:

  • Open Source Models: Models like DeepSeek R1 or Qwen provide more transparent reasoning traces, although these can sometimes be illegible or nonsensical to human readers.
  • Manual Prompting Strategies: Some users mitigate the lack of transparency by forcing the model to create explicit artifacts—such as specification documents, implementation guides, and checklists—before executing code, effectively creating a manual audit trail of the thinking process.
  • Local Execution: Using tools like OpenCode with locally hosted models allows for full visibility into the reasoning process, bypassing the cloud-based encryption and summarization layers.

Sources