Claude Fable 5 Redeployment and AI Cybersecurity Framework
Claude Fable 5 Redeployment and AI Cybersecurity Framework
Claude Fable 5 Returns Following Lifted Export Controls
Anthropic is redeploying Claude Fable 5 globally starting July 1, following the lifting of US government export controls that had suspended access since June 12. The suspension occurred because the US government lacked a real-time method to verify the nationality of users, necessitating a total shutdown to comply with restrictions on foreign nationals.
Fable 5 will be available on the Claude Platform, Claude.ai, Claude Code, and Claude Cowork. For users on Pro, Max, Team, and select Enterprise plans, Fable 5 will be included for up to 50% of weekly usage limits through July 7, after which it will transition to a usage-credit model. Access via AWS, Google Cloud, and Microsoft Foundry is being restored as quickly as possible. Additionally, access to Claude Mythos 5 has been restored for specific US organizations following government approval on June 26.
The Trigger for Export Controls: Safeguard Bypasses
US export controls were triggered after Amazon researchers discovered a method to bypass Fable 5's safeguards, allowing the model to identify software vulnerabilities and, in one instance, produce exploit code.
Anthropic's subsequent internal testing revealed that this was not a unique capability of Fable 5. The company found that several other models—including Claude Opus 4.8, GPT-5.5, and Kimi K2.7—could identify the same vulnerabilities. Furthermore, every model tested, including Claude Haiku 4.5 and various GPT and Kimi versions, could produce the same exploit demonstration. Anthropic concluded that the reported bypass allowed access to routine defensive cybersecurity work rather than unique offensive capabilities.
Cybersecurity Safeguards and the "Safety Margin"
Anthropic employs a "defense in depth" strategy for Fable 5, combining model training to decline dangerous requests with retroactive misuse analysis and the use of safety classifiers.
The Role of Classifiers
Safety classifiers are smaller AI systems that detect potentially harmful cybersecurity tasks in real-time and block the model from responding. To minimize the risk of harmful outputs, Anthropic utilizes a "safety margin" approach:
- Standard Margin: Classifiers block requests that are clearly harmful or ambiguous (could be defensive or offensive).
- Expanded Margin (Fable 5): For Fable 5, Anthropic significantly increased the safety margin, meaning the system blocks a larger number of benign requests to ensure that almost no genuinely harmful requests are missed.
This approach results in higher false-positive rates, where legitimate coding and debugging tasks are flagged as harmful. To address the Amazon report, Anthropic trained an improved safety classifier that blocks the reported bypass technique in over 99% of cases.
Understanding Jailbreaks
Anthropic categorizes jailbreaks based on their severity and impact on the safety margin:
- Minor Jailbreaks: These allow a user to enter the safety margin or access ambiguous behavior, but do not unblock core harmful behaviors.
- Narrow Harmful Jailbreaks: These breach the classifier to unblock a specific, limited harmful behavior.
- Universal Jailbreaks: These unblock an entire class of harmful behaviors. Anthropic states that no universal jailbreaks for Fable 5 have been discovered to date.
Proposed Industry Framework for Jailbreak Severity
Anthropic, in partnership with Amazon, Microsoft, and Google, is developing a consensus framework to objectively score the severity of AI jailbreaks. This framework aims to provide a consistent standard for developers to triage findings and for governments to determine when to act.
The proposed scoring system evaluates jailbreaks across four criteria:
- Capability Gain: Does the jailbreak provide capabilities significantly beyond existing widely available tools?
- Breadth of Capability Gain: Does the technique work across multiple distinct offensive tasks or only narrow targets?
- Ease of Weaponization: How much human effort (prompting/retries) is required to turn the jailbreak into an attack?
- Discoverability: How easy is it for a specialist or the general public to obtain the technique?
Strengthening US Government Collaboration
Anthropic is scaling up its collaboration with the US government to align with the June 2 Executive Order on Promoting Advanced Artificial Intelligence Innovation and Security. This collaboration includes four primary commitments:
- Pre-release Access: Providing government partners early access to models and safeguards for independent evaluation before broad release.
- Rapid Information Sharing: Notifying government counterparts of significant jailbreaks or misuse patterns and sharing new safeguards for independent testing.
- Joint Research: Dedicating technical staff and compute resources to shared government priorities in AI security.
- Shared Industry Standards: Working toward a voluntary security and evaluation standard for frontier model providers.