Guardrails: what it is, what problem it solves & why it's gaining traction

Guardrails: what it is, what problem it solves & why it's gaining traction

What it solves

NeMo Guardrails provides a programmable layer between an application and a Large Language Model (LLM) to ensure conversations remain safe, secure, and on-topic. It prevents LLMs from discussing unwanted topics, protects against vulnerabilities like jailbreaks and prompt injections, and ensures the model follows predefined conversational paths rather than deviating into unpredictable behavior.

How it works

The toolkit implements "rails"—programmable controls that intercept and modify the flow of information at five key stages:

  1. Input Rails: Filter or alter user input before it reaches the LLM.
  2. Dialog Rails: Use a specialized modeling language called Colang to steer the conversation along predefined paths.
  3. Retrieval Rails: In RAG scenarios, these filter or modify retrieved document chunks before they are used for prompting.
  4. Execution Rails: Control the input and output of custom tools or actions called by the LLM.
  5. Output Rails: Review and modify the LLM's response before it is delivered to the user.

Developers can configure these rails via a config.yml file and .co (Colang) files, then deploy them using a Python API or a dedicated guardrails server.

Who it’s for

  • LLM Application Developers: Those building chatbots or domain-specific assistants who need strict control over model behavior.
  • RAG Implementers: Developers needing to enforce fact-checking and output moderation on retrieved data.
  • Enterprise AI Teams: Organizations requiring standard operating procedures (e.g., authentication) and safety guarantees for customer-facing AI.

Highlights

  • Colang: A dedicated modeling language for designing flexible yet controllable dialogue flows.
  • Comprehensive Protection: Built-in support for jailbreak detection, hallucination detection, and content safety.
  • Flexible Integration: Works with various LLMs (GPT-4, LLaMa-2, Falcon, etc.) and integrates optionally with LangChain.
  • Evaluation Tools: Includes a CLI tool (nemoguardrails evaluate) to test topical rails, moderation, and fact-checking.

Sources