forge: what it is, what problem it solves & why it's gaining traction

forge: what it is, what problem it solves & why it's gaining traction

What it solves

Forge provides a reliability layer for self-hosted Large Language Models (LLMs) when they are performing tool-calling. It addresses the common issue where smaller local models often fail to follow tool-calling formats, call unknown tools, or struggle to choose between generating text and calling a tool, which typically leads to crashes or incorrect behavior in agentic workflows.

How it works

Forge acts as a guardrail system that sits between the LLM and the application. It employs several techniques to ensure reliability:

  • Response Validation: Checks tool calls against a defined list of available tools and validates their parameters.
  • Rescue Parsing: Extracts structured tool calls from malformed responses (e.g., JSON in code fences or specific model formats like Mistral or Qwen) and converts them into a canonical format.
  • Retry Loops: If a tool call is invalid, Forge automatically retries the inference with a corrective "nudge" to the model.
  • Synthetic Respond Tool: Injects a hidden respond tool that forces the model to use a tool call even when it wants to provide a text response, preventing the model from mixing text and tool calls incorrectly.
  • Workflow Constraints: Allows developers to define required steps, prerequisites, and terminal tools to constrain the model's path through a task.

Who it’s for

It is designed for developers building agentic applications using local LLMs (via Ollama, vLLM, llama.cpp, etc.) or hybrid setups using Anthropic. It is particularly useful for those who want to improve the tool-calling accuracy of 8B-class models without rewriting their existing orchestration logic.

Highlights

  • Proxy Mode: A drop-in proxy server that makes any OpenAI-compatible client (like aider or Continue) think it is talking to a more capable model by applying guardrails transparently.
  • High Performance Lift: Claims to increase the reliability of 8B local models from single digits to 84% on its evaluation suite.
  • Backend Agnostic: Supports a wide range of backends including Ollama, llama-server, Llamafile, vLLM, and Anthropic.
  • Flexible Integration: Can be used as a full WorkflowRunner, a transparent proxy, or as standalone middleware for existing loops.

Sources