Kiln: a local-first AI development workbench for prompt optimization, evaluations, and agent orchestration

Kiln: a local-first AI development workbench for prompt optimization, evaluations, and agent orchestration

What it solves

Kiln provides a unified workbench for the entire AI development lifecycle, eliminating the need to switch between fragmented tools for prompt engineering, evaluation, RAG, and fine-tuning. It solves the problem of "regression" in AI products, where improving one part of a prompt or upgrading a model might accidentally break other behaviors, by tracking quality across multiple dimensions using a single dataset.

How it works

Kiln combines a desktop application for non-technical collaborators (PMs, QA, and subject matter experts) with an MIT-licensed Python library for engineers. It operates on a local-first basis, allowing users to bring their own API keys or run models entirely offline via Ollama. The system syncs to Git for team collaboration and allows users to define a task once and then apply various optimization techniques—such as automatic prompt mutation, RAG integration, or fine-tuning—against that same dataset.

Who it’s for

It is designed for AI product teams, including engineers who need a production-ready library, data scientists working in notebooks, and non-technical stakeholders who need to be able to rate outputs and add training data without writing code.

Highlights

  • Auto-Optimize: Automatically searches through prompt mutations and model selections to find the best configuration for a specific task.
  • Eval Builder: Quickly generates synthetic evaluation datasets and judges to align AI outputs with user preferences.
  • Multi-Agent Orchestration: Supports the composition of multi-agent hierarchies where each agent runs in its own focused context window.
  • Zero-Code Fine-Tuning: Enables fine-tuning across 60+ models on providers like Fireworks, Together, and Vertex without writing code.
  • Local-First Privacy: Runs on the user's machine, ensuring data control and Git-native synchronization.

Sources