Workweave Router: Smart Model Routing for Agentic Systems

Workweave Router: Smart Model Routing for Agentic Systems

Workweave Router is a drop-in proxy for Anthropic, OpenAI, and Gemini that automatically selects the optimal model for every request. By utilizing a cluster scorer derived from the Avengers-Pro research, the router aims to reduce LLM operational costs by 40-70% without requiring changes to the application logic beyond a simple endpoint update.

Automated Model Selection via Cluster Scoring

The Workweave Router does not rely on "vibes-based" prompting for routing; instead, it uses a tiny on-box embedder to route requests in under 50ms. This system is based on the Avengers-Pro framework, which optimizes for the balance between performance and efficiency.

According to the RouterArena leaderboard, the Workweave Router ranks #1 in the Acc-Cost Arena with a score of 76.09.

Integration and Tool Support

The router acts as a proxy that "speaks" multiple APIs, allowing it to be integrated into various agentic systems and IDEs. It supports streaming, tools, and vision across several providers:

  • Supported APIs: Anthropic Messages, OpenAI Chat Completions, and Gemini native.
  • OSS Model Support: Integration with DeepSeek, Kimi, GLM, Qwen, Llama, and Mistral via OpenRouter or other OpenAI-compatible endpoints.
  • Tool Integration:
    • Claude Code: Can be wired via make install-cc or npx @workweave/router --claude.
    • Codex (OpenAI CLI): Patches config.toml to use the router as the model provider.
    • opencode: Merges a provider.weave entry into the configuration JSON.
    • Cursor: Supports overriding the OpenAI Base URL to point to the router's local endpoint (http://localhost:8080/v1).

Deployment and Architecture

Users can deploy the router in two primary ways:

  1. Hosted: Using npx @workweave/router, which handles the installation and configuration for specific tools like Claude Code or Codex.
  2. Self-Hosted: Running the full stack (including a Postgres database and dashboard) via make full-setup. This allows provider keys to remain on the local machine, encrypted at rest.

API Endpoints

Endpoint Format Function
POST /v1/messages Anthropic Messages Routed request
POST /v1/chat/completions OpenAI Chat Completions Routed request
POST /v1beta/models/:action Gemini generateContent Routed request
POST /v1/route Custom Returns routing decision without calling upstream
GET /v1/models Anthropic Passthrough

Observability and Roadmap

The router provides out-of-the-box OTLP traces, allowing users to monitor routing decisions via a built-in dashboard at http://localhost:8080/ui/dashboard or by integrating with external tools like Honeycomb, Datadog, or Grafana.

Future developments include:

  • Token-aware rate limiting using a Redis sliding window.
  • Sub-installations for tenant hierarchies.
  • Speculative dispatch and hedging to reduce tail latency.

Community Perspectives and Technical Trade-offs

While the router promises significant cost savings, some developers have raised concerns regarding the prompt-model relationship and caching efficiency:

"the way I prompt already changes based upon what model I am using. I'm not convinced it would route to the right model based on my diction or whatever."

Other users pointed out that routing between different models mid-execution could potentially lead to more cache misses, which may offset some of the cost savings provided by cheaper models:

"The thing I do not get with these routers is that you will have more cache misses... using the cache is crucial. How does this router translate to $$$ when developing?"

Conversely, some see this as a necessary evolution for managing token budgets as LLM pricing increases:

"As prices increase we will see more of these tools to optimise and make the best use of token budget"

Sources