Workweave Router: Smart Model Routing for Agentic Systems
Workweave Router: Smart Model Routing for Agentic Systems
Workweave Router is a drop-in proxy for Anthropic, OpenAI, and Gemini that automatically selects the optimal model for every request. By utilizing a cluster scorer derived from the Avengers-Pro research, the router aims to reduce LLM operational costs by 40-70% without requiring changes to the application logic beyond a simple endpoint update.
Automated Model Selection via Cluster Scoring
The Workweave Router does not rely on "vibes-based" prompting for routing; instead, it uses a tiny on-box embedder to route requests in under 50ms. This system is based on the Avengers-Pro framework, which optimizes for the balance between performance and efficiency.
According to the RouterArena leaderboard, the Workweave Router ranks #1 in the Acc-Cost Arena with a score of 76.09.
Integration and Tool Support
The router acts as a proxy that "speaks" multiple APIs, allowing it to be integrated into various agentic systems and IDEs. It supports streaming, tools, and vision across several providers:
- Supported APIs: Anthropic Messages, OpenAI Chat Completions, and Gemini native.
- OSS Model Support: Integration with DeepSeek, Kimi, GLM, Qwen, Llama, and Mistral via OpenRouter or other OpenAI-compatible endpoints.
- Tool Integration:
- Claude Code: Can be wired via
make install-ccornpx @workweave/router --claude. - Codex (OpenAI CLI): Patches
config.tomlto use the router as the model provider. - opencode: Merges a
provider.weaveentry into the configuration JSON. - Cursor: Supports overriding the OpenAI Base URL to point to the router's local endpoint (
http://localhost:8080/v1).
- Claude Code: Can be wired via
Deployment and Architecture
Users can deploy the router in two primary ways:
- Hosted: Using
npx @workweave/router, which handles the installation and configuration for specific tools like Claude Code or Codex. - Self-Hosted: Running the full stack (including a Postgres database and dashboard) via
make full-setup. This allows provider keys to remain on the local machine, encrypted at rest.
API Endpoints
| Endpoint | Format | Function |
|---|---|---|
POST /v1/messages |
Anthropic Messages | Routed request |
POST /v1/chat/completions |
OpenAI Chat Completions | Routed request |
POST /v1beta/models/:action |
Gemini generateContent |
Routed request |
POST /v1/route |
Custom | Returns routing decision without calling upstream |
GET /v1/models |
Anthropic | Passthrough |
Observability and Roadmap
The router provides out-of-the-box OTLP traces, allowing users to monitor routing decisions via a built-in dashboard at http://localhost:8080/ui/dashboard or by integrating with external tools like Honeycomb, Datadog, or Grafana.
Future developments include:
- Token-aware rate limiting using a Redis sliding window.
- Sub-installations for tenant hierarchies.
- Speculative dispatch and hedging to reduce tail latency.
Community Perspectives and Technical Trade-offs
While the router promises significant cost savings, some developers have raised concerns regarding the prompt-model relationship and caching efficiency:
"the way I prompt already changes based upon what model I am using. I'm not convinced it would route to the right model based on my diction or whatever."
Other users pointed out that routing between different models mid-execution could potentially lead to more cache misses, which may offset some of the cost savings provided by cheaper models:
"The thing I do not get with these routers is that you will have more cache misses... using the cache is crucial. How does this router translate to $$$ when developing?"
Conversely, some see this as a necessary evolution for managing token budgets as LLM pricing increases:
"As prices increase we will see more of these tools to optimise and make the best use of token budget"