sie: a self-hosted inference cluster for agents that serves 85+ models through a single unified API
sie: a self-hosted inference cluster for agents that serves 85+ models through a single unified API
What it solves
SIE (Superlinked Inference Engine) eliminates the need to manage multiple separate model servers for different AI agent tasks. Instead of a patchwork of servers for embedding, reranking, OCR, and generation, it provides a single, self-hosted open-source cluster that serves over 85 pre-configured models through one unified API.
How it works
SIE runs as a server (via Docker or native install) that manages a library of quality-verified models from Hugging Face. It uses on-demand loading and LRU (Least Recently Used) eviction to serve multiple models simultaneously without exhausting resources. The system provides a unified SDK for Python and TypeScript to call functions like encode, score, extract, and generate across different model architectures.
Who it’s for
Developers building AI agents who want to self-host their inference stack in their own cloud (GKE, EKS) and avoid the complexity of deploying and maintaining individual model servers for every specialized task.
Highlights
- Unified API: One interface for search/retrieval, document-to-markdown conversion, structured output, and agent loops.
- Extensive Model Library: 85+ pre-configured models including Stella, SPLADE, Qwen3, and GLiNER.
- Production-Ready Stack: Includes a load-balancing gateway, KEDA autoscaling, Grafana dashboards, and Terraform modules for GKE/EKS.
- Broad Integration: Compatible with LangChain, LlamaIndex, Haystack, DSPy, CrewAI, and popular vector databases like Chroma, Qdrant, and Weaviate.
- OpenAI Compatible: Offers a
/v1/embeddingsendpoint for easy migration.
Sources
- undefinedsuperlinked/sie