ODS: a one-command local AI server stack that automates GPU detection and service orchestration
ODS: a one-command local AI server stack that automates GPU detection and service orchestration
What it solves
ODS (Osmantic Deployment System) simplifies the process of setting up a private, local AI server. Instead of manually configuring multiple separate tools for inference, chat interfaces, and automation, ODS provides a single-command installation that wires together a complete AI stack on your own hardware, ensuring your data and prompts remain private.
How it works
ODS uses a modular installer that detects your GPU (NVIDIA, AMD, Intel Arc, or Apple Silicon) and automatically selects the best-fitting LLM based on your available VRAM or RAM. It deploys a suite of pre-configured services using Docker and native binaries (like llama-server for macOS Metal acceleration).
To minimize wait times, it employs a "bootstrap mode" that downloads a tiny model first so you can start chatting immediately while the full-sized model downloads in the background.
Who it’s for
It is designed for individuals who want a private AI homelab or workstation without needing a computer science degree or extensive experience with CUDA drivers and Docker configurations.
Highlights
- One-Command Setup: Automated GPU detection and service orchestration for Linux, macOS, and Windows.
- Full-Service Stack: Includes Open WebUI for chat, llama-server for inference, n8n for workflows, Qdrant for RAG, and ComfyUI for image generation.
- Hardware-Aware: Automatically maps hardware tiers to specific GGUF models to optimize performance.
- Extensible Architecture: Services are treated as extensions, allowing users to easily add or enable/disable new tools via a manifest system.
- Privacy-First: Runs entirely locally by default, though optional cloud/hybrid modes are available via LiteLLM.
Sources
- undefinedLight-Heart-Labs/ODS