ODS: a one-command local AI server stack that automates GPU detection and service orchestration

ODS: a one-command local AI server stack that automates GPU detection and service orchestration

What it solves

ODS (Osmantic Deployment System) simplifies the process of setting up a private, local AI server. Instead of manually configuring multiple separate tools for inference, chat interfaces, and automation, ODS provides a single-command installation that wires together a complete AI stack on your own hardware, ensuring your data and prompts remain private.

How it works

ODS uses a modular installer that detects your GPU (NVIDIA, AMD, Intel Arc, or Apple Silicon) and automatically selects the best-fitting LLM based on your available VRAM or RAM. It deploys a suite of pre-configured services using Docker and native binaries (like llama-server for macOS Metal acceleration).

To minimize wait times, it employs a "bootstrap mode" that downloads a tiny model first so you can start chatting immediately while the full-sized model downloads in the background.

Who it’s for

It is designed for individuals who want a private AI homelab or workstation without needing a computer science degree or extensive experience with CUDA drivers and Docker configurations.

Highlights

  • One-Command Setup: Automated GPU detection and service orchestration for Linux, macOS, and Windows.
  • Full-Service Stack: Includes Open WebUI for chat, llama-server for inference, n8n for workflows, Qdrant for RAG, and ComfyUI for image generation.
  • Hardware-Aware: Automatically maps hardware tiers to specific GGUF models to optimize performance.
  • Extensible Architecture: Services are treated as extensions, allowing users to easily add or enable/disable new tools via a manifest system.
  • Privacy-First: Runs entirely locally by default, though optional cloud/hybrid modes are available via LiteLLM.

Sources