dstack: a unified control plane for GPU provisioning and orchestration across multiple clouds and on-prem clusters

What it solves

dstack is a unified control plane for GPU provisioning and orchestration. It removes the complexity of managing compute resources across different GPU clouds, Kubernetes clusters, and on-premise servers, providing a consistent way to handle development, training, and inference.

How it works

Users set up a dstack server and a CLI to manage their infrastructure. The system works by configuring "backends" to connect to various GPU clouds or clusters. Users define their infrastructure needs through YAML configuration files for fleets, development environments, and tasks. By running dstack apply, the system automatically handles the provisioning, job queuing, auto-scaling, networking, and volume management across the connected environments.

Who it’s for

AI developers and ML engineers who need to scale their workloads from local development to distributed training and model deployment across multiple hardware accelerators (NVIDIA, AMD, Google TPU, and Tenstorrent).

Highlights

Multi-cloud and Hybrid Support: Works across any GPU cloud, Kubernetes, and on-prem clusters.
Detailed Resource Management: Supports fleets, dev environments, tasks, and services for different stages of the ML lifecycle.
** uma own AI Agent Integration**: Provides "skills" that allow AI agents (like Claude or Cursor) to manage fleets and submit workloads via the CLI.
Broad Hardware Compatibility: Out-of-the-box support for NVIDIA, AMD, Google TPU, and Tenstorrent accelerators.

dstack: a unified control plane for GPU provisioning and orchestration across multiple clouds and on-prem clusters

dstack: a unified control plane for GPU provisioning and orchestration across multiple clouds and on-prem clusters

What it solves

How it works

Who it’s for

Highlights

Sources