dstack: a unified control plane for GPU provisioning and orchestration across multiple clouds and on-prem clusters
dstack: a unified control plane for GPU provisioning and orchestration across multiple clouds and on-prem clusters
What it solves
dstack is a unified control plane for GPU provisioning and orchestration. It removes the complexity of managing compute resources across different GPU clouds, Kubernetes clusters, and on-premise servers, providing a consistent way to handle development, training, and inference.
How it works
Users set up a dstack server and a CLI to manage their infrastructure. The system works by configuring "backends" to connect to various GPU clouds or clusters. Users define their infrastructure needs through YAML configuration files for fleets, development environments, and tasks. By running dstack apply, the system automatically handles the provisioning, job queuing, auto-scaling, networking, and volume management across the connected environments.
Who it’s for
AI developers and ML engineers who need to scale their workloads from local development to distributed training and model deployment across multiple hardware accelerators (NVIDIA, AMD, Google TPU, and Tenstorrent).
Highlights
- Multi-cloud and Hybrid Support: Works across any GPU cloud, Kubernetes, and on-prem clusters.
- Detailed Resource Management: Supports fleets, dev environments, tasks, and services for different stages of the ML lifecycle.
- ** uma own AI Agent Integration**: Provides "skills" that allow AI agents (like Claude or Cursor) to manage fleets and submit workloads via the CLI.
- Broad Hardware Compatibility: Out-of-the-box support for NVIDIA, AMD, Google TPU, and Tenstorrent accelerators.
Sources
- undefineddstackai/dstack