spiceai: an accelerated SQL and LLM-inference engine for data-grounded AI agents with a cluster-sidecar architecture

spiceai: an accelerated SQL and LLM-inference engine for data-grounded AI agents with a cluster-sidecar architecture

What it solves

Spice is designed to eliminate the complex data pipelines and "glue code" typically required to build data-grounded AI applications and agents. It provides a unified engine for SQL queries, search, and LLM inference, allowing developers to access federated data sources with millisecond latency on localhost, whether running as a standalone binary, a Kubernetes sidecar, or a distributed cluster.

How it works

Spice operates using a "cluster-sidecar" architecture. A lightweight sidecar runs alongside the application on localhost, serving data from a scoped working set. For larger queries, it transparently delegates to a central Spice cluster powered by Apache Ballista for distributed execution and the Spice Cayenne accelerator for high-performance columnar data access. It integrates with over 30 data connectors (e.g., Postgres, Snowflake, S3) and supports native CDC for real-time updates. AI capabilities are integrated directly into the SQL engine, enabling vector search, reranking, and text-to-SQL (NSQL) within a single query plan.

Who it’s for

It is built for developers creating AI agents and data-intensive applications that need high-performance access to diverse, federated data sources without the operational overhead of managing complex ETL pipelines.

Highlights

  • Cluster-Sidecar Architecture: Combines local results caching, local working sets, and distributed cluster delegation for tiered latency.
  • AI-Native Runtime: Integrates LLM inference, vector search, and text-to-SQL directly into the SQL engine via OpenAI-compatible APIs and MCP support.
  • High-Performance Acceleration: Uses the Spice Cayenne accelerator and Vortex columnar format to outperform DuckDB and Parquet in specific workloads.
  • Federated Querying: Connects to 30+ data sources and supports writing to Apache Iceberg tables using standard SQL.
  • Real-time CDC: Native support for PostgreSQL WAL and DynamoDB Streams for low-latency data synchronization.
  • Enterprise Ready: Includes mTLS, HashiCorp Vault/Azure Key Vault integration, and OpenTelemetry observability.

Sources