executorch: a unified on-device AI inference engine for deploying PyTorch models to mobile and embedded hardware
executorch: a unified on-device AI inference engine for deploying PyTorch models to mobile and embedded hardware
What it solves
ExecuTorch provides a unified way to deploy PyTorch AI models on-device, ranging from smartphones to microcontrollers. It eliminates the need for manual C++ rewrites, intermediate format conversions (like ONNX or TFLite), and vendor lock-in, allowing developers to move from research to production with the same PyTorch APIs.
How it works
ExecuTorch uses ahead-of-time (AOT) compilation to prepare models for the edge. The process involves three main steps:
- Export: The PyTorch model graph is captured using
torch.export(). - Compile: The model is quantized, optimized, and partitioned to specific hardware backends, resulting in a
.ptefile. - Execute: The lightweight C++ runtime (with a base footprint of 50KB) loads and runs the
.ptefile on the device.
It uses a standardized Core ATen operator set and partitioners to delegate subgraphs to specialized hardware like NPUs or GPUs, with CPU fallback.
Who it’s for
AI developers and engineers who need to deploy LLMs, vision, speech, and multimodal models to mobile devices (Android/iOS) and embedded systems (Linux/Windows/MCU) across various hardware backends (Apple, Qualcomm, ARM, MediaTek, etc.).
Highlights
- Native PyTorch Export: Direct export from PyTorch without intermediate formats.
- Tiny Runtime: Minimal 50KB base footprint for extreme portability.
- Broad Hardware Support: 12+ open-source acceleration backends including CoreML, Vulkan, and XNNPACK.
- Production-Proven: Powers on-device AI for Meta's Instagram, WhatsApp, and Quest 3.
- Advanced Deployment Tools: Built-in support for quantization (via torchao), memory planning, and dynamic shapes.
Sources
- undefinedpytorch/executorch