FUTO Swipe: Open-Source Swipe Typing Models and Dataset
FUTO Swipe: Open-Source Swipe Typing Models and Dataset
FUTO Swipe is a family of open-source models and algorithms designed to provide high-accuracy swipe typing without requiring privacy-invasive keyboard apps. By combining a layout-agnostic encoder, a language-specific decoder, and a small language model for context, FUTO Swipe achieves a top-4 fail rate of approximately 4% on its test set, matching the performance of major proprietary keyboard solutions.
Model Architecture and Components
FUTO Swipe utilizes a three-tiered model architecture to balance universal applicability with high precision:
- Encoder Model: A universal, layout-agnostic, and language-agnostic model used for general swipe typing predictions. While versatile, it does not provide the highest level of accuracy.
- Decoder Model: A language-specific and layout-specific model that accounts for the peculiarities of a specific keyboard layout. Currently, a QWERTY English decoder is available, which provides the highest accuracy.
- ContextLM: A small language model trained on text data for a single language. It filters out nonsensical word predictions based on the preceding words in a sentence to improve overall quality.
When utilizing all three models with a beam width of 300, the system achieves an error rate below 1% (excluding out-of-vocabulary cases).
Dataset and Training
To train and evaluate these models, FUTO collected a dataset of over 1 million QWERTY English swipes. This data was gathered in August 2024 via voluntary user contributions on the swipe.futo.org domain, where participants swiped Wikipedia sentences word-by-word.
In March 2025, FUTO released this 1-million-swipe dataset under the MIT license on HuggingFace, providing a public resource for the development of swipe typing systems.
Performance and Resource Footprint
FUTO Swipe is designed for on-device execution with minimal latency and low hardware requirements:
- Parameter Count: The system consists of 1,364,271 active parameters and 2,494,767 total parameters. The ContextLM is the largest component at 1.5 million parameters (including 1.1 million for embeddings).
- Hardware Efficiency: The models are small enough to run in milliseconds on low-end devices. The training process was highly efficient, requiring no more than a single workstation GPU.
Implementation and Integration
To convert raw swipe paths into word predictions, FUTO provides the swipe-library, a C++ library that handles inference, decoding, and dictionary-constrained beam search.
Licensing and Availability
- Models: Available under the FUTO Model License (requires attribution to end-users).
- Inference Library: Released under the GPL license.
- Dataset: Available under the MIT license on HuggingFace.