FUTO Swipe: Open-Source Swipe Typing Models and Dataset

FUTO Swipe: Open-Source Swipe Typing Models and Dataset

FUTO Swipe is a family of open-source models and algorithms designed to provide high-accuracy swipe typing without requiring privacy-invasive keyboard apps. By combining a layout-agnostic encoder, a language-specific decoder, and a small language model for context, FUTO Swipe achieves a top-4 fail rate of approximately 4% on its test set, matching the performance of major proprietary keyboard solutions.

Model Architecture and Components

FUTO Swipe utilizes a three-tiered model architecture to balance universal applicability with high precision:

  • Encoder Model: A universal, layout-agnostic, and language-agnostic model used for general swipe typing predictions. While versatile, it does not provide the highest level of accuracy.
  • Decoder Model: A language-specific and layout-specific model that accounts for the peculiarities of a specific keyboard layout. Currently, a QWERTY English decoder is available, which provides the highest accuracy.
  • ContextLM: A small language model trained on text data for a single language. It filters out nonsensical word predictions based on the preceding words in a sentence to improve overall quality.

When utilizing all three models with a beam width of 300, the system achieves an error rate below 1% (excluding out-of-vocabulary cases).

Dataset and Training

To train and evaluate these models, FUTO collected a dataset of over 1 million QWERTY English swipes. This data was gathered in August 2024 via voluntary user contributions on the swipe.futo.org domain, where participants swiped Wikipedia sentences word-by-word.

In March 2025, FUTO released this 1-million-swipe dataset under the MIT license on HuggingFace, providing a public resource for the development of swipe typing systems.

Performance and Resource Footprint

FUTO Swipe is designed for on-device execution with minimal latency and low hardware requirements:

  • Parameter Count: The system consists of 1,364,271 active parameters and 2,494,767 total parameters. The ContextLM is the largest component at 1.5 million parameters (including 1.1 million for embeddings).
  • Hardware Efficiency: The models are small enough to run in milliseconds on low-end devices. The training process was highly efficient, requiring no more than a single workstation GPU.

Implementation and Integration

To convert raw swipe paths into word predictions, FUTO provides the swipe-library, a C++ library that handles inference, decoding, and dictionary-constrained beam search.

Licensing and Availability

  • Models: Available under the FUTO Model License (requires attribution to end-users).
  • Inference Library: Released under the GPL license.
  • Dataset: Available under the MIT license on HuggingFace.

Sources