xlstm: a recurrent neural network architecture that extends LSTM to compete with Transformers in language modeling

xlstm: a recurrent neural network architecture that extends LSTM to compete with Transformers in language modeling

What it solves

xLSTM (Extended Long Short-Term Memory) is designed to overcome the limitations of the original LSTM architecture to compete with Transformers and State Space Models (SSMs) in language modeling. It aims to provide a recurrent neural network (RNN) alternative that is efficient for both training and inference, particularly for large-scale language models.

How it works

The architecture introduces two primary enhancements over the original LSTM:

  1. Exponential Gating: Uses normalization and stabilization techniques to improve how the network manages information flow.
  2. Matrix Memory: A new memory structure that allows the network to better handle complex data patterns.

The project provides two main components: xLSTMBlockStack (a backbone for general applications) and xLSTMLMModel (a wrapper for token-based language modeling). It also includes a specialized xLSTMLarge architecture optimized for training throughput and stability, which has been used to train a 7B parameter model on 2.3 trillion tokens.

Who it’s for

  • AI Researchers: Those looking for alternatives to the Transformer architecture for sequence modeling.
  • ML Engineers: Developers implementing recurrent models for fast and efficient inference.
  • Developers: Users wanting to integrate an xLSTM backbone into existing projects as an alternative to Transformer blocks.

Highlights

  • 7B Parameter Model: A large-scale recurrent LLM trained on 2.3T tokens.
  • Optimized Kernels: Support for Triton and CUDA kernels via the mlstm_kernels package for high-performance execution on NVIDIA and AMD GPUs.
  • Flexible Architecture: Supports both sLSTM and mLSTM blocks, which can be combined to balance state-tracking and memorization capabilities.
  • Hardware Compatibility: Native PyTorch implementations available for platforms like Apple Metal (with a community port for MLX).

Sources