xlstm: a recurrent neural network architecture that extends LSTM to compete with Transformers in language modeling

What it solves

xLSTM (Extended Long Short-Term Memory) is designed to overcome the limitations of the original LSTM architecture to compete with Transformers and State Space Models (SSMs) in language modeling. It aims to provide a recurrent neural network (RNN) alternative that is efficient for both training and inference, particularly for large-scale language models.

How it works

The architecture introduces two primary enhancements over the original LSTM:

Exponential Gating: Uses normalization and stabilization techniques to improve how the network manages information flow.
Matrix Memory: A new memory structure that allows the network to better handle complex data patterns.

The project provides two main components: xLSTMBlockStack (a backbone for general applications) and xLSTMLMModel (a wrapper for token-based language modeling). It also includes a specialized xLSTMLarge architecture optimized for training throughput and stability, which has been used to train a 7B parameter model on 2.3 trillion tokens.

Who it’s for

AI Researchers: Those looking for alternatives to the Transformer architecture for sequence modeling.
ML Engineers: Developers implementing recurrent models for fast and efficient inference.
Developers: Users wanting to integrate an xLSTM backbone into existing projects as an alternative to Transformer blocks.

Highlights

7B Parameter Model: A large-scale recurrent LLM trained on 2.3T tokens.
Optimized Kernels: Support for Triton and CUDA kernels via the mlstm_kernels package for high-performance execution on NVIDIA and AMD GPUs.
Flexible Architecture: Supports both sLSTM and mLSTM blocks, which can be combined to balance state-tracking and memorization capabilities.
Hardware Compatibility: Native PyTorch implementations available for platforms like Apple Metal (with a community port for MLX).

xlstm: a recurrent neural network architecture that extends LSTM to compete with Transformers in language modeling

xlstm: a recurrent neural network architecture that extends LSTM to compete with Transformers in language modeling

What it solves

How it works

Who it’s for

Highlights

Sources