xlstm: a recurrent neural network architecture that extends LSTM to compete with Transformers in language modeling
xlstm: a recurrent neural network architecture that extends LSTM to compete with Transformers in language modeling
What it solves
xLSTM (Extended Long Short-Term Memory) is designed to overcome the limitations of the original LSTM architecture to compete with Transformers and State Space Models (SSMs) in language modeling. It aims to provide a recurrent neural network (RNN) alternative that is efficient for both training and inference, particularly for large-scale language models.
How it works
The architecture introduces two primary enhancements over the original LSTM:
- Exponential Gating: Uses normalization and stabilization techniques to improve how the network manages information flow.
- Matrix Memory: A new memory structure that allows the network to better handle complex data patterns.
The project provides two main components: xLSTMBlockStack (a backbone for general applications) and xLSTMLMModel (a wrapper for token-based language modeling). It also includes a specialized xLSTMLarge architecture optimized for training throughput and stability, which has been used to train a 7B parameter model on 2.3 trillion tokens.
Who it’s for
- AI Researchers: Those looking for alternatives to the Transformer architecture for sequence modeling.
- ML Engineers: Developers implementing recurrent models for fast and efficient inference.
- Developers: Users wanting to integrate an xLSTM backbone into existing projects as an alternative to Transformer blocks.
Highlights
- 7B Parameter Model: A large-scale recurrent LLM trained on 2.3T tokens.
- Optimized Kernels: Support for Triton and CUDA kernels via the
mlstm_kernelspackage for high-performance execution on NVIDIA and AMD GPUs. - Flexible Architecture: Supports both sLSTM and mLSTM blocks, which can be combined to balance state-tracking and memorization capabilities.
- Hardware Compatibility: Native PyTorch implementations available for platforms like Apple Metal (with a community port for MLX).
Sources
- undefinedNX-AI/xlstm