maxtext: a high-performance JAX reference implementation for scalable LLM pre-training and post-training
maxtext: a high-performance JAX reference implementation for scalable LLM pre-training and post-training
What it solves
MaxText is designed to provide a high-performance, scalable reference implementation for training Large Language Models (LLMs). It addresses the challenge of achieving high Model FLOPs Utilization (MFU) and throughput (tokens/second) across hardware clusters ranging from a single host to tens of thousands of chips without requiring complex manual optimization.
How it works
Written in pure Python and JAX, MaxText targets Google Cloud TPUs and GPUs. It leverages the JAX AI stack, including Flax for neural networks, Tunix for post-training, Orbax for checkpointing, Optax for optimization, and Grain for data loading. The library provides a set of high-performance model architectures (such as Gemma, Llama, DeepSeek, and Qwen) and supports both pre-training from scratch and scalable post-training using techniques like Supervised Fine-Tuning (SFT) and Reinforcement Learning (GRPO and GSPO).
Who it’s for
It is intended for AI researchers and production engineers building ambitious LLM projects who need a scalable, "optimization-free" framework to experiment with model design, pre-train large-scale models, or post-train existing open-source or proprietary models.
Highlights
- Broad Model Support: Includes reference implementations for Gemma 4, Llama 4, DeepSeek V3, Qwen 3.5, and Kimi K2, among others.
- High Scalability: Supports pre-training across tens of thousands of chips.
- Post-Training Framework: Integrated support for SFT and RL (GRPO/GSPO) via Tunix, with vLLM used for sampling in RL workflows.
- Hardware Optimization: Specifically optimized for Google Cloud TPUs and GPUs using the XLA compiler.
- Multimodal Capabilities: Supports multi-modal training for Gemma 3, Gemma 4, and Llama 4 VLMs.
Sources
- undefinedAI-Hypercomputer/maxtext