AReaL: a scalable asynchronous RL infrastructure for training large-scale reasoning and agentic models

AReaL: a scalable asynchronous RL infrastructure for training large-scale reasoning and agentic models

What it solves

AReaL is a reinforcement learning (RL) infrastructure designed to bridge the gap between foundation model training and the creation of complex, agentic AI applications. It addresses the challenges of efficiency and scalability when training large-scale reasoning models and AI agents, making the process more accessible and cost-effective for researchers and developers.

How it works

AReaL utilizes a fully asynchronous RL training paradigm, which allows for significantly faster training speeds compared to synchronous systems. It supports a wide array of RL algorithms (such as GRPO, PPO, and DPO) and integrates with multiple training backends (Megatron, PyTorch FSDP, and PyTorch Archon) and inference backends (vLLM and SGLang). It is designed to be modular, allowing developers to replace the base_url to customize agentic RL for black-box applications.

Who it’s for

This project is intended for AI researchers and developers building large-scale reasoning models, multi-turn agentic workflows, and specialized agents for tasks like mathematics, coding, search, and customer service.

Highlights

  • Asynchronous Training: Achieves industry-leading speed and stability through a fully asynchronous RL paradigm.
  • Huge Algorithm Support: Includes implementations of GRPO, GSPO, PPO, DAPO, LitePPO, REINFORCE++, RLOO, and more.
  • Broad Hardware and Backend Support: Compatible with NVIDIA GPUs and Huawei Ascend NPUs, supporting Megatron, FSDP, and SGLang/vLLM.
  • Agentic Flexibility: Seamlessly integrates with agentic frameworks and supports multi-turn tool calling and reward discounting.
  • Lightweight Version: Offers AReaL-lite for rapid prototyping and algorithm-first development.

Sources