OpenManus-RL: an RL-based agent tuning framework for enhancing LLM reasoning and decision-making

OpenManus-RL: an RL-based agent tuning framework for enhancing LLM reasoning and decision-making

What it solves

OpenManus-RL provides a framework for enhancing the reasoning and decision-making capabilities of LLM agents using reinforcement learning (RL). It aims to move beyond simple supervised fine-tuning by exploring how RL can optimize an agent's ability to plan, use tools, and recover from errors in complex environments.

How it works

The project integrates the verl RL framework to implement various training paradigms. It uses a combination of Supervised Fine-Tuning (SFT) for initialization and RL algorithms like PPO, DPO, and GRPO to refine agent behavior. To improve reasoning, it experiments with rollout strategies such as Tree-of-Thoughts (ToT), Graph-of-Thoughts (GoT), and Monte Carlo Tree Search (MCTS). The system is grounded in a large combined dataset of agent trajectories across six domains (OS, DB, Web, KG, Household, E-commerce) and is evaluated on benchmarks like GAIA, AgentBench, and WebShop.

Who it’s for

This framework is designed for AI researchers and developers working on autonomous agents, specifically those looking to integrate RL-based tuning to improve the reasoning-action chains of LLMs.

Highlights

  • Comprehensive RL Toolkit: Supports PPO, DPO, and GRPO with both format-based and outcome-based rewards.
  • Diverse Reasoning Strategies: Implements advanced rollout techniques like MCTS and Depth-First Search Decision Trees (DFSDT).
  • Large-scale Trajectory Dataset: Includes a combined dataset of over 50,000 agent trajectories using the ReAct framework.
  • Environment Integration: Built-in support for agent environments like ALFWorld and WebShop for online RL tuning.

Sources