mobilerun: an open-source framework for controlling Android and iOS devices with LLM agents

What it solves

Mobilerun provides a way to control Android and iOS devices using natural language commands. It eliminates the need for manual scripting of mobile interactions by allowing LLM agents to navigate apps, perform multi-step workflows, and extract data from mobile interfaces.

How it works

The framework uses a "Portal" app installed on the device to bridge the gap between the LLM and the mobile OS. It combines accessibility trees (UI state) and screenshots for visual understanding, allowing agents to tap, swipe, and type. Users can interact with the system via a CLI, Python API, or Docker, and can choose from various LLM providers (such as OpenAI, Anthropic, and Gemini). For complex tasks, it features a "reasoning mode" that employs a manager-executor planning architecture.

Who it’s for

QA Engineers: For mobile app testing and regression checks.
Developers: To build custom mobile automation workflows via Python.
Automation Enthusiasts: To automate repetitive mobile tasks or extract data from native apps.
Non-technical Users: To execute guided mobile workflows through simple prompts.

Highlights

Cross-Platform: Supports both Android and iOS devices.
Multimodal Input: Combines accessibility trees with vision-based screenshot analysis.
Flexible Execution: Offers a CLI for quick tasks and a Python API for deeper integration.
Reasoning Mode: Enables complex, multi-step planning for sophisticated automations.
Broad Model Support: Compatible with OpenAI, Anthropic, Gemini, Ollama, DeepSeek, and OpenRouter.

mobilerun: an open-source framework for controlling Android and iOS devices with LLM agents

mobilerun: an open-source framework for controlling Android and iOS devices with LLM agents

What it solves

How it works

Who it’s for

Highlights

Sources