mobilerun: an open-source framework for controlling Android and iOS devices with LLM agents
mobilerun: an open-source framework for controlling Android and iOS devices with LLM agents
What it solves
Mobilerun provides a way to control Android and iOS devices using natural language commands. It eliminates the need for manual scripting of mobile interactions by allowing LLM agents to navigate apps, perform multi-step workflows, and extract data from mobile interfaces.
How it works
The framework uses a "Portal" app installed on the device to bridge the gap between the LLM and the mobile OS. It combines accessibility trees (UI state) and screenshots for visual understanding, allowing agents to tap, swipe, and type. Users can interact with the system via a CLI, Python API, or Docker, and can choose from various LLM providers (such as OpenAI, Anthropic, and Gemini). For complex tasks, it features a "reasoning mode" that employs a manager-executor planning architecture.
Who it’s for
- QA Engineers: For mobile app testing and regression checks.
- Developers: To build custom mobile automation workflows via Python.
- Automation Enthusiasts: To automate repetitive mobile tasks or extract data from native apps.
- Non-technical Users: To execute guided mobile workflows through simple prompts.
Highlights
- Cross-Platform: Supports both Android and iOS devices.
- Multimodal Input: Combines accessibility trees with vision-based screenshot analysis.
- Flexible Execution: Offers a CLI for quick tasks and a Python API for deeper integration.
- Reasoning Mode: Enables complex, multi-step planning for sophisticated automations.
- Broad Model Support: Compatible with OpenAI, Anthropic, Gemini, Ollama, DeepSeek, and OpenRouter.
Sources
- undefineddroidrun/mobilerun