page-agent: a client-side GUI agent that enables natural language control of web interfaces via text-based DOM manipulation

page-agent: a client-side GUI agent that enables natural language control of web interfaces via text-based DOM manipulation

What it solves

Page Agent provides a way to integrate an AI copilot directly into a webpage, allowing users to control web interfaces using natural language. It eliminates the need for complex backend rewrites, browser extensions, or headless browsers for basic in-page automation, making it easier to build AI-driven user interfaces for SaaS products, ERPs, and accessibility tools.

How it works

Unlike many web agents that rely on screenshots and multimodal LLMs, Page Agent uses text-based DOM manipulation. It runs as a JavaScript library integrated directly into the webpage, which allows it to interact with the elements on the page. Users can bring their own LLMs via API, and the agent executes commands like "Click the login button" by translating natural language into actions on the DOM.

Who it’s for

  • SaaS Developers: Those wanting to add an AI copilot to their product with minimal code.
  • Enterprise Software Users: People using complex admin systems, CRM, or ERP software who want to simplify multi-click workflows into single sentences.
  • Accessibility Specialists: Developers creating tools to make web apps more accessible via voice commands or screen readers.
  • Web Agent Developers: Those building multi-page agents that can extend their reach across browser tabs using an optional Chrome extension.

Highlights

  • Client-side integration: Works via a simple script tag or NPM package without requiring a headless browser or Python.
  • Text-based interaction: Operates on the DOM rather than relying on visual screenshots.
  • LLM Agnostic: Supports bringing your own LLM provider.
  • Extended Capabilities: Offers an optional Chrome extension for multi-page tasks and an MCP Server (Beta) for external control.

Sources