Generative-Media-Skills: a multimodal toolset for AI agents to generate and edit professional-grade media via a schema-driven architecture
Generative-Media-Skills: a multimodal toolset for AI agents to generate and edit professional-grade media via a schema-driven architecture
What it solves
This project provides a comprehensive toolset for AI agents (such as Claude Code, Cursor, and Gemini CLI) to generate and edit professional-grade images, videos, and audio. It bridges the gap between high-level creative intent and the technical API calls required to produce high-quality multimodal media using a wide array of AI models.
How it works
The system is built on a Core/Library architecture powered by the muapi-cli:
- Core Primitives: Thin wrappers around the CLI for raw API access, handling file uploads, basic editing, and authentication.
- Expert Library: Domain-specific skills (e.g., Cinema Director, UI Designer, Logo Creator) that translate creative goals into technical directives.
- Recipe Pack: Over 40 LLM-orchestrated workflow recipes (e.g., converting a photo to a 3D action figure or creating a cinematic product ad) that agents can follow as step-by-step instructions.
- MCP Server: A Model Context Protocol server that exposes 19 structured tools directly to compatible agents, removing the need for shell scripts.
Who it’s for
Developers and AI agent users who want to integrate professional multimodal generation capabilities into their agentic workflows, specifically those using MCP-compatible tools like Claude Desktop or Cursor.
Highlights
- Agent-Native Design: Uses structured JSON outputs and semantic exit codes for seamless pipeline integration.
- Extensive Model Support: Access to 100+ models including Midjourney v7, Flux, Kling 3.0, and Veo3.
- Direct Media Display: Includes a
--viewflag to automatically open generated media in the system viewer. - Specialized Workflows: Dedicated pipelines for AI clipping (long video to vertical shorts), fashion try-on, and architectural rendering.
Sources
- undefinedSamurAIGPT/Generative-Media-Skills