caveman: what it is, what problem it solves & why it's gaining traction

What it solves

Caveman is a plugin/skill for AI agents (such as Claude Code, Cursor, Gemini, and Copilot) that reduces the number of output tokens used in responses. It eliminates filler words and verbose phrasing while maintaining full technical accuracy, which leads to faster response times and lower API costs.

How it works

It functions as a set of instructions (a "skill") that tells the AI agent to drop filler, use fragments, and keep only the essential substance of the answer. It supports multiple compression levels—lite, full, ultra, and wenyan (classical Chinese)—and can be auto-activated via session flags or rule files. Additionally, it includes a tool called caveman-compress to rewrite memory files (like CLAUDE.md) into a compressed format to reduce input tokens for every session.

Who it’s for

Developers and users of AI coding agents who want to reduce token consumption, increase the speed of AI responses, and avoid verbose AI "chatter" without losing technical precision.

Highlights

Significant Token Reduction: Benchmarks show an average of 65% reduction in output tokens.
Multi-Agent Support: Compatible with 30+ agents including Claude Code, Codex, Gemini, Cursor, Windsurf, Cline, and Copilot.
Language Agnostic: Compresses the style of the response regardless of the language (e.g., Portuguese, Spanish, French).
Session Stats: Includes a /caveman-stats command to track real-time token usage and lifetime savings.
MCP Middleware: Provides caveman-shrink to compress tool descriptions for MCP servers.
Context Compression: caveman-compress reduces the size of project memory files to save input tokens.

caveman: what it is, what problem it solves & why it's gaining traction

caveman: what it is, what problem it solves & why it's gaining traction

What it solves

How it works

Who it’s for

Highlights

Sources