caveman: what it is, what problem it solves & why it's gaining traction
caveman: what it is, what problem it solves & why it's gaining traction
What it solves
Caveman is a plugin/skill for AI agents (such as Claude Code, Cursor, Gemini, and Copilot) that reduces the number of output tokens used in responses. It eliminates filler words and verbose phrasing while maintaining full technical accuracy, which leads to faster response times and lower API costs.
How it works
It functions as a set of instructions (a "skill") that tells the AI agent to drop filler, use fragments, and keep only the essential substance of the answer. It supports multiple compression levels—lite, full, ultra, and wenyan (classical Chinese)—and can be auto-activated via session flags or rule files. Additionally, it includes a tool called caveman-compress to rewrite memory files (like CLAUDE.md) into a compressed format to reduce input tokens for every session.
Who it’s for
Developers and users of AI coding agents who want to reduce token consumption, increase the speed of AI responses, and avoid verbose AI "chatter" without losing technical precision.
Highlights
- Significant Token Reduction: Benchmarks show an average of 65% reduction in output tokens.
- Multi-Agent Support: Compatible with 30+ agents including Claude Code, Codex, Gemini, Cursor, Windsurf, Cline, and Copilot.
- Language Agnostic: Compresses the style of the response regardless of the language (e.g., Portuguese, Spanish, French).
- Session Stats: Includes a
/caveman-statscommand to track real-time token usage and lifetime savings. - MCP Middleware: Provides
caveman-shrinkto compress tool descriptions for MCP servers. - Context Compression:
caveman-compressreduces the size of project memory files to save input tokens.
Sources
- undefinedJuliusBrussee/caveman