pxpipe: Reducing LLM Input Tokens by Rendering Text as Images
pxpipe: Reducing LLM Input Tokens by Rendering Text as Images
pxpipe is a local proxy designed to reduce the input token costs of Large Language Models (LLMs), specifically targeting Claude Code and Fable 5. By converting dense text context into images, pxpipe exploits a gap in how vision-capable models bill for tokens: image token costs are fixed by pixel dimensions rather than the amount of text contained within the image.
The Core Mechanism: Text-to-Image Token Arbitrage
pxpipe operates by intercepting /v1/messages requests and rewriting eligible bulk history and context into compact PNGs. The system identifies "token-dense" content—such as code, JSON, and tool outputs—where the character-to-token ratio is low (approximately 1 character per token). By rendering this text into a 1928x1928 pixel image, pxpipe can pack up to 92,000 characters into a single image that costs approximately 4,761 vision tokens.
This creates a significant token reduction: dense content packs approximately 3.1 characters per image-token compared to 1.0 characters per text-token. In a real-world example, 48,000 characters of system prompts and tool documentation (approximately 25,000 text tokens) were reduced to 2,700 image tokens.
Performance and Cost Savings
According to the project's benchmarks, the end-to-end cost reduction for Fable 5 is typically between 59% and 70%.
End-to-End Cost Analysis
- Total Bill Reduction: In a snapshot of 13,709 requests, the total bill was reduced from $100 to approximately $41.
- Workload Dependency: Savings are highest on token-dense content (code, JSON) and negative on sparse English prose, which is more efficient as text.
- Comparison: In a demo comparing plain Claude to pxpipe, session totals dropped from $42.21 (with 96% context window usage) to $6.06.
Task Quality and Accuracy
- SWE-bench Lite: 10/10 instances resolved on both plain text and pxpipe arms.
- SWE-bench Pro: 14/19 resolved with pxpipe ON, compared to 15/19 with pxpipe OFF. The authors state that the single difference was due to agentic variance rather than compression loss.
- Novel Arithmetic: Fable 5 achieved 100% accuracy on novel arithmetic problems using imaged context, matching the text-based baseline.
Limitations and Fidelity Risks
pxpipe is described as a "gist tier" tool rather than a lossless store. It is inherently lossy because verbatim recall of exact strings is unreliable.
The Verbatim Gap
- Silent Confabulation: The primary failure mode is not an error message, but a plausible but incorrect value (e.g., a wrong person's name or a slightly incorrect hex string).
- Hex Recall: In tests of 12-character hex strings, Opus 4.8 scored 0/15, and Fable 5 scored 13/15.
- Safe-guards: To avoid these risks, the authors recommend keeping IDs, hashes, and secrets as text. The tool provides an
options.keepSharp(block)feature to pin specific blocks as text.
Technical Implementation
pxpipe runs as a local proxy (via npx pxpipe-proxy) and can be pointed to by the Claude Code client. It provides a live dashboard for monitoring token savings and text-to-image conversions.
Compression Targets
pxpipe targets three specific types of input blocks:
- Large
tool_resultbodies: File reads, command outputs, and logs exceeding ~6k characters. - Older collapsed history: Older turns in the conversation are imaged, while recent turns remain text.
- Static system prompts and tool documentation: These are rendered as dense image pages.
Model Compatibility
- Fable 5: The primary target and 100/100 reader. Optimized for the default configuration.
- GPT-5.6: Supported, though tool definitions are kept in native JSON to ensure reliable tool-calling.
- Opus 4.8: Disabled by default due to a ~7% misread rate of rendered content.
Community Insights and Counterpoints
Community discussion on Hacker News emphasizes that this technique is essentially a "pricing hack" or a loophole in token accounting.
"This seems like a pricing hack that burns resources, that when the loophole gets closed the price of OCR will have to rise?"
Other users noted that similar techniques have been explored with OpenAI models in the past, but resulted in higher completion token costs and slower performance. Some also questioned whether this is an efficient use of information theory, arguing that it is a more a workaround for a model's pricing failure than a technical breakthrough in data representation.