Dev

A Tool That Reduces AI Agent Output by 65% with "Caveman Speak"

"Caveman," developed by Julius Brussee, converts AI coding agent responses into "caveman speak," cutting output tokens by about 65%. It supports over 30 agents, including Claude Code and Codex.

4 min read Reviewed & edited by the SINGULISM Editorial Team

A Tool That Reduces AI Agent Output by 65% with "Caveman Speak"
Photo by Michael Förtsch on Unsplash

Julius Brussee has developed a tool called “Caveman” that reduces the number of output tokens generated by AI coding agents by approximately 65% by converting their responses into “caveman speak.” The tool, available on GitHub, is compatible with over 30 agents, including Claude Code, Codex, Gemini, Cursor, Windsurf, Cline, and Copilot.

The repository’s description features the phrase, “why use many token when few do trick.” While the concept may seem humorous, the tool is functional and practical. Its core idea is to eliminate the verbose parts of explanations generated by AI agents, providing only technically accurate information like code, commands, and error messages in concise language.

How It Works

Caveman is installed on each agent as a plugin, extension, rule file, or npx skill. A one-liner command is provided for installation, which can be executed using the curl command for macOS, Linux, WSL, and Git Bash, or using a script for Windows PowerShell. The installation process takes about 30 seconds and requires Node.js version 18 or later. If an agent is not supported, it is automatically skipped during installation, and the process can be safely re-executed.

The tool can be activated by typing /caveman or requesting “talk like caveman.” To deactivate it, users can say “normal mode.” For agents like Claude Code, Codex, and Gemini, Caveman is enabled automatically from the first message. Six levels of “caveman speak” are available, which can be toggled using /caveman <level>. However, detailed descriptions of these levels are not yet publicly available.

Token Reduction Examples

The repository provides examples comparing responses from a regular AI agent to those after applying Caveman.

For instance, when asked about React component re-rendering, a typical AI agent uses 69 tokens to deliver a verbose explanation: “The reason your React component is re-rendering is likely because…”. In contrast, an agent with Caveman enabled condenses the response to just 19 tokens: “New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo.”

In another example about token expiry checking in authentication middleware, a regular agent prefaces its explanation with, “Sure! I’d be happy to help you with that…” and follows with a detailed response. Caveman, however, responds tersely: “Bug in auth middleware. Token expiry check use < not <=. Fix:”. In both cases, no technical information is lost, and the key details required for code correction are preserved.

Benchmark results provided in the repository include the following:

  • Output token reduction rate: 65%
  • Input token reduction rate: 0%
  • Technical accuracy: 100%
  • Style: OOG (Primitive “caveman speak” vibe)

It is important to note that input tokens are not reduced. Caveman only impacts the text output by the agent, while the prompts or code interpreted by the agent remain unchanged. This mechanism allows for potential reductions in API costs without compromising response quality.

Technical Significance and Limitations

Reducing output tokens directly cuts costs in API billing models based on output token usage. This can be especially beneficial in development environments that require extensive code reviews or continuous interaction with AI agents. However, the effectiveness of cost reduction depends on the specific usage scenario, as some large language models also charge for input tokens.

Caveman is designed solely to modify the style of output text, with no impact on the inference capabilities or code generation quality of the agents. Code, commands, and error messages remain byte-for-byte accurate. While the tool may enhance productivity for developers who prefer concise responses, it may not be suitable for scenarios requiring detailed explanations or educational contexts.

Editorial Opinion

In the short term, Caveman could serve as a practical cost-saving measure for development teams that frequently use AI agents. Projects consuming millions of tokens monthly could save significant expenses—potentially in the tens of thousands—by adopting this tool. However, overly concise responses could make it harder for reviewers or subsequent developers to understand the content, posing a risk to project clarity. Teams may need to establish uniform style guidelines or toggle Caveman on and off depending on the context.

In the long term, this tool represents a tangible step towards empowering users to control the output style of AI agents. While current large language models default to verbose, detailed responses, many development scenarios demand only the essentials. Tools like Caveman could pave the way for a new market focused on customizing AI communication styles. Nevertheless, excessive simplification of outputs may hinder error interpretation and debugging, a factor that cannot be ignored. The editorial team believes that the proliferation of token-reduction tools could spark a broader discussion about redefining what constitutes “quality responses” from AI agents.

References

Frequently Asked Questions

Does Caveman work with all AI agents?
Currently, it supports over 30 agents, including Claude Code, Codex, Gemini, Cursor, Windsurf, Cline, and Copilot. The installation script automatically detects supported agents on your system and skips unsupported ones.
Does this tool really reduce output tokens by 65%?
Benchmarks demonstrate a 65% reduction, though the actual rate may vary depending on the type of question and the agent's response style. Higher reduction rates are expected for queries with verbose explanations. The accuracy of technical information is maintained at 100%.
Will using this tool lead to incomplete code responses?
Code, commands, and error messages are preserved at the byte level. The tool only modifies the style of explanatory text. However, detailed explanations may be omitted, making it less suitable for educational purposes.
What are the prerequisites for installation?
Node.js version 18 or later is required. The installation script works on macOS, Linux, WSL, Git Bash, and Windows PowerShell 5.1 or later. The process takes approximately 30 seconds, with automatic configuration for each agent.
Can I revert to the original response style?
Yes, you can deactivate Caveman by saying "normal mode" or using the agent-specific command. For agents like Claude Code, Codex, and Gemini, it is enabled automatically at the start but can be disabled anytime. ## References - [JuliusBrussee/caveman](https://github.com/JuliusBrussee/caveman) — Published on 2026-07-04
Source: GitHub Trending

Comments

← Back to Home