How does codebase-memory-mcp work?

The tool analyzes entire repositories using tree-sitter AST parsing and hybrid LSP semantic type resolution, creating knowledge graphs consisting of functions, classes, and call chains. Queries can be executed on these graphs via MCP tools for fast, structured code searches. All processing is done locally, ensuring that no code is sent externally.

Are there any security concerns?

All processing is conducted locally, ensuring that no code is sent to external servers. Release binaries are signed, checksum-verified, and scanned using over 70 antivirus engines. The source code is also publicly available for auditing purposes. ## References - [codebase-memory-mcp - GitHub](https://github.com/DeusData/codebase-memory-mcp) — Published June 18, 2026 - [Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP (arXiv)](https://arxiv.org/abs/2603.27277) — Research Preprint

Dev

Codebase Memory MCP: High-Speed Engine for AI Agents

Introducing "codebase-memory-mcp," a tool that indexes entire codebases in milliseconds and responds to structured queries in under 1ms. It can analyze the Linux kernel (28M LOC) in just three minutes.

June 18, 2026 4 min read Reviewed & edited by the SINGULISM Editorial Team

Codebase Memory MCP: High-Speed Engine for AI Agents — Photo by Mohamed Nohassi on Unsplash

A code intelligence engine specifically designed for AI coding agents, “codebase-memory-mcp,” is gaining significant attention. This tool has surfaced on GitHub Trending and is rapidly gaining recognition in the developer community. Offered as a single static binary, it enables AI agents to execute queries related to code structures with lightning speed by fully indexing entire repositories.

Speed That Surpasses Traditional Methods

The standout feature of codebase-memory-mcp is its indexing speed. It can fully index a standard repository in milliseconds and complete the task for a massive codebase like the Linux kernel (approximately 28 million lines of code across 75,000 files) in just three minutes. Its response time for structured queries is less than one millisecond.

This performance is achieved through a RAM-first pipeline design. By combining LZ4 compression, in-memory SQLite, and Aho-Corasick pattern matching, the system frees up memory after indexing. As a result, the token consumption for five structural queries is reduced to approximately 3,400 tokens, a dramatic improvement compared to the previous figure of around 412,000 tokens—an efficiency boost of 120 times.

Support for 158 Languages and Hybrid Analysis

The tool’s high-quality code analysis is powered by a combination of tree-sitter AST parsing and hybrid LSP-based semantic type resolution. Tree-sitter supports 158 programming languages, with all grammars embedded directly into the binary, eliminating the need for additional installations and ensuring functionality across different environments.

Further, hybrid LSP semantic type resolution is applied to 13 major languages, including Python, TypeScript, JavaScript, JSX, TSX, PHP, C#, Go, C, C++, Java, Kotlin, and Rust. This enables the creation of persistent knowledge graphs consisting of functions, classes, call chains, HTTP routes, and cross-service links.

14 MCP Tools and Their Features

Through the Model Context Protocol (MCP), codebase-memory-mcp offers 14 tools to AI agents. These tools include features such as search, trace, architecture analysis, impact analysis, Cypher query, dead code detection, cross-service HTTP linking, and Architecture Decision Record (ADR) management, among others.

One particularly noteworthy feature is its support for infrastructure as code. It indexes Dockerfiles, Kubernetes manifests, and Kustomize overlays as graph nodes, maintaining their interrelations. Automatically generated Resource nodes for Kubernetes resource types, Module nodes for Kustomize overlays, and IMPORTS edges that connect them make cross-sectional analysis of application code and infrastructure configurations possible.

Zero Dependencies, Plug-and-Play

The tool is available as a single static binary for macOS (arm64/amd64), Linux (arm64/amd64), and Windows (amd64). It does not require Docker, runtime dependencies, or API keys. Users can get started with just three steps: download, execute, and install.

The installation command includes an auto-detection feature and supports 11 agents, including Claude Code, Codex CLI, Gemini CLI, Zed, OpenCode, Antigravity, Aider, KiloCode, VS Code, OpenClaw, and Kiro. It automatically configures MCP entries, instruction files, and pre-tool hooks for each agent.

Security and Transparency

The developer, DeusData, prioritizes security above all. All processing is conducted 100% locally, ensuring that no code ever leaves the user’s machine. Release binaries are signed, checksum-verified, and scanned using over 70 antivirus engines. Moreover, the source code is openly available, and users are encouraged to audit it beforehand.

Benchmark Results and Research Background

The design and benchmarks of codebase-memory-mcp are detailed in the preprint paper, “Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP” (arXiv:2603.27277). Evaluations conducted using 31 real-world repositories demonstrated 83% query accuracy, a tenfold reduction in token consumption compared to file-based exploration, and a 2.1-fold reduction in tool invocation frequency.

Editorial Opinion

In the short term, codebase-memory-mcp has the potential to significantly improve the practicality of AI coding agents. Previously, AI assistants required the loading of numerous files to comprehend an entire codebase. Now, with structured indexing through knowledge graphs, this process has become dramatically more efficient. Organizations managing large monorepos stand to benefit the most.

In the long term, this approach could fundamentally transform how AI agents understand code. The shift from sequential file-based exploration to graph-based instant searches not only enhances tool efficiency but could redefine the very way agents “understand” codebases. However, the reliance on static analysis via tree-sitter and LSP poses limitations, as dynamic behavior and runtime dependencies require alternative methods. Expanding applicability to proprietary codebases and integration with CI/CD pipelines will be crucial for broader adoption. The editorial team anticipates that the integration of AI agents with development environments through MCP will accelerate, starting with this tool.

References

GitHub Trending — Published June 18, 2026

Frequently Asked Questions

How does codebase-memory-mcp work?: The tool analyzes entire repositories using tree-sitter AST parsing and hybrid LSP semantic type resolution, creating knowledge graphs consisting of functions, classes, and call chains. Queries can be executed on these graphs via MCP tools for fast, structured code searches. All processing is done locally, ensuring that no code is sent externally.
Which AI agents are supported?: The tool supports 11 agents, including Claude Code, Codex CLI, Gemini CLI, Zed, OpenCode, Antigravity, Aider, KiloCode, VS Code, OpenClaw, and Kiro. It automatically configures settings for each agent during installation, eliminating the need for manual setup.
Are there any security concerns?: All processing is conducted locally, ensuring that no code is sent to external servers. Release binaries are signed, checksum-verified, and scanned using over 70 antivirus engines. The source code is also publicly available for auditing purposes. ## References - [codebase-memory-mcp - GitHub](https://github.com/DeusData/codebase-memory-mcp) — Published June 18, 2026 - [Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP (arXiv)](https://arxiv.org/abs/2603.27277) — Research Preprint

Source: GitHub Trending

Written by Yuka Suzuki

Edited & reviewed by Kenichiro Yamamoto

Last updated: June 17, 2026

If you find any factual errors or inaccuracies, we will promptly publish a correction. Please contact us via the contact form to request a correction.