AI coding agents have improved a lot at generating code, refactoring and analysing projects, but they still face a very basic problem: finding useful context inside large repositories. Reading file by file, running repeated grep searches or loading code fragments without a structural view works in small projects. In a monorepo, a legacy codebase or a system with thousands of files, that method starts to break down.
codebase-memory-mcp tries to solve that problem with an idea that is becoming increasingly common in AI-assisted development tools: turning the repository into a persistent knowledge graph. Instead of treating code as a pile of text, the project analyses it with Tree-sitter, extracts symbols, relationships, calls, HTTP routes, classes, functions, packages and links between services, and exposes that information through an MCP server so agents such as Claude Code, Codex CLI, Gemini CLI, Zed, OpenCode, Aider or VS Code can query it.
According to its maintainers, the result is a code intelligence backend capable of indexing the Linux kernel, with 28 million lines of code and 75,000 files, in around three minutes, answering structural queries in under one millisecond and sharply reducing token consumption compared with traditional file-by-file exploration. The figure is striking, but speed is not the only important point. The real shift is the model itself: moving from text search to relationship queries.
From text-based RAG to a code graph
Many AI and development tools have relied on approaches close to traditional RAG: splitting files into chunks, generating embeddings, searching for semantically similar fragments and passing them to the model. That approach can be useful for documentation, manuals or broad semantic questions, but it falls short when the question depends on structure.
Knowing “what calls this function”, “which endpoints depend on this service”, “what impact this change has” or “where dead code is located” is not just a text similarity problem. It is a relationship problem. A function can be far away from the file that invokes it. An HTTP route can link to an internal service. A class can inherit behaviour from another one. A small change can affect an entire module if it sits in a highly connected part of the graph.
codebase-memory-mcp proposes an architecture better suited to those questions. It first analyses the repository with Tree-sitter grammars. It then builds a persistent SQLite graph with nodes such as projects, packages, folders, files, modules, classes, functions, methods, interfaces, types, routes or resources. Edges represent relationships such as CALLS, IMPORTS, DEFINES, IMPLEMENTS, INHERITS, HTTP_CALLS, ASYNC_CALLS, EMITS, LISTENS_ON or DATA_FLOWS.
| Approach | How it works | Advantage | Main limitation |
|---|---|---|---|
| File-by-file search | The agent reads and searches through files one after another | Simple and compatible with any repository | Consumes many tokens and can miss relationships |
| Text-based RAG | Splits code into fragments and searches by semantic similarity | Useful for locating concepts or documentation | Does not always understand calls, dependencies or impact |
| Repo map | Summarises symbols and basic structure | Reduces context compared with reading everything | Can fall short in deeper analysis |
| Persistent AST graph | Indexes symbols and code relationships | Enables fast structural queries | Requires prior indexing and parsing quality |
The practical difference is clear. An agent asking about a function does not need to scan dozens of files to discover who calls it. It can query the graph. If it needs to analyse the impact of a change, it can combine Git diff information with relationships between symbols. If it wants to detect functions with no inbound calls, it can run a structural query over the graph instead of improvising a text search.
Performance: the Linux kernel as a scale test
The project uses the Linux kernel as one of its most visible benchmarks. According to the README, codebase-memory-mcp can index the full repository, with 28 million lines of code and 75,000 files, in three minutes on an Apple M3 Pro. In fast mode, the same benchmark drops to 1 minute and 12 seconds. A full index would generate 4.81 million nodes and 7.72 million edges, while fast indexing would produce 1.88 million nodes.
| Operation | Published result |
| Full Linux kernel index | 3 min |
| Repository size used in the test | 28 MLOC / 75,000 files |
| Nodes generated in full index | 4.81 million |
| Edges generated in full index | 7.72 million |
| Fast Linux kernel index | 1 min 12 s |
| Full Django index | ~6 s |
| Structural Cypher query | <1 ms |
| Regex name search | <10 ms |
| Dead code detection | ~150 ms |
| Call tracing up to depth 5 | <10 ms |
These figures should be read as project benchmarks, not as a universal performance guarantee. Real-world results will depend on hardware, repository size, programming languages, code structure, exclusion settings, storage and query type. Even so, they help explain the goal: the agent should not have to spend context reconstructing an architecture that can already be indexed.
Token efficiency is just as relevant. The README states that five structural queries consumed around 3,400 tokens using codebase-memory-mcp, compared with around 412,000 tokens through grep-based and file-reading exploration, a 99.2% reduction. The associated arXiv preprint, titled “Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP”, describes an evaluation across 31 real-world repositories in which the system achieved 83% answer quality compared with 92% for a file-exploration agent, while using ten times fewer tokens and 2.1 times fewer tool calls.
That nuance matters. This is not about claiming that the graph always answers better than reading files. The study itself suggests a trade-off: slightly lower average quality in some tasks, but much greater efficiency. For teams using AI agents daily on large repositories, that reduction in cost, latency and noise may be more valuable than squeezing every answer through full code reading.
A local binary for sensitive environments
Another strength of codebase-memory-mcp is its local-first approach. The project is distributed as a static binary for macOS, Linux and Windows, without Docker, runtime dependencies or API keys. Indexes are stored in SQLite under the user’s cache directory, and code processing happens on the local machine.
For companies working with private code, this matters. An MCP server of this kind reads the repository and writes configuration into agent tools, so it should not be run without prior review. Its maintainers stress that all code remains local, release binaries are published with SHA-256 checksums, signatures and antivirus scans, and the source code is available for auditing. Even so, in corporate environments it should go through the usual security review before installing any binary that interacts with repositories and agent configurations.
| Feature | Implication for technical teams |
| Static binary | Simple installation and fewer external dependencies |
| Local processing | Code does not need to leave the machine |
| Persistent SQLite | The graph survives restarts and new sessions |
| MCP | Integrates with compatible agents through structured tools |
| Optional UI | Allows graph exploration at localhost:9749 |
| Multi-agent support | Configures different AI clients with a single command |
| Shared artifact | Allows a compressed graph snapshot to be versioned in the repository |
The tool also includes an interesting option for teams: a compressed .codebase-memory/graph.db.zst artifact. The idea is that a repository can include a compact copy of the graph so other developers do not have to reindex from scratch. In large projects, this can save startup time, although each team must decide whether to version that file or keep indexes purely local.
What it gives the agent
codebase-memory-mcp does not include a language model. It is a structural analysis backend. The agent remains the component that interprets the user’s question and decides which tool to call. The difference is that, instead of asking the model to search blindly, the MCP server offers operations such as index_repository, search_graph, trace_path, detect_changes, query_graph, get_architecture, search_code, get_code_snippet or manage_adr.
| MCP tool | Typical use |
index_repository | Index a project into the graph |
search_graph | Search symbols by name, type, file or degree |
trace_path | See who calls a function and what that function calls |
detect_changes | Map Git changes to affected symbols |
query_graph | Run read-only Cypher-like queries |
get_architecture | Get a high-level view of the project |
search_code | Search text inside indexed files |
get_code_snippet | Retrieve the code for a specific symbol |
manage_adr | Manage persistent architecture decision records |
Beyond code analysis, the project indexes infrastructure elements such as Dockerfiles, Kubernetes manifests and Kustomize overlays. This makes it broader than a simple function search tool: it can relate application components, configuration and infrastructure inside the same graph.
Language support is also broad. The repository claims to integrate Tree-sitter grammars for 158 languages and a Hybrid LSP layer to improve semantic resolution in Python, TypeScript, JavaScript, PHP, C#, Go, C, C++, Java, Kotlin and Rust. The academic preprint describes an earlier version evaluated with 66 languages, indicating that the project has continued to expand coverage since publication.
Why this matters for AI-assisted development
The rise of AI coding agents has popularised a risky idea: assuming that giving a model access to a repository is enough for it to understand the codebase. In reality, context remains expensive, limited and fragile. An LLM can generate a convincing answer from incomplete fragments, but that does not mean it has understood the real dependencies of the system.
Graph-based tools do not remove that risk, but they reduce it in one specific area: structural retrieval. When the agent needs to understand relationships, routes, calls or impact, it can rely on data extracted from the code instead of reconstructing it through repeated reads. This does not replace tests, human review, traditional static analysis or production observability, but it improves the starting point.
codebase-memory-mcp fits into a wider trend: AI agents need external memory, specialised tools and structured context. Expanding context windows is not enough. In large repositories, putting more text into the prompt often increases both cost and noise. The alternative is for the agent to read less, but read better. A persistent code graph points exactly in that direction.
The project still needs to prove sustained adoption, quality across more languages, stability in large teams and real usefulness against established tools such as IDE indexers, LSIF, Sourcegraph, OpenGrok, Semgrep, CodeQL, ctags or repo maps used by assistants such as Aider. Even so, its proposal is very timely: bringing a layer of structural intelligence into the MCP workflow that agents can query locally and cheaply.
The technical conclusion is cautious but clear. For small repositories, a graph infrastructure may not be necessary. For large systems, monorepos, microservices or legacy codebases, forcing AI to read files as if it were a developer lost in a directory tree is starting to look inefficient. codebase-memory-mcp does not promise that the model will stop making mistakes, but it does give it a smarter way to look at the code before answering.
Frequently asked questions
What is codebase-memory-mcp?
It is a code intelligence MCP server that indexes repositories into a persistent graph so AI agents can query functions, classes, calls, routes, dependencies and changes in a structured way.
Does it replace a vector search engine or traditional RAG?
Not exactly. It can complement them, but its strength lies in structural code queries: calls, dependencies, change impact, HTTP routes and architecture.
Does it run locally?
Yes. The project is distributed as a static binary for macOS, Linux and Windows, uses SQLite for persistence and does not require API keys, Docker or external services to run.
Is it recommended for private repositories?
It may be interesting because it processes code locally, but any team should audit the binary, review the source code and validate its security policies before integrating it into internal workflows.
