semble: what it is, what problem it solves & why it's gaining traction
semble: what it is, what problem it solves & why it's gaining traction
What it solves
Semble provides fast, token-efficient code search for AI agents. It solves the problem of agents needing to explore large codebases without consuming massive amounts of tokens by reading entire files or relying on imprecise grep-style searches. It allows agents to retrieve only the exact code snippets relevant to a natural-language query, reducing token usage by up to 98% compared to the traditional grep-and-read approach.
How it works
Semble uses a hybrid retrieval system that runs entirely on CPU without requiring GPUs or API keys. It processes codebases by splitting files into code-aware chunks using tree-sitter and then employs two complementary search methods:
- Semantic Search: Uses static
Model2Vecembeddings (via thepotion-code-16Mmodel) for semantic similarity. - Lexical Search: Uses
BM25for exact matches on identifiers and API names.
These results are fused using Reciprocal Rank Fusion (RRF) and then refined by a set of code-aware ranking signals, such as boosting definitions over references and penalizing noise (e.g., test files or legacy shims).
Who it’s for
Developers building or using AI coding agents (such as Claude Code, Cursor, Codex, or OpenCode) who want to provide their agents with instant, low-cost access to any local or remote repository.
Highlights
- Extreme Speed: Indexes repositories in ~250ms and answers queries in ~1.5ms on CPU.
- Token Efficiency: Returns only relevant chunks, significantly reducing the context window load for agents.
- Zero Setup: No API keys, GPUs, or external services required.
- MCP Server Support: Integrates as a Model Context Protocol (MCP) server, allowing agents to call search as a native tool.
- Flexible Indexing: Supports local paths and git URLs, with automatic cache invalidation based on file changes.
Sources
- undefinedMinishLab/semble