sem: a semantic version control tool that tracks code changes at the entity level instead of lines
sem: a semantic version control tool that tracks code changes at the entity level instead of lines
What it solves
Traditional version control systems like Git track changes by lines of text, which often obscures the actual logic changes. sem provides semantic version control by tracking changes at the entity level (functions, classes, and methods) rather than lines, making it easier to understand what actually changed in the code.
How it works
sem operates as a layer on top of Git. It uses tree-sitter to parse code across 32 programming languages and structured data formats, extracting entities as distinct objects. It employs a three-phase matching process—exact ID match, structural hashing (ignoring whitespace and comments), and fuzzy similarity—to detect renames and moves of logic, rather than just additions and deletions.
Who it’s for
- Developers who want more meaningful diffs and impact analysis of their changes.
- AI Coding Agents that need precise, token-efficient context (via the Model Context Protocol) to understand dependencies and refactor code without reading entire files.
- CI/CD Pipelines that require structured JSON output for automated code analysis.
Highlights
- Entity-level Diffing: See exactly which functions or classes were modified, renamed, or moved.
- Impact Analysis: A cross-file dependency graph that identifies what will break if a specific entity is changed.
- LLM Context Optimization: A
contextcommand that provides token-budgeted snippets of an entity and its dependencies for AI agents. - MCP Server: Built-in support for the Model Context Protocol, allowing AI agents to use
semtools as native skills. - Wide Language Support: Full entity extraction for 32 languages, including TypeScript, Python, Rust, Go, and Java.
- Git Integration: Can be configured as the default
git diffoutput viasem setup.
Sources
- undefinedAtaraxy-Labs/sem