Cross Canon: Implementing the Bible as a RAG Database
Cross Canon: Implementing the Bible as a RAG Database
Cross Canon enables semantic scripture search via RAG
Cross Canon is a technical implementation that treats the Bible as a Retrieval-Augmented Generation (RAG) database, allowing users to perform semantic searches across biblical texts. Unlike traditional keyword search, this approach enables the retrieval of passages based on conceptual meaning, such as finding references to "government" that include both theological discussions in Romans 13 and historical decrees in Daniel and Ezra.
Core Functionality and User Experience
Cross Canon provides a specialized interface for querying indexed scripture. The system allows users to:
- Filter by Book: Users can specify a particular book of the Bible (e.g., Genesis, Matthew, Revelation) to narrow the search scope or leave the field blank to search the entire indexed corpus.
- Semantic Retrieval: The engine identifies matches based on the meaning of the query rather than exact word matches, which users have noted is effective for uncovering less obvious references, such as finding various "giants" beyond the well-known Nephilim and Goliath.
Technical Considerations and Community Feedback
Community discussion surrounding the project highlights several technical opportunities and limitations inherent in applying RAG to religious texts:
Canon Scope and Inclusivity
Users have suggested that for a comprehensive implementation, the database should include various versions of the biblical canon. Specifically, recommendations include adding the Deuterocanonical books and expanding the index to include Ethiopian, Catholic, and Orthodox canons to provide a more complete scholarly tool.
Performance and Implementation
Initial user feedback indicates that while the semantic results are conceptually accurate, the system can be slow. Technical contributors in the community have suggested several optimizations for RAG pipelines, including:
- Embedding Models: The use of GTR-T5 for generating fast and free embeddings.
- Hybrid Retrieval: Implementing hybrid modes to improve speed and accuracy.
- Entity Extraction: Using small local models to extract entities for graph-based retrieval.
Comparative Religious RAG Projects
Cross Canon is part of a broader trend of applying RAG to sacred texts. Similar projects mentioned by the community include:
- Reminder.dev: An open-source RAG implementation for the Quran, which also indexes the Hadith and the names of Allah using OpenAI embeddings.
- Crazy.church: A project using Cloudflare Vectorize for embeddings to compare verses across the "big three" religions.