Architecture¶
The pipeline¶
A verifiable_rag.Pipeline is a strict left-to-right composition of pluggable components:
flowchart LR
PDF([PDF / DOCX]) --> Parser
Parser --> Document([Document<br/>spans preserved])
Document --> Chunker
Chunker --> Embedder
Embedder --> Indexer
Query([Query]) --> Retriever
Indexer --> Retriever
Retriever --> Reranker
Reranker --> Generator
Generator --> Verifier
Verifier --> Abstention
Abstention --> Answer([Answer<br/>cited + verified])
Every step is a Protocol — swap any component by passing a different implementation:
from verifiable_rag import Pipeline
from verifiable_rag.parsers import DoclingParser
from verifiable_rag.chunkers import ParentChildChunker
from verifiable_rag.embedders import CohereEmbedder
from verifiable_rag.indexers import HybridIndex, LanceDBIndex, BM25Index
from verifiable_rag.rerankers import CohereReranker
from verifiable_rag.generators import ConstrainedCitedGenerator
from verifiable_rag.verifiers import DualNLIVerifier, HHEMVerifier, MiniCheckVerifier
pipeline = Pipeline(
parser=DoclingParser(),
chunker=ParentChildChunker(max_child_tokens=400),
embedder=CohereEmbedder(),
indexer=HybridIndex(dense=LanceDBIndex(uri="my_idx"), sparse=BM25Index()),
reranker=CohereReranker(),
generator=ConstrainedCitedGenerator(model="anthropic/claude-haiku-4-5"),
verifier=DualNLIVerifier(HHEMVerifier(), MiniCheckVerifier()),
strictness="balanced",
top_k_retrieve=100,
top_k_rerank=10,
)
For most use cases you'll never write this — call hybrid_balanced() or Pipeline.from_yaml() instead.
The data model¶
The library has one rigid hierarchy that every component honors:
Document
├── Section(id, title)
│ ├── Paragraph(id)
│ │ ├── Sentence(id, text, span)
│ │ └── Sentence(id, text, span)
│ └── Paragraph(id)
└── Section(id, title)
└── ...
Every Sentence has:
- A stable
id(e.g."paper::s142") - The
textcontent - A
Spanwith(doc_id, char_start, char_end, optional bboxes)
Sentences are the atomic unit of citation. Every generated CitedSentence references back to Sentence.id values from the source Document, and from there to exact character offsets in the parsed text.
Span preservation invariant¶
The core rule that everything else builds on:
If at any point a generated sentence in the answer cannot be traced back to
(doc_id, page, char_start, char_end)in the source, the system is broken.
Every parser, chunker, retriever, and rerank step must preserve character offsets. This is enforced by round-trip tests in CI: parse a document, walk through every transformation, and verify the final sentence_by_id(sid).text exactly matches the source span content.
This invariant is what makes the library's auditability claim real. The HTML audit report turns every cite ID into an anchor link to the source passage — that link works because the spans never lost their offsets.
Citation granularity ≠ chunking granularity¶
A common confusion:
| Question | Answer |
|---|---|
| What's the retrieval atom? | A Chunk (~400 tokens by default) |
| What's the citation atom? | A Sentence (one human sentence) |
| Why decoupled? | Retrieval works better at chunk granularity; citations are more useful at sentence granularity. |
When the chunker emits a chunk, it carries sentence_ids — the list of every sentence inside that chunk. The retriever returns chunks; the generator emits cited sentences referencing the source sentences inside those chunks; the verifier checks each generated sentence against the cited source sentences (not the full chunk).
You can chunk at 512 tokens for retrieval and still emit sentence-level citations. The IDs travel through.
Parent-child chunking¶
The default chunker (ParentChildChunker) produces small child chunks (~400 tokens) for retrieval precision, but stamps section_id and paragraph_id metadata on each so a parent context can be expanded at generation time:
chunk.metadata = {
"section_id": "sec_3",
"paragraph_id": "sec_3::p1",
"paragraph_ids": ("sec_3::p1",),
"page_first": 4,
"page_last": 4,
# ...
}
The ParentExpander consumes that metadata to compute the larger context the LLM should actually see — typically the full section or a window of nearby paragraphs.
End-to-end:
- Retrieve with small precise chunks → tight semantic match
- Generate with expanded parent context → LLM has surrounding text
- Cite by sentence_id → exact source location
Hard rules¶
These are tested in CI and called out explicitly because they're the contract the library makes:
- Span preservation is non-negotiable. Every transformation preserves character offsets. Round-trip tests verify this.
- Citation granularity ≠ chunking granularity. Don't let a "simplification" collapse
sentence_idsinto chunk-level metadata. - Faithfulness verification is mandatory in
strictandparanoidmodes. Astrict-mode answer that didn't actually run verification is a bug, not a feature. - No silent failures in the citation layer. If the generator can't find supporting sentences for a claim, the claim is flagged or stripped — never returned silently as if cited.
- Eval before optimization. Don't tune thresholds, swap embedders, or "improve" anything without running the benchmark suite first.
These rules are why the library exists. They're also why it's the right choice for use cases where wrong answers are worse than no answer.