Models¶
The library's core data model. Every component honors this hierarchy.
Answer¶
Answer
dataclass
¶
Answer(query: str, sentences: list[CitedSentence], faithfulness_score: float, faithfulness_components: FaithfulnessComponents, unsupported_claims: list[str], retrieved_chunks: list[RetrievedChunk], verification_results: list[VerificationResult], strictness: Strictness = 'balanced', was_refused: bool = False, refusal_reason: str | None = None)
The complete output of a pipeline.ask() call.
supported_sentences
property
¶
Sentences whose verification said is_supported (or had no verifier).
A sentence with no matching VerificationResult counts as
supported — verification didn't run, so we don't penalize it.
Use answer.unsupported_sentences for the strict complement.
unsupported_sentences
property
¶
Sentences explicitly flagged is_supported=False by the verifier.
cited_sentence_ids
property
¶
Union of every source sentence_id cited across all sentences.
Useful for "which sentences from the corpus did this answer pull
from?" — without re-walking answer.sentences.
nli_scores
property
¶
Per-sentence NLI scores in verification_results order.
min_nli_score
property
¶
The worst-case sentence-level NLI score.
Returns 1.0 when no verifier ran — there's no evidence of unfaithfulness in the absence of a check.
verification_for
¶
Return the :class:VerificationResult for sentence_idx, or None.
Lets callers map a CitedSentence index back to its NLI check
without manually walking verification_results::
for i, sent in enumerate(answer.sentences):
vr = answer.verification_for(i)
if vr and not vr.is_supported:
log.warning(f"unsupported: {sent.text!r}")
Source code in src/verifiable_rag/models/answer.py
audit_trail
¶
Structured audit trail as a JSON-serializable dict.
Drop-in for observability stacks — emit this on every answer to track faithfulness over time, alert on unsupported claims, or slice metrics by upstream model. All fields are primitives.
Source code in src/verifiable_rag/models/answer.py
to_html
¶
Render the full audit-trail HTML report for this Answer.
Returns a self-contained HTML document string — inline CSS, no JavaScript, no external dependencies. Write it to a file and open in any browser::
Path("report.html").write_text(answer.to_html())
Shows the query, the answer with per-sentence verification color
coding, the faithfulness components, the per-sentence NLI scores,
and every reranked passage the generator saw. See
:func:verifiable_rag.report.to_html for details.
Source code in src/verifiable_rag/models/answer.py
CitedSentence¶
CitedSentence
dataclass
¶
One sentence of generated output, grounded in source sentence IDs.
supporting_sentence_ids references Sentence.id values in the source Document. An empty tuple means no citations — the verifier treats this as unsupported; the abstention layer decides whether to flag or refuse.
VerificationResult¶
VerificationResult
dataclass
¶
VerificationResult(cited_sentence_index: int, claim_text: str, is_supported: bool, nli_score: float, supporting_span: Span | None = None)
NLI-based faithfulness check for one CitedSentence.
FaithfulnessComponents¶
FaithfulnessComponents
dataclass
¶
FaithfulnessComponents(retrieval_score: float, nli_score: float, generation_logprob: float | None = None)
Decomposed faithfulness signal — exposed for auditability.
Document¶
Document
dataclass
¶
Document(doc_id: str, source_path: Path, sections: list[Section], page_breaks: list[int] = list(), full_text: str | None = None, metadata: dict[str, Any] = dict(), parser_name: str | None = None)
A parsed source document.
Spans throughout the pipeline use char offsets into Document.full_text (when supplied) so they can cross page boundaries. page_breaks maps those offsets back to page numbers via Document.page_for_offset().
page_breaks[i] is the char offset where page i begins. It must start at 0.
page_for_offset
¶
Return the page index (0-indexed) containing offset.
Source code in src/verifiable_rag/models/document.py
pages_for_span
¶
pages_for_span(span: Span) -> tuple[int, int]
Return (first_page, last_page) inclusive — the page range a span touches.
Source code in src/verifiable_rag/models/document.py
Section¶
Paragraph¶
Sentence¶
Sentence
dataclass
¶
Sentence(id: str, text: str, span: Span)
Atomic citable unit. Every sentence has a globally unique id and a Span.
Sentences may cross page boundaries — Span uses document-level char offsets. The page(s) a sentence touches are looked up via Document.pages_for_span().
Chunk¶
Chunk
dataclass
¶
Chunk(chunk_id: str, text: str, doc_id: str, sentence_ids: tuple[str, ...], span: Span, metadata: dict[str, Any] = dict())
Retrieval unit.
Chunks carry the sentence_ids of every source sentence they contain so the citation layer can map retrieved chunks back to exact spans. This is the mechanism that decouples chunking granularity from citation granularity.
RetrievedChunk¶
RetrievedChunk
dataclass
¶
RetrievedChunk(chunk: Chunk, score: float, retrieval_method: str)
A chunk returned from the index with its retrieval score.
Span¶
Span
dataclass
¶
Span(doc_id: str, char_start: int, char_end: int, bboxes: tuple[PageBBox, ...] = ())
Exact source location of a piece of text.
Offsets are character positions into the full document text, so spans naturally support text that crosses page boundaries. Use Document.page_for_offset()/pages_for_span() to recover page numbers when bboxes are not populated.
Invariant: every object in the pipeline that wraps text from a source document must carry a Span. Losing a Span is a bug.
pages
property
¶
Page numbers this span touches, derived from bboxes. Empty if unknown.
merge
classmethod
¶
Return a bounding Span covering all spans in the list.
bboxes from input spans are union-ed and re-sorted by (page, y0, x0). Duplicate (page, bbox) pairs are dropped.
Source code in src/verifiable_rag/models/span.py
BBox¶
BBox
dataclass
¶
Page-coordinate bounding box (PDF user-space units).