Quickstart¶
The fastest path from pip install to a verified, cited answer.
1. Install¶
The [all] extra brings in everything (parser, embedder, index, reranker, verifiers). For lighter installs see the installation guide.
2. Set an API key¶
The default generator is Claude Haiku 4.5. You need an Anthropic API key:
To switch to OpenAI, Gemini, Groq, Ollama, etc., see Swap LLM provider.
3. Ask a question¶
The library ships with a small public-domain demo document (a 3-page overview of penicillin) so you can verify everything works without finding a PDF yourself:
import verifiable_rag
from verifiable_rag.demo import sample_paper_path
answer = verifiable_rag.ask(
"What is the mechanism of action of penicillin?",
docs=sample_paper_path(),
)
print(answer.text)
That's it. The ask() helper:
- Builds the default
hybrid_balancedpipeline (Cohere retrieval + Dual NLI + constrained Haiku generator) - Parses, chunks, embeds, and indexes the document
- Runs the query through retrieval → reranking → generation → verification
- Returns an
Answerobject
First-run model downloads
On the first call, the verifier downloads HHEM-2.1-open (~600 MB) and MiniCheck-Flan-T5-Large (~770 MB) from HuggingFace, cached to ~/.cache/huggingface/hub/. Subsequent calls reuse the cache and skip the download.
4. Use your own document¶
Swap the demo path for your own PDF:
Or a list of documents:
5. See the audit trail¶
The headline feature. Get a self-contained HTML page showing the answer with per-sentence verification color coding, faithfulness scores, and every reranked passage the generator saw:
verifiable_rag.ask(
"What is the mechanism of action of penicillin?",
docs=sample_paper_path(),
output_html="audit.html",
)
Open audit.html in any browser. Citations are anchored links into the passage list; unsupported sentences get a red dashed underline; faithfulness scores show at a glance.
For the programmatic version of the same data:
answer = verifiable_rag.ask("...", docs=...)
for sentence in answer.unsupported_sentences:
print(f"⚠ unsupported: {sentence.text}")
# Structured dump for logging / metrics emit:
import json
print(json.dumps(answer.audit_trail(), indent=2))
6. Pick a preset¶
The default hybrid_balanced preset uses Cohere for embedding and reranking (requires COHERE_API_KEY). For zero-API-key retrieval:
answer = verifiable_rag.ask(
"What is the mechanism of action of penicillin?",
docs=sample_paper_path(),
preset="local_minimal", # BGE embed, no reranker, no verifier
)
For stricter refusal behavior:
answer = verifiable_rag.ask(
"What did the authors prove?",
docs="paper.pdf",
preset="hybrid_strict", # refuses below faithfulness 0.7
)
if answer.was_refused:
print(f"Refused: {answer.refusal_reason}")
See the configuration concept page for the full preset list with cost-vs-quality tradeoffs.
7. Multi-question pattern¶
verifiable_rag.ask() is single-shot — it ingests on every call. For multiple questions over the same corpus, build a Pipeline directly so you only pay the ingest cost once:
from verifiable_rag import hybrid_balanced
pipeline = hybrid_balanced()
pipeline.ingest("paper.pdf")
a1 = pipeline.ask("What did the authors find?")
a2 = pipeline.ask("What methodology did they use?")
a3 = pipeline.ask("What are the limitations?")
Next steps¶
- Understand what's happening: Architecture, Citation flow, Verification
- Tune the verifier on your domain: Calibrate threshold
- Build a custom pipeline: YAML config, Configuration
- Integrate with your stack: Observability, Local-only setup