Installation¶

Python version¶

verifiable-rag requires Python 3.11 or newer (3.11, 3.12, 3.13). The library is pure-Python; the heavy dependencies (torch, transformers, etc.) are opt-in extras.

Install everything¶

For the recommended hybrid_balanced preset and the audit HTML report:

pip install "verifiable-rag[all]"

The [all] extra resolves to all of the following optional extras combined.

Lighter installs¶

The core wheel is intentionally tiny — every heavy dependency is opt-in. Pick the extras that match what you're going to use:

# Just the core data model & helpers — no parsing, no embedding, no LLMs
pip install verifiable-rag

# Parsing
pip install "verifiable-rag[docling]"   # Docling — best PDF quality, slow, OCR-capable
pip install "verifiable-rag[pymupdf]"   # PyMuPDF — fast, text-only, no OCR

# Sentence segmentation
pip install "verifiable-rag[wtpsplit]"  # wtpsplit SaT — used by the default segmenter

# Embedding
pip install "verifiable-rag[bge]"       # BGE-small via sentence-transformers (local, free)
pip install "verifiable-rag[cohere]"    # Cohere embed-english-v3 (hosted)
pip install "verifiable-rag[voyage]"    # Voyage embeddings (hosted)

# Indexing
pip install "verifiable-rag[lancedb]"   # LanceDB dense store
pip install "verifiable-rag[bm25]"      # BM25 sparse store via bm25s

# LLM generators / judges
pip install "verifiable-rag[litellm]"   # LiteLLM — routes Anthropic, OpenAI, Gemini, Groq, Ollama, ...

# Verifiers
pip install "verifiable-rag[hhem]"      # HHEM-2.1-open NLI verifier (~600 MB at runtime)
pip install "verifiable-rag[minicheck]" # MiniCheck-Flan-T5-Large (~770 MB at runtime)

# YAML pipeline config
pip install "verifiable-rag[yaml]"      # Pipeline.from_yaml()

# Modal-hosted GPU verifiers (advanced)
pip install "verifiable-rag[modal]"

API keys¶

The default hybrid_balanced preset needs two:

Service	Used for	Env var	Get one
Anthropic	Generator (Claude Haiku 4.5)	`ANTHROPIC_API_KEY`	console.anthropic.com
Cohere	Embedding + reranking	`COHERE_API_KEY`	dashboard.cohere.com

For zero-API-key retrieval, use the local_minimal preset (BGE + PyMuPDF, generator still needs Anthropic) or swap in a fully-local LLM via Ollama. See Local-only setup.

First-run model downloads¶

Verifier model weights are not bundled in the wheel — they're downloaded lazily from HuggingFace Hub on first use, then cached forever in ~/.cache/huggingface/hub/.

Verifier	Model	Size
`HHEMVerifier`	`vectara/hallucination_evaluation_model`	~600 MB
`MiniCheckVerifier`	`lytang/MiniCheck-Flan-T5-Large`	~770 MB
`LLMJudgeVerifier`	(hosted API, no local model)	0

Standard transformers progress bar is shown during download.

Editable / development install¶

git clone https://github.com/firish/rag-rack.git
cd rag-rack
python -m venv .venv && source .venv/bin/activate
pip install -e ".[all,dev]"

The [dev] extra adds pytest, mypy, ruff, and pre-commit.

Run the smoke tests:

pytest -m smoke

Verify the install¶

python -c "import verifiable_rag; print(verifiable_rag.__version__)"

A 5-second sanity check that the bundled demo helpers work:

python -c "from verifiable_rag.demo import load_sample_document; \
           d = load_sample_document(); \
           print(f'{len(d.sections)} sections, {sum(len(p.sentences) for s in d.sections for p in s.paragraphs)} sentences')"

Expected output: 1 sections, 84 sentences.

Troubleshooting¶

ImportError: No module named 'sentence_transformers' / 'cohere' / 'transformers'

You haven't installed the extra that pulls in the missing dependency. Re-install with the right extra (see table above), or just use pip install "verifiable-rag[all]".

AuthenticationError from LiteLLM

The generator model isn't getting the right API key. Confirm with echo $ANTHROPIC_API_KEY (or whichever provider). If it's set but still failing, check the provider's dashboard for rate-limit or billing issues.

First call hangs for a long time

The verifier models are downloading from HuggingFace (~1.4 GB combined for the dual NLI default). This is one-time; subsequent runs use the cache. Set HF_HUB_VERBOSITY=info to see the progress bar.

Out of memory on Apple Silicon MPS

Long premises can blow up MPS memory on the verifier's attention layer. Either drop the batch size at the call site or set PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 (read the PyTorch MPS note first; this can cause system instability).