6 Downloads Updated 5 days ago
Schematron-8B is an instruction-tuned, Llama-architecture model published as inference-net/Schematron-8B.
This repo provides GGUF builds for llama.cpp and packaged tags for Ollama.
llama (Llama 3.1 family)Recommended default is Q4_K_M (best quality/size balance). Use IQ4_XS if you need a smaller download.
| Tag | Size | Approx RAM* | Description |
|---|---|---|---|
IQ4_XS |
~4.5 GB | ~7–10 GB + KV cache | Smaller / faster |
Q4_K_M |
~4.9 GB | ~7–10 GB + KV cache | Recommended |
*KV cache RAM depends heavily on your configured context window (num_ctx). See “System Requirements”.
# Recommended
ollama pull richardyoung/schematron-8b:Q4_K_M
ollama run richardyoung/schematron-8b:Q4_K_M "Summarize this text in 5 bullets: ..."
# Smaller
ollama pull richardyoung/schematron-8b:iq4_xs
ollama run richardyoung/schematron-8b:iq4_xs "Explain this error and propose a fix: ..."
ollama run richardyoung/schematron-8b:Q4_K_M "Read this and answer questions:\n\n[paste doc here]"
ollama run richardyoung/schematron-8b:Q4_K_M "Create a step-by-step plan to refactor this module:\n\n[paste code here]"
ollama run richardyoung/schematron-8b:Q4_K_M --format json "Return a JSON object with keys {title, summary, risks} for:\n\n[paste text here]"
This model is packaged with a Llama 3-style chat template (special tokens like <|start_header_id|> / <|eot_id|>).
If you create your own Ollama Modelfile, use the templates in:
modelfiles/Schematron-8B-Q4_K_M.Modelfilemodelfiles/Schematron-8B-IQ4_XS.ModelfileSuggested sampling (from GGUF metadata):
temperature: 0.6top_p: 0.9Q4_K_M / IQ4_XS need ~5GB of storage for weights, and typically ~8GB+ RAM once you include runtime overhead.KV cache memory grows roughly linearly with num_ctx. Very large contexts can require tens of GB of additional RAM.
If you don’t need extreme context lengths, keep num_ctx modest (e.g. 8K–32K) for a much lower RAM footprint.
This model is governed by the upstream licensing and terms of use.
The GGUF quantizations are derivative artifacts; you must comply with all upstream terms before redistribution or commercial use.