4 Downloads Updated 5 days ago
## Overview
Schematron-3B is an instruction-tuned, Llama-architecture model published as inference-net/Schematron-3B.
This repo provides GGUF builds for llama.cpp and packaged tags for Ollama.
llama (Llama 3.2 family)## Key Features
## Available Versions
Recommended default is Q4_K_M (best quality/size balance). Use IQ4_XS if you need a smaller download.
| Tag | Size | Approx RAM* | Description |
|—|—:|—:|—|
| IQ4_XS | ~1.8 GB | ~3–4 GB + KV cache | Smaller / faster |
| Q4_K_M | ~2.0 GB | ~3–4 GB + KV cache | Recommended |
*KV cache RAM depends heavily on your configured context window (num_ctx). See “System Requirements”.
## Quick Start
”`bash # Recommended ollama pull richardyoung/schematron-3b:Q4_K_M ollama run richardyoung/schematron-3b:Q4_K_M “Summarize this text in 5 bullets: …”
# Smaller ollama pull richardyoung/schematron-3b:iq4_xs ollama run richardyoung/schematron-3b:iq4_xs “Explain this error and propose a fix: …”
## Example Use Cases
### Long document Q&A
ollama run richardyoung/schematron-3b:Q4_K_M “Read this and answer questions:\n\n[paste doc here]”
### Planning and analysis
ollama run richardyoung/schematron-3b:Q4_K_M “Create a step-by-step plan to refactor this module:\n\n[paste code here]”
### Structured outputs
ollama run richardyoung/schematron-3b:Q4_K_M –format json “Return a JSON object with keys {title, summary, risks} for:\n\n[paste text here]”
## Prompt Format / Templates
This model is packaged with a Llama 3-style chat template (special tokens like <|start_header_id|> / <|eot_id|>). If you create your own Ollama Modelfile, use the templates in:
Suggested sampling (from GGUF metadata):
## System Requirements (Practical)
### Weights (minimum)
### Context window (the real memory driver)
KV cache memory grows roughly linearly with num_ctx. Even on a 3B model, very large contexts can require tens of GB of additional RAM. If you don’t need extreme context lengths, keep num_ctx modest (e.g. 8K–32K) for a much lower RAM footprint.
## License
This model is governed by the upstream licensing and terms of use.
The GGUF quantizations are derivative artifacts; you must comply with all upstream terms before redistribution or commercial use.
## Acknowledgments