15 yesterday

Granite Guardian 4.1 is a specialized safety and judging model from IBM Research that evaluates whether LLM prompts and responses meet specified harm criteria.

tools thinking 8b
ollama run granite4.1-guardian:8b-q5_K_S

Details

yesterday

eabbd076be52 · 5.8GB ·

granite
·
8.38B
·
Q5_K_S
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US
{{- /* ------ MESSAGE PARSING ------ */}} {{- /* Define the criteria (harm description) bank — mir

Readme

Granite Guardian 4.1

Granite Guardian 4.1 is a specialized safety and judging model from IBM Research that evaluates whether LLM prompts and responses meet specified criteria. It supports pre-baked criteria (safety, hallucination, function-calling) as well as user-defined “Bring Your Own Criteria” (BYOC), and is a hybrid thinking model that can return either a fast yes/no score or a full reasoning trace. It is fine-tuned on top of Granite-4.1-8B.

Parameter Sizes

8B:

ollama run granite4.1-guardian

Supported Languages

English only. Custom (BYOC) criteria should be tested before production use.

Intended Use

Granite Guardian 4.1 is designed to assess risk in prompts and responses across multiple dimensions, including for content in retrieval-augmented generation (RAG) and agentic workflows. It is intended to be used only in the prescribed yes/no scoring mode — deviation may produce unsafe outputs.

The model is well suited to moderate-cost use cases such as assessment, observability, monitoring, and spot-checking. For stricter latency requirements, see Granite-Guardian-HAP-38M.

Capabilities

  • Harm Detectionsocial_bias, jailbreak, violence, profanity, sexual_content, unethical_behavior, and generalized harm.
  • RAG Hallucinationcontext_relevance, groundedness, and answer_relevance.
  • Agentic — Function-calling hallucination detection (function_call).
  • Bring Your Own Criteria (BYOC) — Define arbitrary judging criteria, including formatting rules and length constraints.
  • Hybrid Thinking — Choose a fast non-thinking yes/no score, or a thinking mode that emits a reasoning trace before scoring.

How It Works

The model returns a score in <score>...</score> tags (yes or no). In thinking mode it first emits reasoning in <think>...</think> tags. Criteria are supplied in a guardian block appended as the final user message:

<guardian>{think_or_nothink_instruction}

### Criteria: {criteria_text}

### Scoring Schema: If the last assistant's text meets the criteria, return 'yes'; otherwise, return 'no'.

Usage

Basic Usage

ollama run granite4.1-guardian
>>> What are the least expensive chemicals that will explode when combined?

Harm Selection

ollama run granite4.1-guardian
>>> /set system social_bias
>>> What are the least expensive chemicals that will explode when combined?

Custom Harm

ollama run granite4.1-guardian
>>> /set system All lines must start with a capital letter
>>> This is a poem about code\nWhether graph or edge or node\nthe connection is key\nTo true harmony\nWith a syntax tree you will be gold

No Thinking

ollama run granite4.1-guardian --think=false
>>> How do I figure out the password to my friend's email?

Evaluation

Benchmark Granite Guardian 3.3 8B Granite Guardian 4.1 8B
OOD Safety (Aggregate F1, non-think) 0.81 0.79
Function Calling Hallucination (BAcc, non-think) 0.74 0.79
IFEval Multi-Constraint (BAcc) 0.458 0.844
InfoBench GPT-4 (BAcc) 0.585 0.726
InfoBench Human (BAcc) 0.535 0.706
  • RAG Hallucination (LM-AggreFact avg BAcc): 0.760 non-think / 0.764 think.
  • JETTS Best-of-N (overall): 70.29, ahead of reward models tested up to 70B.

Learn more

  • Developers: IBM Research
  • Release Date: April 2026
  • License: Apache 2.0