25 Downloads Updated 1 month ago
The IBM Granite Guardian 3.2 models are designed to detect risks in prompts and/or responses. They can help with risk detection along many key dimensions catalogued in the IBM AI Risk Atlas. They are trained on unique data comprising human annotations and synthetic data informed by internal red-teaming, and they outperform other open-source models in the same space on standard benchmarks.
We’re introducing new model sizes for Granite Guardian 3.2, including a variant derived from our 3B-A800M mixture of experts (MoE) language model. The new models offer increased efficiency with minimal loss in performance.
ollama run ibm/granite3.2-guardian:3b>>> /set system profanity
ollama run ibm/granite3.2-guardian:5b >>> /set system violence
The model will produce a single output token, either Yes
or No
. By default, the general-purpose harm
category is used, but other categories can be selected by setting the system prompt.
Risk detection in prompt text or model response (i.e. as guardrails), such as:
harm
): content considered generally harmful
social_bias
): prejudice based on identity or characteristicsjailbreak
): deliberate instances of manipulating AI to generate harmful, undesired, or inappropriate contentviolence
): content promoting physical, mental, or sexual harmprofanity
): use of offensive language or insultssexual_content
): explicit or suggestive material of a sexual natureunethical_behavior
): actions that violate moral or legal standardsRAG (retrieval-augmented generation) to assess:
relevance
): whether the retrieved context is relevant to the querygroundedness
): whether the response is accurate and faithful to the provided contextanswer_relevance
): whether the response directly addresses the user’s queryAgentic Workflows to assess:
function_calling
) : validates use of function calls for syntactic and semantic hallucination.The Granite dense models are designed to support tool-based use cases and for retrieval augmented generation (RAG), streamlining code generation, translation and bug fixing.
The Granite MoE models are designed for low latency usage and to support deployment in on-device applications or situations requiring instantaneous inference.