301 1 month ago

Efficient, intelligent, and tiny models from IBM

tools 350m 1b 3.2b 3.4b 7b

Models

View all →

Readme

Notes

Uploaded as Unsloth DynamicQuant2 (DQ2) versions. These are enterprise-focused models, built for tool-use and structured output over creative chat.

All models here are Q4_0. The unsloth dq2 method may preserve most of the performance of the q8 model, but doesn’t guarantee it. Do your own testing to make sure it really works for your use case.

Temperature: A range of 0.4–0.6 works well for their intended instruction-following and tool-use tasks.


Description

IBM’s Granite 4.0 — a series of lightweight, open foundation models (Apache 2.0) designed for enterprise applications. They excel at tool-calling, structured JSON output, multilingual tasks, and fill-in-the-middle (FIM) code completion.

The key difference between these models is their architecture and context length:

Granite 4.0 Series:

  • granite-4.0:3.4b (micro): A dense model with a 128K context window.

  • granite-4.0:3.2b (h-micro): A dense-hybrid model with a 1M context window.

  • granite-4.0:7b (h-tiny): A larger dense-hybrid model, also with a 1M context window.

Granite 4.0 Nano Series (for on-device/IoT use):

  • granite-4.0:350m (nano): A dense model with a 128K context window.

  • granite-4.0:1b (nano): A dense model with a 128K context window.

  • granite-4.0:350m-h (h-nano): A dense-hybrid model with a 1M context window.

  • granite-4.0:1b-h (h-nano): A larger dense-hybrid model, also with a 1M context window.

Ideal for: - Agentic workflows requiring tool-calling - Structured data extraction into JSON - Code completion, especially Fill-in-the-Middle (FIM) - Enterprise RAG and other instruction-following tasks - On-device or IoT applications (Nano series)


References

Granite 4.0 HuggingFace Collection

Granite 4.0 Nano HuggingFace Collection

IBM Granite Docs

GitHub Discussions Page