20 Downloads Updated 1 week ago
ollama run jewelzufo/granite4-350m-h-Distill-Gemini-Thinking
This model is a fine-tuned version of ibm-granite/granite-4.0-h-350m trained on high-reasoning conversational data from Gemini 3 Pro.
<think> tags🔗 GGUF versions available here: granite-4.0-h-350m-DISTILL-gemini-think-GGUF
| Format | Size | Use Case |
|---|---|---|
| Q2_K | Smallest | Low memory, reduced quality |
| Q4_K_M | Recommended | Best balance |
| Q5_K_M | Good | Higher quality |
| Q8_0 | Large | Near lossless |
| F16 | Largest | Original precision |
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("glogwa68/granite-4.0-h-350m-DISTILL-gemini-think")
tokenizer = AutoTokenizer.from_pretrained("glogwa68/granite-4.0-h-350m-DISTILL-gemini-think")
messages = [{"role": "user", "content": "Hello, how are you?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
ollama run hf.co/glogwa68/granite-4.0-h-350m-DISTILL-gemini-think-GGUF:Q4_K_M
llama-cli --hf-repo glogwa68/granite-4.0-h-350m-DISTILL-gemini-think-GGUF --hf-file granite-4.0-h-350m-distill-gemini-think-q4_k_m.gguf -p "Hello"
Apache 2.0