jewelzufo/ granite4-350m-h-Distill-Gemini-Thinking:latest

70 Downloads Updated 1 month ago

glogwa68: https://huggingface.co/glogwa68/granite-4.0-h-350m-DISTILL-gemini-think

tools

ollama run jewelzufo/granite4-350m-h-Distill-Gemini-Thinking

curl http://localhost:11434/api/chat \
  -d '{
    "model": "jewelzufo/granite4-350m-h-Distill-Gemini-Thinking",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='jewelzufo/granite4-350m-h-Distill-Gemini-Thinking',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'jewelzufo/granite4-350m-h-Distill-Gemini-Thinking',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 1 month ago

1 month ago

6b51b0e22c5f · 366MB ·

model

archgranitehybrid

·

parameters340M

·

quantizationQ8_0

366MB

template

{{- /* ------ MESSAGE PARSING ------ */}} {{- /* Declare the system prompt chunks used for different

7.2kB

Readme

granite-4.0-h-350m-DISTILL-gemini-think

This model is a fine-tuned version of ibm-granite/granite-4.0-h-350m trained on high-reasoning conversational data from Gemini 3 Pro.

Model Details

Base Model: ibm-granite/granite-4.0-h-350m
Fine-tuning Dataset: TeichAI/gemini-3-pro-preview-high-reasoning-1000x
Context Length: 1048576 tokens
Special Feature: Thinking/Reasoning with <think> tags

Quantized Versions (GGUF)

🔗 GGUF versions available here: granite-4.0-h-350m-DISTILL-gemini-think-GGUF

Format	Size	Use Case
Q2_K	Smallest	Low memory, reduced quality
Q4_K_M	Recommended	Best balance
Q5_K_M	Good	Higher quality
Q8_0	Large	Near lossless
F16	Largest	Original precision

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("glogwa68/granite-4.0-h-350m-DISTILL-gemini-think")
tokenizer = AutoTokenizer.from_pretrained("glogwa68/granite-4.0-h-350m-DISTILL-gemini-think")

messages = [{"role": "user", "content": "Hello, how are you?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Ollama (GGUF)

ollama run hf.co/glogwa68/granite-4.0-h-350m-DISTILL-gemini-think-GGUF:Q4_K_M

llama.cpp

llama-cli --hf-repo glogwa68/granite-4.0-h-350m-DISTILL-gemini-think-GGUF --hf-file granite-4.0-h-350m-distill-gemini-think-q4_k_m.gguf -p "Hello"

Training Details

Epochs: 3
Learning Rate: 2e-5
Batch Size: 1 (with gradient accumulation)
Precision: FP16
Hardware: Multi-GPU with DeepSpeed ZeRO-3

License

Apache 2.0

# granite-4.0-h-350m-DISTILL-gemini-think

This model is a fine-tuned version of [ibm-granite/granite-4.0-h-350m](https://huggingface.co/ibm-granite/granite-4.0-h-350m) trained on high-reasoning conversational data from Gemini 3 Pro.

## Model Details

- **Base Model:** ibm-granite/granite-4.0-h-350m
- **Fine-tuning Dataset:** TeichAI/gemini-3-pro-preview-high-reasoning-1000x
- **Context Length:** 1048576 tokens
- **Special Feature:** Thinking/Reasoning with `<think>` tags

## Quantized Versions (GGUF)

**🔗 GGUF versions available here: [granite-4.0-h-350m-DISTILL-gemini-think-GGUF](https://huggingface.co/glogwa68/granite-4.0-h-350m-DISTILL-gemini-think-GGUF)**

| Format | Size | Use Case |
|--------|------|----------|
| Q2_K | Smallest | Low memory, reduced quality |
| Q4_K_M | Recommended | Best balance |
| Q5_K_M | Good | Higher quality |
| Q8_0 | Large | Near lossless |
| F16 | Largest | Original precision |

## Usage

### Transformers

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("glogwa68/granite-4.0-h-350m-DISTILL-gemini-think")
tokenizer = AutoTokenizer.from_pretrained("glogwa68/granite-4.0-h-350m-DISTILL-gemini-think")

messages = [{"role": "user", "content": "Hello, how are you?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### Ollama (GGUF)

```bash
ollama run hf.co/glogwa68/granite-4.0-h-350m-DISTILL-gemini-think-GGUF:Q4_K_M
```

### llama.cpp

```bash
llama-cli --hf-repo glogwa68/granite-4.0-h-350m-DISTILL-gemini-think-GGUF --hf-file granite-4.0-h-350m-distill-gemini-think-q4_k_m.gguf -p "Hello"
```

## Training Details

- **Epochs:** 3
- **Learning Rate:** 2e-5
- **Batch Size:** 1 (with gradient accumulation)
- **Precision:** FP16
- **Hardware:** Multi-GPU with DeepSpeed ZeRO-3

## License

Apache 2.0

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)