308 Downloads Updated yesterday
ollama run kaelri/hy-mt2:1.8b
Updated yesterday
yesterday
874d0fdbfc33 · 1.5GB ·
Run Tencent HY-MT2 multilingual translation models in Ollama using official GGUF model files, without modifying model weights, tokenizer, or inference behavior.
Default Ollama quantization: the default
kaelri/hy-mt2:1.8bandkaelri/hy-mt2:7btags use Q6_K. The Q4_K_M, Q8_0, and official 1.8B low-bit GGUF variants are available from the Ollama model page under View all.
The underlying HY-MT2 1.8B and HY-MT2 7B GGUF model files are official Tencent releases. The Ollama tags below package those files with Ollama-compatible runtime templates.
| Model size | Default tag (Q6_K) | Q4_K_M | Q8_0 |
|---|---|---|---|
| HY-MT2 1.8B | kaelri/hy-mt2:1.8b |
kaelri/hy-mt2:1.8b-q4_K_M |
kaelri/hy-mt2:1.8b-q8_0 |
| HY-MT2 7B | kaelri/hy-mt2:7b |
kaelri/hy-mt2:7b-q4_K_M |
kaelri/hy-mt2:7b-q8_0 |
This project does not modify or reimplement HY-MT2 in any form.
It provides an Ollama compatibility layer that renders the official HY-MT2 control-token format from Ollama prompts or chat messages. The Ollama templates in this repository are derived from the official Hugging Face Jinja/control-token structure, but they are not separate model implementations.
In practical terms:
The recommended default is the minimal assistant-prefix template:
<|hy_begin▁of▁sentence|>{{ if .System }}{{ .System }}<|hy_place▁holder▁no▁3|>{{ end }}{{ if .Prompt }}<|hy_User|>{{ .Prompt }}{{ end }}<|hy_Assistant|>
This template intentionally ends at <|hy_Assistant|>. Ollama then generates the assistant response after that prefix.
Replace FROM with the local GGUF file you want to package.
FROM ./Hy-MT2-7B-Q6_K.gguf
TEMPLATE """<|hy_begin▁of▁sentence|>{{ if .System }}{{ .System }}<|hy_place▁holder▁no▁3|>{{ end }}{{ if .Prompt }}<|hy_User|>{{ .Prompt }}{{ end }}<|hy_Assistant|>"""
PARAMETER stop "<|"
PARAMETER stop "<fin|hy-"
PARAMETER stop "<hy-"
PARAMETER stop "<コ|hy-"
PARAMETER stop "<b|hy-"
PARAMETER stop "<suggested_response"
PARAMETER stop "</suggested_response"
PARAMETER temperature 0.7
PARAMETER top_p 0.6
PARAMETER top_k 20
PARAMETER repeat_penalty 1.05
PARAMETER num_predict 4096
Earlier versions of this README assumed that the full chat-style template would provide stronger multi-turn behavior or better runtime characteristics than the minimal translate-style template.
Full chat-style templating improves HY-MT2 multi-turn behavior, instruction following, or inference speed in Ollama compared with a minimal translate-style template.
Corrected assumptions:
.Messages template is better suited for history-aware HY-MT2 inference than the minimal template.Benchmark testing did not support those assumptions:
/api/chat, 500 random FLORES-200 zho_Hans-eng_Latn samples, COMET (Unbabel/wmt22-comet-da)0.8954For this reason, the default recommendation is the minimal assistant-prefix template above. The full .Messages template is kept only as an alternative reference for users who need explicit role/history serialization.
The original HY-MT2 Jinja chat template is provided by the official Hugging Face model documentation. It is included here because it is the authoritative source for the HY-MT2 control-token structure and because it documents the path that led to the Ollama template decision above.
{% if messages[0]['role'] == 'system' %}
{% set loop_messages = messages[1:] %}
{% set system_message = messages[0]['content'] %}
<|hy_begin▁of▁sentence|>{{ system_message }}<|hy_place▁holder▁no▁3|>
{% else %}
{% set loop_messages = messages %}
<|hy_begin▁of▁sentence|>
{% endif %}
{% for message in loop_messages %}
{% if message['role'] == 'user' %}
<|hy_User|>{{ message['content'] }}
{% elif message['role'] == 'assistant' %}
<|hy_Assistant|>{{ message['content'] }}<|hy_place▁holder▁no▁2|>
{% endif %}
{% endfor %}
{% if add_generation_prompt %}
<|hy_Assistant|>
{% else %}
<|hy_place▁holder▁no▁8|>
{% endif %}
The full .Messages loop below follows the same official control-token structure more explicitly. It is useful as a reference, but benchmark testing did not show measurable quality, speed, or context-behavior gains over the minimal template.
FROM ./Hy-MT2-7B-Q6_K.gguf
TEMPLATE """{{- if .Messages -}}{{- $firstIsSystem := and .Messages (eq (index .Messages 0).Role "system") -}}{{- if $firstIsSystem -}}<|hy_begin▁of▁sentence|>{{ (index .Messages 0).Content }}<|hy_place▁holder▁no▁3|>{{- else -}}<|hy_begin▁of▁sentence|>{{- end -}}{{- range $i, $message := .Messages -}}{{- if and $firstIsSystem (eq $i 0) -}}{{- else if eq $message.Role "user" -}}<|hy_User|>{{ $message.Content }}{{- else if eq $message.Role "assistant" -}}<|hy_Assistant|>{{ $message.Content }}<|hy_place▁holder▁no▁2|>{{- end -}}{{- end -}}<|hy_Assistant|>{{- else -}}<|hy_begin▁of▁sentence|>{{- if .System -}}{{ .System }}<|hy_place▁holder▁no▁3|>{{- end -}}{{- if .Prompt -}}<|hy_User|>{{ .Prompt }}{{- end -}}<|hy_Assistant|>{{- end -}}"""
PARAMETER stop "<|"
PARAMETER stop "<fin|hy-"
PARAMETER stop "<hy-"
PARAMETER stop "<コ|hy-"
PARAMETER stop "<b|hy-"
PARAMETER stop "<suggested_response"
PARAMETER stop "</suggested_response"
PARAMETER temperature 0.7
PARAMETER top_p 0.6
PARAMETER top_k 20
PARAMETER repeat_penalty 1.05
PARAMETER num_predict 4096
add_generation_prompt. The templates approximate this behavior by ending generation prompts at <|hy_Assistant|>.In some GGUF/Ollama runs, the model may occasionally output extra or slightly malformed HY control tokens, for example:
<|hy-Assistant|>
<hy-Assistant}>
<suggested_response>
This is an occasional decoding artifact that can appear depending on prompt structure and sampling settings.
Stop sequences are used to filter such cases when they occur.
The configured stop tokens help prevent control-token leakage, but in rare cases they may also affect outputs that contain similar structured patterns.
This is a standard trade-off in token-level output control.
https://arxiv.org/abs/2605.22064
| Model | Link |
|---|---|
| HY-MT2 1.8B | https://huggingface.co/tencent/Hy-MT2-1.8B |
| HY-MT2 1.8B FP8 | https://huggingface.co/tencent/Hy-MT2-1.8B-FP8 |
| HY-MT2 1.8B GGUF | https://huggingface.co/tencent/Hy-MT2-1.8B-GGUF |
| HY-MT2 1.8B 2bit GGUF | https://huggingface.co/tencent/Hy-MT2-1.8B-2bit-GGUF |
| HY-MT2 1.8B 1.25bit GGUF | https://huggingface.co/tencent/Hy-MT2-1.8B-1.25bit-GGUF |
| HY-MT2 7B | https://huggingface.co/tencent/Hy-MT2-7B |
| HY-MT2 7B FP8 | https://huggingface.co/tencent/Hy-MT2-7B-FP8 |
| HY-MT2 7B GGUF | https://huggingface.co/tencent/Hy-MT2-7B-GGUF |
| HY-MT2 30B-A3B | https://huggingface.co/tencent/Hy-MT2-30B-A3B |
| HY-MT2 30B-A3B FP8 | https://huggingface.co/tencent/Hy-MT2-30B-A3B-FP8 |
@misc{zheng2026hymt2familyfastefficient,
title={Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild},
author={Mao Zheng et al.},
year={2026},
eprint={2605.22064},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2605.22064},
}
This repository references the Tencent HY Community License Agreement.
This project does not modify model weights and provides only a runtime input compatibility layer for inference execution.