Official Tencent HY-MT2 (1.8B / 7B GGUF) models with an Ollama-compatible prompt template for correct and consistent local translation behavior.

Details

Updated yesterday

yesterday

874d0fdbfc33 · 1.5GB ·

model

archhunyuan-dense

parameters1.79B

quantizationQ6_K

1.5GB

template

<｜hy_begin▁of▁sentence｜>{{ if .System }}{{ .System }}<｜hy_place▁holder▁no▁3｜>{{ en

177B

license

TENCENT HY COMMUNITY LICENSE AGREEMENT Tencent Hy-MT2 Release Date: May 21, 2026 THIS LICENSE AGREEM

17kB

params

{ "num_predict": 4096, "repeat_penalty": 1.05, "stop": [ "<｜", "<ｆin

230B

HY-MT2 Ollama Integration

Run Tencent HY-MT2 multilingual translation models in Ollama using official GGUF model files, without modifying model weights, tokenizer, or inference behavior.

Default Ollama quantization: the default kaelri/hy-mt2:1.8b and kaelri/hy-mt2:7b tags use Q6_K. The Q4_K_M, Q8_0, and official 1.8B low-bit GGUF variants are available from the Ollama model page under View all.

Official Model Tags

The underlying HY-MT2 1.8B and HY-MT2 7B GGUF model files are official Tencent releases. The Ollama tags below package those files with Ollama-compatible runtime templates.

Model size	Default tag (Q6_K)	Q4_K_M	Q8_0
HY-MT2 1.8B	`kaelri/hy-mt2:1.8b`	`kaelri/hy-mt2:1.8b-q4_K_M`	`kaelri/hy-mt2:1.8b-q8_0`
HY-MT2 7B	`kaelri/hy-mt2:7b`	`kaelri/hy-mt2:7b-q4_K_M`	`kaelri/hy-mt2:7b-q8_0`

Project Scope

This project does not modify or reimplement HY-MT2 in any form.

It provides an Ollama compatibility layer that renders the official HY-MT2 control-token format from Ollama prompts or chat messages. The Ollama templates in this repository are derived from the official Hugging Face Jinja/control-token structure, but they are not separate model implementations.

In practical terms:

Model weights are unchanged
Tokenizer behavior is unchanged
Runtime inference behavior is unchanged
Only prompt rendering for Ollama compatibility is provided

Recommended Template

The recommended default is the minimal assistant-prefix template:

<｜hy_begin▁of▁sentence｜>{{ if .System }}{{ .System }}<｜hy_place▁holder▁no▁3｜>{{ end }}{{ if .Prompt }}<｜hy_User｜>{{ .Prompt }}{{ end }}<｜hy_Assistant｜>

This template intentionally ends at <｜hy_Assistant｜>. Ollama then generates the assistant response after that prefix.

Recommended Ollama Modelfile

Replace FROM with the local GGUF file you want to package.

FROM ./Hy-MT2-7B-Q6_K.gguf

TEMPLATE """<｜hy_begin▁of▁sentence｜>{{ if .System }}{{ .System }}<｜hy_place▁holder▁no▁3｜>{{ end }}{{ if .Prompt }}<｜hy_User｜>{{ .Prompt }}{{ end }}<｜hy_Assistant｜>"""

PARAMETER stop "<｜"
PARAMETER stop "<ｆin｜hy-"
PARAMETER stop "<ｈy-"
PARAMETER stop "<ｺ｜hy-"
PARAMETER stop "<ｂ｜hy-"
PARAMETER stop "<suggested_response"
PARAMETER stop "</suggested_response"

PARAMETER temperature 0.7
PARAMETER top_p 0.6
PARAMETER top_k 20
PARAMETER repeat_penalty 1.05
PARAMETER num_predict 4096

Template Benchmark Note

Earlier versions of this README assumed that the full chat-style template would provide stronger multi-turn behavior or better runtime characteristics than the minimal translate-style template.

~~Full chat-style templating improves HY-MT2 multi-turn behavior, instruction following, or inference speed in Ollama compared with a minimal translate-style template.~~

Corrected assumptions:

~~The full .Messages template is better suited for history-aware HY-MT2 inference than the minimal template.~~
~~The full chat-style template provides more stable multi-turn behavior, stronger assistant style continuation, or better instruction following.~~
~~The full chat-style template improves latency, tokens/s, or prompt-token structure in Ollama.~~

Benchmark testing did not support those assumptions:

Scope: HY-MT2 1.8B Q6_K, Ollama /api/chat, 500 random FLORES-200 zho_Hans-eng_Latn samples, COMET (Unbabel/wmt22-comet-da)
Quality: both templates produced the same COMET score: 0.8954
Runtime and behavior: latency, tokens/s, prompt-token structure, multi-turn context behavior, instruction override behavior, and assistant style continuation showed no meaningful difference beyond normal runtime noise

For this reason, the default recommendation is the minimal assistant-prefix template above. The full .Messages template is kept only as an alternative reference for users who need explicit role/history serialization.

Official Jinja Template

The original HY-MT2 Jinja chat template is provided by the official Hugging Face model documentation. It is included here because it is the authoritative source for the HY-MT2 control-token structure and because it documents the path that led to the Ollama template decision above.

{% if messages[0]['role'] == 'system' %}
{% set loop_messages = messages[1:] %}
{% set system_message = messages[0]['content'] %}
<｜hy_begin▁of▁sentence｜>{{ system_message }}<｜hy_place▁holder▁no▁3｜>
{% else %}
{% set loop_messages = messages %}
<｜hy_begin▁of▁sentence｜>
{% endif %}

{% for message in loop_messages %}
{% if message['role'] == 'user' %}
<｜hy_User｜>{{ message['content'] }}
{% elif message['role'] == 'assistant' %}
<｜hy_Assistant｜>{{ message['content'] }}<｜hy_place▁holder▁no▁2｜>
{% endif %}
{% endfor %}

{% if add_generation_prompt %}
<｜hy_Assistant｜>
{% else %}
<｜hy_place▁holder▁no▁8｜>
{% endif %}

Alternative: Full Messages Template

The full .Messages loop below follows the same official control-token structure more explicitly. It is useful as a reference, but benchmark testing did not show measurable quality, speed, or context-behavior gains over the minimal template.

FROM ./Hy-MT2-7B-Q6_K.gguf

TEMPLATE """{{- if .Messages -}}{{- $firstIsSystem := and .Messages (eq (index .Messages 0).Role "system") -}}{{- if $firstIsSystem -}}<｜hy_begin▁of▁sentence｜>{{ (index .Messages 0).Content }}<｜hy_place▁holder▁no▁3｜>{{- else -}}<｜hy_begin▁of▁sentence｜>{{- end -}}{{- range $i, $message := .Messages -}}{{- if and $firstIsSystem (eq $i 0) -}}{{- else if eq $message.Role "user" -}}<｜hy_User｜>{{ $message.Content }}{{- else if eq $message.Role "assistant" -}}<｜hy_Assistant｜>{{ $message.Content }}<｜hy_place▁holder▁no▁2｜>{{- end -}}{{- end -}}<｜hy_Assistant｜>{{- else -}}<｜hy_begin▁of▁sentence｜>{{- if .System -}}{{ .System }}<｜hy_place▁holder▁no▁3｜>{{- end -}}{{- if .Prompt -}}<｜hy_User｜>{{ .Prompt }}{{- end -}}<｜hy_Assistant｜>{{- end -}}"""

PARAMETER stop "<｜"
PARAMETER stop "<ｆin｜hy-"
PARAMETER stop "<ｈy-"
PARAMETER stop "<ｺ｜hy-"
PARAMETER stop "<ｂ｜hy-"
PARAMETER stop "<suggested_response"
PARAMETER stop "</suggested_response"

PARAMETER temperature 0.7
PARAMETER top_p 0.6
PARAMETER top_k 20
PARAMETER repeat_penalty 1.05
PARAMETER num_predict 4096

Template Behavior Notes

The minimal and full templates showed equivalent translation quality, runtime behavior, and context behavior in the HY-MT2 1.8B Q6_K benchmark described above.
HY-MT2 remains a translation-oriented model; neither template produced reliable instruction override behavior in testing.
Ollama does not expose Hugging Face add_generation_prompt. The templates approximate this behavior by ending generation prompts at <｜hy_Assistant｜>.

Known Limitations

Control-token leakage

In some GGUF/Ollama runs, the model may occasionally output extra or slightly malformed HY control tokens, for example:

<｜hy-Assistant｜>
<ｈy-Assistant｝>
<suggested_response>

This is an occasional decoding artifact that can appear depending on prompt structure and sampling settings.

Stop sequences are used to filter such cases when they occur.

Stop-token trade-off

The configured stop tokens help prevent control-token leakage, but in rare cases they may also affect outputs that contain similar structured patterns.

This is a standard trade-off in token-level output control.

Model Sources

Official References

Paper

https://arxiv.org/abs/2605.22064

Model Links

Model	Link
HY-MT2 1.8B	https://huggingface.co/tencent/Hy-MT2-1.8B
HY-MT2 1.8B FP8	https://huggingface.co/tencent/Hy-MT2-1.8B-FP8
HY-MT2 1.8B GGUF	https://huggingface.co/tencent/Hy-MT2-1.8B-GGUF
HY-MT2 1.8B 2bit GGUF	https://huggingface.co/tencent/Hy-MT2-1.8B-2bit-GGUF
HY-MT2 1.8B 1.25bit GGUF	https://huggingface.co/tencent/Hy-MT2-1.8B-1.25bit-GGUF
HY-MT2 7B	https://huggingface.co/tencent/Hy-MT2-7B
HY-MT2 7B FP8	https://huggingface.co/tencent/Hy-MT2-7B-FP8
HY-MT2 7B GGUF	https://huggingface.co/tencent/Hy-MT2-7B-GGUF
HY-MT2 30B-A3B	https://huggingface.co/tencent/Hy-MT2-30B-A3B
HY-MT2 30B-A3B FP8	https://huggingface.co/tencent/Hy-MT2-30B-A3B-FP8

Citation

@misc{zheng2026hymt2familyfastefficient,
  title={Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild},
  author={Mao Zheng et al.},
  year={2026},
  eprint={2605.22064},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2605.22064},
}

License

This repository references the Tencent HY Community License Agreement.

This project does not modify model weights and provides only a runtime input compatibility layer for inference execution.