kaelri/ hy-mt2:1.8b

308 yesterday

Official Tencent HY-MT2 (1.8B / 7B GGUF) models with an Ollama-compatible prompt template for correct and consistent local translation behavior.

1.8b 7b
ollama run kaelri/hy-mt2:1.8b

Details

yesterday

874d0fdbfc33 · 1.5GB ·

hunyuan-dense
·
1.79B
·
Q6_K
<|hy_begin▁of▁sentence|>{{ if .System }}{{ .System }}<|hy_place▁holder▁no▁3|>{{ en
TENCENT HY COMMUNITY LICENSE AGREEMENT Tencent Hy-MT2 Release Date: May 21, 2026 THIS LICENSE AGREEM
{ "num_predict": 4096, "repeat_penalty": 1.05, "stop": [ "<|", "<fin

Readme

HY-MT2 Ollama Integration

Run Tencent HY-MT2 multilingual translation models in Ollama using official GGUF model files, without modifying model weights, tokenizer, or inference behavior.

Default Ollama quantization: the default kaelri/hy-mt2:1.8b and kaelri/hy-mt2:7b tags use Q6_K. The Q4_K_M, Q8_0, and official 1.8B low-bit GGUF variants are available from the Ollama model page under View all.

Official Model Tags

The underlying HY-MT2 1.8B and HY-MT2 7B GGUF model files are official Tencent releases. The Ollama tags below package those files with Ollama-compatible runtime templates.

Model size Default tag (Q6_K) Q4_K_M Q8_0
HY-MT2 1.8B kaelri/hy-mt2:1.8b kaelri/hy-mt2:1.8b-q4_K_M kaelri/hy-mt2:1.8b-q8_0
HY-MT2 7B kaelri/hy-mt2:7b kaelri/hy-mt2:7b-q4_K_M kaelri/hy-mt2:7b-q8_0

Project Scope

This project does not modify or reimplement HY-MT2 in any form.

It provides an Ollama compatibility layer that renders the official HY-MT2 control-token format from Ollama prompts or chat messages. The Ollama templates in this repository are derived from the official Hugging Face Jinja/control-token structure, but they are not separate model implementations.

In practical terms:

  • Model weights are unchanged
  • Tokenizer behavior is unchanged
  • Runtime inference behavior is unchanged
  • Only prompt rendering for Ollama compatibility is provided

Recommended Template

The recommended default is the minimal assistant-prefix template:

<|hy_begin▁of▁sentence|>{{ if .System }}{{ .System }}<|hy_place▁holder▁no▁3|>{{ end }}{{ if .Prompt }}<|hy_User|>{{ .Prompt }}{{ end }}<|hy_Assistant|>

This template intentionally ends at <|hy_Assistant|>. Ollama then generates the assistant response after that prefix.

Recommended Ollama Modelfile

Replace FROM with the local GGUF file you want to package.

FROM ./Hy-MT2-7B-Q6_K.gguf

TEMPLATE """<|hy_begin▁of▁sentence|>{{ if .System }}{{ .System }}<|hy_place▁holder▁no▁3|>{{ end }}{{ if .Prompt }}<|hy_User|>{{ .Prompt }}{{ end }}<|hy_Assistant|>"""

PARAMETER stop "<|"
PARAMETER stop "<fin|hy-"
PARAMETER stop "<hy-"
PARAMETER stop "<コ|hy-"
PARAMETER stop "<b|hy-"
PARAMETER stop "<suggested_response"
PARAMETER stop "</suggested_response"

PARAMETER temperature 0.7
PARAMETER top_p 0.6
PARAMETER top_k 20
PARAMETER repeat_penalty 1.05
PARAMETER num_predict 4096

Template Benchmark Note

Earlier versions of this README assumed that the full chat-style template would provide stronger multi-turn behavior or better runtime characteristics than the minimal translate-style template.

Full chat-style templating improves HY-MT2 multi-turn behavior, instruction following, or inference speed in Ollama compared with a minimal translate-style template.

Corrected assumptions:

  • The full .Messages template is better suited for history-aware HY-MT2 inference than the minimal template.
  • The full chat-style template provides more stable multi-turn behavior, stronger assistant style continuation, or better instruction following.
  • The full chat-style template improves latency, tokens/s, or prompt-token structure in Ollama.

Benchmark testing did not support those assumptions:

  • Scope: HY-MT2 1.8B Q6_K, Ollama /api/chat, 500 random FLORES-200 zho_Hans-eng_Latn samples, COMET (Unbabel/wmt22-comet-da)
  • Quality: both templates produced the same COMET score: 0.8954
  • Runtime and behavior: latency, tokens/s, prompt-token structure, multi-turn context behavior, instruction override behavior, and assistant style continuation showed no meaningful difference beyond normal runtime noise

For this reason, the default recommendation is the minimal assistant-prefix template above. The full .Messages template is kept only as an alternative reference for users who need explicit role/history serialization.


Official Jinja Template

The original HY-MT2 Jinja chat template is provided by the official Hugging Face model documentation. It is included here because it is the authoritative source for the HY-MT2 control-token structure and because it documents the path that led to the Ollama template decision above.

{% if messages[0]['role'] == 'system' %}
{% set loop_messages = messages[1:] %}
{% set system_message = messages[0]['content'] %}
<|hy_begin▁of▁sentence|>{{ system_message }}<|hy_place▁holder▁no▁3|>
{% else %}
{% set loop_messages = messages %}
<|hy_begin▁of▁sentence|>
{% endif %}

{% for message in loop_messages %}
{% if message['role'] == 'user' %}
<|hy_User|>{{ message['content'] }}
{% elif message['role'] == 'assistant' %}
<|hy_Assistant|>{{ message['content'] }}<|hy_place▁holder▁no▁2|>
{% endif %}
{% endfor %}

{% if add_generation_prompt %}
<|hy_Assistant|>
{% else %}
<|hy_place▁holder▁no▁8|>
{% endif %}

Alternative: Full Messages Template

The full .Messages loop below follows the same official control-token structure more explicitly. It is useful as a reference, but benchmark testing did not show measurable quality, speed, or context-behavior gains over the minimal template.

FROM ./Hy-MT2-7B-Q6_K.gguf

TEMPLATE """{{- if .Messages -}}{{- $firstIsSystem := and .Messages (eq (index .Messages 0).Role "system") -}}{{- if $firstIsSystem -}}<|hy_begin▁of▁sentence|>{{ (index .Messages 0).Content }}<|hy_place▁holder▁no▁3|>{{- else -}}<|hy_begin▁of▁sentence|>{{- end -}}{{- range $i, $message := .Messages -}}{{- if and $firstIsSystem (eq $i 0) -}}{{- else if eq $message.Role "user" -}}<|hy_User|>{{ $message.Content }}{{- else if eq $message.Role "assistant" -}}<|hy_Assistant|>{{ $message.Content }}<|hy_place▁holder▁no▁2|>{{- end -}}{{- end -}}<|hy_Assistant|>{{- else -}}<|hy_begin▁of▁sentence|>{{- if .System -}}{{ .System }}<|hy_place▁holder▁no▁3|>{{- end -}}{{- if .Prompt -}}<|hy_User|>{{ .Prompt }}{{- end -}}<|hy_Assistant|>{{- end -}}"""

PARAMETER stop "<|"
PARAMETER stop "<fin|hy-"
PARAMETER stop "<hy-"
PARAMETER stop "<コ|hy-"
PARAMETER stop "<b|hy-"
PARAMETER stop "<suggested_response"
PARAMETER stop "</suggested_response"

PARAMETER temperature 0.7
PARAMETER top_p 0.6
PARAMETER top_k 20
PARAMETER repeat_penalty 1.05
PARAMETER num_predict 4096

Template Behavior Notes

  • The minimal and full templates showed equivalent translation quality, runtime behavior, and context behavior in the HY-MT2 1.8B Q6_K benchmark described above.
  • HY-MT2 remains a translation-oriented model; neither template produced reliable instruction override behavior in testing.
  • Ollama does not expose Hugging Face add_generation_prompt. The templates approximate this behavior by ending generation prompts at <|hy_Assistant|>.

Known Limitations

Control-token leakage

In some GGUF/Ollama runs, the model may occasionally output extra or slightly malformed HY control tokens, for example:

<|hy-Assistant|>
<hy-Assistant}>
<suggested_response>

This is an occasional decoding artifact that can appear depending on prompt structure and sampling settings.

Stop sequences are used to filter such cases when they occur.


Stop-token trade-off

The configured stop tokens help prevent control-token leakage, but in rare cases they may also affect outputs that contain similar structured patterns.

This is a standard trade-off in token-level output control.


Model Sources


Official References

Paper

https://arxiv.org/abs/2605.22064

Model Links

Model Link
HY-MT2 1.8B https://huggingface.co/tencent/Hy-MT2-1.8B
HY-MT2 1.8B FP8 https://huggingface.co/tencent/Hy-MT2-1.8B-FP8
HY-MT2 1.8B GGUF https://huggingface.co/tencent/Hy-MT2-1.8B-GGUF
HY-MT2 1.8B 2bit GGUF https://huggingface.co/tencent/Hy-MT2-1.8B-2bit-GGUF
HY-MT2 1.8B 1.25bit GGUF https://huggingface.co/tencent/Hy-MT2-1.8B-1.25bit-GGUF
HY-MT2 7B https://huggingface.co/tencent/Hy-MT2-7B
HY-MT2 7B FP8 https://huggingface.co/tencent/Hy-MT2-7B-FP8
HY-MT2 7B GGUF https://huggingface.co/tencent/Hy-MT2-7B-GGUF
HY-MT2 30B-A3B https://huggingface.co/tencent/Hy-MT2-30B-A3B
HY-MT2 30B-A3B FP8 https://huggingface.co/tencent/Hy-MT2-30B-A3B-FP8

Citation

@misc{zheng2026hymt2familyfastefficient,
  title={Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild},
  author={Mao Zheng et al.},
  year={2026},
  eprint={2605.22064},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2605.22064},
}

License

This repository references the Tencent HY Community License Agreement.

This project does not modify model weights and provides only a runtime input compatibility layer for inference execution.