19 Downloads Updated 8 months ago
This is a merge of pre-trained language models created using mergekit.
The model was merged using the passthrough merge method, with a sliding window approach.
Due to the sliding window method, layers overlap between slices. The following layers are duplicated across multiple slices:
This means each layer is used in multiple slices, with each slice overlapping with the previous and next slice.
The following models were included in the merge: * Qwen/Qwen2.5-3B-Instruct
The following YAML configuration was used to produce this model:
slices:
- sources:
- model: Qwen/Qwen2.5-3B-Instruct
layer_range: [0, 4]
- sources:
- model: Qwen/Qwen2.5-3B-Instruct
layer_range: [2, 6]
- sources:
- model: Qwen/Qwen2.5-3B-Instruct
layer_range: [4, 8]
- sources:
- model: Qwen/Qwen2.5-3B-Instruct
layer_range: [6, 10]
- sources:
- model: Qwen/Qwen2.5-3B-Instruct
layer_range: [8, 12]
- sources:
- model: Qwen/Qwen2.5-3B-Instruct
layer_range: [10, 14]
- sources:
- model: Qwen/Qwen2.5-3B-Instruct
layer_range: [12, 16]
- sources:
- model: Qwen/Qwen2.5-3B-Instruct
layer_range: [14, 18]
- sources:
- model: Qwen/Qwen2.5-3B-Instruct
layer_range: [16, 20]
- sources:
- model: Qwen/Qwen2.5-3B-Instruct
layer_range: [18, 22]
- sources:
- model: Qwen/Qwen2.5-3B-Instruct
layer_range: [20, 24]
- sources:
- model: Qwen/Qwen2.5-3B-Instruct
layer_range: [22, 26]
- sources:
- model: Qwen/Qwen2.5-3B-Instruct
layer_range: [24, 28]
- sources:
- model: Qwen/Qwen2.5-3B-Instruct
layer_range: [26, 30]
- sources:
- model: Qwen/Qwen2.5-3B-Instruct
layer_range: [28, 32]
- sources:
- model: Qwen/Qwen2.5-3B-Instruct
layer_range: [30, 34]
- sources:
- model: Qwen/Qwen2.5-3B-Instruct
layer_range: [32, 36]
tokenizer_source: Qwen/Qwen2.5-3B-Instruct
merge_method: passthrough
dtype: bfloat16