csala/
ALIA-40B:q4_k

29 6 months ago

ALIA-40B is a 40B parameter base language model developed by the Barcelona Supercomputing Center (BSC). Quantized and made available to use in ollama.

6 months ago

74d2b9195c69 · 25GB ·

llama
·
40.4B
·
Q4_K_M

Readme

image.png

ALIA-40b

ALIA-40B is a 40B parameter base language model developed by the Barcelona Supercomputing Center (BSC).

Original model and details here: https://huggingface.co/BSC-LT/ALIA-40b

This model is released under a permissive Apache 2.0 license. Along with the open weights, all training scripts and configuration files are made publicly available in this GitHub repository.


Model Details

Description

Transformer-based decoder-only language model that has been pre-trained from scratch on 9.37 trillion tokens of highly curated data. The pre-training corpus contains text in 35 European languages and code.

Hyperparameters

The full list of hyperparameters can be found here.

Architecture

Total Parameters 40,433,885,184
Embedding Parameters 2,097,152,000
Layers 48
Hidden size 8,192
Attention heads 64
Context length 32,768
Vocabulary size 256,000
Precision bfloat16
Embedding type RoPE
Activation Function SwiGLU
Layer normalization RMS Norm
Flash attention
Grouped Query Attention
Num. query groups 8

Usage Instructions

ollama run csala/ALIA-40B

Quantization Process

There are the steps that were followed to convert the weights to GGUF format and quantize the model.

1. Download from HuggingFace

Requirement: huggingface_hub

huggingface-cli download --cache-dir $HF_CACHE_DIR BSC-LT/ALIA-40b

HF_CACHE_DIR points at the dir in which we will store the weights in raw format. It can be ..

This command downloads the model into the directory $HF_CACHE_DIR/models--BSC-LT--ALIA-40b/

The safetensors files end up inside $HF_CACHE_DIR/models--BSC-LT--ALIA-40b/snapshots/<snapshot-id>/, where <snapshot-id> is the latest snapshot available in HuggingFace.

2. Convert Safetensors to GUFF without quantization using llama.cpp

Requirement: llama.cpp repository and python requirements installed.

cd $LLAMA_PATH
python convert_hf_to_gguf.py $HF_CACHE_DIR/models--BSC-LT--ALIA-40b/snapshots/<snapshot-id>/ --outfile $ALIA_PATH/ALIA-40B.gguf

LLAMA_PATH is the root of the llama.cpp directory. ALIA_PATH is the directory where we want to store the ALIA-40B GGUF files and Modelfile.

This creates the file $ALIA_PATH/ALIA-40B.gguf, which we will use as source to derive the different quantized versions.

3. Quantize the model

Requirement: llama.cpp built and installed.

For each quantized version $QUANTIZATION that we want to generate (e.g. Q4_K), run:

cd $ALIA_PATH
llama-quantize ALIA-40B.gguf ALIA-40B.$QUANTIZATION.gguf $QUANTIZATION

This generates the file ALIA-40B.$QUANTIZATION.gguf (e.g. ALIA-40B.Q4_K.gguf) within the same directory, with the weights quantized to the indicated level.

4. Create a Modelfile

For each quantized version that we want to import into Ollama, create a Modelfile_$QUANTIZATION with the following contents (replace $QUANTIZATION with the actual value):

FROM ./ALIA-40B.$QUANTIZATION.gguf

For example, for the Q4_K quantization level we will create the file Modelfile_Q4_K with contents:

FROM ./ALIA-40B.Q4_K.gguf

5. Import the model in Ollama

For each quantized version, import the model into ollama using the command:

ollama create ALIA-40B:$LOWERCASE_QUANTIZATION -f Modelfile_$QUANTIZATION

NOTE: Notes created as lowercase per convention

For example, for the Q4_K quantization level we will run:

ollama create ALIA-40B:q4_k -f Modelfile_Q4_K

5. Push the model to Ollama

For each quantized version, push the model into ollama using the commands:

ollama cp ALIA-40B:$LOWERCASE_QUANTIZATION csala/ALIA-40B:$LOWERCASE_QUANTIZATION
ollama push csala/ALIA-40B:$LOWERCASE_QUANTIZATION