ALIA-40B is a 40B parameter base language model developed by the Barcelona Supercomputing Center (BSC). Quantized and made available to use in ollama.

ALIA-40b

ALIA-40B is a 40B parameter base language model developed by the Barcelona Supercomputing Center (BSC).

Original model and details here: https://huggingface.co/BSC-LT/ALIA-40b

This model is released under a permissive Apache 2.0 license. Along with the open weights, all training scripts and configuration files are made publicly available in this GitHub repository.

Model Details

Description

Transformer-based decoder-only language model that has been pre-trained from scratch on 9.37 trillion tokens of highly curated data. The pre-training corpus contains text in 35 European languages and code.

Hyperparameters

The full list of hyperparameters can be found here.

Architecture


Total Parameters	40,433,885,184
Embedding Parameters	2,097,152,000
Layers	48
Hidden size	8,192
Attention heads	64
Context length	32,768
Vocabulary size	256,000
Precision	bfloat16
Embedding type	RoPE
Activation Function	SwiGLU
Layer normalization	RMS Norm
Flash attention	✅
Grouped Query Attention	✅
Num. query groups	8

Usage Instructions

ollama run csala/ALIA-40B

Quantization Process

There are the steps that were followed to convert the weights to GGUF format and quantize the model.

1. Download from HuggingFace

Requirement: huggingface_hub

huggingface-cli download --cache-dir $HF_CACHE_DIR BSC-LT/ALIA-40b

HF_CACHE_DIR points at the dir in which we will store the weights in raw format. It can be ..

This command downloads the model into the directory $HF_CACHE_DIR/models--BSC-LT--ALIA-40b/

The safetensors files end up inside $HF_CACHE_DIR/models--BSC-LT--ALIA-40b/snapshots/<snapshot-id>/, where <snapshot-id> is the latest snapshot available in HuggingFace.

2. Convert Safetensors to GUFF without quantization using llama.cpp

Requirement: llama.cpp repository and python requirements installed.

cd $LLAMA_PATH
python convert_hf_to_gguf.py $HF_CACHE_DIR/models--BSC-LT--ALIA-40b/snapshots/<snapshot-id>/ --outfile $ALIA_PATH/ALIA-40B.gguf

LLAMA_PATH is the root of the llama.cpp directory. ALIA_PATH is the directory where we want to store the ALIA-40B GGUF files and Modelfile.

This creates the file $ALIA_PATH/ALIA-40B.gguf, which we will use as source to derive the different quantized versions.

3. Quantize the model

Requirement: llama.cpp built and installed.

For each quantized version $QUANTIZATION that we want to generate (e.g. Q4_K), run:

cd $ALIA_PATH
llama-quantize ALIA-40B.gguf ALIA-40B.$QUANTIZATION.gguf $QUANTIZATION

This generates the file ALIA-40B.$QUANTIZATION.gguf (e.g. ALIA-40B.Q4_K.gguf) within the same directory, with the weights quantized to the indicated level.

4. Create a `Modelfile`

For each quantized version that we want to import into Ollama, create a Modelfile_$QUANTIZATION with the following contents (replace $QUANTIZATION with the actual value):

FROM ./ALIA-40B.$QUANTIZATION.gguf

For example, for the Q4_K quantization level we will create the file Modelfile_Q4_K with contents:

FROM ./ALIA-40B.Q4_K.gguf

5. Import the model in Ollama

For each quantized version, import the model into ollama using the command:

ollama create ALIA-40B:$LOWERCASE_QUANTIZATION -f Modelfile_$QUANTIZATION

NOTE: Notes created as lowercase per convention

For example, for the Q4_K quantization level we will run:

ollama create ALIA-40B:q4_k -f Modelfile_Q4_K

5. Push the model to Ollama

For each quantized version, push the model into ollama using the commands:

ollama cp ALIA-40B:$LOWERCASE_QUANTIZATION csala/ALIA-40B:$LOWERCASE_QUANTIZATION
ollama push csala/ALIA-40B:$LOWERCASE_QUANTIZATION