118 Downloads Updated 1 year ago
OpenLlaMA 3B V2 finetuned on EverythingLM Data V2 for 2 epochs
The model used in the example below is the Marx-3B-V2 model, with 3b parameters, which is a chat model.
ollama serve
) curl -X POST http://localhost:11434/api/generate -d '{
"model": "acrastt_marx-3b-v2:latest",
"prompt":"Why is the sky blue?"
}'
ollama run acrastt_marx-3b-v2:latest
Note: The ollama run
command performs an ollama pull
if the model is not already downloaded. To download the model without running it, use ollama pull acrastt_marx-3b-v2:latest
If you run into issues with higher quantization levels, try using the q4 model or shut down any other programs that are using a lot of memory.
By default, Ollama uses 4-bit quantization. To try other quantization levels, please try the other tags. The number after the q represents the number of bits used for quantization (i.e. q4 means 4-bit quantization). The higher the number, the more accurate the model is, but the slower it runs, and the more memory it requires.
Marx-3B-V2 source on Ollama
3b parameters source: acrasst