tri282 mitya is a model based on Dostoevsky's renowned works, including but not limited to: The Brothers Karamazov, Crime & Punishment, Demons, etc. It is trained upon the open-sourced model Mistral-7b-v0.3 and comes in 6 distinct sizes and quantizations.
2 Pulls Updated 3 months ago
Updated 3 months ago
3 months ago
dfa4db1ddec8 · 7.7GB
Readme
Model Description
mitya is a model based on Dostoevsky’s renowned works, including but not limited to: The Brothers Karamazov, Crime & Punishment, Demons, etc. It is trained upon the open-sourced model Mistral-7b-v0.3 and comes in 6 distinct sizes and quantizations.
- Developed by: tri282
- Funded by: my sincere adoration for his ideals
- Model type: question-answering
- Language(s) (NLP): English (primary), though it can interpret other languages and answer in English
- Finetuned from model: mistral-7b-v0.3
- Repository: https://github.com/tri282/mitya-0.0
- Inference/Download: https://huggingface.co/tri282/dostoevskyGPT_merged
Intended Cause
before everyone, for everyone and everything
Bias, Risks, and Limitations
this model was initially trained on 7,000 question-answer pairs with LoRA, and later on adapted to its base model. given the limited training examples it was fine-tuned on, expect minor, if not any (for i spitefully claim), errors with regards to its syntax and so on
Other Usage
- Inference:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
path = “tri282/dostoevskyGPT_merged”
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path)
input_text = “your text here”
inputs = tokenizer(input_text, return_tensors = “pt”)
with torch.no_grad():
____outputs = model.generate(**inputs, max_new_tokens = 250)
output_text = tokenizer.decode(outputs[0], skip_special_tokens = True)
print(output_text)
- Download:
from huggingface_hub import snapshot_download
path = “tri282/dostoevskyGPT_merged”
snapshot_download(repo_id = path, local_dir = “./your_directory_here”)
Training Data
currently propriety
Training Hyperparameters
- Training regime: fp16 mixed precision
- Epochs: 3
- Learning Rate: 2e-4
- Batch Size: 16
- Rank, LoRA Alpha, LoRA Dropout: 64, 96, 0.1
Speeds, Sizes, Times [optional]
this model was trained for 6 hours on Tesla L4 GPU. it is roughly 27GB with float32 precision
Evaluation
Summary
i hold firm awareness of the current limitations with regards to my model, that being said, i had a great time testing it out. i ask nothing but your great expectations on future optimizations and versions
Citations
special thanks to Dostoevsky himself, cordially