Advancing Open-source Large Language Models in Medical Domain

1,193 5 months ago

Readme

import from https://hf-mirror.com/aaditya/Llama3-OpenBioLLM-70B

fJIOPJnY6Ff6fUiSIuMEt.png

Advancing Open-source Large Language Models in Medical Domain

For mutil-language support, visit https://ollama.com/taozhiyuai/openbiollm-llama-3-chinese

KGmRE5w2sepNtwsEu8t7K.jpeg

Introduction 介绍

生物医学领域优等生,基于LLAMA3打造.

Introducing OpenBioLLM-70B: A State-of-the-Art Open Source Biomedical Large Language Model

OpenBioLLM-70B is an advanced open source language model designed specifically for the biomedical domain. Developed by Saama AI Labs, this model leverages cutting-edge techniques to achieve state-of-the-art performance on a wide range of biomedical tasks.

🏥 Biomedical Specialization: OpenBioLLM-70B is tailored for the unique language and knowledge requirements of the medical and life sciences fields. It was fine-tuned on a vast corpus of high-quality biomedical data, enabling it to understand and generate text with domain-specific accuracy and fluency.

🎓 Superior Performance: With 70 billion parameters, OpenBioLLM-70B outperforms other open source biomedical language models of similar scale. It has also demonstrated better results compared to larger proprietary & open-source models like GPT-4, Gemini, Meditron-70B, Med-PaLM-1 & Med-PaLM-2 on biomedical benchmarks.

🧠 Advanced Training Techniques: OpenBioLLM-70B builds upon the powerful foundations of the Meta-Llama-3-70B-Instruct and Meta-Llama-3-70B-Instruct models. It incorporates the DPO dataset and fine-tuning recipe along with a custom diverse medical instruction dataset. Key components of the training pipeline include:

Screen sample 示例

70B的输出内容多质量高 ;

70B generate more and quality texts

截屏2024-05-08 22.07.00.png

8B的输出内容少,且输出的中文质量不稳定,过度量化导致损失率高. 建议使用70B.;

8B generate less, More quants more perplexity. so 70B is recommended.

| Model                        | Quants | Size  | Bit | Perplexity       |
|------------------------------|--------|-------|----|-------------------|
| llama3-openbiollm-8b:Q4_0   | Q4_0   | 4.7GB | 4  | +0.2166 ppl       |
| llama3-openbiollm-8b:Q4_K_M  | Q4_K_M | 4.9GB | 4  | +0.0532 ppl       |
| llama3-openbiollm-8b:Q5_K_M  | Q5_K_M | 5.7GB | 5  | +0.0122 ppl       |
| llama3-openbiollm-8b:Q6_K    | Q6_K   | 6.6GB | 6  | +0.0008 ppl       |

截屏2024-05-09 15.21.49.png

Benchmark 医疗模型测评

oPchsJsEpQoGcGXVbh7YS.png UXF-V0col0Z0sS6BGPBkE.png 截屏2024-05-09 14.25.03.png

WeChat ID: TAOZHIYUAI