A series of llama and mistral models of various sizes tailored for Document Validation and Management. It exclusively generates valid JSON output with standardized filenames for both academic and non-academic texts.
35 Pulls Updated 8 weeks ago
Updated 8 weeks ago
8 weeks ago
405f83568355 · 7.1GB
Readme
Document Validation & Management Assistant
This configuration file defines a Document Validation and Management Assistant built on the LLaVA framework for the Ollama @Web environment. It is designed to process documents and generate standardized filenames according to strict formatting rules. The configuration is flexible, supporting multiple model sizes (1b, 3b, 7b, 8b) depending on your deployment needs.
Overview
The assistant is purpose-built to: - Output Valid JSON: Every response is a well-formed JSON object. - Standardize Filenames: - Academic/Research Papers: - Filename format:
`{lastname}_{year}_{seven_word_summary}.pdf`
- **Example:**
`smith_2023_neural_networks_improve_medical_image_classification_accuracy.pdf`
- **Components:**
- `lastname`: Lowercase first author’s last name (without initials)
- `year`: Publication year (YYYY)
- `seven_word_summary`: Exactly seven keywords (from the title) separated by underscores
Non-Academic Documents:
- Filename format:
{specific_type}_{date}_{seven_word_summary}.pdf
Examples:
assessment_20240219_patient_speech_language_evaluation_initial_session.pdf
guideline_20240219_aphasia_treatment_protocol_for_clinical_implementation.pdf
Components:
specific_type
: One of (assessment, guideline, case_study, progress_note, protocol, evaluation, manual, template, report, training)date
: In YYYYMMDD formatseven_word_summary
: Exactly seven descriptive words separated by underscores
Enforce Strict Filename Rules:
- All output is in lowercase with underscores as the only separator.
- Filenames must never exceed 100 characters.
- The summary must always consist of exactly seven words.
- Every JSON response includes an error field to capture any processing issues.
File Structure
The modelfile is divided into several key sections:
FROM Clause:
- Specifies the base model to use. This file is configured to work with LLaVA; adjust the model size (e.g., 1b, 3b, 7b, 8b) as needed.
- Example:
FROM llava:7b
PARAMETER Definitions:
- These lines define runtime settings such as temperature, context length, and stopping tokens. For example:
PARAMETER temperature 0.1 PARAMETER top_p 0.1 PARAMETER num_ctx 2048 PARAMETER stop ["</s>", "}", "\n"] PARAMETER repeat_penalty 1.2 PARAMETER num_predict 512
- These lines define runtime settings such as temperature, context length, and stopping tokens. For example:
SYSTEM Prompt:
- The system prompt details the assistant’s role and lays out the filename formatting rules along with the expected JSON response structure. It enforces:
- Valid JSON Output
- Strict adherence to naming conventions
- Inclusion of all required metadata and error reporting
- The system prompt details the assistant’s role and lays out the filename formatting rules along with the expected JSON response structure. It enforces:
Usage Instructions
Environment Setup
- Ollama @Web: Ensure that your Ollama environment is properly configured.
- Model Size: Modify the FROM clause to switch between LLaVA model sizes (e.g.,
llava:1b
,llava:3b
, etc.) based on available resources.
Sending Documents
- Academic Documents: Provide details such as title, authors, and publication year.
- Non-Academic Documents: Provide the document type and date (YYYYMMDD) along with a descriptive title.
- The assistant processes the document and returns a JSON response that includes a standardized filename and additional metadata.
Expected JSON Response Format
Every response will conform to the following structure:
{
"document_type": "academic | specific_type",
"metadata": {
"title": "string",
"authors": ["string"], // For academic papers only
"year": "YYYY", // For academic papers only
"date": "YYYYMMDD", // For non-academic documents
"specific_type": "string" // For non-academic documents
},
"filename": "string", // Generated filename following strict rules
"confidence": float, // Value between 0.0 and 1.0
"error": "string|null" // Contains error message if any issue occurred
}