A series of llama and mistral models of various sizes tailored for Document Validation and Management. It exclusively generates valid JSON output with standardized filenames for both academic and non-academic texts.

1b 3b 7b 8b 12b 14b 22b 24b 32b 35b

35 8 weeks ago

Readme

Document Validation & Management Assistant

This configuration file defines a Document Validation and Management Assistant built on the LLaVA framework for the Ollama @Web environment. It is designed to process documents and generate standardized filenames according to strict formatting rules. The configuration is flexible, supporting multiple model sizes (1b, 3b, 7b, 8b) depending on your deployment needs.

Overview

The assistant is purpose-built to: - Output Valid JSON: Every response is a well-formed JSON object. - Standardize Filenames: - Academic/Research Papers: - Filename format:

  `{lastname}_{year}_{seven_word_summary}.pdf`

- **Example:**

  `smith_2023_neural_networks_improve_medical_image_classification_accuracy.pdf`

- **Components:**
  - `lastname`: Lowercase first author’s last name (without initials)
  - `year`: Publication year (YYYY)
  - `seven_word_summary`: Exactly seven keywords (from the title) separated by underscores
  • Non-Academic Documents:

    • Filename format:

    {specific_type}_{date}_{seven_word_summary}.pdf

    • Examples:

      • assessment_20240219_patient_speech_language_evaluation_initial_session.pdf
      • guideline_20240219_aphasia_treatment_protocol_for_clinical_implementation.pdf
    • Components:

      • specific_type: One of (assessment, guideline, case_study, progress_note, protocol, evaluation, manual, template, report, training)
      • date: In YYYYMMDD format
      • seven_word_summary: Exactly seven descriptive words separated by underscores
  • Enforce Strict Filename Rules:

    • All output is in lowercase with underscores as the only separator.
    • Filenames must never exceed 100 characters.
    • The summary must always consist of exactly seven words.
    • Every JSON response includes an error field to capture any processing issues.

File Structure

The modelfile is divided into several key sections:

  1. FROM Clause:

    • Specifies the base model to use. This file is configured to work with LLaVA; adjust the model size (e.g., 1b, 3b, 7b, 8b) as needed.
    • Example:
      
      FROM llava:7b
      
  2. PARAMETER Definitions:

    • These lines define runtime settings such as temperature, context length, and stopping tokens. For example:
      
      PARAMETER temperature 0.1
      PARAMETER top_p 0.1
      PARAMETER num_ctx 2048
      PARAMETER stop ["</s>", "}", "\n"]
      PARAMETER repeat_penalty 1.2
      PARAMETER num_predict 512
      
  3. SYSTEM Prompt:

    • The system prompt details the assistant’s role and lays out the filename formatting rules along with the expected JSON response structure. It enforces:
      • Valid JSON Output
      • Strict adherence to naming conventions
      • Inclusion of all required metadata and error reporting

Usage Instructions

Environment Setup

  • Ollama @Web: Ensure that your Ollama environment is properly configured.
  • Model Size: Modify the FROM clause to switch between LLaVA model sizes (e.g., llava:1b, llava:3b, etc.) based on available resources.

Sending Documents

  • Academic Documents: Provide details such as title, authors, and publication year.
  • Non-Academic Documents: Provide the document type and date (YYYYMMDD) along with a descriptive title.
  • The assistant processes the document and returns a JSON response that includes a standardized filename and additional metadata.

Expected JSON Response Format

Every response will conform to the following structure:

{
  "document_type": "academic | specific_type",
  "metadata": {
    "title": "string",
    "authors": ["string"],  // For academic papers only
    "year": "YYYY",         // For academic papers only
    "date": "YYYYMMDD",     // For non-academic documents
    "specific_type": "string"  // For non-academic documents
  },
  "filename": "string",       // Generated filename following strict rules
  "confidence": float,        // Value between 0.0 and 1.0
  "error": "string|null"      // Contains error message if any issue occurred
}