Deterministic multimodal layer that unifies text, images, PDFs, UI and code into one reproducible JSON schema for downstream evaluation in the S.L.A.V.K.O.™ stack.

🌐 SlavkoFusion 1.0 – Multimodal Integration Layer

Extract → Normalise → Unify all modalities into a single JSON object.

📜 Philosophy

Multimodal AI must be deterministic and reproducible. Fusion normalises images, PDFs, UI mock-ups, and code snippets into a canonical feature set that can be fed to any downstream evaluator.

Core Principles

Unified Schema: All modalities produce the same JSON structure
Deterministic Extraction: Same input always yields same output
Modality Detection: Automatic detection of input type
Plugin Architecture: Extensible extractors for new modalities
Audit Checkpoint: Second checkpoint in the audit chain

✨ Core Features

Feature	Description
Automatic modality detection	Detects text, image, pdf, ui, code automatically
Feature extraction	Objects, layout, OCR, syntax tree extraction
Deterministic output	Always the same JSON shape for same input
Audit checkpoint #2	Adds `fusion` to the audit chain
Plug-in extractor framework	Add custom parsers without touching core code

📦 Installation

git clone https://github.com/FormatDisc/slavko-fusion
cd slavko-fusion
pip install -e .

Dependencies

python>=3.11
pillow>=10.0.0
pytesseract>=0.3.10
pdfplumber>=0.10.0
opencv-python>=4.8.0
transformers>=4.35.0
torch>=2.0.0

🚀 Quick Start

from slavko_fusion import Fusion
import json

fusion = Fusion()

payload = {
    "image_base64": "<BASE64-PNG-IMAGE-DATA>",
    "text": "Review this dashboard"
}

features = fusion.extract(payload)
print(json.dumps(features, indent=2))

📚 Usage Examples

Text Extraction

fusion = Fusion()

text_payload = {
    "text": "This is a sample text document for analysis."
}

features = fusion.extract(text_payload)
print(features)

Image Analysis

fusion = Fusion()

image_payload = {
    "image_base64": "<BASE64-IMAGE>",
    "text": "Analyze this UI screenshot"
}

features = fusion.extract(image_payload)

for obj in features["features"]["objects"]:
    print(f"Found {obj['label']} at {obj['bbox']}")

print(f"Aspect ratio: {features['features']['layout']['aspectRatio']}")

PDF Processing

python fusion = Fusion()

pdf_payload = { “pdf_base64”: “”, “text”: “Extract content from this PDF” }

features = fusion.extract(pdf_payload)

print(f”Text: {features[‘features’][‘text’]}“) “`

Code Analysis

”`python fusion = Fusion()

code_payload = { “code”: “”” def calculate_risk(data): if data[‘risk_factor’] > 0.8: return ‘HIGH’ return ‘LOW’ “”“, “language”: “python”, “text”: “Analyze this code” }

features = fusion.extract(code_payload)

print(f”Functions: {features[‘features’][‘functions’]}“) print(f”Complexity: {features[‘features’][‘complexity’]}“)

📊 Performance

Modality	Avg. Latency	Memory	Model
Text	5–10 ms	< 100 MB	N/A
Image	300–700 ms	4–5 GB	phi3-vision
PDF	500–1000 ms	2–3 GB	pdfplumber + OCR
UI	400–800 ms	4–5 GB	phi3-vision
Code	10–20 ms	< 200 MB	AST parser

📜 License

BSD-3-Clause – see LICENSE for details.

📞 Contact — Formatdisc

Company: Formatdisc – Computer Programming & Advanced Software Systems
Founder & System Architect: Mladen Gertner
Website: https://formatdisc.hr
Email: mladen@formatdisc.hr
Phone: +385 91 542 1014
Location: Zagreb, Croatia
OIB: 18915075854

GitHub: https://github.com/FormatDisc
LinkedIn: https://linkedin.com/company/formatdisc

Built with S.L.A.V.K.O.™ – Unified. Deterministic. Extensible.