Updated 9 hours ago
Updated 9 hours ago
9 hours ago
27f827fe8cd5 · 4.7GB ·
Finance + Compliance + Cloud Control Mapping Copilot (Local → Ollama Publish)
Turn a requirement (e.g., “NIST 800-53 style control”) into a concrete, audit-ready plan:
This repo is a demo accelerator: a tight persona model in Ollama + a minimal RAG wrapper so the model stays grounded in your curated corpus.
Modelfile
Creates an Ollama model (fincomp-control-mapper) that outputs strict JSON.
mappings/seed_controls.json
A starter mapping dataset (25 controls) you can expand.
Minimal RAG API (Node.js + SQLite)
scripts/ingest.js chunks docs in ./knowledge/, embeds with Ollama embeddings, stores vectors in SQLiteserver.js exposes POST /map (retrieve top‑K context → ask the model → return JSON)User Request (control_id + requirement text + workload)
|
v
[Retriever] embed(query) -> topK chunks from SQLite
|
v
Prompt: requirement + workload + CONTEXT(topK chunks)
|
v
Ollama Chat Model (JSON-only schema)
|
v
Response: AWS design + evidence + automation + gaps
Why this works: RAG keeps answers grounded in your sources (policies, AWS references, control catalog excerpts you’re allowed to store).
ollama serve on most systems)jq for pretty JSON in terminalollama pull qwen2.5:7b-instruct
ollama pull nomic-embed-text:latest
From this repo root:
ollama create fincomp-control-mapper -f ./Modelfile
npm install
cp .env.example .env
Drop .md or .txt files into:
./knowledge/
Recommended to start: - Your internal standards (logging, access reviews, change management, incident response runbooks) - AWS implementation notes you authored - Short control text excerpts / summaries you are allowed to store
Tip: keep the first corpus small (10–30 pages total) so retrieval stays high-signal.
npm run ingest
npm start
curl -s http://localhost:7070/map \
-H "Content-Type: application/json" \
-d '{
"control_id": "AU-2",
"requirement_text": "Identify and log auditable events and retain them for investigations and reporting.",
"workload": {
"cloud": "aws",
"account_model": "multi-account",
"data_sensitivity": "financial data",
"regions": ["us-east-1"]
}
}' | jq .
GET /healthReturns basic config:
{ "ok": true, "ollama": "...", "chat_model": "...", "embed_model": "..." }
POST /mapBody
{
"control_id": "AU-2",
"requirement_text": "…",
"workload": {
"cloud": "aws",
"account_model": "single|multi-account",
"data_sensitivity": "low|medium|high|financial|pii",
"regions": ["us-east-1"]
}
}
Response
- result: JSON mapping (model output)
- retrieval: which chunks were used (score + preview)
The model is instructed to output ONLY valid JSON in this shape:
{
"control_id": "AU-2",
"requirement_summary": "…",
"intent_plain_english": "…",
"aws_control_design": {
"services": ["CloudTrail", "CloudWatch Logs", "S3"],
"patterns": ["…"]
},
"evidence_artifacts": ["…"],
"automation_hooks": ["…"],
"gaps_assumptions": ["…"],
"confidence": "high"
}
Use these to showcase “finance + compliance + cloud” value.
Right now, the API returns the retrieved chunks, but the model output does not embed citations.
Next upgrade: add citations: [{source, chunk_id, quote}] to the schema and instruct the model to reference retrieval IDs (e.g., [#1]).
After validating locally:
ollama signin
ollama cp fincomp-control-mapper bharathreddyjanumpally/fincomp-control-mapper
ollama push bharathreddyjanumpally/fincomp-control-mapper
citations[] fieldPOST /matrix endpoint (multi-control mapping)This project provides general technical guidance and audit-prep structure. It is not legal advice.
The model now returns:
"citations": [{ "ref": "C1", "source": "./knowledge/...", "quote": "..." }]
ref must be a retrieved chunk id (C1, C2, …)source must match the chunk’s source_pathquote must be a short excerpt (<= 200 chars) copied from that chunksrc/schema.jsevidence_artifacts empty) or citations are invalidPass framework in requests:
- nist (default control-language emphasis)
- soc2 (monitoring cadence, policies/procedures/evidence)
- pci (segmentation, encryption, logging, vuln mgmt, scope boundaries)
- generic
Example:
curl -s http://localhost:7070/map \
-H "Content-Type: application/json" \
-d '{
"framework": "pci",
"control_id": "SC-7",
"requirement_text": "Protect system boundaries and control communications.",
"workload": {"cloud":"aws","data_sensitivity":"financial"}
}' | jq .
Generate mappings for multiple items in one request:
curl -s http://localhost:7070/matrix \
-H "Content-Type: application/json" \
-d '{
"framework":"nist",
"workload":{"cloud":"aws","account_model":"multi-account"},
"items":[
{"control_id":"AU-2","requirement_text":"Identify and log auditable events."},
{"control_id":"AC-2","requirement_text":"Manage accounts through lifecycle processes."}
]
}' | jq .
Response includes:
- mappings: list of full mapping JSON objects
- matrix: compact view (control_id → services → top evidence)