shb/ legal-embed:latest

4 4 days ago

embedding
ollama pull shb/legal-embed

Details

4 days ago

b66e1c4b96f1 · 274MB ·

nomic-bert
·
137M
·
F16

Readme

legal-embed

Fine-tuned nomic-embed-text-v1.5 for retrieval over Indian criminal law — BNS 2023, IPC 1860, BNSS 2023, BSA 2023. Built for RAG: maps a legal question to the correct statutory section and separates it from look-alike sections.

Dataset

11,587 contrastive triplets synthesised from the bare Acts: - anchor — a query (search_query: ...) - positive — the correct section (search_document: ...) - negative — a hard negative: a confusingly similar section from the same chapter (e.g. Theft vs Robbery vs Extortion)

Training

Hyperparameter Value
Loss TripletLoss, cosine, margin 0.5
Epochs 5
Batch size 16
Max seq length 512
Learning rate 2e-5 (AdamW)
Warmup ratio 0.1
Precision fp32
Eval split 5% (580 triplets)

Result

Triplet accuracy: base 0.9552 → fine-tuned 0.9983

Usage

Always prefix inputs: search_query: for questions, search_document: for indexed sections. `bash ollama pull shb/legal-embed curl http://localhost:11434/api/embed -d '{"model":"shb/legal-embed","input":"search_query: What constitutes murder under the new criminal code?"}' \`