4 Downloads Updated 4 days ago
ollama pull shb/legal-embed
Updated 4 days ago
4 days ago
b66e1c4b96f1 · 274MB ·
Fine-tuned nomic-embed-text-v1.5 for retrieval over Indian criminal law —
BNS 2023, IPC 1860, BNSS 2023, BSA 2023. Built for RAG: maps a legal question to the
correct statutory section and separates it from look-alike sections.
11,587 contrastive triplets synthesised from the bare Acts:
- anchor — a query (search_query: ...)
- positive — the correct section (search_document: ...)
- negative — a hard negative: a confusingly similar section from the same chapter
(e.g. Theft vs Robbery vs Extortion)
| Hyperparameter | Value |
|---|---|
| Loss | TripletLoss, cosine, margin 0.5 |
| Epochs | 5 |
| Batch size | 16 |
| Max seq length | 512 |
| Learning rate | 2e-5 (AdamW) |
| Warmup ratio | 0.1 |
| Precision | fp32 |
| Eval split | 5% (580 triplets) |
Triplet accuracy: base 0.9552 → fine-tuned 0.9983
Always prefix inputs: search_query: for questions, search_document: for indexed sections.
`bash
ollama pull shb/legal-embed
curl http://localhost:11434/api/embed -d '{"model":"shb/legal-embed","input":"search_query: What constitutes murder under the new criminal code?"}'
\`