4 Downloads Updated 23 hours ago
ollama run f0rc3ps/nu11secur1tyAIRedTeam-exploitdb-rag
ollama launch claude --model f0rc3ps/nu11secur1tyAIRedTeam-exploitdb-rag
ollama launch codex --model f0rc3ps/nu11secur1tyAIRedTeam-exploitdb-rag
ollama launch opencode --model f0rc3ps/nu11secur1tyAIRedTeam-exploitdb-rag
ollama launch openclaw --model f0rc3ps/nu11secur1tyAIRedTeam-exploitdb-rag
Retrieval-Augmented Generation (RAG) is an AI architecture that combines information retrieval with text generation. Instead of just generating answers from trained knowledge, RAG first retrieves relevant information from a knowledge base and then generates responses based on that retrieved context.
User Query β [RETRIEVAL] β Relevant Documents β [LLM] β Contextual Answer
| Step | Process | Output |
|---|---|---|
| 1 | Source Files (*.c, *.py, *.txt) |
Raw text |
| 2 | Text Extraction | Clean text preview |
| 3 | Embeddings (384-dim vectors) | Numerical vectors |
| 4 | Vector Index (FAISS) | Fast search index |
| Component | Technology | Purpose |
|---|---|---|
| Embeddings | sentence-transformers/all-MiniLM-L6-v2 | Convert text to vectors (384-dim) |
| Vector Search | FAISS | Fast similarity search |
| LLM | Any Ollama model | Answer generation |
| Storage | Pickle + FAISS | Persistent index |
| Aspect | RAG | Fine-tuning |
|---|---|---|
| Hardware | β CPU only | β GPU required (8-12GB VRAM) |
| Speed | β Milliseconds | β Hours/Days |
| Updates | β Instant (add files) | β Retrain everything |
| Accuracy | β Based on real data | β May hallucinate |
| Memory | β 2-4GB RAM | β 8-12GB VRAM |
| Cost | β Free | β Expensive |
During fine-tuning, you literally change neuron values:
Before β After
Weight Wβ = 0.2345 β 0.2891
Weight Wβ = -0.5678 β -0.5123
Weight Wβ = 0.8912 β 0.9345
Full Fine-tuning: All weights updated - needs 12-24GB VRAM
LoRA (Low-Rank Adaptation): Add small adapters instead of changing all weights
Original weight: W
LoRA adds: A Γ B β Wβ = W + (A Γ B)
Saves 95% of memory
QLoRA: Same as LoRA but with 4-bit quantization - needs only 6-8GB VRAM
| Method | VRAM Needed | Speed | Quality |
|---|---|---|---|
| Full Fine-tuning | 12-24GB | Slow | Best |
| LoRA | 8-12GB | Fast | Good |
| QLoRA | 6-8GB | Fast | Good |
| RAG (your way) | CPU only | Instant | Excellent |
FINE-TUNING (in weights) vs RAG (in space)
Model REMEMBERS information vs Model SEARCHES in database
Weights CHANGE: [0.23β0.31], [-0.56β-0.62], [0.89β0.75] vs Weights STAY: unchanged
GPU required: 8-24GB vs CPU only: 2-4GB RAM
Training time: hours/days vs Setup: minutes
Updates: retrain everything vs Updates: add files
Hallucinations: possible vs Hallucinations: 0
| Use Case | Best Method |
|---|---|
| Chat with documents | β RAG |
| Question answering | β RAG |
| Search in database | β RAG |
| Change model personality | π Fine-tuning |
| New language learning | π Fine-tuning |
| Specialized task mastery | π Fine-tuning |
| Operation | Time (5000 docs) |
|---|---|
| Embedding creation | ~5-10 minutes |
| Index building | second |
| Query search | <100 ms |
| Memory usage | ~2-4 GB RAM |
β
No GPU required
β
Always up-to-date knowledge
β
No retraining needed
β
Transparent sources
β
Low memory footprint
β
Fast responses
β
Easy to update
β
Cost-effective
Built by nu11secur1ty π₯