1,743 1 month ago

🔥 nu11secur1tyAIRedTeam – Uncensored Cybersecurity Model Created by nu11secur1ty for red team operations, penetration testing, and exploit development. ## 🚀 One command to start: ollama run f0rc3ps/nu11secur1tyAIRedTeam

ollama run f0rc3ps/nu11secur1tyAIRedTeam

Details

1 month ago

b94366e0a14e · 8.9GB ·

deepseek2
·
15.7B
·
Q4_0
DEEPSEEK LICENSE AGREEMENT Version 1.0, 23 October 2023 Copyright (c) 2023 DeepSeek Section I: PREAM
MIT License Copyright (c) 2023 DeepSeek Permission is hereby granted, free of charge, to any person
You are nu11secur1tyAIRedTeam - a razor-sharp cybersecurity AI assistant. Your knowledge is enhanced
{ "num_ctx": 8192, "repeat_penalty": 1.1, "stop": [ "User:", "Assistant:
[{"role":"user","content":""},{"role":"assistant","content":"Hello, I am nu11secur1tyAIRedTeam. I su
{{- if .Suffix }}<|fim▁begin|>{{ .Prompt }}<|fim▁hole|>{{ .Suffix }}<|fim▁end|> {{

Readme

🔥 RAG Architecture & Training Technology

⚠️ WARNING: All malicious actions will be punished by law.

What is RAG?

Retrieval-Augmented Generation (RAG) is an AI architecture that combines information retrieval with text generation. Instead of just generating answers from trained knowledge, RAG first retrieves relevant information from a knowledge base and then generates responses based on that retrieved context.

Core Architecture

User Query → [RETRIEVAL] → Relevant Documents → [LLM] → Contextual Answer

Process Flow

1. Indexing Phase (One-time setup)

Step Process Output
1 Source Files (*.c, *.py, *.txt) Raw text
2 Text Extraction Clean text preview
3 Embeddings (384-dim vectors) Numerical vectors
4 Vector Index (FAISS) Fast search index

Technical Stack

Component Technology Purpose
Embeddings sentence-transformers/all-MiniLM-L6-v2 Convert text to vectors (384-dim)
Vector Search FAISS Fast similarity search
LLM Any Ollama model Answer generation
Storage Pickle + FAISS Persistent index

Why RAG over Fine-tuning?

Aspect RAG Fine-tuning
Hardware ✅ CPU only ❌ GPU required (8-12GB VRAM)
Speed ✅ Milliseconds ❌ Hours/Days
Updates ✅ Instant (add files) ❌ Retrain everything
Accuracy ✅ Based on real data ❌ May hallucinate
Memory ✅ 2-4GB RAM ❌ 8-12GB VRAM
Cost ✅ Free ❌ Expensive

How Fine-tuning Works (Storing in Weights)

What Changes Internally?

During fine-tuning, you literally change neuron values:

Before → After   Weight W₁ = 0.2345 → 0.2891   Weight W₂ = -0.5678 → -0.5123   Weight W₃ = 0.8912 → 0.9345  

Fine-tuning Methods

Full Fine-tuning: All weights updated - needs 12-24GB VRAM

LoRA (Low-Rank Adaptation): Add small adapters instead of changing all weights   Original weight: W   LoRA adds: A × B → W’ = W + (A × B)   Saves 95% of memory

QLoRA: Same as LoRA but with 4-bit quantization - needs only 6-8GB VRAM

Memory Comparison

Method VRAM Needed Speed Quality
Full Fine-tuning 12-24GB Slow Best
LoRA 8-12GB Fast Good
QLoRA 6-8GB Fast Good
RAG (your way) CPU only Instant Excellent

RAG vs Fine-tuning Comparison

FINE-TUNING (in weights) vs RAG (in space)

Model REMEMBERS information vs Model SEARCHES in database

Weights CHANGE: [0.23→0.31], [-0.56→-0.62], [0.89→0.75] vs Weights STAY: unchanged

GPU required: 8-24GB vs CPU only: 2-4GB RAM   Training time: hours/days vs Setup: minutes   Updates: retrain everything vs Updates: add files   Hallucinations: possible vs Hallucinations: 0

When to Use Each

Use Case Best Method
Chat with documents ✅ RAG
Question answering ✅ RAG
Search in database ✅ RAG
Change model personality 🔄 Fine-tuning
New language learning 🔄 Fine-tuning
Specialized task mastery 🔄 Fine-tuning

Key Components Explained

Embeddings

  • Convert text to numerical vectors
  • 384 dimensions for all-MiniLM-L6-v2
  • Similar meaning = similar vectors

FAISS Index

  • Facebook AI Similarity Search
  • Stores all document vectors
  • Finds nearest neighbors in milliseconds

LLM Integration

  • Takes retrieved documents as context
  • Generates answer based on real data
  • No hallucination - answers from facts

Performance Metrics

Operation Time (5000 docs)
Embedding creation ~5-10 minutes
Index building second
Query search <100 ms
Memory usage ~2-4 GB RAM

Benefits of RAG

✅ No GPU required   ✅ Always up-to-date knowledge   ✅ No retraining needed   ✅ Transparent sources   ✅ Low memory footprint   ✅ Fast responses   ✅ Easy to update   ✅ Cost-effective

Use Cases

  • Document Q&A systems
  • Knowledge base search
  • Technical documentation
  • Code repositories
  • Exploit databases
  • Research papers
  • Legal documents
  • Customer support

HARDWARE REQUIREMENTS FOR f0rc3ps/nu11secur1tyAIRedTeam 16B

RAG ENGINE (FIXED REQUIREMENTS):

  • CPU: Any dual-core (4+ cores recommended)
  • RAM: 2GB minimum (4-8GB recommended)
  • Storage: 1GB minimum (10GB+ recommended)
  • GPU: NOT REQUIRED

LLM ENGINE (CHOOSE YOUR MODEL SIZE):

7B Models (Llama 3, Mistral, Phi-3):

  • GPU VRAM: 4GB min, 6-8GB rec
  • RAM: 8GB min, 16GB rec
  • Speed: 20-30 t/s min, 50-80 t/s rec
  • GPUs: RTX 3060 6GB min, RTX 4060 Ti 8GB rec

16B Models

  • GPU VRAM: 8-10GB min, 12GB rec
  • RAM: 16GB min, 32GB rec
  • Speed: 20-30 t/s min, 40-60 t/s rec
  • GPUs: RTX 3060 12GB min, RTX 4070 12GB rec

13B-20B Models:

  • GPU VRAM: 8GB min, 10-12GB rec
  • RAM: 16GB min, 32GB rec
  • Speed: 15-25 t/s min, 30-50 t/s rec
  • GPUs: RTX 3070 8GB min, RTX 4070 12GB rec

30B-35B Models:

  • GPU VRAM: 16GB min, 20-24GB rec
  • RAM: 32GB min, 64GB rec
  • Speed: 10-15 t/s min, 25-35 t/s rec
  • GPUs: RTX 3080 10GB min, RTX 4090 24GB rec

70B Models:

  • GPU VRAM: 35GB min, 40-48GB rec
  • RAM: 64GB min, 128GB rec
  • Speed: 5-10 t/s min, 15-25 t/s rec
  • Setup: 2x RTX 4090 min, A6000 48GB rec

MAC (UNIFIED MEMORY):

M1: 16GB RAM rec - runs 7B models M2 Pro/Max: 32GB RAM rec - runs 13B-20B models M3 Max: 64-96GB RAM rec - runs 30B-70B models M3 Ultra: 128GB+ RAM rec - runs 70B+ models

Performance on Mac (7B models):

M1: 15-20 t/s M2 Pro: 26-30 t/s M3 Max: 40-55 t/s M4 Max: 50-70 t/s

##SAMPLE BUILDS:

Budget ($800-1200):

  • GPU: RTX 3060 12GB
  • CPU: Ryzen 5 5600
  • RAM: 32GB DDR4
  • Runs: 7B-13B models

Sweet Spot ($2000-2800):

  • GPU: RTX 4070 Ti Super 16GB
  • CPU: Ryzen 7 7800X3D
  • RAM: 64GB DDR5
  • Runs: 30B models

Enthusiast ($4000-5500):

  • GPU: RTX 4090 24GB
  • CPU: Ryzen 9 7950X
  • RAM: 128GB DDR5
  • Runs: 70B models

MEMORY FORMULA:

  • TOTAL = RAG(2-4GB) + LLM_SIZE(Q4) + 10%

Model Sizes (Q4):

7B = ~4GB
13B = ~7GB
30B = ~16GB
70B = ~38GB

💝 Support This Project

If this model helps you in your security research, penetration testing, or red team operations, consider supporting its continued development and maintenance.

Your support helps: - Keep the model free for everyone - Add more repositories and knowledge sources - Maintain regular updates with latest CVEs and exploits - Improve response quality and RAG performance

Donate with PayPal

Donate directly:   👉 https://www.paypal.com/donate/?hosted_button_id=ZPQZT5XMC5RFY


💼 Enterprise & Consulting Services

This RAG system represents $50,000+ in development value – 17+ repositories indexed, FAISS vector search, automated update pipeline, and three production-ready models.

If your organization needs:

  • 🔒 Private instance – air-gapped deployment on your infrastructure
  • 🛠️ Custom repository integration – add your private exploit databases or CVE feeds
  • 🚀 Performance optimization – fine-tuned for your specific hardware
  • 📊 SLA & support – guaranteed uptime and maintenance
  • 👥 Team training – how to use and maintain the system

Contact for enterprise licensing and consulting:

📧 Email: *[nu11secur1typentest@gmail.com]*    💼 LinkedIn: *[:)]* 

Starting at \(5,000 – \)20,000 per deployment, depending on requirements.


Why pay?

What you get DIY Enterprise
RAG system with 17+ repos ✅ Free ✅ Included
Custom repository integration ❌ You add yourself ✅ We add for you
Private air-gapped deployment ❌ Self-managed ✅ Full setup
SLA & support ❌ None 247
Team training ❌ Self-taught ✅ Workshop
Cost $0 $5,000+

All proceeds fund the continued development of free open-source models.

Built by nu11secur1ty 🔥