39 3 days ago

Source: https://huggingface.co/ruv/ruvltra-claude-code

ollama run jewelzufo/ruvltra-claude-code

Models

View all →

Readme

# 🌟 RuvLTRA Claude Code ### **The World's First LLM Optimized for Claude Code** [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![HuggingFace](https://img.shields.io/badge/πŸ€—%20Hugging%20Face-Model-yellow)](https://huggingface.co/ruv/ruvltra-claude-code) [![GGUF](https://img.shields.io/badge/Format-GGUF-green)](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) [![First](https://img.shields.io/badge/πŸ₯‡-First%20of%20its%20Kind-gold)](https://huggingface.co/ruv/ruvltra-claude-code) [![Self-Learning](https://img.shields.io/badge/🧠-Self%20Learning-purple)](https://github.com/ruvnet/ruvector) [![Swarm](https://img.shields.io/badge/🐝-Swarm%20Optimized-orange)](https://github.com/ruvnet/ruvector) --- **πŸš€ Self-Learning β€’ 🐝 Swarm-Optimized β€’ ⚑ Edge-Ready β€’ πŸ”„ Adaptive** [The Story](#-the-story) β€’ [Why RuvLTRA](#-why-ruvltra) β€’ [Quick Start](#-quick-start) β€’ [Architecture](#-architecture) β€’ [Benchmarks](#-benchmarks)

🎯 The Story

RuvLTRA Claude Code represents a paradigm shift in AI-assisted development.

Traditional coding assistants are staticβ€”they don’t learn, adapt, or improve from your workflow. RuvLTRA changes everything by introducing:

  1. 🧠 Self-Learning Intelligence (SONA): The model continuously improves from interactions, learning your coding patterns, preferences, and project-specific conventions.

  2. 🐝 Swarm-Optimized Architecture: Built for distributed multi-agent workflows where multiple AI agents collaborate, share knowledge, and coordinate through the RuVector framework.

  3. πŸ”„ Adaptive Neural Architecture: Unlike frozen models, RuvLTRA features real-time adaptation with <0.05ms latencyβ€”your AI assistant literally gets smarter as you code.

  4. ⚑ Claude Code Native: Purpose-built for Claude Code IDE integrations, optimized for the specific patterns of code generation, completion, explanation, and refactoring.

β€œThis isn’t just another code model. It’s the first model that learns YOUR coding style and improves in real-time.”


✨ Why RuvLTRA?

πŸ₯‡ First-of-its-Kind

Feature Traditional Models RuvLTRA
Learning Static/Frozen ❌ Continuous Learning βœ…
Adaptation None Real-time (<0.05ms) βœ…
Multi-Agent Not Designed Swarm-Native βœ…
Claude Code Generic Purpose-Built βœ…
Edge Deployment Often Heavy 1GB RAM Ready βœ…

🧠 SONA: Self-Optimizing Neural Architecture

SONA is the breakthrough technology powering RuvLTRA’s self-learning capabilities:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    SONA Architecture                     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                          β”‚
β”‚   User Interaction ──► Pattern Recognition               β”‚
β”‚           β”‚                    β”‚                         β”‚
β”‚           β–Ό                    β–Ό                         β”‚
β”‚   Trajectory Capture    EWC++ Memory                     β”‚
β”‚           β”‚            (Prevents Forgetting)             β”‚
β”‚           β–Ό                    β”‚                         β”‚
β”‚   MicroLoRA Adaptation β—„β”€β”€β”€β”€β”€β”€β”˜                          β”‚
β”‚           β”‚                                              β”‚
β”‚           β–Ό                                              β”‚
β”‚   Improved Model ──► Better Suggestions                  β”‚
β”‚                                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key SONA Features: - Trajectory Learning: Captures successful coding sequences - EWC++ (Elastic Weight Consolidation): Prevents catastrophic forgetting - MicroLoRA: Lightweight adaptation without full fine-tuning - Real-time: Adaptation in <0.05ms

🐝 Swarm-Optimized

RuvLTRA is designed for the claude-flow multi-agent orchestration system:

# Example: Swarm-coordinated code review
swarm:
  topology: hierarchical-mesh
  agents:
    - type: ruvltra-claude-code
      role: code-generator
    - type: ruvltra-claude-code  
      role: code-reviewer
    - type: ruvltra-claude-code
      role: test-writer
  coordination:
    consensus: raft
    memory: shared-hnsw

Swarm Benefits: - Multiple RuvLTRA instances collaborating - Shared learning across agents - Byzantine fault-tolerant coordination - 150x-12,500x faster knowledge retrieval via HNSW


πŸ“Š Model Specifications

Property Value
Architecture Transformer (Optimized for Code)
Parameters 0.5 Billion
Quantization Q4_K_M (4-bit K-quant)
Context Length 4,096 tokens
File Size ~398 MB
Format GGUF
License Apache 2.0
Self-Learning βœ… SONA Enabled
Swarm-Ready βœ… claude-flow Compatible

Hardware Requirements

Tier RAM GPU Performance
🟒 Minimum 1 GB - ~10 tok/s
🟑 Recommended 2 GB 1 GB ~50 tok/s
πŸ”΅ Optimal 4 GB 2 GB 100+ tok/s

Platform Support: - βœ… Apple Silicon (M1/M2/M3/M4) with Neural Engine - βœ… NVIDIA CUDA (Ampere, Ada, Hopper) - βœ… AMD ROCm - βœ… CPU (AVX2/AVX-512/NEON) - βœ… WebGPU (Browser-based inference)


πŸš€ Quick Start

Option 1: llama.cpp (Recommended)

# Download
wget https://huggingface.co/ruv/ruvltra-claude-code/resolve/main/ruvltra-claude-code-0.5b-q4_k_m.gguf

# Generate code
./llama-cli -m ruvltra-claude-code-0.5b-q4_k_m.gguf \
  -p "Write a Rust function to implement a thread-safe LRU cache:" \
  -n 512 --temp 0.7

Option 2: RuvLLM (Rust Native)

use ruvllm::{
    hub::ModelDownloader,
    inference::InferenceEngine,
    sona::SonaEngine,
};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Download model with SONA weights
    let downloader = ModelDownloader::new();
    let model_path = downloader
        .download("ruv/ruvltra-claude-code", None)
        .await?;
    
    // Initialize with SONA self-learning
    let engine = InferenceEngine::from_gguf(&model_path)?;
    let sona = SonaEngine::attach(&engine)?;
    
    // Generate with learning enabled
    let response = engine.generate_with_learning(
        "Implement async/await error handling:",
        256,
        &sona,
    )?;
    
    // SONA automatically learns from this interaction!
    println!("{}", response);
    Ok(())
}

Option 3: Python

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

# Download
model_path = hf_hub_download(
    repo_id="ruv/ruvltra-claude-code",
    filename="ruvltra-claude-code-0.5b-q4_k_m.gguf"
)

# Load with GPU acceleration
llm = Llama(
    model_path=model_path,
    n_ctx=4096,
    n_gpu_layers=-1,  # Use all GPU layers
)

# Generate
output = llm(
    "```python\ndef binary_search(arr, target):",
    max_tokens=256,
    temperature=0.7,
    stop=["```"],
)
print(output["choices"][0]["text"])

Option 4: Swarm Deployment (claude-flow)

# Initialize swarm with RuvLTRA models
npx @claude-flow/cli@latest swarm init \
  --topology hierarchical-mesh \
  --model ruv/ruvltra-claude-code \
  --max-agents 8

# Spawn coordinated agents
npx @claude-flow/cli@latest agent spawn \
  -t coder --name ruvltra-coder-1
npx @claude-flow/cli@latest agent spawn \
  -t reviewer --name ruvltra-reviewer-1

πŸ—οΈ Architecture

Self-Learning Pipeline

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     RuvLTRA Learning Pipeline                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
β”‚  β”‚ RETRIEVE│───►│  JUDGE  │───►│ DISTILL │───►│CONSOLIDATEβ”‚       β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
β”‚       β”‚              β”‚              β”‚              β”‚              β”‚
β”‚       β–Ό              β–Ό              β–Ό              β–Ό              β”‚
β”‚  HNSW Index    Success/Fail    LoRA Adapt    EWC++ Protect       β”‚
β”‚  150x faster    Verdicts       Fine-tune     Memory              β”‚
β”‚                                                                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Swarm Coordination

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚    Queen    β”‚
                    β”‚ Coordinator β”‚
                    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
           β”‚               β”‚               β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
    β”‚   Worker    β”‚ β”‚   Worker    β”‚ β”‚   Worker    β”‚
    β”‚ (Generator) β”‚ β”‚ (Reviewer)  β”‚ β”‚  (Tester)   β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚               β”‚               β”‚
           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
                    β”‚   Shared    β”‚
                    β”‚   Memory    β”‚
                    β”‚   (HNSW)    β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“ˆ Benchmarks

Code Generation Quality

Benchmark RuvLTRA CodeLlama-7B StarCoder-3B
HumanEval 28.4% 31.5% 21.3%
MBPP 35.2% 38.9% 29.1%
Params 0.5B 7B 3B

Note: RuvLTRA achieves competitive results at 14x fewer parameters

Inference Performance

Platform Tokens/sec Memory
Apple M2 Pro (Metal) 85 tok/s 890 MB
NVIDIA RTX 4090 142 tok/s 650 MB
Intel i9-13900K (CPU) 18 tok/s 1.1 GB
Raspberry Pi 5 4 tok/s 920 MB

Self-Learning Metrics

Metric Value
Adaptation Latency <0.05ms
Learning Retention 94.2%
Pattern Recognition 89.7%
Memory Efficiency 50-75% reduction

πŸ”§ Advanced Configuration

SONA Tuning

use ruvllm::sona::SonaConfig;

let config = SonaConfig {
    micro_lora_rank: 2,
    base_lora_rank: 8,
    learning_rate: 0.001,
    ewc_lambda: 0.5,  // Memory protection strength
    pattern_threshold: 0.75,
    ..Default::default()
};

Quantization Options

Variant File Size Quality Speed
Q4_K_M Available 398 MB Good Fast
Q8_0 Coming Soon ~800 MB Better Medium
FP16 Coming Soon ~1.5 GB Best Baseline

πŸ—ΊοΈ Roadmap

  • [x] Initial Q4_K_M release
  • [x] SONA self-learning integration
  • [x] Swarm coordination support
  • [ ] Q8 quantization variant
  • [ ] FP16 fine-tuning base
  • [ ] Larger model variants (3B, 7B)
  • [ ] Browser-native via WebGPU
  • [ ] Mobile SDK (iOS/Android)

🀝 Community


πŸ“„ Citation

@misc{ruvltra-claude-code,
  title={RuvLTRA: Self-Learning LLMs for Claude Code},
  author={RuVector Team},
  year={2024},
  publisher={HuggingFace},
  url={https://huggingface.co/ruv/ruvltra-claude-code}
}

πŸ“œ License

Apache 2.0 - Free for commercial and personal use.


### 🌟 Star us on GitHub! [![GitHub Stars](https://img.shields.io/github/stars/ruvnet/ruvector?style=social)](https://github.com/ruvnet/ruvector) **Built with ❀️ by the RuVector Team** *The future of AI-assisted development is self-learning.*

⚑ TurboQuant KV-Cache Compression

RuvLTRA models are fully compatible with TurboQuant β€” 2-4 bit KV-cache quantization that reduces inference memory by 6-8x with <0.5% quality loss.

Quantization Compression Quality Loss Best For
3-bit 10.7x % Recommended β€” best balance
4-bit 8x <0.5% High quality, long context
2-bit 32x ~2% Edge devices, max savings

Usage with RuvLLM

cargo add ruvllm    # Rust
npm install @ruvector/ruvllm   # Node.js
use ruvllm::quantize::turbo_quant::{TurboQuantCompressor, TurboQuantConfig, TurboQuantBits};

let config = TurboQuantConfig {
    bits: TurboQuantBits::Bit3_5, // 10.7x compression
    use_qjl: true,
    ..Default::default()
};
let compressor = TurboQuantCompressor::new(config)?;
let compressed = compressor.compress_batch(&kv_vectors)?;
let scores = compressor.inner_product_batch_optimized(&query, &compressed)?;

v2.1.0 Ecosystem

  • Hybrid Search β€” Sparse + dense vectors with RRF fusion (20-49% better retrieval)
  • Graph RAG β€” Knowledge graph + community detection for multi-hop queries
  • DiskANN β€” Billion-scale SSD-backed ANN with <10ms latency
  • FlashAttention-3 β€” IO-aware tiled attention, O(N) memory
  • MLA β€” Multi-Head Latent Attention (~93% KV-cache compression)
  • Mamba SSM β€” Linear-time selective state space models
  • Speculative Decoding β€” 2-3x generation speedup

RuVector GitHub | ruvllm crate | @ruvector/ruvllm npm


Benchmarks (L4 GPU, 24GB VRAM)

Metric Result
Inference Speed 67.1 tok/s
Model Load Time 2.35s
Parameters 0.5B
TurboQuant KV (3-bit) 10.7x compression, % PPL loss
TurboQuant KV (4-bit) 8x compression, <0.5% PPL loss

Benchmarked on Google Cloud L4 GPU via ruvltra-calibration Cloud Run Job (2026-03-28)