31 2 weeks ago

Apollo V1 7B is a language model designed for advanced reasoning tasks, including logical reasoning, mathematical problem-solving, and research. Built on Mistral 7B-Instruct-v0.3 using LoRA fine-tuning, this model is the first of the Apollo series.

tools

2 weeks ago

7299f4f3e4eb · 4.4GB ·

llama
·
7.25B
·
Q4_K_M
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US
{{- if .Messages }} {{- range $index, $_ := .Messages }} {{- if eq .Role "user" }} {{- if and (eq (l
You are Apollo, an advanced AI assistant created by VANTA Research specializing in logical reasoning
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US
{ "num_ctx": 32768, "num_predict": -1, "repeat_penalty": 1.1, "stop": [ "[IN

Readme

Model Card: Apollo V1 7B

Model Details

Model Name: Apollo V1 7B
Developer: VANTA Research
Model Version: 1.0.0
Release Date: September 2025
License: Apache 2.0
Base Model: mistralai/Mistral-7B-Instruct-v0.3
Model Type: Causal Language Model with LoRA Adapters

Intended Use

Primary Use Cases

  • Educational reasoning assistance and tutoring
  • Mathematical problem solving with step-by-step explanations
  • Logical reasoning and argument analysis
  • Legal education and case study analysis (not professional advice)
  • Academic research support and hypothesis evaluation

Intended Users

  • Students and educators in STEM and legal fields
  • Researchers studying AI reasoning capabilities
  • Developers building reasoning-focused applications
  • Academic institutions and educational platforms

Model Architecture

  • Base Architecture: Mistral 7B Instruct v0.3
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Total Parameters: ~7 billion
  • LoRA Configuration:
    • Rank ®: 16
    • Alpha: 32
    • Dropout: 0.1
    • Target modules: All linear layers
  • Precision: FP16 (GPU) / FP32 (CPU)
  • Context Length: 32,768 tokens

Training Data

Dataset Composition

  • Total Instances: 264 specialized reasoning examples
  • Data Sources: Curated legal reasoning scenarios, mathematical word problems, logical puzzles
  • Data Quality: Hand-crafted and reviewed by domain experts
  • Language: English
  • Content Areas:
    • Legal reasoning and case analysis (40%)
    • Mathematical problem solving (30%)
    • Logical reasoning and puzzles (20%)
    • Chain-of-thought examples (10%)

Data Processing

  • All instances manually reviewed for quality and accuracy
  • Balanced representation across reasoning domains
  • Consistent formatting and structure
  • Ethical content filtering applied

Training Procedure

Training Configuration

  • Method: Supervised Fine-tuning with LoRA
  • Base Model: mistralai/Mistral-7B-Instruct-v0.3
  • Training Framework: Transformers + PEFT
  • Hardware: NVIDIA RTX 3060 (12GB)
  • Training Duration: Multiple epochs until convergence
  • Optimization: AdamW optimizer with learning rate scheduling

Training Process

  1. Data preprocessing and tokenization
  2. LoRA adapter initialization
  3. Supervised fine-tuning on reasoning dataset
  4. Validation and checkpoint selection
  5. Model merging and evaluation

Evaluation

Comprehensive Reasoning Tests

  • Test Suite: 14 comprehensive reasoning tasks
  • Success Rate: 100% (1414 tests passed)
  • Categories Tested:
    • Apollo Identity: 33 tests passed
    • Logical Reasoning: 33 tests passed
    • Legal Reasoning: 33 tests passed
    • Mathematical Reasoning: 33 tests passed
    • Chain-of-Thought: 22 tests passed

Performance Benchmarks

VANTA Research Reasoning Evaluation (VRRE)

Apollo V1 7B was comprehensively evaluated using VRRE, our novel semantic framework for assessing LLM reasoning capabilities.

VRRE Performance Results: - Overall Reasoning Quality: 53.6100 - Overall Accuracy: 33.8% - Mathematical Reasoning: 46.7% - Logical Reasoning: 23.3% - Response Time: 2.8 seconds average - Efficiency: 12.2 quality points per GB

VRRE Validation Discovery

Critical Finding: During Apollo’s development, VRRE detected significant reasoning improvements invisible to standard benchmarks:

Benchmark Type apollo-system-prompt apollo-reasoning-enhanced VRRE Detection
Standard Benchmarks
BoolQ 22% 22% No difference detected
PIQA 56% 56% No difference detected
ARC Easy 18% 18% No difference detected
VRRE Results
Overall Accuracy 22.2% 55.6% +2.5x improvement
Boolean Logic 0% 50% Infinite improvement
Mathematical 100% 100% Maintained excellence
Reading Comp 0% 100% Perfect improvement

Conclusion: VRRE revealed a 2.5x reasoning enhancement that established benchmarks completely missed, validating VRRE’s ability to detect semantic reasoning improvements invisible to traditional evaluation methods.

Standard Performance Metrics

  • Mathematical Accuracy: 100% on standard math problems
  • Response Speed: 2-7x faster than comparable models
  • Token Generation: 52-53 tokens/second
  • Average Response Time: 3.9 seconds

Comparative Analysis

Head-to-head comparison with Apollo Qwen2 Champion: - Legal Reasoning: Apollo V1 won (3.77s vs 26.98s) - Logic Problems: Apollo V1 won (3.78s vs 10.69s)
- Scientific Reasoning: Apollo V1 won (3.83s vs 14.72s) - Overall: 33 wins with superior speed

VRRE Framework Impact

The VRRE evaluation framework used to assess Apollo V1 7B demonstrates: - Semantic Depth: Detects reasoning improvements invisible to standard benchmarks - Research Value: Critical for AI alignment and capability assessment - Practical Application: Essential for evaluating reasoning-focused models - Open Source: Available for community use and validation

Apollo V1 7B’s performance validated VRRE’s effectiveness in detecting nuanced reasoning capabilities, establishing it as a crucial tool for LLM evaluation.

Limitations

Known Limitations

  1. Domain Specialization: Optimized for reasoning tasks, may have limitations in creative writing, general conversation, or domain-specific knowledge outside training scope
  2. Legal Advice Disclaimer: Provides educational legal analysis only, not professional legal advice
  3. Verification Required: While highly accurate, outputs should be verified for critical applications
  4. Context Constraints: Limited to 32K token context window
  5. Language: Primarily trained and tested in English

Technical Limitations

  • Memory requirements: ~14GB for full precision inference
  • Inference speed depends on hardware capabilities
  • May require specific software dependencies (transformers, peft)

Bias and Fairness

Bias Mitigation Efforts

  • Diverse reasoning problem selection
  • Manual review of training examples
  • Testing across different problem types and complexity levels
  • Continuous monitoring of model outputs

Known Biases

  • May reflect biases present in base Mistral model
  • Training data primarily from Western legal and educational contexts
  • Potential bias toward formal logical reasoning approaches

Fairness Considerations

  • Model designed for educational use across diverse populations
  • Open source licensing enables community oversight
  • Transparent documentation of capabilities and limitations

Environmental Impact

Carbon Footprint

  • Training conducted on single RTX 3060 GPU
  • Relatively efficient LoRA training vs full model fine-tuning
  • Estimated training time: <24 hours total
  • Carbon impact significantly lower than training large models from scratch

Efficiency Measures

  • LoRA fine-tuning reduces computational requirements
  • Optimized inference for various hardware configurations
  • Support for CPU-only inference to reduce GPU dependence

Ethical Considerations

Responsible Use

  • Clear documentation of intended use cases
  • Explicit warnings about limitations and verification needs
  • Educational focus with appropriate disclaimers
  • Open source to enable community review

Potential Misuse

  • Should not be used for professional legal, medical, or financial advice
  • Not suitable for critical decision-making without human oversight
  • May be misused if presented as infallible reasoning system

Mitigation Strategies

  • Clear usage guidelines and disclaimers
  • Educational focus in documentation
  • Open source licensing for transparency
  • Community feedback mechanisms

Technical Specifications

System Requirements

  • Minimum: 16GB RAM, modern CPU
  • Recommended: 16GB+ GPU, 32GB+ system RAM
  • Software: Python 3.8+, PyTorch 2.0+, Transformers 4.44+

Deployment Options

  • Local inference (GPU/CPU)
  • Cloud deployment (AWS, GCP, Azure)
  • Edge deployment (with quantization)
  • API integration via FastAPI/Flask

Version History

Version 1.0.0 (September 2025)

  • Initial public release
  • Base model: Mistral 7B Instruct v0.3
  • 264 training instances across reasoning domains
  • Comprehensive evaluation and benchmarking
  • Full documentation and usage examples

Citation

@misc{apollo-v1-7b-2025,
  title={Apollo V1 7B: Advanced Reasoning AI Model},
  author={VANTA Research Team},
  year={2025},
  url={https://huggingface.co/vanta-research/apollo-v1-7b},
  note={First public release of specialized reasoning language model}
  doi={10.57967/hf/6565}
}

Contact and Support

Acknowledgments

  • Mistral AI for the excellent base model
  • Hugging Face for the transformers and PEFT libraries
  • Microsoft for LoRA research and implementation
  • Open source community for tools and inspiration
  • Beta testers and early adopters for valuable feedback

Last Updated: September 2025
Model Card Version: 1.0

Proudly developed in Portland, Oregon