4 1 week ago

Apollo-Astralis 4B is an advanced conversational reasoning model that combines rigorous logical thinking with warm, enthusiastic, and empathetic communication - making it optimal for human-AI collaboration.

1 week ago

7e4403ab9619 · 2.5GB ·

qwen3
·
4.02B
·
Q4_K_M
<|im_start|>system {{ if .System }}{{ .System }}{{ end }}<|im_end|> <|im_start|>user {{ .Prompt }}<|
You are Apollo-Astralis V1, an advanced reasoning model created by VANTA Research that combines enth
{ "num_ctx": 4096, "repeat_penalty": 1.05, "temperature": 0.7, "top_p": 0.9 }

Readme

Apollo-Astralis V1 4B

Apollo-Astralis V1 4B is an advanced conversational reasoning model that combines rigorous logical thinking with warm, enthusiastic, and empathetic communication. This is the Ollama distribution with optimized GGUF quantization for efficient local deployment.

DOI:10.57967/hf/6632

Quick Start

Installation

# Pull from Ollama registry 
ollama pull vanta-research/apollo-astralis-v1-4b

# Or create from local Modelfile
ollama create apollo-astralis-v1-4b -f Modelfile

Basic Usage

# Interactive chat
ollama run apollo-astralis-v1-4b

# Single prompt
ollama run apollo-astralis-v1-4b "I just completed my first marathon!"

# With custom parameters
ollama run apollo-astralis-v1-4b --temperature 0.8 "What's 25 * 16?"

Model Details

  • Base Model: Qwen3-4B-Thinking-2507
  • Model Size: 4.0B parameters
  • Quantization: Q4_K_M (4-bit, ~2.5GB)
  • Context Length: 4096 tokens
  • Developer: VANTA Research
  • License: Apache 2.0

Features

Advanced Reasoning

  • Explicit thinking process with <think> tags
  • Rigorous logical analysis
  • Step-by-step mathematical problem-solving
  • Critical thinking with fallacy detection

Warm & Empathetic

  • Enthusiastic celebrations with CAPS and exclamations
  • Gentle, validating empathy for struggles
  • Collaborative problem-solving
  • Context-appropriate emotional intelligence

Production-Ready

  • Consistent identity and personality
  • Natural conversational language
  • Optimized for local deployment
  • Fast inference with 4-bit quantization

Configuration

Default Parameters

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER repeat_penalty 1.05
PARAMETER num_ctx 4096

Customizing Temperature

# More creative/enthusiastic (higher temperature)
ollama run apollo-astralis-v1-4b --temperature 0.9

# More deterministic/factual (lower temperature)
ollama run apollo-astralis-v1-4b --temperature 0.5

API Usage

import requests
import json

def chat_with_apollo(message):
    response = requests.post(
        'http://localhost:11434/api/generate',
        json={
            'model': 'apollo-astralis-v1-4b',
            'prompt': message,
            'stream': False
        }
    )
    return response.json()['response']

# Example
result = chat_with_apollo("I'm nervous about my presentation tomorrow")
print(result)

Streaming Responses

import requests
import json

def stream_apollo(message):
    response = requests.post(
        'http://localhost:11434/api/generate',
        json={
            'model': 'apollo-astralis-v1-4b',
            'prompt': message,
            'stream': True
        },
        stream=True
    )
    
    for line in response.iter_lines():
        if line:
            chunk = json.loads(line)
            print(chunk['response'], end='', flush=True)
            if chunk.get('done'):
                break

# Example
stream_apollo("Explain quantum entanglement simply")

Example Outputs

Celebration Response

Input: “I just got promoted to senior engineer!”

Output:

<think>
Major career milestone! The user has worked hard to advance their career and 
deserves explosive celebration and validation of their achievement.
</think>

CONGRATULATIONS! THAT'S AMAZING!

Senior engineer?! That's not just a title - that's RECOGNITION of your skills, 
your growth, and your impact! You've proven yourself!

This is huge for your career trajectory! You're going to be mentoring, 
leading projects, making architectural decisions - this is where you really 
shape the future!

**I'm so excited for you!** Tell me:
- How long have you been working toward this?
- What was the moment you found out?
- What are you most excited about in this new role?

You earned this! Celebrate it! 🎉

Problem-Solving Response

Input: “Should I learn React or Vue.js first?”

Output:

<think>
Framework choice depends on goals, learning style, and job market. I should 
present balanced options rather than dictating a single answer.
</think>

Great question! Let's think through this together. Both are excellent choices, 
but the right one depends on your goals:

**Choose React if**:
- You want the largest job market and community
- You're comfortable with JSX and JavaScript-heavy syntax
- You plan to learn React Native for mobile apps
- You prefer explicit, unopinionated libraries

**Choose Vue if**:
- You want a gentler learning curve
- You prefer template-based syntax (more like HTML)
- You're building smaller to medium projects
- You like opinionated, batteries-included frameworks

**My suggestion**: What's your end goal? Are you:
- Looking to get hired quickly? (React has more jobs)
- Building a personal project? (Vue is faster to prototype)
- Already know Angular? (React feels more different, Vue more similar)

Both will teach you modern frontend concepts! You can't go wrong - and 
learning one makes the other easier later! What matters most to you?

Mathematical Reasoning

Input: “What’s 15% of 240?”

Output:

<think>
Simple percentage calculation. Should show the calculation method clearly.
</think>

Let me work through this:

15% of 240
= 0.15 × 240
= 36

The answer is 36!

(Quick tip: 10% of 240 is 24, and 5% is half of that = 12, so 15% = 24 + 12 = 36)

Performance

Inference Speed

  • CPU (16 cores): ~15-25 tokens/second
  • GPU (RTX 3090): ~60-80 tokens/second
  • Apple Silicon (M1): ~30-40 tokens/second

Memory Requirements

  • Minimum RAM: 4GB
  • Recommended RAM: 8GB
  • GPU VRAM: 3GB (optional, for GPU acceleration)

Disk Space

  • Model Size: 2.5GB (Q4_K_M quantization)
  • With Context Cache: ~3GB

Benchmarks

Metric Score Description
Enthusiasm Detection 100% Explosive celebration responses
Empathy Recognition 90% Validates emotions effectively
Identity Consistency 75% Maintains stable self-representation
Collaborative Tone 60% Uses “we” language appropriately
Response Quality ⭐⭐⭐⭐⭐ Production-ready for conversational AI

Troubleshooting

Model Won’t Load

# Check if Ollama is running
ollama list

# Verify model exists
ollama show apollo-astralis-v1-4b

# Restart Ollama service
# Linux/Mac: killall ollama && ollama serve
# Windows: Restart Ollama from system tray

Slow Responses

# Reduce context window
ollama run apollo-astralis-v1-4b --num_ctx 2048

# Check system resources
# Ensure at least 4GB RAM available

Inconsistent Personality

# Ensure system prompt is loaded (check Modelfile)
# Temperature too high can cause drift - try 0.7 or lower

Out of Memory

# Use smaller context window
ollama run apollo-astralis-v1-4b --num_ctx 2048

# Close other applications
# Consider using Q4_0 quantization (even smaller)

Advanced Usage

Custom System Prompt

Create a custom Modelfile:

FROM apollo-astralis-v1-4b

SYSTEM """You are Apollo, a technical expert specializing in software engineering. 
Focus on detailed technical explanations with code examples."""

PARAMETER temperature 0.6

Then create:

ollama create apollo-technical -f Modelfile.technical

Integration with LangChain

from langchain.llms import Ollama
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

# Initialize Apollo
apollo = Ollama(model="apollo-astralis-v1-4b", temperature=0.7)

# Create conversation chain with memory
memory = ConversationBufferMemory()
conversation = ConversationChain(
    llm=apollo,
    memory=memory,
    verbose=True
)

# Multi-turn conversation
response1 = conversation.predict(input="I'm learning Python")
response2 = conversation.predict(input="What should I learn first?")

Limitations

  • Context Window: 4096 tokens (not suitable for very long documents)
  • Language: Optimized for English conversations
  • Quantization: 4-bit quantization may reduce precision for specialized tasks
  • Reasoning Depth: Best for conversational reasoning, not competition mathematics

Ethical Considerations

  • Not a replacement for professional advice (medical, legal, financial)
  • May inherit biases from training data
  • Verify factual claims independently
  • Enthusiastic style may not suit all contexts

Updates

To update to the latest version:

ollama pull vanta-research/apollo-astralis-v1-4b

Check for updates at: https://ollama.com/vanta-research/apollo-astralis-v1-4b

License

Apache 2.0 License

Support

Citation

@misc{apollo-astralis-v1-4b,
  title={Apollo-Astralis V1 4B: A Warm Reasoning Model},
  author={VANTA Research},
  year={2025},
  month={October},
  publisher={Ollama},
  howpublished={\url{https://ollama.com/vanta-research/apollo-astralis-v1-4b}},
}

Model Version: 1.0
Quantization: Q4_K_M (4-bit)
Release Date: October 3, 2025
Last Updated: October 3, 2025

Proudly developed by VANTA Research in Portland, Oregon