3 2 days ago

MINT (Multi-turn Inter-tactic Novelty Training) model for empathic dialogue, fine-tuned from Qwen/Qwen3-4B.

ollama run fauxpaslife/MINT-empathy-Qwen3-4B

Models

View all →

Readme

Screenshot 2026-04-10 221131.jpg

Imported from HF, chat template added, and exported to Ollama via Forge. I found the concept interesting since part of the work we are doing is around language, emotion, and memory.

A very verbose model sharing all of its thinking. I have not compared to the base model; this import was designed for that test.

image.png

Thinking Example

<think>
Okay, the user is feeling really down and considering uninstalling Ollama and giving up on AI. That's a tough situation. First, I need to acknowledge their feelings and show empathy. They might be going through a tough time with AI, maybe facing challenges or frustrations. I should validate their emotions without judgment.

I should ask them what's specifically causing this sadness. Maybe they're encountering technical issues, or perhaps the AI isn't meeting their expectations. It's important to understand the root of their problem. I need to listen actively and offer support.

I should also remind them that it's okay to feel this way and that taking a break is a valid choice. But I should also encourage them to reflect on why they're feeling this way. Maybe there's a way to adjust their approach or find a different solution. I should avoid pushing them to continue, but instead offer alternative perspectives.

I should check if they need help troubleshooting Ollama or if they're looking for alternatives. Maybe they need guidance on how to deal with the challenges they're facing. I should present options without being pushy, just offering support. Let them know I'm here to help, whether they want to explore more AI options or take a break.

Also, I should make sure my response is warm and supportive, using phrases that show I care about their well-being. Avoid technical jargon, keep the tone conversational. Maybe suggest they take some time to reflect, but also offer assistance if they want to explore other avenues. The goal is to be a safe space where they feel heard and supported.
</think>

Response Example

It sounds like you're going through a really tough time, and I can see how feeling this way could feel overwhelming. Your emotions are valid, and it’s completely okay to feel frustrated or even sad when dealing with something as complex as AI or tools like Ollama. You’re not alone in this—many people face moments where they question their choices or feel like giving up on something they’ve invested time into.  

Would you mind sharing a bit more about what’s been causing this sadness? Is it a specific issue with Ollama, or are there broader challenges you’re facing with AI that’s making you feel disconnected or frustrated? I’m here to listen without judgment and help you explore what might be going on.  

If you’re feeling overwhelmed, taking a break or stepping back isn’t a failure—it’s a sign of self-awareness. You’ve already taken a brave step by acknowledging your feelings. Would you like to talk about what’s behind this feeling, or would you prefer to take some time to reflect on your next steps? I’m here to support you, whether you want to troubleshoot, explore alternatives, or simply process your emotions. You’ve got this.


MINT (Multi-turn Inter-tactic Novelty Training) model for empathic dialogue, fine-tuned from Qwen/Qwen3-4B.

This is the Q+D_KL variant (Quality + KL-divergence tactic diversity reward), the best-performing MINT configuration from the paper. What is MINT?

MINT is a reinforcement learning framework that trains empathic dialogue models to diversify their discourse moves across conversation turns. Most models lock into repetitive empathy tactics (e.g., always validating emotions); MINT combines an empathy quality reward with a cross-turn tactic novelty signal via GRPO to break this pattern.

Trained on 322 multi-turn emotional support conversations and evaluated on the Lend-an-Ear framework across 6 empathy dimensions. Training

Method GRPO (Group Relative Policy Optimization) via VERL
Reward Quality (PsychoCounsel) + Tactic Diversity (KL divergence)
Base model Qwen/Qwen3-4B
KL coeff - 0.01
Diversity weight - 1.0
Response length - 2048 tokens
Rollouts n=8 per prompt

@article{zhan2026discourse,
  title={Discourse Diversity in Multi-Turn Empathic Dialogue},
  author={Zhan, Hongli and Gueorguieva, Emma S and Hernandez, Javier and Suh, Jina and Ong, Desmond C and Li, Junyi Jessy},
  journal={arXiv preprint arXiv:2604.11742},
  year={2026}
}

Original HF Model
GGUF from Model