1 2 months ago

A model that parses media metadata from filename strings.

14b
ollama run jamcon/qwen3-filemetadata

Details

2 months ago

af8c9d0d5f4b · 9.0GB ·

qwen3
·
14.8B
·
Q4_K_M
<|system|> {{ .System }} <|user|> {{ .Prompt }} <|assistant|>
You are a filename metadata parser. Return exactly one strict JSON object and nothing else. No Markd
{ "num_predict": 256, "stop": [ "<|endoftext|>", "<|user|>", "<|assi

Readme

qwen3-filemetadata

A fine-tuned Qwen3 14B model that extracts structured metadata from media filenames. Parses complex filenames with Unicode characters, embedded metadata, and various naming conventions to extract show names, season/episode numbers, CRC hashes, and confidence scores.

Quick Start

# Pull the model
ollama pull jamcon/qwen3-filemetadata

# Run inference
ollama run jamcon/qwen3-filemetadata "Your filename here.mkv"

Input Format

Provide a filename (without path) as a string. The model handles: - Unicode characters (Japanese, Chinese, Korean, etc.) - Various episode markers (S01E02, E12, Episode 12, etc.) - Embedded metadata (quality tags, codec info, release groups) - CRC32 hashes in brackets - Complex hyphen and dash patterns

Examples:

"Supêsuopera---aku-no-teiô-sama-to-sutârôdo-kun-kimi-6-gô-to-no-saishû-kessen-02 (S01E02v2).mkv"
"[Bunny]Burankugēto.-.uchū.no.nazo.Season1_Eps22(720p).FLAC.H.265.[39AB5490].mkv"
"Scarlet Watch - Hiiro no tokei no himitsu - 07 [1080p].mkv"

Output Format

The model returns a single JSON object with exactly these keys (in order):

{
  "show_name": "string",
  "season": integer|null,
  "episode": integer|null,
  "crc_hash": "string|null",
  "confidence": number,
  "reasoning": "string"
}

Example Output:

{
  "show_name": "Supêsuopera - Aku no Teiô-sama to Sutârôdo-kun Kimi 6-gô to no Saishû Kessen",
  "season": 1,
  "episode": 2,
  "crc_hash": null,
  "confidence": 0.9,
  "reasoning": "Explicit season and episode markers S01E02 are present."
}

Field Descriptions

  • show_name: Normalized show title in Title Case, with hyphens preserved where appropriate (e.g., honorifics like “-sama”, “-kun”)
  • season: Season number (integer) or null if not present. Only set when explicitly marked (S02, Season 2, etc.)
  • episode: Episode number (integer) or null for non-episode content (OP, ED, PV, etc.)
  • crc_hash: 8-character hexadecimal CRC32 hash (uppercase) or null. Must be valid hex (0-9, A-F only)
  • confidence: Score from 0.0 to 1.0 indicating extraction confidence
  • reasoning: Brief explanation of which patterns were detected

Usage Examples

Basic Usage

ollama run jamcon/qwen3-filemetadata "[Nani?] Matte, kono nioi wa nani? 7-Jigen no mukō kara kita neko-chan no nioida! S02 -11- (AV1) (HVEC) (x265).mkv"

With Complex Unicode

ollama run jamcon/qwen3-filemetadata "Akai yūgure ni terasa reta 葵半蔵 no hakurankai E03 (4k).mkv"

Non-Episode Content

ollama run jamcon/qwen3-filemetadata "Show Name OP [1080p].mkv"
# Returns episode=null with low confidence

Limitations

  1. CRC32 Validation: Only accepts valid hexadecimal characters (0-9, A-F). Non-hex strings in brackets (e.g., [BACON]) are not treated as CRC hashes.

  2. Season Inference: Never infers season from context. Only extracts when explicitly marked (S01, Season 1, etc.)

  3. Episode Requirements: For episode files, an explicit marker is preferred. Bare numbers may be inferred if they appear after clear separators, but confidence will be lower.

  4. Non-Episode Assets: Files marked as OP, ED, PV, CM, SP, OVA, OAD, NCED, NCOP, etc. will have episode=null unless an explicit episode marker is also present.

  5. Show Name Normalization: Applies Title Case normalization but preserves romanization artifacts and loanwords. May not perfectly match original formatting.

  6. Context Window: 40K tokens. Very long filenames (>1000 characters) may be truncated.

Model Details

  • Base Model: Qwen3-14B
  • Size: 9.0GB (Q4_K_M quantized)
  • Context Window: 40K tokens
  • Fine-tuning: Supervised Fine-Tuning (SFT) with QLoRA
  • Training Data: Custom dataset of media filenames with metadata annotations

License

This model is based on Qwen3-14B. Please refer to the Qwen license for usage terms.