oamazonasgabriel

oamazonasgabriel

Data & AI Engineering - Open Source, Private AI Enthusiast

oamazonasgabriel

qwen3.6-35b-a3b

A memory-efficient model configuration of Qwen3.6-35B-A3B using an upstream imatrix-calibrated IQ4_XS quantization and q4_0 KV cache. Designed for 24 GB VRAM

tools thinking

829 Pulls 1 Tag Updated 1 month ago
qwen3.5-9b

A coding-optimized configuration of Qwen3.5-9B designed for 16 GB single-GPU hardware. The model uses the official Q4_K_M quantization (~6.6 GB weights), leaving ~9 GB headroom for KV cache — enabling 32K+ context windows comfortably.

vision tools thinking

704 Pulls 1 Tag Updated 1 month ago
lfm2.5-230m

LFM2.5-230M is a hybrid language model by Liquid AI, Built to Run Anywhere. The ideal lightweight AI companion for: 4 GB laptops · 8 GB desktops · Edge devices · Raspberry Pi 5

212 Pulls 1 Tag Updated 4 days ago
qwen3.5-4b

A balanced configuration variant of Qwen3.5-4B, planned for FIM (Fill-In-the-Middle) and Tool Calling in restrict capacity environments. Using the Q3_K_S GGUF quantization from HuggingFace (unsloth).

208 Pulls 2 Tags Updated 4 days ago
qwen2.5-coder-0.5b

A lightweight, FIM (Fill-In-the-Middle) optimized variant of Qwen2.5-Coder-0.5B-Instruct using the f16 GGUF quantization from HuggingFace. At only ~1 GB, it fits comfortably on any 8 GB single GPU with headroom for 8K context.

100 Pulls 1 Tag Updated 4 days ago
qwen2.5-coder.1.5b-mlx

Code-specialized 1.5B model in F16 precision, optimized for MLX workflows. 32K context, fill-in-the-middle support, and fast inference on GPUs with 8 GB+ memory. Ideal for code generation, completion, and bug fixing.

79 Pulls 1 Tag Updated 4 days ago
bonsai-27b

67 Pulls 1 Tag Updated 3 days ago
qwen3.5-2b

Qwen3.5 2B in Q8_0 quantization. Strong balance of capability and efficiency with 262K context, vision, tool use, and thinking. Ideal for local deployment on consumer hardware.

vision tools thinking

61 Pulls 1 Tag Updated 5 days ago
lfm2-1.2b-tool

LFM2-1.2B-Tool — a specialized 1.2B tool-calling model from Liquid AI, fine-tuned exclusively for concise and precise function calling. Uses a hybrid LIV+GQA architecture, outperforms thinking models of similar size without chain-of-thought overhead.

39 Pulls 1 Tag Updated 2 weeks ago
qwen3.5-0.8b

Qwen3.5 0.8B in Q8_0 quantization. Small, fast model with 262K context, vision, tool use, and thinking capabilities. Optimized for local/edge deployment on constrained hardware.

vision tools thinking

34 Pulls 1 Tag Updated 5 days ago
lfm2.5-8b-a1b

A general-purpose 8.3B Mixture-of-Experts model from Liquid AI that activates only ~1.5B parameters per token — delivering strong reasoning, tool calling, and multilingual support while fitting comfortably in 8 GB VRAM.

1 Pull 1 Tag Updated 18 hours ago