Tencent WeDLM-7B-Base converted to GGUF (Q4_K_M). A text-diffusion model based on Qwen2.5 architecture, optimized for efficient parallel decoding.

ollama run doitmagic/wedlm-7b-base

curl http://localhost:11434/api/chat \
  -d '{
    "model": "doitmagic/wedlm-7b-base",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='doitmagic/wedlm-7b-base',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'doitmagic/wedlm-7b-base',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Models

View all →

Name

1 model

Size

Context

Input

wedlm-7b-base:latest

4.7GB · 16K context window · Text · 1 month ago

wedlm-7b-base:latest

4.7GB

16K

Text

Readme

WeDLM-7B-Base (GGUF Quantized)

This model is a GGUF conversion of tencent/WeDLM-7B-Base, quantized to Q4_K_M for efficient local inference via Ollama.

Model Details

WeDLM (Web-enhanced Diffusion Language Model) is developed by Tencent. It is an advanced model that reconciles Diffusion Language Models with Standard Causal Attention, designed for fast inference.

Original Repo: tencent/WeDLM-7B-Base
Base Architecture: Qwen2.5-7B
Quantization: Q4_K_M (4-bit, medium - balanced quality/speed)
Context Length: 16k (Native), effective context depends on system resources.
License: Apache 2.0

Usage

You can run this model directly with Ollama:

ollama run doitmagic/wedlm-7b-base

Setup & Conversion

This model was converted and quantized by doITmagic using llama.cpp. It uses the qwen2 architecture definition to ensure compatibility with standard inference engines like Ollama.