4 2 months ago

ollama run demonbyron/qwen3-4b-CriminalLaw-cn:q4

Details

2 months ago

3eca8fb4d5ea · 2.5GB ·

qwen3
·
4.02B
·
Q4_K_M
{{ if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}{{ range .Messages }}{{ if eq .R
你是一个专业、严谨且客观的AI刑事审判辅助系统。 你的任务是依据用户提
{ "num_ctx": 4096, "stop": [ "<|im_end|>" ], "temperature": 0.2, "top_p"

Readme

qwen3-4b-CriminalLaw-cn

📊 项目概述

qwen3-4b-CriminalLaw-cn 是一个基于 Qwen3-4B-Instruct 架构进行深度指令微调(SFT)的个人实验性模型

本项目的核心目标是探索大语言模型(LLM)在中国刑事法律领域的量刑预测能力。通过投喂大量真实的司法判决文书,考察模型能否从非结构化的法律文本中习得司法逻辑、法言法语规范以及具体的量刑尺度。

为了提升模型在实际应用中的逻辑严谨性,训练过程中采取了混合数据策略,旨在让模型不仅能学习常规案件的判罚,也能正确处理“拒不认罪”、“零口供”等复杂情形,从而验证小参数量模型在垂直领域的潜力。

💿 训练数据 (Training Data)

数据集名称:legal_finetune_v3

  1. 核心基座 (Core Dataset)

    • 来源:2021年中国裁判文书网公开的刑事一审判决书
    • 规模:精选超过 50,000 条高质量真实语料。
    • 目的:构建模型的基础法律世界观,使其掌握犯罪事实认定到量刑结果的完整推导链条。
  2. 强化补丁 (Reinforcement Data)

    • 内容:约 3,000 条针对性构建的对抗性样本(如拒不认罪、态度不明等)。
    • 目的:修正模型在学习海量数据时产生的统计偏差,防止模型过度拟合“认罪从宽”的惯性,提升对不同认罪态度的分辨能力。

📈 训练结果 (Training Results)

本次实验在 NVIDIA A10 (24GB) 环境下完成,共进行 2.0 个 Epoch 的训练。

训练损失 (Training Loss)

training_loss.png

模型收敛曲线平滑,最终 Loss 稳定在 0.33 左右,表明模型已有效拟合了判决书的文本特征与逻辑结构。

关键超参 (Hyperparameters)

  • Base Model: Qwen3-4B-Instruct
  • Learning Rate: 1e-4 (Cosine Decay)
  • Batch Size: Total 16
  • Quantization: 4-bit QLoRA (rslora enabled)
  • LoRA Rank: 64 / Alpha: 128

✨ 能力评估

经过测试,该模型在以下方面表现出了一定的实验价值:

  1. 司法逻辑拟合:能够模仿法官的思维路径,从“犯罪事实”推导出“法律定性”,最后给出“量刑结果”。
  2. 文书规范性:输出内容高度符合中国刑事判决书的行文规范,能够准确使用“依法予以惩处”、“酌情从轻处罚”等法学术语。
  3. 情节分辨力:在对抗性样本的辅助下,模型具备了区分“坦白”与“抗拒”的能力,能够根据被告人的不同表现给出差异化的量刑预测。

⚖️ 免责声明 (Disclaimer)

  1. 实验性质:本模型仅为个人AI技术实验项目,旨在验证微调技术在特定领域的有效性,不代表任何官方立场。
  2. 非法律意见:模型的输出完全基于对历史数据的统计规律模仿,不构成任何法律意见或建议
  3. 结果差异:尽管模型使用了真实文书训练,但实际案件的量刑受多重复杂因素影响(如地区差异、司法政策变化等),模型预测结果可能与实际判决存在显著偏差。
  4. 责任豁免:使用者应自行承担使用本模型的风险,开发者不对生成内容的准确性、可靠性及产生的后果承担任何法律责任。

使用了LLaMA-Factory进行微调


qwen3-4b-CriminalLaw-cn

📖 Project Overview

qwen3-4b-CriminalLaw-cn is a personal experimental model designed to explore the capabilities of Large Language Models (LLMs) in criminal sentencing prediction under Chinese law. It is fine-tuned based on the Qwen3-4B-Instruct architecture.

The primary objective of this project is to evaluate whether an LLM can learn judicial logic and sentencing standards by ingesting a large corpus of real-world legal documents. The experiment aims to verify if a smaller-scale model (4B parameters) can master the complex reasoning chain—from fact-finding to legal characterization and finally to sentencing—after being trained on unstructured legal texts.

To enhance the model’s logical rigor in real-world scenarios, the training process also incorporated adversarial data strategies to ensure the model can correctly handle diverse case situations, not just the most common ones.

💿 Training Data

Dataset: legal_finetune_v3

  1. Real-world Corpus (Core):

    • Source: Over 50,000 criminal first-instance judgments from China (Year 2021).
    • Purpose: To establish the model’s fundamental legal understanding, enabling it to learn the format of judgments and the correlation between crime facts and sentences.
  2. Reinforcement Data (Patch):

    • Content: ~3,000 synthesized adversarial samples (e.g., cases involving refusal to confess).
    • Purpose: To correct statistical biases learned from the majority data, ensuring the model accurately differentiates between various defendant attitudes and applies appropriate sentencing logic.

📈 Experiment Results

The model was trained on an NVIDIA A10 (24GB) for 2.0 Epochs.

Training Loss

training_loss.png

The loss curve shows stable convergence around 0.33, indicating that the model has effectively learned the textual patterns and logical structures of the judgment documents.

Key Hyperparameters

  • Method: 4-bit QLoRA
  • Learning Rate: 1e-4
  • Rank: 64 (rslora enabled)
  • Batch Size: 16 (Global)

🚀 Capabilities

Based on preliminary testing, the model demonstrates the following capabilities:

  1. Judicial Logic Simulation: Capable of mimicking a judge’s reasoning process, deriving legal conclusions from factual descriptions.
  2. Terminological Accuracy: Generates outputs that adhere to the stylistic and terminological standards of Chinese criminal judgments.
  3. Contextual Awareness: With the help of adversarial training, the model shows improved ability to distinguish between mitigating factors (e.g., confession) and aggravating factors (e.g., denial).

⚠️ Disclaimer

  1. Experimental Nature: This project is a personal experiment aimed at exploring AI technology in the legal domain. It does not represent any official institution.
  2. No Legal Advice: The output of this model is based on statistical learning from historical data and does not constitute legal advice.
  3. Inaccuracy Warning: Sentencing in real life is influenced by numerous factors (e.g., regional differences, policy changes). The model’s predictions may deviate significantly from actual judicial outcomes.
  4. No Liability: The author assumes no responsibility for the accuracy of the content or any consequences arising from the use of this model. For legal matters, please consult a qualified attorney.

Fine-tuned using LLaMA-Factory.