qwen3-4b-CriminalLaw-cn
📊 项目概述
qwen3-4b-CriminalLaw-cn 是一个基于 Qwen3-4B-Instruct 架构进行深度指令微调(SFT)的个人实验性模型。
本项目的核心目标是探索大语言模型(LLM)在中国刑事法律领域的量刑预测能力。通过投喂大量真实的司法判决文书,考察模型能否从非结构化的法律文本中习得司法逻辑、法言法语规范以及具体的量刑尺度。
为了提升模型在实际应用中的逻辑严谨性,训练过程中采取了混合数据策略,旨在让模型不仅能学习常规案件的判罚,也能正确处理“拒不认罪”、“零口供”等复杂情形,从而验证小参数量模型在垂直领域的潜力。
💿 训练数据 (Training Data)
数据集名称:legal_finetune_v3
核心基座 (Core Dataset):
- 来源:2021年中国裁判文书网公开的刑事一审判决书。
- 规模:精选超过 50,000 条高质量真实语料。
- 目的:构建模型的基础法律世界观,使其掌握犯罪事实认定到量刑结果的完整推导链条。
强化补丁 (Reinforcement Data):
- 内容:约 3,000 条针对性构建的对抗性样本(如拒不认罪、态度不明等)。
- 目的:修正模型在学习海量数据时产生的统计偏差,防止模型过度拟合“认罪从宽”的惯性,提升对不同认罪态度的分辨能力。
📈 训练结果 (Training Results)
本次实验在 NVIDIA A10 (24GB) 环境下完成,共进行 2.0 个 Epoch 的训练。
训练损失 (Training Loss)

模型收敛曲线平滑,最终 Loss 稳定在 0.33 左右,表明模型已有效拟合了判决书的文本特征与逻辑结构。
关键超参 (Hyperparameters)
- Base Model: Qwen3-4B-Instruct
- Learning Rate: 1e-4 (Cosine Decay)
- Batch Size: Total 16
- Quantization: 4-bit QLoRA (rslora enabled)
- LoRA Rank: 64 / Alpha: 128
✨ 能力评估
经过测试,该模型在以下方面表现出了一定的实验价值:
- 司法逻辑拟合:能够模仿法官的思维路径,从“犯罪事实”推导出“法律定性”,最后给出“量刑结果”。
- 文书规范性:输出内容高度符合中国刑事判决书的行文规范,能够准确使用“依法予以惩处”、“酌情从轻处罚”等法学术语。
- 情节分辨力:在对抗性样本的辅助下,模型具备了区分“坦白”与“抗拒”的能力,能够根据被告人的不同表现给出差异化的量刑预测。
⚖️ 免责声明 (Disclaimer)
- 实验性质:本模型仅为个人AI技术实验项目,旨在验证微调技术在特定领域的有效性,不代表任何官方立场。
- 非法律意见:模型的输出完全基于对历史数据的统计规律模仿,不构成任何法律意见或建议。
- 结果差异:尽管模型使用了真实文书训练,但实际案件的量刑受多重复杂因素影响(如地区差异、司法政策变化等),模型预测结果可能与实际判决存在显著偏差。
- 责任豁免:使用者应自行承担使用本模型的风险,开发者不对生成内容的准确性、可靠性及产生的后果承担任何法律责任。
使用了LLaMA-Factory进行微调
qwen3-4b-CriminalLaw-cn
📖 Project Overview
qwen3-4b-CriminalLaw-cn is a personal experimental model designed to explore the capabilities of Large Language Models (LLMs) in criminal sentencing prediction under Chinese law. It is fine-tuned based on the Qwen3-4B-Instruct architecture.
The primary objective of this project is to evaluate whether an LLM can learn judicial logic and sentencing standards by ingesting a large corpus of real-world legal documents. The experiment aims to verify if a smaller-scale model (4B parameters) can master the complex reasoning chain—from fact-finding to legal characterization and finally to sentencing—after being trained on unstructured legal texts.
To enhance the model’s logical rigor in real-world scenarios, the training process also incorporated adversarial data strategies to ensure the model can correctly handle diverse case situations, not just the most common ones.
💿 Training Data
Dataset: legal_finetune_v3
Real-world Corpus (Core):
- Source: Over 50,000 criminal first-instance judgments from China (Year 2021).
- Purpose: To establish the model’s fundamental legal understanding, enabling it to learn the format of judgments and the correlation between crime facts and sentences.
Reinforcement Data (Patch):
- Content: ~3,000 synthesized adversarial samples (e.g., cases involving refusal to confess).
- Purpose: To correct statistical biases learned from the majority data, ensuring the model accurately differentiates between various defendant attitudes and applies appropriate sentencing logic.
📈 Experiment Results
The model was trained on an NVIDIA A10 (24GB) for 2.0 Epochs.
Training Loss

The loss curve shows stable convergence around 0.33, indicating that the model has effectively learned the textual patterns and logical structures of the judgment documents.
Key Hyperparameters
- Method: 4-bit QLoRA
- Learning Rate: 1e-4
- Rank: 64 (rslora enabled)
- Batch Size: 16 (Global)
🚀 Capabilities
Based on preliminary testing, the model demonstrates the following capabilities:
- Judicial Logic Simulation: Capable of mimicking a judge’s reasoning process, deriving legal conclusions from factual descriptions.
- Terminological Accuracy: Generates outputs that adhere to the stylistic and terminological standards of Chinese criminal judgments.
- Contextual Awareness: With the help of adversarial training, the model shows improved ability to distinguish between mitigating factors (e.g., confession) and aggravating factors (e.g., denial).
⚠️ Disclaimer
- Experimental Nature: This project is a personal experiment aimed at exploring AI technology in the legal domain. It does not represent any official institution.
- No Legal Advice: The output of this model is based on statistical learning from historical data and does not constitute legal advice.
- Inaccuracy Warning: Sentencing in real life is influenced by numerous factors (e.g., regional differences, policy changes). The model’s predictions may deviate significantly from actual judicial outcomes.
- No Liability: The author assumes no responsibility for the accuracy of the content or any consequences arising from the use of this model. For legal matters, please consult a qualified attorney.
Fine-tuned using LLaMA-Factory.