Details

Updated 2 months ago

2 months ago

7b6c52a11d3d · 4.7GB ·

model

archqwen2

parameters7.62B

quantizationQ4_K_M

4.7GB

template

### Instruction: rename the following variables to make the following javascript program more readab

133B

params

{ "stop": [ "### Instruction:", "<|endoftext|>" ], "temperature": 0.1 }

74B

🔍 Obfuscated Variable Renaming with Qwen-Code

This repository hosts a Qwen-Code–based model fine-tuned to rename obfuscated variables in source code, improving readability while preserving program semantics.

The model is designed for use cases such as malware analysis, reverse engineering, digital forensics, and general program comprehension.

🚀 Task Overview

Task: Code Deobfuscation / Variable Renaming
Base Model: Qwen-Code
Input: Source code with obfuscated variable names
Output: Semantically equivalent source code with readable variable names

Example

Input

function _0x12af(a, b) {
  let _0x9c3e = a * b;
  return _0x9c3e + 10;
}

Output

function multiplyAndAdd(a, b) {
  let product = a * b;
  return product + 10;
}

🧠 Model Description

Architecture: Qwen-Code (Transformer-based)
Fine-tuning Objective: Context-aware variable renaming
Approach: AST-guided identifier alignment + sequence generation
Languages: JavaScript (primary), extendable to others

The model learns to infer meaningful variable names from usage context, not from superficial patterns.

🏗 Training Details

Dataset

Paired samples of:
- Obfuscated code
- Original / readable code
Variable mappings extracted using AST-based analysis
Realistic obfuscation patterns (minifiers, packers, name mangling)

Training Objectives

Identifier-aware sequence-to-sequence learning
Contextual name prediction
Syntax preservation

📦 Installation

pip install transformers torch accelerate

▶️ Usage

Inference Example

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Neo111x/Variables-Renaming"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=True
)

code = '''
function _0x12af(a, b) {
  let _0x9c3e = a * b;
  return _0x9c3e + 10;
}
'''

inputs = tokenizer(code, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=False
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🧪 Evaluation

Identifier exact-match accuracy
AST equivalence checks
Manual readability assessment

⚠️ Limitations

Generated names are semantic approximations, not original identifiers
Performance degrades on:
- Extremely short contexts
- Heavy control-flow flattening
Single-file scope only

🔐 Ethical Considerations

This model is intended for: - Malware and binary analysis - Digital forensics and incident response (DFIR) - Code maintenance and auditing

It should not be used to violate software licenses or intellectual property rights.

🧩 Future Work

Multi-language support (C/C++, Python)
Function and class renaming
Control-flow–aware modeling
Integration with decompilers and IR tools

📜 License

Specify the license here (e.g., Apache-2.0, MIT).

📖 Citation

@misc{qwen_code_variable_renamer,
  title={Context-Aware Variable Renaming for Obfuscated Code using Qwen-Code},
  author={Your Name},
  year={2026},
  url={https://huggingface.co/Neo111x/Variables-Renaming}
}

Obfuscated Javascript Variables Renaming