7 2 months ago

Obfuscated Javascript Variables Renaming

ollama run bilel_cherif/Variables-Renaming:q4_k_m

Details

2 months ago

7b6c52a11d3d · 4.7GB ·

qwen2
·
7.62B
·
Q4_K_M
### Instruction: rename the following variables to make the following javascript program more readab
{ "stop": [ "### Instruction:", "<|endoftext|>" ], "temperature": 0.1 }

Readme

🔍 Obfuscated Variable Renaming with Qwen-Code

This repository hosts a Qwen-Code–based model fine-tuned to rename obfuscated variables in source code, improving readability while preserving program semantics.

The model is designed for use cases such as malware analysis, reverse engineering, digital forensics, and general program comprehension.


🚀 Task Overview

Task: Code Deobfuscation / Variable Renaming
Base Model: Qwen-Code
Input: Source code with obfuscated variable names
Output: Semantically equivalent source code with readable variable names

Example

Input

function _0x12af(a, b) {
  let _0x9c3e = a * b;
  return _0x9c3e + 10;
}

Output

function multiplyAndAdd(a, b) {
  let product = a * b;
  return product + 10;
}

🧠 Model Description

  • Architecture: Qwen-Code (Transformer-based)
  • Fine-tuning Objective: Context-aware variable renaming
  • Approach: AST-guided identifier alignment + sequence generation
  • Languages: JavaScript (primary), extendable to others

The model learns to infer meaningful variable names from usage context, not from superficial patterns.


🏗 Training Details

Dataset

  • Paired samples of:
    • Obfuscated code
    • Original / readable code
  • Variable mappings extracted using AST-based analysis
  • Realistic obfuscation patterns (minifiers, packers, name mangling)

Training Objectives

  • Identifier-aware sequence-to-sequence learning
  • Contextual name prediction
  • Syntax preservation

📦 Installation

pip install transformers torch accelerate

▶️ Usage

Inference Example

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Neo111x/Variables-Renaming"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=True
)

code = '''
function _0x12af(a, b) {
  let _0x9c3e = a * b;
  return _0x9c3e + 10;
}
'''

inputs = tokenizer(code, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=False
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🧪 Evaluation

  • Identifier exact-match accuracy
  • AST equivalence checks
  • Manual readability assessment

⚠️ Limitations

  • Generated names are semantic approximations, not original identifiers
  • Performance degrades on:
    • Extremely short contexts
    • Heavy control-flow flattening
  • Single-file scope only

🔐 Ethical Considerations

This model is intended for: - Malware and binary analysis - Digital forensics and incident response (DFIR) - Code maintenance and auditing

It should not be used to violate software licenses or intellectual property rights.


🧩 Future Work

  • Multi-language support (C/C++, Python)
  • Function and class renaming
  • Control-flow–aware modeling
  • Integration with decompilers and IR tools

📜 License

Specify the license here (e.g., Apache-2.0, MIT).


📖 Citation

@misc{qwen_code_variable_renamer,
  title={Context-Aware Variable Renaming for Obfuscated Code using Qwen-Code},
  author={Your Name},
  year={2026},
  url={https://huggingface.co/Neo111x/Variables-Renaming}
}