LLM4Decompile aims to decompile x86 assembly instructions into C. available in [F16, q8_0, q6_K, q4_K_S]
186 Pulls Updated 6 weeks ago
Updated 6 weeks ago
6 weeks ago
3f004b3ceba8 · 44GB
model
archllama
·
parameters22.2B
·
quantizationF16
44GB
license
MIT License
11B
template
{{ .Prompt }}
13B
params
{
"num_ctx": 4096,
"repeat_penalty": 1.1,
"stop": [
"</s>"
],
"temperatu
105B
system
You are a code decompilation assistant. You will be given x86 assembly code and your task is to out
204B
Readme
1. Introduction of LLM4Decompile
LLM4Decompile aims to decompile x86 assembly instructions into C. The newly released V2 series are trained with a larger dataset (2B tokens) and a maximum token length of 4,096, with remarkable performance (up to 100% improvement) compared to the previous model.
- Github Repository: LLM4Decompile
2. Evaluation Results
Metrics | Re-executability Rate | Edit Similarity | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Optimization Level | O0 | O1 | O2 | O3 | AVG | O0 | O1 | O2 | O3 | AVG |
LLM4Decompile-End-6.7B | 0.6805 | 0.3951 | 0.3671 | 0.3720 | 0.4537 | 0.1557 | 0.1292 | 0.1293 | 0.1269 | 0.1353 |
Ghidra | 0.3476 | 0.1646 | 0.1524 | 0.1402 | 0.2012 | 0.0699 | 0.0613 | 0.0619 | 0.0547 | 0.0620 |
+GPT-4o | 0.4695 | 0.3415 | 0.2866 | 0.3110 | 0.3522 | 0.0660 | 0.0563 | 0.0567 | 0.0499 | 0.0572 |
+LLM4Decompile-Ref-1.3B | 0.6890 | 0.3720 | 0.4085 | 0.3720 | 0.4604 | 0.1517 | 0.1325 | 0.1292 | 0.1267 | 0.1350 |
+LLM4Decompile-Ref-6.7B | 0.7439 | 0.4695 | 0.4756 | 0.4207 | 0.5274 | 0.1559 | 0.1353 | 0.1342 | 0.1273 | 0.1382 |
+LLM4Decompile-Ref-33B | 0.7073 | 0.4756 | 0.4390 | 0.4146 | 0.5091 | 0.1540 | 0.1379 | 0.1363 | 0.1307 | 0.1397 |
3. How to Use
- Install Ollama: Follow the instructions on the Ollama website.
- Pull a model:
ollama pull MHKetbi/llm4decompile-22b-v2
. - Install the Ollama Python library:
pip install ollama
- Save the script: Save the code as a Python file (e.g.,
decompiler.py
). - Create an input file: Create a file named
my_assembly_file_O0.pseudo
(replacemy_assembly_file
with your desired filename) containing the assembly code of the function you want to decompile. - Run the script:
python decompiler.py my_assembly_file
import ollama
import sys
MODEL_NAME = 'MHKetbi/llm4decompile-22b-v2'
def decompile_with_ollama(asm_func, max_tokens=2048):
"""
Decompiles an assembly function using Ollama.
Args:
asm_func: The assembly function as a string.
max_tokens: The maximum number of tokens to generate.
Returns:
The decompiled C code as a string, or None if an error occurred.
"""
try:
# Construct a prompt for the Ollama model. This is *crucial* for good results.
# We're using a system prompt and a user prompt. The system prompt sets the context.
# The user prompt provides the input and asks for the output.
messages = [
{
'role': 'system',
'content': 'You are a helpful assistant that decompiles assembly code to C code. '
'Provide only the C code, without any extra explanation or comments. '
'Do not include markdown formatting. Do not wrap the output in ```.'
},
{
'role': 'user',
'content': f'Decompile the following assembly code to C:\n\n{asm_func}'
}
]
response = ollama.chat(model=MODEL_NAME, messages=messages, stream=False)
# Check if the response is valid and contains the decompiled code.
if response and response.get('message') and response['message'].get('content'):
c_func_decompile = response['message']['content'].strip()
return c_func_decompile
else:
print("Error: Ollama did not return a valid response.", file=sys.stderr)
return None
except ollama.ResponseError as e:
print(f"Error during Ollama request: {e}", file=sys.stderr)
return None
except Exception as e:
print(f"An unexpected error occurred: {e}", file=sys.stderr)
return None
def decompile_with_ollama_streaming(asm_func, max_tokens=2048):
"""
Decompiles an assembly function using Ollama with streaming.
Args:
asm_func: The assembly function as a string.
max_tokens: (Not directly used in streaming, but kept for consistency).
Returns:
The decompiled C code as a string, or None if an error occurred.
"""
try:
messages = [
{
'role': 'system',
'content': 'You are a helpful assistant that decompiles assembly code to C code. '
'Provide only the C code, without any extra explanation or comments. '
'Do not include markdown formatting. Do not wrap the output in ```.'
},
{
'role': 'user',
'content': f'Decompile the following assembly code to C:\n\n{asm_func}'
}
]
stream = ollama.chat(model=MODEL_NAME, messages=messages, stream=True)
c_func_decompile = ""
for chunk in stream:
if chunk and chunk.get('message') and chunk['message'].get('content'):
c_func_decompile += chunk['message']['content']
return c_func_decompile.strip()
except ollama.ResponseError as e:
print(f"Error during Ollama request: {e}", file=sys.stderr)
return None
except Exception as e:
print(f"An unexpected error occurred: {e}", file=sys.stderr)
return None
def main():
if len(sys.argv) != 2:
print("Usage: python script.py <filename>")
sys.exit(1)
file_name = sys.argv[1]
opt = ['O0'] # Keep this for consistency with the original script's file naming.
try:
with open(f'{file_name}_{opt[0]}.pseudo', 'r') as f:
asm_func = f.read()
except FileNotFoundError:
print(f"Error: File '{file_name}_{opt[0]}.pseudo' not found.", file=sys.stderr)
sys.exit(1)
# Choose either streaming or non-streaming version
# c_func_decompile = decompile_with_ollama(asm_func)
c_func_decompile = decompile_with_ollama_streaming(asm_func)
if c_func_decompile:
try:
with open(f'{file_name}_{opt[0]}.pseudo', 'r') as f: # original file
func = f.read()
print(f'pseudo function:\n{func}')
print(f'refined function:\n{c_func_decompile}')
except FileNotFoundError:
print(f"Error: Could not read original pseudo file (for display only).", file=sys.stderr)
print(f'refined function:\n{c_func_decompile}') # Still print the decompiled output
else:
print("Decompilation failed.", file=sys.stderr)
if __name__ == "__main__":
main()
4. License
This code repository is licensed under the MIT License.
5. Contact
If you have any questions, please raise an issue.