How to prompt Code Llama

September 9, 2023

Two weeks ago the Code Llama model was released by Meta with three variations:

This guide walks through the different ways to structure prompts for Code Llama for its different variations and features.

Examples below use the 7 billion parameter model with 4-bit quantization, but 13 billion and 34 billion parameter models were made available as well. This guide uses the open-source Ollama project to download and prompt Code Llama, but these prompts will work in other model providers and runtimes too.


The instruct model was trained to output human-like answers to questions. This model variation is the easiest to use and will behave closest to ChatGPT, with answer questions including both natural language and code:


ollama run codellama:7b-instruct 'You are an expert programmer that writes simple, concise code and explanations. Write a python function to generate the nth fibonacci number.'


Here is a Python function that generates the nth Fibonacci number:

def fib(n):
    if n <= 1:
        return n
        return fib(n-1) + fib(n-2)

This function uses the recursive formula for the Fibonacci sequence, which is:

fib(n) = fib(n-1) + fib(n-2)

Code Review

Code reviews can often be time consuming for teams. Code Llama can help by providing a second set of eyes on a commit or pull request for simple bugs:


ollama run codellama '
Where is the bug in this code?

def fib(n):
    if n <= 0:
        return n
        return fib(n-1) + fib(n-2)


The bug in this code is that it does not handle the case where `n` is equal to 1.

Unit Tests

Writing unit tests often requires quite a bit of boilerplate code. Code Llama can help:


ollama run codellama "write a unit test for this function: $(cat"


Here's an example of how you could write a unit test for the `fib` function:

import unittest

class TestFib(unittest.TestCase):
    def test_fib(self):
        self.assertEqual(fib(0), 0)
        self.assertEqual(fib(1), 1)
        self.assertEqual(fib(2), 1)
        self.assertEqual(fib(3), 2)
        self.assertEqual(fib(4), 3)
        self.assertEqual(fib(5), 5)
        self.assertEqual(fib(6), 8)
        self.assertEqual(fib(7), 13)
        self.assertEqual(fib(8), 21)
        self.assertEqual(fib(9), 34)
if __name__ == '__main__':

This test case covers all the cases where the input is an integer between 0 and 9, inclusive.

Code completion

The standard code completion model will simply continue to generate subsequent tokens based on the provided prompt. This variation works best if you’re only looking to generate code, such as having the model write a function after seeing the function signature or comment:


ollama run codellama:7b-code '# A simple python function to remove whitespace from a string:'


def remove_whitespace(s):
    return ''.join(s.split())


Fill-in-the-middle (FIM), or more briefly, infill is a special prompt format supported by the code completion model can complete code between two already written code blocks. This is especially effective for inserting code within an existing function such as below:

def compute_gcd(x, y):


    return result

With Code Llama, infill prompts require a special format that the model expects.

<PRE> {prefix} <SUF>{suffix} <MID>

To use this with existing code, split the code before and after in the example above the into parts: the prefix, and the suffix. For example, for our LCM example above:


ollama run codellama:7b-code '<PRE> def compute_gcd(x, y): <SUF>return result <MID>'

While results will vary, you should get something like this:


  if x == y:
        return x

    if x > y:
        x = x - y
        y = y - x

    result = compute_gcd(x, y)

Note: the model may return <EOT> at the end of the result. This is a special token in the response that represents the end of the response similar to <PRE>, <SUF> and <MID>


As a thank you to the community and tooling that created the model, the authors of Code Llama included a Python variation which is fine-tuned on 100B additional Python tokens, making it a good model to use when working on machine learning-related tooling, or any other Python code:


ollama run codellama:7b-python '
# django view for rendering the current day and time without a template
def current_datetime(request):'


    now =
    html = "<html><body>It is now %s.</body></html>" % now
    return HttpResponse(html)

Tools built on Code Llama