How to Run CodeLlama in VSCode on macOS

Many people are unable to run a coding assistant LLM like Copilot or ChapGPT because of privacy concerns in a non-open codebase. This is true of our private WordPress.com codebase: we don’t want to be sending our secrets to OpenAI or Microsoft.

But now, with the release of CodeLLaMa (and with a huge hat tip to llama.cpp) and thanks to the Continue VSCode extension, we can run these models directly on our own hardware.

Here’s how I did it:

Download and install Ollama. It lets you run and serve these models in a way that Continue can use.
Pick the model you want. 7B is the lightest weight and 13B and 34B are heavier, and there are a bunch of quantized versions as well. These are from TheBloke, see for example CodeLlama-7B-GGUF and scroll down to the Provided Files table to see the size vs performance tradeoffs.
I chose the large 7B model, so I ran:
ollama pull codellama:7b-instruct-q5_K_M
While you’re waiting (that model is ~5GB), install the Continue VSCode extension.
Follow the instructions on how to use Ollama in Continue. (The entire reason for this blog post is that those instructions are incomplete.) In my case, with config.py open, my Models line looks like:
models=Models(default=Ollama(model="codellama:7b-instruct-q5_K_M")
(Note: Continue will add some extra stuff to it later, adding prompt_templates etc.)
Once your model is downloaded, you need to serve it. (This was my missing piece):
ollama serve codellama:7b-instruct-q5_K_M
You might need to reload VSCode but you should be up and running!