mattwiebe.blog

this blog is for testing fediverse integration from WordPress.com, but soon I can actually just use it!

  • The fediverse is sounding very cinematic today.

  • And the guy who works on federating WordPress posts to the fediverse with ActivityPub forgets to post.

  • Starting today, every site on WordPress.com can publish directly to the fediverse, powered by ActivityPub for WordPress.

  • But eventually, waiting ceases, and it’s action time.

  • My kinda rings

    Cassiniu2019s Pale Blue Dot by NASA Goddard Photo and Video is licensed under CC-BY 2.0
  • Photo by samer daboul on Pexels.com
  • Anyway I tried to get our AI thing to give me an image, it’s too funny.

  • If this works

    then we’re really, really close 🤞🏻

  • After my last post about setting up CodeLlama, some colleagues have asked the million dollar question: how does CodeLlama compare to Copilot? My early answer is: “I don’t know yet” but here’s a useful comparison.

    I have little shell script that lets me interface with Arcanist (arc), part of the Phabricator toolset we use internally. One of the (many) weird things about it vs something like Github is that your PRs (Diffs in Phabricator parlance) are independent from git branches. This is a problem when I need to update a Diff, but I don’t know what I’ve been working with from the current branch. git push is just not possible, I need to run arc diff --update DXXXXX instead. But what is the ID?

    There’s already a command for this: arc which. But its output is messy. I’m working on a Diff that adds this plugin PR to WordPress.com, fixing (inevitable) bugs along the way, and this is what arc which gives me:

    REPOSITORY
    To identify the repository associated with this working copy, arc followed this process:
    
        Configuration value "repository.callsign" is set to "WPGIT".
    
        Found a unique matching repository.
    
    This working copy is associated with the WordPress.com repository.
    
    COMMIT RANGE
    If you run 'arc diff', changes between the commit:
    
        212461c993d9d7ad  [redacted commit message]
    
    ...and the current working copy state will be sent to Differential, because
    it is the merge-base of 'origin/trunk' and HEAD, as specified by
    'git:merge-base(origin/trunk)' in your project 'base' configuration.
    
    You can see the exact changes that will be sent by running this command:
    
        $ git diff 212461c993d9d7ad..HEAD
    
    These commits will be included in the diff:
    
        60eb751ea35ecfe1  [redacted commit message]
        55dc4536770138b0  ActivityPub: add Follow Me block
    
    
    MATCHING REVISIONS
    These Differential revisions match the changes in this working copy:
    
        D120281 ActivityPub: add Follow Me block
            Reason: Commit message for '55dc4536770138b0' has explicit 'Differential Revision'.
    
    Since exactly one revision in Differential matches this working copy, it will
    be updated if you run 'arc diff'.
    

    (That last line is a lie.) I already have a shell command that cuts out the noise and just provides me with the URL and Diff ID that I want:

    arc which | grep -E 'D[0-9]+\ ' -o | awk '{ printf "https://redacted.a8c.com/%s  %s\n", $1, $1 }'
    

    Which for the above gives me:

    https://redacted.a8c.com/D120281  D120281
    

    Cool. But I decided that I’d stripped too much context, I wanted the Diff title after the ID as well, so that my output would be like this:

    ActivityPub: add Follow Me block
    https://redacted.a8c.com/D120281  D120281
    

    That was hard because awk separates on spaces and would only give me the first word of the title in my first attempt:

    arc which | grep -E 'D[0-9]+\ (.*)' -o | awk '{ printf "%s\nhttps://redacted.a8c.com/%s  %s\n", $2, $1, $1 }'
    ActivityPub: 
    https://redacted.a8c.com/D120281  D120281
    

    So I decided to ask both Copilot and CodeLlama (7b-instruct-q5_K_M) for help. I kept my wording a bit vague to see how they would do.

    Spoiler: they both contributed something useful, and it was doubly interesting that CodeLlama told me I was doing it wrong.

    Prompt

    I have the following shell script, which parses an ID and the title out of arc which output.
    
    arc which | grep -E 'D[0-9]+ (.*)' -o | awk '{ printf "%s\nhttps://code.a8c.com/%s %s\n", $2, $1, $1 }'
    
    arc which line that concerns me looks like:
    
    ` D120281 ActivityPub: add Follow Me block`
    
    The trouble is that awk thinks $2 is only the first word of the remaining line, just "ActivityPub". I want the whole line in the output. How can I achieve that?
    

    Copilot response

    To include the entire title in the output, you can modify the awk command to print all fields starting from the second field. Here's the updated command:
    
    ```
    arc which | grep -E 'D[0-9]+ (.*)' -o | awk '{ printf "%s\nhttps://code.a8c.com/%s  ", $2, $1; for (i=2; i<=NF; i++) printf "%s ", $i; printf "\n" }'
    ```
    
    '
    In this updated command, we're using a for loop to print all fields starting from the second field. The NF variable represents the total number of fields in the input, so we're looping from the second field (i=2) to the last field (i<=NF). We're then using printf to print each field separated by a space.
    
    This should output the entire title in the output, along with the ID and URL.
    

    CodeLlama response

    To get the whole line in the output, you need to use a different approach with awk. Instead of using $2 to print out the second word of the remaining line, you can use $0 to print out the entire line. Here's an updated version of your shell script that uses awk to print out the whole line:
    
    ```
    arc which | grep -E 'D[0-9]+\ (.*)' -o | awk '{ printf "https://code.a8c.com/%s  %s\n", $1, $0 }'
    ```
    
    In this version of the script, we use awk to print out the entire line ($0) as part of the URL that we want to output. We also use $1 to print out the identifier followed by one or more numbers. I hope this helps! Let me know if you have any questions.
    

    Neither approach was exactly what I wanted, but I learned something from both and was able to use this to produce my desired outcome:

    arc which | grep -E 'D[0-9]+ (.*)' -o | awk '{ for (i=2; i<=NF; i++) printf "%s ", $i; printf "\n %s  https://code.a8c.com/%s\n", $1, $1; }'
    

    In any case, googling for code help is probably pretty much dead. And this is the first generation of these tools, which will only get better.

  • Many people are unable to run a coding assistant LLM like Copilot or ChapGPT because of privacy concerns in a non-open codebase. This is true of our private WordPress.com codebase: we don’t want to be sending our secrets to OpenAI or Microsoft.

    But now, with the release of CodeLLaMa (and with a huge hat tip to llama.cpp) and thanks to the Continue VSCode extension, we can run these models directly on our own hardware.

    Here’s how I did it:

    1. Download and install Ollama. It lets you run and serve these models in a way that Continue can use.
    2. Pick the model you want. 7B is the lightest weight and 13B and 34B are heavier, and there are a bunch of quantized versions as well. These are from TheBloke, see for example CodeLlama-7B-GGUF and scroll down to the Provided Files table to see the size vs performance tradeoffs.
    3. I chose the large 7B model, so I ran:
      ollama pull codellama:7b-instruct-q5_K_M
    4. While you’re waiting (that model is ~5GB), install the Continue VSCode extension.
    5. Follow the instructions on how to use Ollama in Continue. (The entire reason for this blog post is that those instructions are incomplete.) In my case, with config.py open, my Models line looks like:
      models=Models(default=Ollama(model="codellama:7b-instruct-q5_K_M")
      (Note: Continue will add some extra stuff to it later, adding prompt_templates etc.)
    6. Once your model is downloaded, you need to serve it. (This was my missing piece):
      ollama serve codellama:7b-instruct-q5_K_M
    7. You might need to reload VSCode but you should be up and running!
    this took ~20 seconds on my M1 Pro with 16GB of RAM