The fediverse is sounding very cinematic today.
this blog is for testing fediverse integration from WordPress.com, but soon I can actually just use it!
And the guy who works on federating WordPress posts to the fediverse with ActivityPub forgets to post.
Starting today, every site on WordPress.com can publish directly to the fediverse, powered by ActivityPub for WordPress.
After my last post about setting up CodeLlama, some colleagues have asked the million dollar question: how does CodeLlama compare to Copilot? My early answer is: “I don’t know yet” but here’s a useful comparison.
I have little shell script that lets me interface with Arcanist (arc
), part of the Phabricator toolset we use internally. One of the (many) weird things about it vs something like Github is that your PRs (Diffs in Phabricator parlance) are independent from git branches. This is a problem when I need to update a Diff, but I don’t know what I’ve been working with from the current branch. git push
is just not possible, I need to run arc diff --update DXXXXX
instead. But what is the ID?
There’s already a command for this: arc which
. But its output is messy. I’m working on a Diff that adds this plugin PR to WordPress.com, fixing (inevitable) bugs along the way, and this is what arc which
gives me:
REPOSITORY
To identify the repository associated with this working copy, arc followed this process:
Configuration value "repository.callsign" is set to "WPGIT".
Found a unique matching repository.
This working copy is associated with the WordPress.com repository.
COMMIT RANGE
If you run 'arc diff', changes between the commit:
212461c993d9d7ad [redacted commit message]
...and the current working copy state will be sent to Differential, because
it is the merge-base of 'origin/trunk' and HEAD, as specified by
'git:merge-base(origin/trunk)' in your project 'base' configuration.
You can see the exact changes that will be sent by running this command:
$ git diff 212461c993d9d7ad..HEAD
These commits will be included in the diff:
60eb751ea35ecfe1 [redacted commit message]
55dc4536770138b0 ActivityPub: add Follow Me block
MATCHING REVISIONS
These Differential revisions match the changes in this working copy:
D120281 ActivityPub: add Follow Me block
Reason: Commit message for '55dc4536770138b0' has explicit 'Differential Revision'.
Since exactly one revision in Differential matches this working copy, it will
be updated if you run 'arc diff'.
(That last line is a lie.) I already have a shell command that cuts out the noise and just provides me with the URL and Diff ID that I want:
arc which | grep -E 'D[0-9]+\ ' -o | awk '{ printf "https://redacted.a8c.com/%s %s\n", $1, $1 }'
Which for the above gives me:
https://redacted.a8c.com/D120281 D120281
Cool. But I decided that I’d stripped too much context, I wanted the Diff title after the ID as well, so that my output would be like this:
ActivityPub: add Follow Me block
https://redacted.a8c.com/D120281 D120281
That was hard because awk
separates on spaces and would only give me the first word of the title in my first attempt:
arc which | grep -E 'D[0-9]+\ (.*)' -o | awk '{ printf "%s\nhttps://redacted.a8c.com/%s %s\n", $2, $1, $1 }'
ActivityPub:
https://redacted.a8c.com/D120281 D120281
So I decided to ask both Copilot and CodeLlama (7b-instruct-q5_K_M
) for help. I kept my wording a bit vague to see how they would do.
Spoiler: they both contributed something useful, and it was doubly interesting that CodeLlama told me I was doing it wrong.
I have the following shell script, which parses an ID and the title out of arc which output.
arc which | grep -E 'D[0-9]+ (.*)' -o | awk '{ printf "%s\nhttps://code.a8c.com/%s %s\n", $2, $1, $1 }'
arc which line that concerns me looks like:
` D120281 ActivityPub: add Follow Me block`
The trouble is that awk thinks $2 is only the first word of the remaining line, just "ActivityPub". I want the whole line in the output. How can I achieve that?
To include the entire title in the output, you can modify the awk command to print all fields starting from the second field. Here's the updated command:
```
arc which | grep -E 'D[0-9]+ (.*)' -o | awk '{ printf "%s\nhttps://code.a8c.com/%s ", $2, $1; for (i=2; i<=NF; i++) printf "%s ", $i; printf "\n" }'
```
'
In this updated command, we're using a for loop to print all fields starting from the second field. The NF variable represents the total number of fields in the input, so we're looping from the second field (i=2) to the last field (i<=NF). We're then using printf to print each field separated by a space.
This should output the entire title in the output, along with the ID and URL.
To get the whole line in the output, you need to use a different approach with awk. Instead of using $2 to print out the second word of the remaining line, you can use $0 to print out the entire line. Here's an updated version of your shell script that uses awk to print out the whole line:
```
arc which | grep -E 'D[0-9]+\ (.*)' -o | awk '{ printf "https://code.a8c.com/%s %s\n", $1, $0 }'
```
In this version of the script, we use awk to print out the entire line ($0) as part of the URL that we want to output. We also use $1 to print out the identifier followed by one or more numbers. I hope this helps! Let me know if you have any questions.
Neither approach was exactly what I wanted, but I learned something from both and was able to use this to produce my desired outcome:
arc which | grep -E 'D[0-9]+ (.*)' -o | awk '{ for (i=2; i<=NF; i++) printf "%s ", $i; printf "\n %s https://code.a8c.com/%s\n", $1, $1; }'
In any case, googling for code help is probably pretty much dead. And this is the first generation of these tools, which will only get better.
Many people are unable to run a coding assistant LLM like Copilot or ChapGPT because of privacy concerns in a non-open codebase. This is true of our private WordPress.com codebase: we don’t want to be sending our secrets to OpenAI or Microsoft.
But now, with the release of CodeLLaMa (and with a huge hat tip to llama.cpp
) and thanks to the Continue VSCode extension, we can run these models directly on our own hardware.
Here’s how I did it:
ollama pull codellama:7b-instruct-q5_K_M
config.py
open, my Models
line looks like:models=Models(default=Ollama(model="codellama:7b-instruct-q5_K_M")
prompt_templates
etc.)ollama serve codellama:7b-instruct-q5_K_M