mattwiebe.blog
mattwiebe.blog
@mattwiebe.blog@mattwiebe.blog

this blog is for testing fediverse integration from WordPress.com, but soon I can actually just use it!

56 posts
47 followers

Comparing Meta’s CodeLlama to OpenAI’s Copilot

After my last post about setting up CodeLlama, some colleagues have asked the million dollar question: how does CodeLlama compare to Copilot? My early answer is: “I don’t know yet” but here’s a useful comparison.

I have little shell script that lets me interface with Arcanist (arc), part of the Phabricator toolset we use internally. One of the (many) weird things about it vs something like Github is that your PRs (Diffs in Phabricator parlance) are independent from git branches. This is a problem when I need to update a Diff, but I don’t know what I’ve been working with from the current branch. git push is just not possible, I need to run arc diff --update DXXXXX instead. But what is the ID?

There’s already a command for this: arc which. But its output is messy. I’m working on a Diff that adds this plugin PR to WordPress.com, fixing (inevitable) bugs along the way, and this is what arc which gives me:

REPOSITORY
To identify the repository associated with this working copy, arc followed this process:

    Configuration value "repository.callsign" is set to "WPGIT".

    Found a unique matching repository.

This working copy is associated with the WordPress.com repository.

COMMIT RANGE
If you run 'arc diff', changes between the commit:

    212461c993d9d7ad  [redacted commit message]

...and the current working copy state will be sent to Differential, because
it is the merge-base of 'origin/trunk' and HEAD, as specified by
'git:merge-base(origin/trunk)' in your project 'base' configuration.

You can see the exact changes that will be sent by running this command:

    $ git diff 212461c993d9d7ad..HEAD

These commits will be included in the diff:

    60eb751ea35ecfe1  [redacted commit message]
    55dc4536770138b0  ActivityPub: add Follow Me block


MATCHING REVISIONS
These Differential revisions match the changes in this working copy:

    D120281 ActivityPub: add Follow Me block
        Reason: Commit message for '55dc4536770138b0' has explicit 'Differential Revision'.

Since exactly one revision in Differential matches this working copy, it will
be updated if you run 'arc diff'.

(That last line is a lie.) I already have a shell command that cuts out the noise and just provides me with the URL and Diff ID that I want:

arc which | grep -E 'D[0-9]+\ ' -o | awk '{ printf "https://redacted.a8c.com/%s  %s\n", $1, $1 }'

Which for the above gives me:

https://redacted.a8c.com/D120281  D120281

Cool. But I decided that I’d stripped too much context, I wanted the Diff title after the ID as well, so that my output would be like this:

ActivityPub: add Follow Me block
https://redacted.a8c.com/D120281  D120281

That was hard because awk separates on spaces and would only give me the first word of the title in my first attempt:

arc which | grep -E 'D[0-9]+\ (.*)' -o | awk '{ printf "%s\nhttps://redacted.a8c.com/%s  %s\n", $2, $1, $1 }'
ActivityPub: 
https://redacted.a8c.com/D120281  D120281

So I decided to ask both Copilot and CodeLlama (7b-instruct-q5_K_M) for help. I kept my wording a bit vague to see how they would do.

Spoiler: they both contributed something useful, and it was doubly interesting that CodeLlama told me I was doing it wrong.

Prompt

I have the following shell script, which parses an ID and the title out of arc which output.

arc which | grep -E 'D[0-9]+ (.*)' -o | awk '{ printf "%s\nhttps://code.a8c.com/%s %s\n", $2, $1, $1 }'

arc which line that concerns me looks like:

` D120281 ActivityPub: add Follow Me block`

The trouble is that awk thinks $2 is only the first word of the remaining line, just "ActivityPub". I want the whole line in the output. How can I achieve that?

Copilot response

To include the entire title in the output, you can modify the awk command to print all fields starting from the second field. Here's the updated command:

```
arc which | grep -E 'D[0-9]+ (.*)' -o | awk '{ printf "%s\nhttps://code.a8c.com/%s  ", $2, $1; for (i=2; i<=NF; i++) printf "%s ", $i; printf "\n" }'
```

'
In this updated command, we're using a for loop to print all fields starting from the second field. The NF variable represents the total number of fields in the input, so we're looping from the second field (i=2) to the last field (i<=NF). We're then using printf to print each field separated by a space.

This should output the entire title in the output, along with the ID and URL.

CodeLlama response

To get the whole line in the output, you need to use a different approach with awk. Instead of using $2 to print out the second word of the remaining line, you can use $0 to print out the entire line. Here's an updated version of your shell script that uses awk to print out the whole line:

```
arc which | grep -E 'D[0-9]+\ (.*)' -o | awk '{ printf "https://code.a8c.com/%s  %s\n", $1, $0 }'
```

In this version of the script, we use awk to print out the entire line ($0) as part of the URL that we want to output. We also use $1 to print out the identifier followed by one or more numbers. I hope this helps! Let me know if you have any questions.

Neither approach was exactly what I wanted, but I learned something from both and was able to use this to produce my desired outcome:

arc which | grep -E 'D[0-9]+ (.*)' -o | awk '{ for (i=2; i<=NF; i++) printf "%s ", $i; printf "\n %s  https://code.a8c.com/%s\n", $1, $1; }'

In any case, googling for code help is probably pretty much dead. And this is the first generation of these tools, which will only get better.

Leave a comment