Notes from trying AI coding tools
I spent some time to investigate how useful different tools (apart from gpt-engineer!) are to enhance one’s personal coding workflow.
Below are my notes of what I learned from using them.
I tried the following tools:
Mentat
Aider
Sweep
Mentat
I didn’t get Mentat to work.
Generally, the --exclude
flag didn’t seem to work fully as expected, so I couldn’t get the files included down to the “8k token limit”, also did not have support for 32k gpt model.
With some workaround, I got a “No such file or directory: …” error and gave up.
I would love to see Mentat flourish, but since it launched quite a while ago, activity is low, and obvious things don’t seem to be fixed; I don’t have high hopes.
Some nice stuff: about it is that it prints the total cost after each run.
Conclusion: Many issues, would rate it 2/5
Aider
First time I used Aider it froze without giving me any explanation as of why. I believe it was the ctags indexing of a too-large codebase.
On a smaller codebase it worked better.
Aider features a nice REPL where unix keyboard shortcuts work etc, and it will autocomplete files based on ctags indexing. The autocomplete stands out as stellar.
It generates diffs to your code base, and commits them to your github repo.
So far, I’ve found the “ask GPT4 to produce diff” reliable. The actual reasoning maybe less so.
That the diffs get applied as git commits makes it easy to check what it has changed, and roll back any changes, for those that are proficient in using git.
Aider a sharp tool. Not as general as for example Sweep, but it’s intuitive to understand what it does.
When I used it, it didn’t fully work (didn’t add the necessary import) but the nice UX goes a long way.
Conclusion: Haven’t proved that it enhances my workflow, but UX is very polished, I would rate it 4/5
Sweep
Sweep monitors your github issues, and will submit a PR to fix any issue labled with “sweep”.
This works pretty good for super simple things, like adding typhints to a codebase.
For more complex things it doesnt work so well. The PRs will typically also not pass CI in my experience. I was sceptical that generating PRs, simply because you want to utilise faster human <> AI feedback loops than a PR. The problem discovered here with CI/CD makes this even clearer.
On the technical side: Sweep uses embedding search to find the right code snippets. It has a “parallell mode” that gets activated when it judges that things should run in parallell over basically all files, e.g. when “migrating” things, and other “LLM based find and replace” commands.
Conclusion: It’s very simple to, but most of the time it’s not useful, I would rate it: 3/5
Other projects to mention
Cursor.io, I’ve tried this one and it’s very promising. Not open source though
Continue is also an AI first IDE, that is open source! Looking forward to investigate.
CodeGPT: An IDE extension that I get a lot of use for with the select text → “Custom promp” functionality
Takeaways
There are some nice features, especially on the UX side, such as commit, autocomplete for files, printing of total OpenAI cost, that I liked trying the tools out.
Needless to say: The gpt-engineer community is already working on leveling up the UX there and will probably take inspiration from a few of these.
What I’m looking forward most though: all of these tools will ultimately be extremely potent once LLM models that back them are inevitably upgraded again.