After reading this Twitter thread and watching the talk it was referencing I can...

moonchrome · on Aug 4, 2023

Problem is generative AI sucks at this. I've had multiple instances where it would say something like "you did this part of the code wrong fix it like this" and it would go "sorry for the confusion here's the code with your suggestion " then repeat the initial code or do something else entirely. GPT4, Copilot X, JetBrains AI - I've seen an instance of this problem in all of them.

Personally copilot is a magical typing speedup and that got me enthusiastic about next step - but it looks like next step is going to require another breakthrough considering GPT4 HW requirements and actual performance.

Chat interface as a search bot is the only use case I've found it useful outside of copilot - regurgitating relevant part of internal documentation as search on steroids, even 3.5 is relatively decent at this.

ZephyrBlu · on Aug 4, 2023

I completely disagree. I've found that GPT-4 and Copilot Chat generate good suggestions most of the time. What kind of things are you try to use codegen for?

You also need to provide adequate context. I don't think source code alone is good enough context most of the time (E.g. need an error message as well), unless you're pulling in multiple chunks of code.

gertlex · on Aug 4, 2023

To me it sounds like what's described is similar to my own interactions with ChatGPT where:

1. Write a prompt, get a full script in response (good! as I expected)

2. Realize something missing in the prompt, or want to improve a specific function.

3. Prompt like "For function foo_bar(), give me just a updated version of that function that adds exception handling for missing files".

4. Chat GPT ignores the (admittedly implied in my language above) "just that function" part, and rewrites the entire script.

While usually it only modifies the part of the script I asked it to, it's annoying because (1) it's slower to give me the whole thing back (2) need to do additional checks in case the function I care about depends on things outside of it that might have been changed.

I've had this a few times, so I'm in the habit of not trying to iterate with chatGPT when asking it for scripts, but instead just doing the rest myself.

(Disclaimers: I haven't tried too hard to find a solution to this; there might be one; I also don't generally combine ChatGPT with Copilot; one or the other on different projects; and haven't tried other forms of using GPT-4...)

ZephyrBlu · on Aug 5, 2023

Are you using GPT-4? I've found it very responsive to follow-up suggestions. I don't re-specify the whole ask though, just ask for the changes like "can you add error handling for missing files?".

This is a small recent example: https://chat.openai.com/share/2aa979ca-c796-4dfa-ae13-b21d39...

And another example where I ask ChatGPT to remove some error handling: https://chat.openai.com/share/90ce6336-f35c-40fc-8fce-baefc5...

GitHub Copilot seems a bit more temperamental. If it doesn't give me something decent first time I don't bother trying to follow-up.

gertlex · on Aug 6, 2023

It blurs together at this point. I've been using GPT-4 for a few months now, was just using 3.5 for months prior to that.

I'm also not doing too much, volume-wise; maybe about a script/week. I'll keep this in mind in future work!

vorticalbox · on Aug 4, 2023

"fix this code" is very broad, it will likely do better with more of a prompt.

"you will be given a section of code wrapped in ``` your task is analysis this code for any possible issues. You should list any issues you find and why you believe it's an issue. Then explain how to correct it"

moonchrome · on Aug 4, 2023

But the context is there, e.g. I get a bad suggestion by LLM - I suggest a fix - it acknowledged the correction and ignores the instructions when generating code.

vorticalbox · on Aug 4, 2023

If you have the ability you can try changing presence_penalty and frequency_penalty.

bko · on Aug 4, 2023

I had a similar thought, but would I would like to maintain control what the LLM has access to. It would be difficult to prevent things you might temporarily have hardcoded or generally do not want to expose.

I started working on something for this purpose called j-dev [0]. It started as a fork off smol-dev [1] which basically gets GPT to write your entire project from scratch. And then you would have to iterate the prompt to nuke everything and re-write everything, filling in increasingly complicated statements like "oh except in this function make sure you return a promise"

j-dev is a CLI where it gives a prompt similar to the one in the parent article [2]. You start with a prompt and the CLI fills in the directory contents (excluding gitignore). Then it requests access to the files it thinks it wants. And then it can edit, delete or add files or ask for followup based on your response. It does stuff like make sure the file exists, show you a git diff of changes, respond if the LLM is not following the system prompt, etc.

It also addresses the problem that a lot of these tools eat up way too many tokens so a single prompt to something like smol-dev would eat up a few dollars on every iterations.

It's still very much a work in progress and i'll prob do a show hn next week but I would love some feedback

[0] https://github.com/breeko/j-dev

[1] https://github.com/smol-ai/developer

[2] https://github.com/breeko/j-dev/blob/master/src/prompts/syst...

ZephyrBlu · on Aug 4, 2023

Sounds cool! I'm looking forward to the Show HN. Would definitely recommend recording a video of it in action. A video makes it much easier to understand what the tool can do.

Terretta · on Aug 5, 2023

> video makes it much easier to understand what the tool can do

For you.

Long before YouTube versus WikiHow we've had "visual learners", "practical learners", and those who prefer learning from clear writing.

So it's probably not just some sort of generational or way-of-thinking divide, it's probably a cohort of personas who far prefer text to rapidly understand what things do, and a cohort who prefer someone to "show and tell"[^1].

That said, the balance does seem to have shifted in recent decades, perhaps for Americans around the same time (correlation not causation) as free play outside and big three over-the-air TV networks gave way to helicopter parenting of fully programmed days and/or the first 150 channel MTV generation.

[^1]: "show and tell" (1954?) - https://en.wikipedia.org/wiki/Show_and_tell

foobiekr · on Aug 4, 2023

Have you seen a lot of non-lazy design in the last decade? It’s one trend after another and a general tendency for the UI to offer the bare minimum in functionality and call it design.

kgilpin · on Aug 4, 2023

I agree - “fixes” are a cool opportunity. I’ve created a runtime analysis of code execution that spots certain types of flaws and anti-patterns (that static analyzers cant find). Then I’m using a combination of the execution trace, code, and finding metadata (eg OWASP URL) to create a prompt. The AI responds with a detailed description of the problem (in the style of a PR comment), and a suggested fix. Here’s a short video of it in action - lmk what you think.

https://www.loom.com/share/969562d3c0fd49518d0f64aecbddccd6?...

ZephyrBlu · on Aug 5, 2023

It looks quite powerful. I would focus on adoption and usability over adding any more features. I feel like there's a lot of value there already, but I'm not exactly sure how I'd integrate it into my workflow.

The CI integration sounds like the most interesting part to me, since I usually let things fail in CI then go back and fix them.

It's kind of in an interesting spot because it's not instant feedback like a linter/type checker, but only running it in CI feels like a waste of potential.

I hope it becomes a successful product!

kgilpin · on Aug 5, 2023

Thanks for the advice! I agree with your characterization as somewhere between “instant” and “too late” (eg in prod) feedback. We are focusing on the code editor and GitHub Actions at the moment. For example, figuring out out what happens after the GitHub Action identifies a problem. Do you try and fix it directly in the browser? Or go back to the code editor to inspect the issue and work on the AI-assisted fix? Fixing “in browser” feels awkward to me, but I have seen some videos of Copilot X doing this so maybe it’s possible? Working with the code back in the code editor is of course much more powerful, but it takes some work to setup the context locally to work on the fix. Wdyt?