The author starts out with an excellent observation: Lately, I've been playing a...

freedomben · on May 25, 2023

I've been hearing executives and "tech leaders" recently saying that 80% of the new code is now written by chatgpt, and that it will "10x" a developer, but that sure mismatches with my experience. I suspect there will be a lot of managers with much higher expectations than is reasonable, which won't be good.

visarga · on May 25, 2023

It's actually 1.2x productivity. Writing code does not take most of the day. If GPT can be just as good at debugging as it is at writing code, maybe the speedup would increase a bit. The ultimate AI speed == human reading + thinking speed, not AI generation speed.

blowski · on May 25, 2023

I’ve found it’s very useful for understanding snippets of code, like a 200-line function. But systems are more than “lots of 200-line functions” - there’s a lot of context hidden in flags, conditional blocks, data, Git histories.

Maybe one day, we’ll be able to run it over all the Git histories, Jira tickets, Confluence documentation, Slack conversations, emails, meeting transcripts, presentations, etc. Until then, the humans will need to stitch it all together as best they can.

darkerside · on May 26, 2023

I can't think of a reason you couldn't specifically train an AI on your own large code base. After all, current LLMs are trained on effectively the entire internet.

zappchance · on May 26, 2023

Unless your documentation is 20~100x the size of your codebase and written in a conversational tone, the LLM won't be able to be asked any questions about it using English.

If your only aim is to use it like Copilot, sure, it's useful.

oceanplexian · on May 26, 2023

You might be able fine tune a model on pull requests if they have really high quality descriptions, high quality commit messages, and the code is well documented and organized.

still_grokking · on May 27, 2023

I'm really not sure this is irony or a serious comment.

sandos · on May 26, 2023

So the first step is to let the LLM write the documentation.... :)

still_grokking · on May 27, 2023

Sure. Because it understands the code so well.

blowski · on May 26, 2023

I haven't yet seen anything that can scan an entire codebase and build, say, a data lineage to understand how a value in the UI was calculated. I'm sure it's coming, though.

still_grokking · on May 27, 2023

It's coming, I'm sure.

Just right after we have invented AGI.

fragmede · on May 25, 2023

but not all human thinking is worthwhile. I had it do a simple chrome extension to let me reply inline on HN, and it coughed out a manifest.json that worked first try. I didn't have to poke around the Internet to find a reference and then debug that via stack overflows. Easily saved me half an hour and gave me more mental bandwidth for the futzing with the DOM that I did need to do. (to your point tho, I didn't try feeding the html to it to see if it could do that part for me.)

so it's somewhere between 1.2x and 10x for me, depending on what I'm doing that day. Maybe 3x on a good day?

jjnoakes · on May 25, 2023

I don't mean to pick on you specifically, but this kind of approach doesn't fit the way I like to work.

For example, just because the manifest.json worked doesn't mean it is correct - is it free of issues (security or otherwise)?

I would argue that every system in production today seemed to "just work" when it was built and initially tested, and yet how many serious issues are in the wild today (security or otherwise)?

I prefer to take a little more time solving problems, gaining an understanding of WHY things are done certain ways, and some insight into potential problems that may arise in the future even if something seems to work at first glance.

Now I get that you are just talking about a small chrome extension that maybe you are only using for yourself... but scaling that up to anything beyond that seems like a ticking time bomb to me.

sharemywin · on May 25, 2023

I feel like you would get more benefit out of GPT. you could ask it if it finds any vulnerabilities, common mistakes, other inconstancies. please provide comments on what each line does. what are some other common ways to write this line of code, etc.

what are some ways to handle this XYZ problem. I see you might have missed sql injection attacks. would that apply here?

Same goes for code you find on the internet.

I got this out put for this line of code what do you think the problem is.

still_grokking · on May 27, 2023

Big misunderstanding about those chat-bot AIs.

Even OpenAI says clearly: You should not, by any means, ask the AI any questions you don't know the answer already!

> more benefit out of GPT. you could ask it if it finds any vulnerabilities, common mistakes, other inconstancies. please provide comments on what each line does. what are some other common ways to write this line of code, etc.

And than it spits out some completely made up bullshit…

How would you know if you don't understand what you're actually doing?

jjnoakes · on May 25, 2023

Every time I've tried chatgpt I've been shocked at the mistakes. It isn't a good tool to use if you care about correctness, and I care about correctness.

It may be able to regurgitate code for simple tasks, but that's all I've seen it get right.

pixl97 · on May 26, 2023

You using 3.5 or 4?

still_grokking · on May 27, 2023

Makes no difference. Both versions are mostly a bullshit generator.

But to see that you actually need to know in detail the things you're asking about.

After using it for some time I'm by now quite surprised when this thingy gets something actually right. But that are very seldom cases.

0xedd · on May 25, 2023

Most managers don't care for people like you. Companies sell their product. Another successful fiscal year. Irregardless of the absolute shit code base, wasteful architecture and gaping security vulnerabilities.

jjnoakes · on May 25, 2023

Maybe, but I've always had lucrative jobs and my work has always been appreciated. Maybe you just have to find the right employer. I think longer-term, employers that value high quality work will have the upper-hand.

And to be honest, I don't care for managers like that, so the feeling is mutual.

ResearchCode · on May 26, 2023

What managers? The LLM will do their job first.

bredren · on May 26, 2023

I think it is worthwhile considering the multiple of the alternatives.

That is, SERPs providing relevant discussion and exacting or largely turnkey solutions.

On “easy” tasks in technical niches I’m not familiar with, I would take gpt over DDG + SO the majority of the time.

I’ve had situations where I’m wanting a tutorial or walk through on something and mixing various sub stack and independent blog posts.

There is no consistency in quality or even correctness from those sources, while you must also deal with format and styling variation.

I block adtech within reason, but the problem of greyhat or SEO-focused filler also isn’t really a thing in gpt. You just get the fat of the land.

The biggest problem w gpt 4 is the cutoff date. The LLM needs to be updated regularly, the way Google initially seemed to crawl all the things and make them available to queries as they appeared.

Data recency and the ability to process excess tokens is going to show OP’s test as but a toy example of what the systems can do.

In the absence of constant updates of a massive model and all that entails, I foresee Companies temporarily dominating attention by providing pretrained LORA-like add-ons to LLMs at key events.

For example, Apple could release a new model trained on all of the updated Swift libraries coming to market at WWDC shortly after the keynote.

It can contain all the docs, example code and allow devs to really, really experiment and have warm and fuzzies on the newest stuff.

It could even include the details on product announcements and largely handle questions of compatibility.

If the companies hosted the topic focused chatgpt-like bots, they could also own all the unexpected questions, and both clarify and retrain on what the most enthusiastic users want to know.

This is going a bit of another direction, but I think all of this is very exciting and will hasten the delivery of software for brand new SDKs.

zappchance · on May 26, 2023

> The biggest problem w gpt 4 is the cutoff date. The LLM needs to be updated regularly, the way Google initially seemed to crawl all the things and make them available to queries as they appeared.

Have you tried out Phind? It's essentially GPT-4 + internet access. It hasn't been perfect, but it's been a very useful tool for me.

still_grokking · on May 27, 2023

Using a search engine would have yielded https://github.com/plibither8/refined-hacker-news in a tiny fraction of the time wasted with the AI.

Also chances are great that the AI just spit out some code from that extension… (Of course without attribution. Which would make it a copyright volition.)

fomine3 · on May 26, 2023

It works great for such quick making tasks based on open source, but not so great for large existing closed system developing.

walthamstow · on May 25, 2023

1.2x is a good rough number. Some things it saves me hours (regex, writing unfamiliar languages), some thing I never even bother asking it about

pfsalter · on May 26, 2023

> Writing code does not take most of the day.

Definitely this. I spend about 10-15% of my time writing code so a 20% increase really doesn't save me a lot of time. Also AI generated code requires more reading of code, which is harder and more cognitively expensive than writing code.

darkerside · on May 26, 2023

Depends on the code. If AI can quickly write even mediocre quality automated tests, that's a tremendous speedup, and, if I'm being totally honest, morale boost.

still_grokking · on May 27, 2023

If AI can write your tests, you're testing likely implementation details. This has "negative value"!

https://www.youtube.com/watch?v=EZ05e7EMOLM

But an AI can't write higher level tests as it would need to understand large chunks of code (sometimes whole systems), which it can't.

darkerside · on May 29, 2023

I disagree about negative value. When it's this easy to throw away and rewrite tests, it's helpful to test the implementation details of your code.

still_grokking · on May 31, 2023

Throwing away and rewriting tests is work (== negative value).

Testing implementation details is always contra productive. Have you watched the video? (I'm not recommending videos often, as I think writings have more value per time-unit, but this talk is a kind of classic on that topic.)

darkerside · on June 1, 2023

Sorry, I can't accept that a one hour video is the cost to participate in this conversation. I don't disagree that there is a cost associated with doing this, but the cost is so much smaller than it used to be that it can be economical now.

Implementation details are in the eye of the beholder IMO. I'm open to reasons why that's not the case here.

anotherpaulg · on May 26, 2023

It can. I use aider/GPT-4 for this all the time. It’s super valuable and very low effort.

epolanski · on May 25, 2023

Most of the people I know don't use ChatGPT anymore. Many do use copilot as it is sometimes handy but far from life changing.

derwiki · on May 26, 2023

I use it pretty regularly for things like “write me a typed retry decorator in python, and write tests for it” or “parallelize this for loop using a ThreadPoolExecutor”

ChatGTP · on May 26, 2023

I work with people who have used it almost daily since it came out. None of them have 10x’d anything. Even their open source ChatGPT related projects are stalled and not going anywhere. Not to say it can’t help but I’ve not yet encountered this mythical 10x boost.

As others have observed, maybe most have stopped using it.

Maybe Ray Kurzweil is kind of on the money about computer brain interfaces, that’s when the fun really starts.

still_grokking · on May 27, 2023

No worry. As soon as 80%+ of those managers will be replaced by ChatGPT such nonsense claims will stop; as ChatGPT knows things better than most managers. /s

bredren · on May 25, 2023

I did not know what Ctags were, here is the explanation from exuberant Ctags:

>Ctags generates an index (or tag) file of language objects found in source files that allows these items to be quickly and easily located by a text editor or other utility. A tag signifies a language object for which an index entry is available (or, alternatively, the index entry created for that object).

https://ctags.sourceforge.net/whatis.html

anotherpaulg · on May 25, 2023

Good point, ctags is a bit old school! Most IDEs use Language Server Protocol now for similar purposes.

I added a few sentences of explanation and background on ctags to my writeup.

cyrialize · on May 25, 2023

I'm a big fan of ctags. My old Emacs setup utilized Ctags when all else failed. So if I wanted to find a reference for something it would use LSP, then ctags if LSP returned nothing.

RossBencina · on May 26, 2023

I've been thinking that there needs to be a LangChain or similar in-prompt Tool that allows the model to automatically query a Language Server Protocol server. Maybe your tool does everything that could be done that way anyhow. aider looks interesting, I will try it out.

anotherpaulg · on May 26, 2023

Absolutely, LSP is strictly more capable than ctags for this purpose. I mention it in the future work section of the writeup I linked above.

The main reason I started with ctags is because the `universal ctags` tool supports a ton of languages right out of the box. Each LSP server implementation tends to only support a single language. So it would be more work for users to find, install and stand up the LSP server(s) they need for their particular projects.

I plan to try some experiments with LSP in the future.

linsomniac · on May 26, 2023

I keep reading that you can upload data to one of the GPT interfaces, I'm not sure if it's a ChatGPT plugin, or one of the OpenAI API, but I wonder if you can include some source code that you can then prompt based on...

I've been kind of tricking GPT, admittedly I'm currently limited to gpt-3.5-turbo, because with the API you can submit whole conversations and ask for the next reply from the assistant. I don't know how much of that whole conversation it re-reads, but I'll do things like:

  - Give it a system prompt with guidelines.
  - Give it my user prompt, which contains a description of what I want it to do.
  - Give it an "assistant response" which I've faked which is like: "Can you tell me more about the database schema?"
  - Give it my answer, which is the database schema.
  - Give it another faked assistant response where it is asking me about the API endpoints I need.
  - Give it my endpoints...

At this point, I then ask it to generate a response.

I've found that if I try to combine all the above information into a single prompt, I often run into token limits.

irrational · on May 25, 2023

How much time did it take to prepare all of that for ChatGPT? Won’t you have to redo all of that work every time you ask for more help since code bases are not static? Would it take less time and effort to just write the code on your own?

anotherpaulg · on May 25, 2023

Ya, it would be tedious to do all of this manually. I guess it wasn't clear, but all of this is part of my open source GPT coding tool called "aider".

aider is a command-line chat tool that allows you to write and edit code with GPT-4. As you are chatting, aider automatically manages all of this context and provides it to GPT as part of the chat. It all happens transparently behind the scenes.

andsoitis · on May 26, 2023

> needs to understand

LLMs don’t understand. They don’t understand because they have no model of reality.

gmerc · on May 26, 2023

Can you prove that? Because understanding ultimately is the ability to map to abstractions and GPT4 and other models are certifiably able to abstract.

jen729w · on May 26, 2023

The onus is on you to prove that they do understand.

One may not simply claim that a machine ‘understands’ because it appears to. Now we’re actually in the much-misunderstood Occam’s razor: that the theory which introduces the fewest new assumptions is the most likely correct.

The assumption that a machine ‘understands’ — is capable of such a thing — is new, and requires extreme justification.

andsoitis · on May 28, 2023

> Can you prove that?

Yes. One can easily construct a conversation where it is obvious that ChatGPT doesn't understand.

gmerc · on May 28, 2023

Ya that’s not how proof works. People can do the same with humans. Not understanding things happens to be not a sign for the lack of capacity of understanding in humans.

joenot443 · on May 25, 2023

This is a game changer. I’ve been doing something similar, relatively manually. I’ll give this a spin and report back.

anotherpaulg · on May 25, 2023

Glad to hear you're going to give it a try. Let me know how it goes.

still_grokking · on May 27, 2023

> Convey all of this “code context” to GPT in an efficient manner that fits within the 8k-token context window.

> To address these issues, I send GPT a concise map of the whole codebase. The map includes all declared variables and functions with call signatures.

Even if you could encode a function signature in only one token (which isn't be possible, of course) this would only work for small single developer programs, and not large projects.

Mid-sized frameworks / libs have often teens of millions of lines of code, and hundred thousands of function signatures. With 8k tokens to spend you go nowhere even close to a large code base…

fnordpiglet · on May 25, 2023

This is awesome, thanks. I wonder if complementing it with a vector store of the full repo further assists.

anotherpaulg · on May 25, 2023

Ya, vector search is certainly the most common hammer to reach for when you're trying to figure out what subset of your full dataset to share with an LLM. And you're right, it's probably a piece of the puzzle for coding with GPT against a large codebase.

But I think code has such a semantically useful structure that we should probably try and exploit that as much as possible before falling back to "just search for stuff that seems similar".

Check out the "future work" section near the end of the writeup I linked above. I have a few possible improvements to this basic "repo map" concept that I'm experimenting with now.

fnordpiglet · on May 25, 2023

No I agree the distilled map is most useful in context. However I wonder if providing a vector store of the total code base amplifies the effect. You could also pull in vector stores of all dependencies as well. Regardless amazing work and looking forward to seeing your future work as outlined.

jiggawatts · on May 26, 2023

LLMs can put their embeddings into a vector database, extending their short-term memory to encompass an entire codebase, every support ticket, PR, etc...

I haven't seen any turnkey solutions yet good enough to use for development, but it's coming with months, not years.

drvdevd · on May 25, 2023

Using ctags for this purpose is genius. Thanks for posting!