Not trying to be snarky, with all due respect... this is a skill issue.
It's a tool. It's a wildly effective and capable tool. I don't know how or why I have such a wildly different experience than so many that describe their experiences in a similar manner... but... nearly every time I come to the same conclusion that the input determines the output.
> If they implement something with a not-so-great approach, they'll keep adding workarounds or redundant code every time they run into limitations later.
Yes, when the prompt/instructions are overly broad and there's no set of guardrails or guidelines that indicate how things should be done... this will happen. If you're not using planning mode, skill issue. You have to get all this stuff wrapped up and sorted before the implementation begins. If the implementation ends up being done in a "not-so-great" approach - that's on you.
> If you tell them the code is slow
Whew. Ok. You don't tell it the code is slow. Do you tell your coworker "Hey, your code is slow" and expect great results? You ask it to benchmark the code and then you ask it how it might be optimized. Then you discuss those options with it (this is where you do the part from the previous paragraph, where you direct the approach so it doesn't do "no-so-great approach") until you get to a point where you like the approach and the model has shown it understands what's going on.
Then you accept the plan and let the model start work. At this point you should have essentially directed the approach and ensured that it's not doing anything stupid. It will then just execute, it'll stay within the parameters/bounds of the plan you established (unless you take it off the rails with a bunch of open ended feedback like telling it that it's buggy instead of being specific about bugs and how you expect them to be resolved).
> you can have 10 bespoke tests for every bug. Plus a new mocking framework created every time the last one turns out to be unfit for purpose.
This is an area I will agree that the models are wildly inept. Someone needs to study what it is about tests and testing environments and mocking things that just makes these things go off the rails. The solution to this is the same as the solution to the issue of it keeping digging or chasing it's tail in circles... Early in the prompt/conversation/message that sets the approach/intent/task you state your expectations for the final result. Define the output early, then describe/provide context/etc. The earlier in the prompt/conversation the "requirements" are set the more sticky they'll be.
And this is exactly the same for the tests. Either write your own tests and have the models build the feature from the test or have the model build the tests first as part of the planned output and then fill in the functionality from the pre-defined test. Be very specific about how your testing system/environment is setup and any time you run into an issue testing related have the model make a note about that and the solution in a TESTING.md document. In your AGENTS.md or CLAUDE.md or whatever indicate that if the model is working with tests it should refer to the TESTING.md document for notes about the testing setup.
Personally, I focus on the functionality, get things integrated and working to the point I'm ready to push it to a staging or production (yolo) environment and _then_ have the model analyze that working system/solution/feature/whatever and write tests. Generally my notes on the testing environment to the model are something along the lines of a paragraph describing the basic testing flow/process/framework in use and how I'd like things to work.
The more you stick to convention the better off you'll be. And use planning mode.
> Whew. Ok. You don't tell it the code is slow. Do you tell your coworker "Hey, your code is slow" and expect great results?
Yes? Why don't you?
They are capable people that just didn't notice something, id I notice some telemetry and tell them "hey this is slow" they are expected to understand the reason(s).
So, you observed some telemetry - which would have been some sort of specific metric, right? Wouldn't you communicate that to them as well, not just "it's slow"?
"Hey, I saw that metric A was reporting 40% slower, are you aware already or have any ideas as to what might be causing that?"
Those two approaches are going to produce rather distinctly different results whether you're speaking to a human or typing to a GPU.
Yeah if my co-worker can't start figuring out why the code is slow, with a reasonable reference to what the code in question is, that is a knock against their skills. I would actually expect some ideas as to what the problem is just off the top of their heads, but that the coding agent can't do that isn't a hit against it specifically, this is now a good part of what needs to be done differently.
The suggestion to tell the agent to do performance analysis of the part of the code you think is problematic, and offer suggestions for improvements seems like the proper way to talk to a machine, whereas "hey your code is slow" feels like the proper way to talk to a human.
As someone who leads a team of engineers, telling someone their code is slow is not nice, helpful or something a good team member should do. It’s like telling them there’s a bug and not explaining what the bug is. Code can be slow for infinite reasons, maybe the input you gave is never expected and it’s plenty fast otherwise. Or the other dev is not senior enough to know where problems may be. It can be you when I tell you your OOP code is super slow, but you only ever done OOP and have no idea how to put data in a memory layouts that avoids cpu cache misses or whatever.
So no that’s not the proper way to talk to humans.
And AI is only as good as the quality of what you’re asking. It’s a bit like a genie, it will give you what you asked , not what you actually wanted. Are you prepared for the ai to rewrite your Python code in C to speed it up? Can it just add fast libraries to replace the slow ones you had selected? Can it write advanced optimization techniques it learned about from phd thesis you would never even understand?
>As someone who leads a team of engineers, telling someone their code is slow is not nice, helpful or something a good team member should do
right, I'm sure there are all sorts of scenarios where that is the case and probably the phrasing would be something like that seems slow, or it seems to be taking longer than expected or some other phrasing that is actually synonymous with the code is slow. On the other hand there are also people that you can say the code is slow to, and they won't worry about it.
>So no that’s not the proper way to talk to humans
In my experience there are lots of proper ways to talk to humans, and part of the propriety is involved with what your relationship with them is. so it may be the proper way to talk to a subset of humans, which is generally the only kinds of humans one talks to - a subset. I certainly have friends that I have worked to for a long time who can say "what the fuck were you thinking here" or all sorts of things that would not be nice if it came from other people but is in fact a signifier of our closeness that we can talk in such a way. Evidently you have never led a team with people who enjoyed that relationship between them, which I think is a shame.
Finally, I'll note that when I hear a generalized description of a form of interaction I tend to give what used to be called "the benefit of a doubt" and assume that, because of the vagaries of human language and the necessity of keeping things not a big long harangue as every communication must otherwise become in order to make sure all bases of potential speech are covered, that the generalized description may in fact cover all potential forms of polite interaction in that kind of interaction, otherwise I should have to spend an inordinate amount of my time lecturing people I don't know on what moral probity in communication requires.
But hey, to each their own.
on edit: "the what the fuck were you thinking here" quote is also an example of a generalized form of communication that would be rude coming from other people but was absolutely fine given the source, and not an exact quote despite the use of quotation marks in the example.
A normal human conversation would specify which code/tasks/etc., how long it's currently taking, how much faster it needs to be, and why. And then potentially a much longer conversation about the tradeoffs involved in making in faster. E.g. a new index on the database that will make it gigabytes larger, a lookup table that will take up a ton more memory, etc. Does the feature itself need to be changed to be less capable in order to achieve the speed requirements?
If someone told me "hey your code is slow" and walked away, I'd just laugh, I think. It's not a serious or actionable statement.
Well, I would say something like "We seem to be having some performance issues the business has noticed in the XYZ stuff. Shall we sit down together and see if we can work out if we can improve things?"
There was a 20+ person team of well paid, smart (mostly Java) programmers that dealt for months with slow application they were building, that everyone knew was slow. I nagged them for weeks to set up indexes even for small, 100 row tables. Once they did things started running orders of magnitude faster.
Your expectations for people (and LLMs) are way too high.
My comment was a summary of the situation, not literal prompts I use. I absolutely realize the work needs to be adequately described and agents must be steered in the right direction. The results also vary greatly depending on the task and the model, so devs see different rates of success.
On non-trivial tasks (like adding a new index type to a db engine, not oneshotting a landing page) I find that the time and effort required to guide an LLM and review its work can exceed the effort of implementing the code myself. Figuring out exactly what to do and how to do it is the hard part of the task. I don't find LLMs helpful in that phase - their assessments and plans are shallow and naive. They can create todo lists that seemingly check off every box, but miss the forest for the trees (and it's an extra work for me to spot these problems).
Sometimes the obvious algorithm isn't the right one, or it turns out that the requirements were wrong. When I implement it myself, I have all the details in my head, so I can discover dead-ends and immediately backtrack. But when LLM is doing the implementation, it takes much more time to spot problems in the mountains of code, and even more effort to tell when it's a genuinely a wrong approach or merely poor execution.
If I feed it what I know before solving the problem myself, I just won't know all the gotchas yet myself. I can research the problem and think about it really hard in detail to give bulletproof guidance, but that's just programming without the typing.
And that's when the models actually behave sensibly. A lot of the time they go off the rails and I feel like a babysitter instructing them "no, don't eat the crayons!", and it's my skill issue for not knowing I must have "NO eating crayons" in AGENTS.md.
Great answer, and the reason some people have bad experiences is actually patently clear: they don’t work with the AI as a partner, but as a slave. But even for them, AI is getting better at automatically entering planning mode, asking for clarification (what exactly is slow, can you elaborate?), saying some idea is actually bad (I got that a few times), and so on… essentially, the AI is starting to force people to work as a partner and give it proper information, not just tell them “it’s broken, fix it” like they used to do on StackOverflow.
If I was on the "replace all the meatsacks AGI ftw" team then I would have referred to it as an oracle, by your own logic, wouldn't I have?
It's a tool. It's good for some things, not for others. Use the right tool for the job and know the job well enough to know which tools apply to which tasks.
More than anything it's a learning tool. It's also wildly effective at writing code, too. But, man... the things that it makes available to the curious mind are rather unreal.
I used it to help me turn a cat exercise wheel (think huge hamster wheel) into a generator that produces enough power to charge a battery that powers an ESP32 powered "CYD" touchscreen LCD that also utilizes a hall effect sensor to monitor, log and display the RPMs and "speed" (given we know the wheel circumference) in real time as well as historically.
I didn't know anything about all this stuff before I started. I didn't AGI myself here. I used a learning tool.
But keep up with your schtick if that's what you want to do.
Oracles have their use too, but as long as you keep confusing "oracle" and "tool" you will get nowhere.
P.S. The real big deal is the democratization of oracles. Back in the day building an oracle was a megaproject accessible only to megacorps like Google. Today you can build one for nothing if you have a gaming GPU and use it for powering your kobold text adventure session.
>I used it to help me turn a cat exercise wheel (think huge hamster wheel) into a generator that produces enough power to charge a battery that powers an ESP32 powered "CYD" touchscreen LCD that also utilizes a hall effect sensor to monitor, log and display the RPMs and "speed" (given we know the wheel circumference) in real time as well as historically.
So what? That's honestly amateur hour. And the LLM derived all of it from things that have been done and posted about a thousand times before.
You could have achieved the same thing with a few google searches 15 years ago (obviously not with ESP32, but other microcontrollers).
Right - it's not a big deal and it LITERALLY is amateur hour. But I did it. I wouldn't have done it prior, sure I could have done a bunch of google searches but the time investment it would have taken to sift through all that information and distill it into actionable chunks would have far exceeded the benefit of doing so, in this case.
The whole point is that it is amateur hour and it's wildly effective as a learning tool.
The fact it derived everything from things that have been done... yea, that's also the point? What point are you trying to make here? I'm well aware it's not a great tool if you're trying to use it to create novel things... but I'm not a nuclear physicist. I'm a builder, fixer, tinkerer who happens to make a living writing code. I use it to teach me how to do things, I use it to analyze problems and recommend approaches that I can then delve into myself.
I'm not asking it to fold proteins. (I guess that's been done quite a bit too, so would be amateur as well)
>The whole point is that it is amateur hour and it's wildly effective as a learning tool.
You sound so proud of your accomplishment, and I question if there's really nothing to be proud of here. I doubt you really learned anything, a machine told you what to do and you did it, like coloring by numbers - it doesn't make you an artist. You won't be able to build upon it, without asking the machine to do more of the thinking for you. And I think that's kind of sad.
>I'm a builder, fixer, tinkerer who happens to make a living writing code
I have to doubt that. If you were all those things, you would have been able to complete that project with very little effort, and without a machine telling you what to do.
OP was writing how great the LLM is, and that he couldn't do this stuff as easily before LLMs. And that simply isn't true.
Instead of breaking down the task himself into achievable steps, the LLM did that "thinking" for him. This will inevitably lead to atrophy of the brain. If you don't exercise your brain, and let the tin-can tell you what to do, you're going to get pretty dull. It's well known that keeping your brain active, solving problems, will keep your mental abilities strong. Using LLMs is the opposite of that.
lmao - I'm not at all proud of what you called an accomplishment. I literally said it _is_ amateur hour, it's hacked together, not safe, not stylish, not well engineered. But it does work. And despite your assumption about me learning anything - I had _no idea_ how generators worked. The realization that spinning an electric motor would result in electricity being produced blew my mind and got me asking claude things related to that, then I wanted to interface a wheel against my wheel to spin a stepper motor to get a charge and had the hair brain idea to just make the whole thing the generator instead. None of this was stuff I knew.
Despite this thing I made being rather useless in the grand scheme of things it was _wildly_ illuminating in terms of my understanding of electricity and the various objects around me and how they function. Which has spurred another rabbit hole that is having _real measurable effect_ for a host of feral cats to live a more comfortable life. (Not the wheel generator thing)
> a machine told you what to do and you did it, like coloring by numbers - it doesn't make you an artist.
I never claimed to be an artist ;) And, maybe it's different for you, but someone or something showing me how to do something is quite literally the best way for me to learn. /shrug
> I have to doubt that. If you were all those things, you would have been able to complete that project with very little effort, and without a machine telling you what to do.
> Do you tell your coworker "Hey, your code is slow" and expect great results? You ask it to benchmark the code and then you ask it how it might be optimized.
...Really? I think 'hey we have a lot of customers reporting the app is laggy when they do X, could you take a look' is a very reasonable thing to tell your coworker who implemented X.
It's a tool. It's a wildly effective and capable tool. I don't know how or why I have such a wildly different experience than so many that describe their experiences in a similar manner... but... nearly every time I come to the same conclusion that the input determines the output.
> If they implement something with a not-so-great approach, they'll keep adding workarounds or redundant code every time they run into limitations later.
Yes, when the prompt/instructions are overly broad and there's no set of guardrails or guidelines that indicate how things should be done... this will happen. If you're not using planning mode, skill issue. You have to get all this stuff wrapped up and sorted before the implementation begins. If the implementation ends up being done in a "not-so-great" approach - that's on you.
> If you tell them the code is slow
Whew. Ok. You don't tell it the code is slow. Do you tell your coworker "Hey, your code is slow" and expect great results? You ask it to benchmark the code and then you ask it how it might be optimized. Then you discuss those options with it (this is where you do the part from the previous paragraph, where you direct the approach so it doesn't do "no-so-great approach") until you get to a point where you like the approach and the model has shown it understands what's going on.
Then you accept the plan and let the model start work. At this point you should have essentially directed the approach and ensured that it's not doing anything stupid. It will then just execute, it'll stay within the parameters/bounds of the plan you established (unless you take it off the rails with a bunch of open ended feedback like telling it that it's buggy instead of being specific about bugs and how you expect them to be resolved).
> you can have 10 bespoke tests for every bug. Plus a new mocking framework created every time the last one turns out to be unfit for purpose.
This is an area I will agree that the models are wildly inept. Someone needs to study what it is about tests and testing environments and mocking things that just makes these things go off the rails. The solution to this is the same as the solution to the issue of it keeping digging or chasing it's tail in circles... Early in the prompt/conversation/message that sets the approach/intent/task you state your expectations for the final result. Define the output early, then describe/provide context/etc. The earlier in the prompt/conversation the "requirements" are set the more sticky they'll be.
And this is exactly the same for the tests. Either write your own tests and have the models build the feature from the test or have the model build the tests first as part of the planned output and then fill in the functionality from the pre-defined test. Be very specific about how your testing system/environment is setup and any time you run into an issue testing related have the model make a note about that and the solution in a TESTING.md document. In your AGENTS.md or CLAUDE.md or whatever indicate that if the model is working with tests it should refer to the TESTING.md document for notes about the testing setup.
Personally, I focus on the functionality, get things integrated and working to the point I'm ready to push it to a staging or production (yolo) environment and _then_ have the model analyze that working system/solution/feature/whatever and write tests. Generally my notes on the testing environment to the model are something along the lines of a paragraph describing the basic testing flow/process/framework in use and how I'd like things to work.
The more you stick to convention the better off you'll be. And use planning mode.