So why does it work better than the commercially available tools at the moment a...

sankha93 · on July 23, 2020

This article is a classic example of Facebook PR doing a wonderful job of selling the research and linked paper [1] claiming too much in the introduction. Please, please talk to actual researchers before you buy such claims.

If you go through the paper - you have to check the evaluation section to see how they measured their success. They used some programs from GeeksforGeeks to evaluate their approach. Problems on GeeksforGeeks do not represent the vast majority of programming tasks encountered in daily life. This is very much in contrast to the overarching claims presented in the introduction of the paper.

Second issue with the evaluation: they use BLEU scores to judge how good their translations are. BLEU makes sense for natural language translations (even that is widely debated in the NLP community these days). For a program there is no concept of an almost correct program (based on how things look similar), it is either correct or not. Eg. if I am asked to write a program to add two numbers and I write `x - y`, I am not almost correct, I am completely wrong. And in some ways that is what their model does, it optimizes for BLEU scores.

Third, the correctness of the programs are tested based on 10 random inputs. Are 10 random inputs enough to cover the entire input space that can be accepted by a program?

It is indeed a great advance in the application of ML technology, but it is nowhere close to the broader claims. One can even debate, ROI on time spent in gathering and curating data and then checking the correctness of translation from such system vs the ROI on writing rules for a rule based system since all programming languages are easily expressible that way.

[1]: https://arxiv.org/pdf/2006.03511.pdf

PaulHoule · on July 23, 2020

Because the commercially available tools aren't that good and because they kept working on it until it was better.

If they kept working on it I think they would run into an asymptote. Maybe they could get closer and closer to 90% accuracy on a task with real hardware, 92% boiling the oceans, and 93% with a Dyson sphere, 93.5% if you can harness a quasar. At that point it probably passes a whiteboard interview and the people who have to fix the bugs can console themselves that the last programmer had neither a brain nor a soul.

That system has an approximate, not an exact model of the domain it works on and that is why it has an asymptote. Turning a graph-structured program into a vector is like mapping the curved surface of the Earth onto a flat map -- except instead of it being a 3-dimensional space it is more like a 1000-dimensional space. Information is destroyed in that process and forever lost so there will always be important characteristics of the problem that it will never "get".

If the message recipient is a person they will meet you halfway and might even accept bullshit if it is presented with complete confidence and lack of shame. The computer will interpret exactly what you said and reveal that you're a dog. (e.g. mute animal)