These models aren't capable of "doing" things. They can't necessarily calculate values etc. They can assemble text based (more or less) on statistical analysis of all their input data, and the are incredible at it... but if you ask what 2+2 is, its not doing math to get to 4.
They certainly can't deploy the code they generate etc.
I tried asking GPT-4 a math question and it does do math:
Me: "What (8 * 25) / 14 + 4 - 2?"
AI: "Let's break down the expression and solve it step by step:
Multiply 8 and 25: 8 * 25 = 200
Divide the result by 14: 200 / 14 ≈ 14.2857 (rounded to four decimal places)
Add 4 to the result: 14.2857 + 4 = 18.2857
Subtract 2 from the result: 18.2857 - 2 = 16.2857 (rounded to four decimal places)
So, the result of the expression (8 * 25) / 14 + 4 - 2 is approximately 16.2857."
They certainly can't deploy the code they generate etc.