cross-posted from: https://kbin.melroy.org/m/technology@lemmy.world/t/1067613
Coders spent more time prompting and reviewing AI generations than they saved on coding. On the surface, METR’s results seem to contradict other benchmarks and experiments that demonstrate increases in coding efficiency when AI tools are used. But those often also measure productivity in terms of total lines of code or the number of discrete tasks/code commits/pull requests completed, all of which can be poor proxies for actual coding efficiency. These factors lead the researchers to conclude that current AI coding tools may be particularly ill-suited to “settings with very high quality standards, or with many implicit requirements (e.g., relating to documentation, testing coverage, or linting/formatting) that take humans substantial time to learn.” While those factors may not apply in “many realistic, economically relevant settings” involving simpler code bases, they could limit the impact of AI tools in this study and similar real-world situations.
- These factors lead the researchers to conclude that current AI coding tools may be particularly ill-suited to "settings with very high quality standards, or with many implicit requirements (e.g., relating to documentation, testing coverage, or linting/formatting) - What on earth? These are the projects that AI excels at because there are more, better examples to use from the code. When your code doesn’t have high quality standards, you end up with multiple patterns to achieve the same thing instead. Similarly, high documentation or testing burdens means that it has more context and more guardrails built it, whereas a repo without these reqs wouldn’t be able to verify changes as easily. 


