Our first outage from LLM-written code

skip0110@lemmy.zip · 3 个月前

Our first outage from LLM-written code

deegeese@lemmy.dbzer0.com · edit-2 3 个月前

I gasped when I saw this:

A bit of discussion indicated that the trigger for the CPU spikes both times was our CEO logging in. We re-deployed to get a clean start, permanently banned him from the service, and moved on.

This is like finding a live grenade under your bed and putting it under the rug.

They found a way to reproduce a system killing bug, and instead of taking the time to understand it, they threw away their test case.

BlazeDaley@lemmy.world · 3 个月前

They contained the impact. Root causing or “understanding” should come after impact mitigation. If needed find a safe way to reproduce the bug without customer impact.

We reverted the refactoring, deployed, un-banned the CEO, and set about analysis.

FizzyOrange@programming.dev · 3 个月前

Yeah me too but if you keep reading they didn’t actually “move on” in the way that it sounds.

Irdial@lemmy.sdf.org · 3 个月前

Well done. More and more companies are deploying LLM-written code in production environments. Might as well be honest about the results so we can learn what does and doesn’t work.

bookmeat@lemmynsfw.com · 3 个月前

It’s obvious that the LLM didn’t understand the code at all. It chose to refactor the way it did because of a silly comment.

Awkwardparticle@programming.dev · 3 个月前

It’s an inference model. It does not understand code no matter how much context it has. It can however output the most probable solution based on the context it has.

skip0110@lemmy.zip · 3 个月前

Why are we using tools that can’t parse the comment and code via syntax for refactoring?

spartanatreyu@programming.dev · 3 个月前

The first problem is they’re letting AI touch their code.

The second problem is they’re relying on a human to pick up changes in moved code while using git’s built-in diff tools. There’s a whole bunch of studies that show how git’s diff algorithms are terrible, and how swapping to newer diff algos improves things considerably.

TL;DR on the studies:

Only supporting add/remove/move operations is really bad.
Adding syntax awareness to understand if differences in indentation should be brought to a reviewer’s attention, improves code and makes code reviews more accurate. (But this is hard because it’s language dependent)
Adding extra operations (indent/deindent/move/rename-symbol/comment/un-comment/etc…) makes code review easier, faster and more accurate. (But again, most of this requires syntax awareness.

There’s also a bunch of alternative diff algos you can use, but the best ones are paid, and the free ones have fewer features. See: