There have been multiple things which have gone wrong with AI for me but these two pushed me over the brink. This is mainly about LLMs but other AI has also not been particularly helpful for me.
Case 1
I was trying to find the music video from where a screenshot was taken.
I provided o4 mini the image and asked it where it is from. It rejected it saying that it does not discuss private details. Fair enough. I told it that it is xyz artist. It then listed three of their popular music videos, neither of which was the correct answer to my question.
Then I started a new chat and described in detail what the screenshot was. It once again regurgitated similar things.
I gave up. I did a simple reverse image search and found the answer in 30 seconds.
Case 2
I wanted a way to create a spreadsheet for tracking investments which had xyz columns.
It did give me the correct columns and rows but the formulae for calculations were off. They were almost correct most of the time but almost correct is useless when working with money.
I gave up. I manually made the spreadsheet with all the required details.
Why are LLMs so wrong most of the time? Aren’t they processing high quality data from multiple sources? I just don’t understand the point of even making these softwares if all they can do is sound smart while being wrong.
I assume by “thinking engine” you mean “Reasoning AI”.
Reasoning AI is just more bullshit. What happens is that they produce the output the way they always do - by guessing at a sequence of words that is statistically adjacent to the input they’re given - but then what they do is produce a randomly generated “Chain of thought” which is invented in the same way as the result; just pure statistical word association. Essentially they create the output the same way that a non-reasoning LLM does, then they give r themselves the prompt “Write a chain of thought for this output.” There’s a little extra stuff going on where they sort of check their own output, but in essence that’s just done by running the model multiple times and picking the output they converge on. So, just weighting the randomness, basically.
I’m simplifying a lot here obviously, but that’s pretty much what’s going on.
Basically reworded what I was saying almost exactly, but yes.