Ok, you had a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?

  • Denjin@feddit.uk
    link
    fedilink
    arrow-up
    28
    ·
    2 days ago

    The whole point of maths is that there’s a formulaic approach to getting the answer. You go through a series of discrete steps and you get the answer. The steps will always be the same for the same problem and the answer will always be the same for the same inputs.

    This is something already solved by computation, it doesn’t need a generative AI token matching and learning algorithm to work out the answer to a problem. This isn’t and shouldn’t ever be the use case for a chat bot.

    What you want is a calculator.

    • Farmdude@lemmy.worldOP
      link
      fedilink
      arrow-up
      1
      arrow-down
      1
      ·
      1 day ago

      Wow. I thought cringe was a feeling, not an actual person. Thanks, Lower Organism! My new friend the house plant

  • RiverRabbits@lemmy.blahaj.zone
    link
    fedilink
    arrow-up
    19
    arrow-down
    1
    ·
    edit-2
    2 days ago

    All 6 answers will be wrong. If they are not wrong, then the correctness is purely coincidental and not sign of future correctness.

    If you want to use technology to solve maths problems, use wolfram-alpha any open source maths software, like the ones linked in the reply by technocrit to this post, if you must. They are not LLMs and therefore can actually solve maths problems.

    Also: this reads like a barely hidden AI booster post, OP. Why do you even post this?

  • bcovertigo@lemmy.world
    link
    fedilink
    English
    arrow-up
    17
    ·
    2 days ago

    No, because they don’t do math. If the LLM calls a script to do the math and just formats the input it might get accurate results consistently… but you just invented a machine to press calculator buttons for you at that point which is hilariously energy inefficient. This is unacceptable from a cost and reliability standpoint. If you’re familiar with enterprise reliability metrics you’d weep at the thought of a multistage process where each step had a single 9 and no visibility to underlying model tuning that can change outputs in wildly unexpected ways.

  • JandroDelSol@lemmy.world
    link
    fedilink
    arrow-up
    6
    ·
    2 days ago

    here, let’s try something.

    3 + 6 = search history on the internet play with the devil and the other than the other than that I just got a new phone and I don’t know what to do with the guys had to go to the store and I don’t know what to do with the devil and the other than that I just want to be a gooner for the bit of a way to get a little busy but I don’t know what to do with the guys had to do that but I don’t think I can do it but I don’t think I can do it but I don’t want to be a gooner but I don’t want to be a gooner but I don’t want to be a gooner

    I just tapped my phone’s next word predictor. LLMs are a slightly more coherent version of that

  • jjjalljs@ttrpg.network
    link
    fedilink
    arrow-up
    12
    ·
    2 days ago

    No. If they showed the work and I could repeat the steps myself and get the answer maybe, but I’d still be wary of some subtle wrong.

  • technocrit@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    6
    ·
    edit-2
    2 days ago

    Would you trust the answer?

    No because it’s a crap shoot where the odds are intentionally obscured. That’s real bad for math but it’s perfect for the casino economy.

  • xia@lemmy.sdf.org
    link
    fedilink
    English
    arrow-up
    2
    ·
    2 days ago

    I remember there being tricks to get statistically more accurate math, like “show your work”, etc.

  • jordanlund@lemmy.world
    link
    fedilink
    arrow-up
    5
    arrow-down
    1
    ·
    2 days ago

    I don’t think they actually calculate anything.

    If you ask an LLM to calculate Pi to 14 digits, it’s not doing math, it’s looking at lists of Pi calculated out.

    Poison that feed? All of them will give you the wrong numbers because they aren’t actually doing math.