• JcbAzPx@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    ·
    3 days ago

    I suppose answering “I don’t know” to every prompt is at least more accurate than what we have now, but I don’t think they’ll want to risk that.

    • skisnow@lemmy.ca
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 days ago

      Of course. What the paper is suggesting is that during training and evaluation you should reward correct answers, punish wrong answers, and treat abstentions as somewhere in between. Current benchmarks punish abstentions and wrong answers equally, therefore models that guess instead of abstaining score higher on average.