The “em-dashes” (—) come up a lot in online translations of books like Bible and Quran.

Normal keyboard “-” and “–” are different from “—” but microsoft office auto-formats “–” to that.

I kinda assumed it was ALL microsoft word data that caused training to include that.

I am only now realizing AI stole from even the religious texts and influenced by them as well.

  • gedaliyah@lemmy.world
    link
    fedilink
    arrow-up
    21
    ·
    14 hours ago

    Well, maybe a little. Em dashes and en dashes are pretty standard (and editorially enforced) in newspapers and academic journals. By length, every religious text is eclipsed by news and journal media on a daily basis.

  • AbouBenAdhem@lemmy.world
    link
    fedilink
    English
    arrow-up
    40
    ·
    19 hours ago

    Each dash has a different use case that all professionally-typeset books adhere to (not just religious texts).

    Hyphens are for compound words; en-dashes are for ranges or (on rare cases) to disambiguate multiple levels of hyphens; and em-dashes are for parenthetical dashes (for publishers who don’t use spaced en-dashes instead).

  • Norah (pup/it/she)@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    25
    arrow-down
    1
    ·
    18 hours ago

    I am only now realizing AI stole from even the religious texts and influenced by them as well.

    As far as I’m aware, religious texts are all public domain. While I hate AI, they should have free access to public domain just like anyone else.

    • Siru@discuss.tchncs.de
      link
      fedilink
      arrow-up
      2
      ·
      6 hours ago

      And also, in my understanding religions are supposed to help the general populace live a more fullfilled life and get to a better end result. So in this case it should be fair to put forth eminent domain (or whatever the text equivalent would be) for both the original texts and the translations.

    • Flax@feddit.uk
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      6 hours ago

      Not all are, it’s just many translations are old enough to be public domain. But some things like the English Standard Version of The Bible isn’t public domain, vs the Geneva Bible which is

    • DomeGuy@lemmy.world
      link
      fedilink
      arrow-up
      9
      arrow-down
      1
      ·
      18 hours ago

      While you’re largely right, it is worth noting that each translation is a distinct work under copyright law, and any translation made after 1929 may be still protected.

      And that ignores really young religions, and the copyright status of high-authority extant religions such as Iranian Islam, Mormon and Roman Catholic Christianity, Ron Hubbard’s Scientology or state-atheist communism.

      (Whether or not Hubbard, Lenin, Stalin, and Mao count as “religious leaders” is a distinction without a difference in discussion of the copyright status of their works.)

    • gedaliyah@lemmy.world
      link
      fedilink
      arrow-up
      3
      ·
      14 hours ago

      Maybe it’s changed, but my experience with OCR is that it is not great at detecting nuances of punctuation.

  • A_norny_mousse@feddit.org
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    12 hours ago

    Software converts human-typed comments to use fancy quotes, dashes and other punctuation. Even this platform does that with the Markdown extension Fancypants - look at the quotes in your post.

    That’s where LLMs get this from.

    E.g.: to get an em-dash here: --- => —

    • 4am@lemmy.zip
      link
      fedilink
      arrow-up
      3
      ·
      5 hours ago

      I think that highly depends on the client you use:

      But I see your point