The “em-dashes” (—) come up a lot in online translations of books like Bible and Quran.
Normal keyboard “-” and “–” are different from “—” but microsoft office auto-formats “–” to that.
I kinda assumed it was ALL microsoft word data that caused training to include that.
I am only now realizing AI stole from even the religious texts and influenced by them as well.
Maybe it’s changed, but my experience with OCR is that it is not great at detecting nuances of punctuation.