• Yaky@slrpnk.net
    link
    fedilink
    English
    arrow-up
    1
    ·
    21 hours ago

    How good is LLM training data for a language spoken by less than 10 million people? Keep in mind that most of those people are probably multilingual (i.e. categorizing which language is which by person is harder), and language itself is similar to its neighbors. And then, again, terms.