“The room was messiestest I had ever seen”, “Bookmaal and Noknиŋɑɪ”[1]: Incorrect information, bad spelling and linguistic hallucinations have set off alarms for the Norwegian Language Council, as teachers and principals send concerned messages about a clearly AI-generated dictionary.

“Your complete resource for the Norwegian language” — these are the words at the top of the website ordlista.no. What the website actually provides, however, are made-up words, bad grammar, wholly nonexistent expressions, and alternative language history, according to the Norwegian Language Council. They are concerned.

—“The information could easily seem to be of sound quality to an untrained reader, but once we took a closer look at it, we saw right away that there is absolutely no quality control here,” director Åse Wetås says.

—“Extremely harmful”

The website claims that it “uses freely-available AI models to create insightful articles providing guidance and useful resources for our readers.”

Wetås chortles as she presents examples from the website. You can, among other things, read about “the depth and usage of the phrase ‘to be like a fish out of water’”, or learn about how the dialects of former Sogn og Fjordane county are “more conservative and closer to what we could call pure Bokmål[2]

Although Wetås can’t help but laugh at the examples, she is still worried. The website claims to have content about language learning and grammar, and claims that it “provides the tools you need to master the language.”

—“When you learn a language, you need resources you can trust, not resources containing egregiously incorrect facts and language.”

Wetås says that it’s a problem that people need to already possess some amount of knowledge of the language beforehand in order to notice the lack of quality that websites like these have. She also believes that the front page does not disclose the fact that it is AI generated in clear enough terms on its front page.

—“It could be extremely harmful for students and other people trying to learn Norwegian, if language learners end up using resources like these, instead of using resources which have been checked to ensure quality,” Wetås says.

She brings up the much more reliable ordbøkene.no, which is maintained by the Language Council itself.

“Messy, messierer and messiestest”

Ordbøkene.no[3] has Norwegian-language dictionaries approved for use in education.

But when principal Bjørn Wilhelmsen at the Eikefjord School for Children and Youth[4] searched the word “messy” on Google one day, he ended up on ordlista.no — and certainly found a “mess” there.

—“I found something mildly entertaining on that website, let’s put it that way,” Wilhelmsen says.

—“It seemed fine at first, but then they started doing degrees of comparison, and then they said that ‘chiaoens’[5] was a synonym for ‘mess’.”

Wilhelmsen’s curiosity was piqued, and he started looking through the website’s examples of unique dialectal words, which included words like “cuddle”, “poop”, “nonsense”, “rowan”, and “party-down” — none of which are actually dialectal words, yet the website still insists that “these words can be difficult to understand for those who aren’t familiar with them, but they can also provide a fascinating insight into linguistic diversity”.

Blocked the site

After Wilhelmsen was done laughing at ordlista.no, he quickly notified the Language Council. Wetås states that several other teachers and principals have contacted her about this matter as well. Wilhelmsen has himself chosen to block the website on his school’s network.

This coming fall, the teachers at Eikefjord will use ordlista.no as a central example when they teach their students about critical thinking and skepticism to online sources.

—“Use at your own risk”

Aftenposten has presented Wilhelmsen and the Language Council’s criticisms to ordlista.no. Nomedia Norge Limited — the company behind ordlista.no — only provided the following remark, and refuses further comment:

—“All information about ordlista.no, including information about its use of AI-generated content, can already be found on the website through the top menu and footer.”

On the “terms of service” page it states that the information provided by ordlista.no should not be perceived as professional advice, and all usage of the site is at one’s own risk. Users are recommended to contact qualified professionals if necessary.

—“A particular responsibility”

The Language Council has the approval of dictionaries and word lists for use in public education as one of its responsibilities. Wetås states that ordlista.no has not sought approval.

—“They’re very far from being approved. If they really want to help people learn Norwegian, they need to start by fixing their content.”

Aside from refusing to approve the word list section on ordlista.no, the Language Council cannot do much with the “educational content” on the AI-generated website.

—“But we can shine a spotlight on it,” Wetås says, while emphasizing that the Language Council does not want to put an end to AI initiatives.

In an era where websites like ordlista.no are becoming progressively easier to spin up, Wetås says that there is only one real solution to the problem these websites present:

—“It is more important than ever that schoolchildren get a good education in reading and writing, as well as in textual analysis and criticism.”

She believes this is the only way children and young people can learn to distinguish between good and bad quality.


  1. It’s supposed to say “Bokmål” and “Nynorsk” ↩︎

  2. Every single municipality in Sogn og Fjordane uses Nynorsk, not Bokmål. Even the county’s name is Nynorsk. So this is basically like saying that Mississippi has the closest living dialect to “pure” Received Pronunciation. ↩︎

  3. The original article just says “dette” meaning “this”. This seemed like odd phrasing to me, but in context it’s clear it’s referring to ordbøkene.no. ↩︎

  4. I normally translate barneskole as “primary school” and ungdomsskole as “lower secondary school” but I opted for a different and more elegant-sounding translation here. ↩︎

  5. This is literally cat-walking-on-your-keyboard type gibberish. ↩︎

  • Erika3sis [she/her, xe/xem]@hexbear.netOP
    link
    fedilink
    English
    arrow-up
    6
    ·
    2 days ago

    Language learning resources containing woefully inaccurate information are, of course, nothing new, but LLMs represent a particular issue in how easily they can drown out or blend in with good materials, and how disproportionate LLM output quality can be between languages. I tried using a chatbot as a Japanese conversation partner once, and almost immediately gave up because it was spewing complete nonsense — and that’s Japanese, the eighth-most spoken language in the world by number of speakers, and one of the most in-demand languages for learning resources.

    Norwegian is tiny in comparison, and at that many Norwegians primarily navigate the Internet in English, which means that LLMs indiscriminately stealing content from the open Internet simply have less to work with in Norwegian. Things are worst of all, of course, for endangered Indigenous languages. If we put on our POSIWID caps, I might argue that AI slop serves to hinder people from learning both Norwegian and the endangered Indigenous languages: in the latter case, it’s a part of the cultural genocide, in the former case, it’s because many rights and privileges in Norway (not least naturalization) are tied to Norwegian proficiency — therefore the longer it takes immigrants to learn Norwegian, the harder it will be for them to acquire those rights and privileges. This is part of my critique of the role of the English language in Norway: if every Norwegian also knows English, and uses English when talking to immigrants and Norwegian when talking amongst themselves, then the immigrants are denied more opportunities to actively practice the language. I recently met a British immigrant who’s lived in this country since the '90s yet could hardly speak a word of Norwegian.

    We can also bring up Deaf issues here, as I recently shared an article about how some people are trying to use “AI” to make a Norwegian Sign Language translator… My friends, if “AI” gives us chiaoens in Norwegian — a language with like 2,000x more native speakers and probably 200,000x more training data and institutional support than Norwegian Sign Language — what the FUCK do you think it’s gonna do with Norwegian Sign Language‽ Deaf people criticize proposals to replace terps with AI for good reason. It’s abandonment.

    We can at least take solace in the fact that this shit is gonna collapse under its own weight sooner rather than later.