

Odd url…Here’s the original: https://futurism.com/chatgpt-polluted-ruined-ai-development
Nice detail to use when searching the internet btw:
“But if you’re collecting data before 2022 you’re fairly confident that it has minimal, if any, contamination from generative AI,” he added. “Everything before the date is ‘safe, fine, clean,’ everything after that is ‘dirty.’”
Try running searches set pre-2022, at least for older info, to reduce the possibilities of AI generated noise.
Anyway, kinda funny to see these generators may be producing enough noise to make producing more noise somewhat harder. Hopefully this doesn’t also impact more productive AI development, such as what’s used in scientific research and the like, as that would genuinely suck.
Edit:
Revised from generators “have produced” to “may be producing” to better reflect the lack of concrete info regarding generative AI data pollution as someone else pointed out. As they note:
“Now, it’s not clear to what extent model collapse will be a problem, but if it is a problem, and we’ve contaminated this data environment, cleaning is going to be prohibitively expensive, probably impossible,” he told The Register.
Sort of odd to see this again (from Vox as well, I think?). It seems to add more detail, but the bottom line remains the same: it’s largely because fewer people are trying to immigrate into the U.S. since the Trump admin entered office.
This all sucks, and another part that sucks about it is that as usual, in the absence of as many of the Republicans’/conservatives’ favorite scapegoats, they begin turning inward and grabbing anyone and everyone that remotely resembles those scapegoats to abuse and deport to appeal to their base. Without more pushback, and as those deportation numbers continue to dwindle, you can expect that they’ll begin more widely rounding up their detractors (or at least attempting to).