Guess we can always rely on the good old fashioned ways to make money…
Honestly, I think its pretty awful but im not surprised.
Guess we can always rely on the good old fashioned ways to make money…
Honestly, I think its pretty awful but im not surprised.
Not anymore.
The new trend in ML is training on synthetic data, alongside more refined sets of curated data.
And, honeslty the open base models we have now are ‘good enough’ with some finetuning, and maybe QAT.
Ah sweet model collapse.
That’s certainly something I’ve observed myself training GANs on their own output. It’s definitely a problem for the stupid (like Tech Bros).
But it doesn’t happen like you think, as long as the augmentations are clever, and their scope is narrow. Hence the success of several recent distillations and ‘augmented’ LLMs, and the failure of huge dataset trains like Llama4.
…And synthetic data generation/augmentation is getting clever, and is already being used in newer trains. See this, or newer papers if your search for them on arixv: https://github.com/qychen2001/Awesome-Synthetic-Data
Or Nvidia’s HUGE focus on this, combining it with their work in computer graphics: https://www.nvidia.com/en-us/use-cases/synthetic-data-physical-ai/