It’s not brute-force to a better algorithm per se. It’s the same algorithm, exactly as “stupid,” just with more force (more numerous and powerful GPUs) running it.
Three are benchmarks to check if the model is “good” – for instance, how well the model does on standardized tests similar to SATs (researchers are very careful to ensure that the questions do not appear on the internet anywhere, so that the model can’t just memorize the answers.)
It’s not brute-force to a better algorithm per se. It’s the same algorithm, exactly as “stupid,” just with more force (more numerous and powerful GPUs) running it.
Three are benchmarks to check if the model is “good” – for instance, how well the model does on standardized tests similar to SATs (researchers are very careful to ensure that the questions do not appear on the internet anywhere, so that the model can’t just memorize the answers.)