when people here call it a text extrusion machine thats literally what it is. In fact it doesnt even look at text, it looks at tokens. And there are a limited number of tokens (llama uses a vocabulary size of about 32k i think). It takes all of the previously entered input and output, turns it into tokens, and then each token “attends” (is multiplied by with some coefficient) to each other token. Then it all goes through more gigantic layers of matrix multiplication and at the end you have the statistically most likely next token. Then it does the whole thing again recursively until it reaches what it decides is the end of the output. It may also not decide and would need to be cut off.
So its not really looking at the game. It is in a way but it doesnt really know the rules, its just producing the next most likely token which is not necrssarily the next best move or even next correct move.
when people here call it a text extrusion machine thats literally what it is. In fact it doesnt even look at text, it looks at tokens. And there are a limited number of tokens (llama uses a vocabulary size of about 32k i think). It takes all of the previously entered input and output, turns it into tokens, and then each token “attends” (is multiplied by with some coefficient) to each other token. Then it all goes through more gigantic layers of matrix multiplication and at the end you have the statistically most likely next token. Then it does the whole thing again recursively until it reaches what it decides is the end of the output. It may also not decide and would need to be cut off.
So its not really looking at the game. It is in a way but it doesnt really know the rules, its just producing the next most likely token which is not necrssarily the next best move or even next correct move.