In the way that AIXI is an abstracted mathematical formalism for (very roughly) "a program that maximizes the expected total rewards received from the environment", what is the equivalent formalism for an abstracted next token predictor?

Does this exist in the literature? What's it called? Where can I read about it?

The predictor looks like this:

Training: 
[some long series of 0's and 1's] --> [training some ML model on this data to minimize loss for next-token prediction] --> [some set of final weights in the ML model.]

Inference:
[Some series of 0's and 1's] --> [our trained ML Model] --> [probability distribution over 0,1 for next token.]

The training data should not be random, and should be 'correlated with the reality you want to predict.' (The binary output of a real-world sensor at discrete time steps is a good example of the kind of data that's suitable.)

Any pointers?

New to LessWrong?

New Answer
New Comment

1 Answers sorted by

I think you're looking for Solomonoff Induction, which is the first half of AIXI.

The classic textbook on it if you want to read more is Li and Vitanyi's Introduction to Kolmogorov Complexity.