NoSuchPlace

If I understand this correctly Deepmind is using each token in at most one update (They say they are training for less than one epoch), which means that it is hard to say anything about data efficiency of DL from this paper since the models are not trained to convergence on the data that they have seen.

They are probably doing this since they already have the data, and a new data point is more informative than an old one, even if your model is very slow to extract the available information with each update.

Searching his twitter, he barely seems to have mentioned GPT at all in 2020. Maybe he deleted some of his tweets?

I remember vividly reading one of his tweets last year, enthusiastically talking about how he'd started chatting with GPT-3 and it was impressing him with its intelligence.

Are you thinking of this tweet? I believe that was meant to be a joke. His actual position at the time appeared to be that GPT-3 is impressive but overhyped.

Thank you I fixed it. I think the same argument shows that that question is also undefined. I think the real takeaway is that physics doesn't deal well with some infinities.

As you point out later in the thread the light can never touch any given sphere, since no matter which one you pick there will always be another sphere in front of it to block the light. At the same time the light beam must eventually hit something because the centre sphere is in its way. So your light beam must both eventually hit a sphere and never hit a sphere so your system is contradictory and thus ill defined.

You could make the question answerable by instead asking for the limit of the light beam as number of steps of packing done goes to infinity in which case the light reflects back at 180°, since it does that in every step of the packing. Alternately you could ask what happens to the light beam if it is reflected of a shape which is the limit of the packing you described, in which case it will split in three since the shape produced is a cube (since it will have no empty spaces). (Edit:no it doesn't the answer to this question is again undefined via the argument in the first paragraph, since the matter it bounced of of had to belong to some sphere)

Since I don't spend all my time inside avoiding every risk hoping for someone to find the cure to aging, I probably value a infinite life a large but finite amount times more than a year of life. This means that I must discount in such a way that after a finite number of button press Omega would need to grant me an infinite life span.

So I preform some Fermi calculations to obtain an upper bound on the number of button presses I need to obtain Immortality, press the button that often, then leave.

They are different concepts, either you use statistical significance or you do Bayesian updating (ie. using priors):

If you are using a 5% threshold roughly speaking this means that you will accept a hypothesis if the chance of getting equally strong data if your hypothesis is false is 5% or less.

If you are doing Bayesian updating you start with a probability for how likely a statement is (this is your prior) and update based on how likely your data would be if your statement was true or false.

here is an xkcd which highlights the difference: https://xkcd.com/1132/

In particular, I intuitively believe that "my beliefs about the integers are consistent, because the integers exist". That's an uncomfortable situation to be in, because we know that a consistent theory can't assert its own consistency.

That is true, however you don't appear to be asserting the consistency of your beliefs, you are asserting the consistency of a particular subset of your beliefs which does not contain the assertion of its consistency. This is not in conflict with Gödel's incompleteness theorem which implies that no theory may consistently assert its *own* consistency. Gödel's incompleteness theorem does not forbid proofs of consistency by more powerful theories: for example there are proofs of the consistency of Peano arithmetic

I think it would be more accurate to say that the test was meant to check whether the TV shows were effective than whether the children had a maximal inherent tendency towards virtuousness.