Super interesting series! Your second post actually gave me some insights on a problem that was a big part of my phd, but it will take me some time to think through. So here is a simpler unrelated comment.
Thompson sampling seems to be too perfectionistic to me: it wants to take an action that is optimal in some world, rather than one that is merely great in all worlds. For example, suppose you have a suitcase with a 6 digit lock. In each turn, you can make a guess at the correct combination. If you guess correctly, you get 10 utils, if you guess wrong you get 1 util. If you don’t guess but instead pass, you get 9 utils.
(The model is supposed to be greedy and ignore future advantages so information revealed from a wrong guess shouldn’t matter. If this bothers you, you can imaging that when you “pass”, you still get to make a guess at the correct combination, and you learn if it is correct, but you get 9 utils no matter if your guess is correct or not.)
Each possible combination is considered a hypothesis so each hypothesis is deterministic. Thompson sampling would never pass, because that is not the optimal thing to do for any particular hypothesis (although it is the optimal thing to do when you don’t know which hypothesis is true). Instead it would keep guessing combinations, because each combination is optimal is some hypothesis.
This is not an issue of AM vs. GM: the same thing happens in the plurality decision procedure when we only use AM. It is an issue about what we take argmax over. If we consider the combination to be due to randomness within a single hypothesis, the decision procedures will correctly choose to pass until the correct combination has been revealed.
Possibly related question:
Is there any reason the AM-GM boundary is at the same level as where we take argmax in the definition of m? Or could we have an arithmetic expectation outside of argmax (hence outside of m) but inside the geometric expectation? Or even have a geometric expectation inside of argmax in m but possibly over the arithmetic expectation?
My simple answer is: Go for it! I have done a PhD in mathematics myself (also in the UK), and although I have since changed career in a direction where it has not been that useful, I have never regretted it! And I’m sure I would have regretted it if I hadn’t done it.
Thinking a bit more about the question, I wonder if “Should I do it?” was really the question you wanted to ask? It seems not quite well-defined, when you are not stating any alternatives. Were you hoping to answers to “What other options do I have?”, “How do I get started on this path?”, “Will the change from CS to maths be too big?” or something else?
I’m not the right person to answer what other options you have. About the two other questions, I think you should focus on who you want as a supervisor, rather than whether it will be in mathematics or in computer science (I actually thought my PhD was in both maths and CS while I was doing my PhD, and only learned that it was only in maths when my diploma arrived! One of the examiners who awarded my PhD later told me he thought he had awarded me a PhD in CS!). If you find the right supervisor, it is not important if they are in CS or maths or something else.
Yes, there typically isn’t an advantage in two particular different words sounding the same. Instead, that is a result of having many words and wanting to keep them short. My point is that as long at they are used in completely different situations (either grammatically or in different topics) is not much of a problem.
I think it is a feature rather than a bug that words with similar meanings sound different. Imagine if to, too and two meant similar things and could be used in the same situation. Then it would be difficult to hear what exactly a speaker is saying and it would take effort to learn the distinctions. Eventually, they would likely merge into a single word.
I can clearly recognize myself in this. Im starting to wonder if this can be avoided. If I start on something similar to your project, will I look back on this in two years and think”that was just another one of those thoughts that seemed they would change everything, but didn’t”?
I know this isn’t really the point in the post, but I dont think the “roling five d10 every day” or even the “1d00 a year” are good models for the probability of dying. They make sense statistically, when only considering your age and sex, but you yourself knows for example if you have been diagnosed with cancer or not. You might say that you get a cancer diagnosis if the first four d10 are all 1s and the fifth is a 2 or 3. Then once you have the cancer diagnosis, your probability is higher, especially when looking months or years ahead.
The usual mortality statistics dont distinguish between unexpected death and expected deaths. Do anyone know how of a more accurate model of how it is revealed when you die? Im not looking for exact probabilities and not necessarily of the resolution of days. Just something more accurate than the simple model that ignores current health.
Another possibility is that the command is executed (meaning we are in the kaboom scenario), and the US does not escalate immediately but says that there will be a respond at a time and place of their choosing. This could give time for someone to overthrow Putin to prevent escalation.
Sure, Russia used to be technological and cultural superpower. I just can’t think of any similar examples from Putin’s time.
I just realized that I have never taken the expression “superpower” literally, to only be about military strength. I have always just assumed the it also involve cultural and technological influence, and in general “how much do you contribute to the world”. This is probably because I started from the assumption that the US was the only superpower, and then I extrapolated from that.
If you take superpower to just mean the amount of military pressure you can put on other countries, it does make a bit more sense.