Alex Turner, Oregon State University PhD student working on AI alignment. Reach me at turneale[at]oregonstate[dot]edu.
The all-or-nothing vaccine hypothesis is:
But maybe the vaccine is 100% effective against all outcomes! So long as it’s correctly transported and administered, that is. Except sometimes vaccines are left at high temperature for too long, the delicate proteins are damaged, and people receiving them are effectively not vaccinated. If this happens 5% of the time, then 95% of people are completely immune to Covid and 5% are identical to not be vaccinated. Whatever chance they had of getting severe Covid before, it’s the same now.
If all-or-nothing were true, you would expect the following equality in conditional probability distributions
This is not what we see:
This was already shown in the mass Pfizer study, but several other sources indicate the ratio of asymptomatic-to-symptomatic cases is increased for vaccinated people. In other words, vaccination works better against symptomatic Covid (more severe) than asymptomatic Covid (less severe).
Therefore, all-or-nothing cannot be true.
Am I missing something?
Do you think such humans would have a high probability of working on TAI alignment, compared to working on actually making TAI?
I think you are indeed making a mistake by letting unsourced FB claims worry you, given the known proliferation of antivax-driven misinformation. There is an extremely low probability that you're first hearing about a real issue via some random, unsourced FB comment.
For more evidence, look to the overreactions to J&J / AZ adverse effects. Regulatory bodies are clearly willing to make a public fuss over even small probabilities of things going wrong.
Evolution requires some amount of mutation, which is occasionally beneficial to the species. Species that were too good at preventing mutations would be unable to adapt to changing environmental conditions, and thus die out.
We're aware of many species which evolved to extinction. I guess I'm looking for why there's no plausible "path" in genome-space between this arrangement and an arrangement which makes fatal errors happen less frequently. EG why wouldn't it be locally beneficial to the individual genes to code for more robustness against spontaneous abortions, or an argument that this just isn't possible for evolution to find (like wheels instead of legs, or machine guns instead of claws).
I feel confused wrt the genetic mutation hypothesis for the spontaneous abortion phenomenon. Wouldn't genes which stop the baby from being born, quickly exit the gene pool? Similarly for gamete formation processes which allow such mutations to arise?
I agree. I've put it in my SuperMemo and very much look forward to going through it. Thanks Peter & Owen!
(midco developed this separately from our project last term, so this is actually my first read)
I have a lot of small questions.
What is your formal definition of the IEU ui? What kinds of goals is it conditioning on (because IEU is what you compute after you view your type in a Bayesian game)?
Multi-agent "impact" seems like it should deal with the Shapley value. Do you have opinions on how this should fit in?
You note that your formalism has some EDT-like properties with respect to impact:
Well, in a sense, they do. The universes where player i shouts "heads" are exactly the universes in which everyone wins. The problem is that of agency: player i doesn't choose their action, the coin (ω) does. If we condition on the value of ω, then each player's action becomes deterministic, thus IEU is constant across each player's (trivial) action space.
This seems weird and not entailed by the definition of IEU, so I'm pretty surprised that IEU would tell you to shout 'heads.'
Given arbitrary R.V.s A, B, we define the estimate of A given B=b as e(A,B):=EB=b[A]
Given arbitrary R.V.s A, B, we define the estimate of A given B=b as
Is this supposed to be e(A,B=b)? If so, this is more traditionally called the conditional expectation of A given B=b.
I'm really excited about this project. I think that in general, there are many interesting convergence-related phenomena of cognition and rational action which seem wholly underexplored (see also instrumental convergence, convergent evolution, universality of features in the Circuits agenda (see also adversarial transferability), etc...).
My one note of unease is that an abstraction thermometer seems highly dual-use; if successful, this project could accelerate AI timelines. But that doesn't mean it isn't worth doing.
I still don't fully agree with OP but I do agree that I should weight this heuristic more.