by Kaarel1 min read6th Jul 20222 comments
New Comment
2 comments, sorted by Click to highlight new comments since: Today at 3:26 PM

I'm updating my estimate of the return on investment into culture wars from being an epsilon fraction compared to canonical EA cause areas to epsilon+delta. This has to do with cases where AI locks in current values extrapolated "correctly" except with too much weight put on the practical (as opposed to the abstract) layer of current preferences. What follows is a somewhat more detailed status report on this change.

For me (and I'd guess for a large fraction of autistic altruistics multipliers), the general feels regarding [being a culture war combatant in one's professional capacity] seem to be that while the questions fought over have some importance, the welfare-produced-per-hour-worked from doing direct work is at least an order of magnitude smaller than the same quantities for any canonical cause area (also true for welfare/USD). I'm fairly certain one can reach this conclusion from direct object-level estimates, as I imagine e.g. OpenPhil has done, although I admit I haven't carried out such calculations with much care myself. Considering the incentives of various people involved should also support this being a lower welfare-per-hour-worked cause area (whether an argument along these lines gives substantive support to the conclusion that there is an order-of-magnitude difference appears less clear).

So anyway, until today part of my vague cloud of justification for these feels was that "and anyway, it's fine if this culture war stuff is fixed in 30 years, after we have dealt with surviving AGI". The small realization I had today was that maybe a significant fraction of the surviving worlds are those where something like corrigibility wasn't attainable but AI value extrapolation sort of worked out fine, i.e. with the values that got locked in being sort of fine, but the relative weights of object-level intuitions/preferences was kinda high compared to the weight on simplicity/[meta-level intuitions], like in particular maybe the AI training did some Bayesian-ethics-evidential-double-counting of object-level intuitions about 10^10 similar cases (I realize it's quite possible that this last clause won't make sense to many readers, but unfortunately I won't provide an explanation here; I intend to write about a few ideas on this picture of Bayesian ethics at some later time, but I want to read Beckstead's thesis first, which I haven't done yet; anyway the best I can offer is that I estimate a 75% of you understanding the rough idea I have in mind (which does not necessarily imply that the idea can actually be unfolded into a detailed picture that makes sense), conditional on understanding my writing in general and conditional on not having understood this clause yet, after reading Beckstead's thesis; also: woke: Bayesian ethics, bespoke: INFRABAYESIAN ETHICS, am I right folks). 

So anyway, finally getting to the point of all this at the end of the tunnel, in such worlds we actually can't fix this stuff later on, because all the current opinions on culture war issues got locked in.

(One could argue that we can anyway be quite sure that this consideration matters little, because most expected value is not in such kinda-okay worlds, because even if these were 99% percent of the surviving worlds, assuming fun theory makes sense or simulated value-bearing minds are possible, there will be amazingly more value in each world where AGI worked out really well, as compared to a world tiled with Earth society 2030. But then again, this counterargument could be iffy to some, in sort of the same way in which fanaticism (in Bostrom's sense) or the St. Petersburg paradox feel iffy to some, or perhaps in another way. I won't be taking a further position on this at the moment.)

I proposed a method for detecting cheating in chess; cross-posting it here in the hopes of maybe getting better feedback than on reddit: