interstice

Wiki Contributions

Comments

The "village idiot to Einstein" argument is about the relative length of the "subhuman--village idiot" and "village idiot--Einstein" transitions. It predicts that the latter should be vastly shorter than the former. Instead we usually see computers progressing fairly smoothly through the range of human abilities. The fact that the overall transition "subhuman--superhuman" sometimes doesn't take very long is not directly relevant here.

The AI-box experiment and this result are barely related at all -- the only connection between them is that they both involve deception in some manner. "There will exist future AI systems which sometimes behave deceptively" can hardly be considered a meaningful advance prediction.

Eliezer likes to characterize the success of deep learning as "being further than him on the Eliezer-Robin axis", but I consider that quite misleading. I would say that it's rather mostly orthogonal to that axis, contradicting both Eliezer and Robin's prior models in different ways.

The facts, thus far, have seemed to support Eliezer’s predictions

In what sense? I don't think Eliezer has made any particularly successful advance predictions about the trajectory of AI.

I think the next logical step in this train of thought[1] is to discard the idea that there's a privileged "real world" at all. Rather, from the perspective of an agent, there is simply sensory input and the decisions you make. There is no fact of the matter about which of the infinitely many possible embeddings of your algorithm in various mathematical structures is "the real one". Instead you can make decisions on the basis of which parts of mathematical reality you care about the most and want to have influence over.


  1. which I don't necessarily fully endorse. But it is interesting! ↩︎

SBF had sociopathic personality traits and was clearly motivated by EA principles. If you look at people who commit heinous acts in the name of just about any ideology, they will likely have sociopathic personality traits, but some ideologies can make it easier to justify taking sociopathic actions(and acquire resources/followers to do so).

UDASSA works fine in level 2 and 4 multiverses. Indeed, its domain of applicability is all possible Turing machines, which could be seen as a model of the level 4 multiverse.

What if they cooperate acausally between themselves? Well there’s an infinite amount of humans vs an infinite amount of UFAIs, and this is an infinity vs infinity scenario

And how do you divide up that infinity between the infinite number of possible UFAIs and future-humanities that could exist? That this procedure gives undefined answers in infinite universes is a sign that it's probably not a good fit for reasoning about them. I think a better answer is something like UDASSA, which can assign different amounts of 'measure' to humanity and UFAIs, giving them potentially unequal bargaining power, even if there are an infinite number of instantiations of both.

even if you think there’s a 20% chance we make it, that’s not the same as thinking that 20% of Everett branches starting in this position make it

Although worlds starting in this position are a tiny minority anyway, right? Most of the Everett branches containing "humanity" have histories very different from our own. And if alignment is neither easy nor impossible -- if it requires insights fitting "in a textbook from the future", per Eliezer -- I think we can say with reasonable (logical) confidence that a non-trivial fraction of worlds will see a successful humanity, because all that is required for success in such a scenario is having a competent alignment-aware world government. Looking at the history of Earth governments, I think we can say that while such a scenario may be unlikely, it is not so unlikely as to render us overwhelmingly likely to fail.

I think a more likely reason for preponderance of "failure" is that alignment in full generality may be intractable. But such a scenario would have its upsides, as well as making a hard binary of "failure/success" less meaningful.

Load More