Stuart has worked on further developing the orthogonality thesis, which gave rise to a paper, a non-final version of which you can see here: http://lesswrong.com/lw/cej/general_purpose_intelligence_arguing_the/
This post won't make sense if you haven't been through that.
Today we spent some time going over it and he accepted my suggestion of a minor amendment. Which best fits here.
Besides all the other awkward things that a moral convergentist would have to argue for, namely:
This argument generalises to other ways of producing the AI. Thus to deny the Orthogonality thesis is to assert that there is a goal system G, such that, among other things:
- There cannot exist any efficient real-world algorithm with goal G.
- If a being with arbitrarily high resources, intelligence, time and goal G, were to try design an efficient real-world algorithm with the same goal, it must fail.
- If a human society were highly motivated to design an efficient real-world algorithm with goal G, and were given a million years to do so along with huge amounts of resources, training and knowledge about AI, it must fail.
- If a high-resource human society were highly motivated to achieve the goals of G, then it could not do so (here the human society is seen as the algorithm).
- Same as above, for any hypothetical alien societies.
- There cannot exist any pattern of reinforcement learning that would train a highly efficient real-world intelligence to follow the goal G.
- There cannot exist any evolutionary or environmental pressures that would evolving highly efficient real world intelligences to follow goal G.
We can add:
8. If there were a threshold of intelligence above which any agent will converge towards the morality/goals asserted by the anti-orthogonalist, there cannot exist any system, composed of a multitude of below-threshold intelligences that will as a whole pursue a different goal (G) than the convergent one (C), without any individual agent reaching the threshold.
Notice in this case each individual might still desire the goal (G). We can specify it even more by ruling out this case altogether.
9. There cannot be any Superorganism-like groups of agents, each with sub-threshold intelligence, whose goals differ from G, whom if acting towards their own goals could achieve G.
This would be valuable in case in which the threshold for convergence is i units of intelligence, or i-s units of intelligence plus knowing that goal C exists in goal space (C would be the goal towards which they allegedly would converge), and to fully grasp G requires understanding C.
A separately interesting issue that has come up is that there seems to be two distinct conceptions of why convergent goals would converge, and some other people might be as unaware of that as it seemed we were.
Case 1: Goals would converge because there is the right/correct/inescapable/imperative set of goals, and anything smart enough will notice that those are the right ones, and start acting towards them.
(this could but be moral realism, but needn't, in particular because moral realism doesn't mean much
in most cases)
Case 2: There's a fact that any agent, upon achieving some particular amount of intelligence will start to converge in their moral judgements and assessments, and regardless of those being true/right/correct etc,,, the agents will converge into them. So whichever those happen to be, a)Moral convergence is the case and b)We should call those the Moral Convergent Values or some other fancy name.
The distinction between them is akin to that of of and that. So group one believes, of the convergent moral values, that agents will converge to them. The other group believes that convergent values, whichever they are, should be given distinct conceptual importance and a name.
Stuart and I were inclined to think that Case 2 is more defensible/believable, though both fail at surviving the argument for the orthogonality thesis.