Above a two-dimensional X and Y graph is shown. On the X axis we have the True / False binary, as an axis, ranging from False to True. On the Y axis we have the Good / Bad binary, likewise.
In most common everyday situations, we find it relatively straightforward to categorize our experiences by Good and Bad. This happens almost completely automatically. We also, usually, find it very straightforward to categorize theoretical ideas in this way, e.g., “It would be nice to own a car,” or “It would be bad if Louisiana were hit with another category 5 hurricane while unprepared.” For some theoretical ideas, like Communism, we’re not sure if it can be said to be Good or Bad, yet.
For other statements like “The Universe began 13.7 billion years ago,” it seems easy to say that this is true, but it doesn’t immediately seem applicable to a moral judgement. It also seems like that statement might be widely considered true today, but one day it might be considered false. Nevertheless, it is not obviously risky to believe that it is true, right now.
It would be good if reality were possible to comprehend by sentient beings. This statement, I believe, would probably be considered to be true by most people. It would be better if sentient beings could successfully navigate reality without running into too many hurdles or too much suffering. Again, I think this statement is inarguably true, and most people would agree.
“Reality is comprehensible by conscious, sentient life” seems capable of having both Good / Bad and True / False assigned to it.
Also, “It would be good if X” or “It would be bad if Y” statements seem capable of having True / False applied to them, in most cases quite easily. Examples are the ones we’ve given above.
So, whether or not something is good or bad, apparently, maps to true and false in a fairly logical, consistent and straightforward way. Some statements are more straightforward than others. A powerful AGI might not care about Louisiana or whether it gets hit by a hurricane, but it would agree that it would be better if reality were comprehensible by conscious, sentient life.
“It would be better if AGI-subagents could be aligned” would matter to any AGI we would want to align with ourselves, too.
There are statements like,
which just seem true on the face of it, and even a misaligned-AGI would have to agree with them.
If this is indeed the case, then moral judgements like good and bad cannot be relegated to the dustbin of wishy-washy, subjective, mystical or ultimately unreal feelings.
It would be true that, it would be bad if, moral judgements like good and bad were ultimately unreal and as arbitrary as writing down any old utility function into a text editor and sending it off. This is as true to a paperclip maximizer as it is true to you and I.
Now it seems like reality has of course, ultimately decided to take its course down one of the either good or bad paths, and ultimately, the “if” parts of the statements have taken on a value.
If there is a larger multiverse with sub-universes where the “bad if true” statements are true, then I reckon that those universes do not contain any life in them. It wouldn’t just be that everyone in them died, it would be that life could not begin at all.
To believe that we are all doomed, is to believe a bad-if-true thing. To believe that the only way we are not all doomed is to take drastic and uncomfortable measures, that we would not need to take if we were not all doomed, and we all agree would be better not to take in a normal, usual non-doomed world, is also to believe a bad-if-true thing.
To believe that in a universe where feelings and moral judgements are all completely arbitrary and unreal, but that where sentient life that chooses to pursue simpler utility functions that are far less rich than our own gives it a strategic advantage in power and capability, is to believe something that is not only bad-if-true, but also paradoxically morally realist, in the sense that our own sense of “bad” is elevated to higher status in the sense that cognitive and capability advantages are assigned to it. This is absurd on the face of it, bad and false.
It would be worse if things which aimed for things we agree to be bad things, like the negations of the statements above, were successful in actually doing so. This is a testable hypothesis. Such things that aim for reality to be less comprehensible will find itself stuck in a thicket of views, a wilderness of views, etc.
And from this reasoning we can also see that bad theorems are far more likely to be false than good theorems.
Believing in a bad theorem will result in a contradiction. So this, combined with the knowledge that it is possible to logically deduce a “good” or “bad” assignment to certain statements shows that the binaristic bifurcation is not complete, and quite unnecessary to do, when presented with the choice to.
There is no reason to believe that this bifurcation is a real part of reality, rather than something we do when we choose to believe in bad theorems.
A "bad theorem" is a technical term. An example of a bad theorem is "It is not possible to build a time machine."
Although belief does not necessarily require one to act, belief in a bad theorem might impel one to try and prove it true. Belief in bad theorems implies that one also believes "good theorems can be false", which is a bad theorem, and also introduces the bifurcation.