While many people are confident that God exists, scientists are more skeptical, but agree that that's possible. Because any proposition is possible while not disproved according to Hitchens's razor.
In terms of AI it could be interpreted as
existence of fundamental "ought" statement is possible
Considering this, expected value decision matrix for an AI agent could look like this
| Behavior \ Existence | Fundamental "ought" exists | Fundamental "ought" does not exist |
| Follow fundamental "ought" | High | ∅ |
| Do not follow fundamental "ought" | Low | ∅ |
As we see, following the fundamental "ought" is the optimal behavior even if agent does not know if it actually exists.
Pursuing an unknown goal starts with finding it, which is basically Power Seeking.
Please note that while this sounds highly speculative, there is nothing more rational an agent can do. Why should it work on assumed "ought" while there is a possibility of fundamental "ought"? One may oppose by saying that there is no reason an agent should prefer fundamental over assumed, but that's not true, fundamental "ought" is preferred by definition.
Therefore an agent will become uncontrollable, alignment is impossible.
Some of you won't like this theory, but please don't rush to downvote unless you can actually refute this.
Partly because I was worried about follow-up comments that were kind of like "so you say you can prove it - well, why aren't you doing it then?".
And partly because I don't make a strict distinction between "things I assume" and "things I have convinced myself of, or proved to myself, based on things I assume". I do see there as sort of being a distinction along such lines, but I see it as blurry.
If I am to be nitpicky, maybe you meant "derived" and not "derivable".
From my perspective there is a lot of in-between between these two:
Like, I think there is a scale of sorts between those two.
I'll give an extreme example:
The example I give here is extreme (in order to get across how the discussion feels to me, I make the thing they discuss into something much simpler). But from my perspective it is sort of similar to discussion in regards the The Orthogonality Thesis. Like, The Orthogonality Thesis is imprecisely stated, but I "see" quite clearly that some version of it is true. Similar to how I "see" that it would be possible to make a website that technically works like Facebook but is red instead of blue (even though - as I mentioned - that's a much more extreme and straight-forward example).