I operate by Crocker's rules.
I try to not make people regret telling me things. So in particular:
- I expect to be safe to ask if your post would give AI labs dangerous ideas.
- If you worry I'll produce such posts, I'll try to keep your worry from making them more likely even if I disagree. Not thinking there will be easier if you don't spell it out in the initial contact.
(FDT(P,x))(x)
Should this be FDT(P,x)? As is this looks to me like the second (x) introduces x into scope, and the first x is an out-of-scope usage.
Let me try again:
Does the note say that I was predicted to choose the right box regardless of what notes I am shown, and therefore the left box contains a bomb? Then the predictor is malfunctioning and I should pick the right box.
Does the note say that I was predicted to choose the right box when told that the left box contains a bomb, and therefore the left box contains a bomb? Then I should pick the left box, to shape what I am predicted to do when given that note.
You'll also need to update the content of the note and the predictor's decision process to take into account that the agent may see a note. In particular, the predictor needs to decide whether to show a note in the simulation, and may need to run multiple simulations.
Let's sharpen A6. Consider this stamp collector construction: It sends and receives internet data, it has a magically accurate model of reality, it calculates how many stamps would result from each sequence of outputs, and then it outputs the one that results in the most stamps.
By definition it knows everything about reality, including any facts about what is morally correct, and that stamps are not particularly morally important. It knows how to self-modify, and how many stamps any such self-modification will result in.
I'd like to hear how this construction fares as we feed it through your proof. I think it gums up the section "Rejecting nihilistic alternatives". I think that section assumes the conclusion: You expect it to choose its biases on the basis of what is moral, instead of on the basis of its current biases.
The analogous argument would be:
If I have no way to do something, then it's nonsensical to say that I should avoid doing that thing. For example, if you say that I should have avoided arriving to an appointment on time and I say that it would be impossible because you only told me about it an hour ago and it's 1000 miles away, then it would be nonsensical for you to say that I should have avoided arriving in time anyway. This is equivalent to saying that if I should avoid doing something, then I can do it.
I think that makes as much sense as "Whatever ought to be done can actually be done". Do you have some argument that makes sense of one but not the other?
By analogous reasoning, if determinism is true, then whatever ought not to be done also actually is done.
or an audio interface to a camera
I plugged your middle paragraph into the provided AI because that's its point. Here's the response:
Currently, no technical or governance scheme can reliably guarantee that all private entities are prevented from developing or running AGI outside official oversight. Even strong international agreements or hardware controls can be circumvented by determined actors, especially as required compute drops with research progress. Without ubiquitous surveillance or global control over compute, models, and researchers, a determined group could realistically “go rogue,” meaning any system that depends on absolute prevention is vulnerable to secret efforts that might reach AGI/ASI first, potentially unleashing unaligned or unsafe systems beyond collective control.
sounds kinda sycophantic, e.g. you only need global control over one of the three.
The left hand side of the equation has type action (Hintze page 4: "An agent’s decision procedure takes sense data and outputs an action."), but the right hand side has type policy, right?