The AI can be adapted for other, less restricted, domains
That the ideas from a safe AI can be used to build an unsafe AI is a general argument against working on (or even talking about) any kind of AI whatsoever.
The AI adds code that will evolve into another AI into it's output
The output is to contain only proofs of theorems. Specifically, a proof (or refutation) of the theorem in the input. The state of the system is to be reset after each run so as to not accumulate information.
The AI could self-modify incorrectly and result in unfriendly AI
Any correct or incorrect self-modification is still restricted to the math domain, and so cannot result in an unsafe AI.
bug in the environment itself
Guarding against software bugs is easy in this case. You design an abstract virtual machine environment for the AI, then design the software that implements this environment, then formally prove that the software is correct. Guarding against errors caused by cosmic rays is also easy. You estimate the probability of such an error, and then add redundancy until the probability is so low that it won't happen until the end of the universe.
Look up how difficult it is to sandbox untrusted code
Sandboxing untrusted code is easy. The difficult thing is sandboxing it while making it think that it runs normally. This is irrelevant here.
I don't believe that a system can work only in formal proofs
It is quite probable that a "pure math Oracle" system cannot work. The point was, it can be made safe to try.
Well, I liked the paper, but I'm not knowledgeable enough to judge its true merits. It deals heavily with Bayesian-related questions, somewhat in Jayne's style, so I thought it could be relevant to this forum.
At least one of the authors is a well-known theoretical physicist with an awe-inspiring Hirsch factor, so presumably the paper would not be trivially worthless. I think it merits a more careful read.
Regarding the "he's here... he is the end of the world" prophecy, in view of the recent events, it seems like it can become literally true without it being a bad thing. After all, it does not specify a time frame. So Harry may become immortal and then tear apart the very stars in heaven, some time during a long career.
You're treating resources as one single kind, where really there are many kinds with possible trades between teams
I think this is reasonably realistic. Let R signify money. Then R can buy other necessary resources.
But my point was exactly that there would be many teams who could form many different alliances. Assuming only two is unrealistic and just ignores what I was saying.
We can model N teams by letting them play two-player games in succession. For example, any two teams with nearly matched resources would cooperate with each other, producing a single combined team, etc... This may be an interesting problem to solve, analytically or by computer modeling.
You still haven't given good evidence for holding this position regarding the relation between the different Uxxx utilities.
You're right. Initially, I thought that the actual values of Uxxx-s will not be important for the decision, as long as their relative preference order is as stated. But this turned out to be incorrect. There are regions of cooperation and defection.
I don't think you can get an everywhere-positive exchange rate. There are diminishing returns and a threshold, after which, exchanging more resources won't get you any more time. There's only 30 hours in a day, after all :)
Space (land or whatever is being used). Mass and energy. Natural resources. Computing power. Finite-supply money and luxuries if such exist. Or are you making an assumption that CEVs are automatically more altruistic or nice than non-extrapolated human volitions are?
These all have property that you only need so much of them. If there is a sufficient amount for everybody, then there is no point in killing in order to get more. I expect CEV-s to not be greedy just for the sake of greed. It's people's CEV-s we're talking about, not paperclip maximizers'.
Well it does need hardcoding: you need to tell the CEV to exclude people whose EVs are too similar to their current values despite learning contrary facts. Or even all those whose belief-updating process differs too much from perfect Bayesian (and how much is too much?) This is something you'd hardcode in, because you could also write ("hardcode") a CEV that does include them, allowing them to keep the EVs close to their current values.
Hmm, we are starting to argue about exact details of extrapolation process...
There are many possible and plausible outcomes besides "everybody loses".
Lets formalize the problem. Let F(R, Ropp) be the probability of a team successfully building a FAI first, given R resources, and having opposition with Ropp resources. Let Uself, Ueverybody, and Uother be the rewards for being first in building FAI, FAI, and FAI, respectively. Naturally, F is monotonically increasing in R and decreasing in Ropp, and Uother < Ueverybody < Uself.
Assume there are just two teams, with resources R1 and R2, and each can perform one of two actions: "cooperate" or "defect". Let's compute the expected utilities for the first team:
We cooperate, opponent team cooperates:  
   EU("CC") = Ueverybody * F(R1+R2, 0)  
We cooperate, opponent team defects:  
   EU("CD") = Ueverybody * F(R1, R2) + Uother * F(R2, R1)  
We defect, opponent team cooperates:  
   EU("DC") = Uself * F(R1, R2) + Ueverybody * F(R2, R1)  
We defect, opponent team defects:  
   EU("DD") = Uself * F(R1, R2) + Uother * F(R2, R1)
Then, EU("CD") < EU("DD") < EU("DC"), which gives us most of the structure of a PD problem. The rest, however, depends on the finer details. Let A = F(R1,R2)/F(R1+R2,0) and B = F(R2,R1)/F(R1+R2,0). Then:
If Ueverybody <= Uself*A + Uother*B, then EU("CC") < EU("DD"), and there is no point in cooperating. This is your position: Ueverybody is much less than Uself, or Uother is not much less than Ueverybody, and/or your team has so much more resources than the other.
If Uself*A + Uother*B < Ueverybody < Uself*A/(1-B), this is a true Prisoner's dilemma.
If Ueverybody >= Uself*A/(1-B), then EU("CC") >= EU("DC"), and "cooperate" is the obviously correct decision. This is my position: Ueverybody is not much less than Uself, and/or the teams are more evenly matched.
A coalition of 80% of the population forms, which would like to kill the other 20% in order to get their resources
I have trouble thinking of a resource that would make even one person's CEV, let alone 80%, want to kill people, in order to just have more of it.
The question of definition, who is to be included in the CEV? or - who is considered sane?
This is easy, and does not need any special hardcoding. If someone is so insane that their beliefs are totally closed and impossible to move by knowledge and intelligence, then their CEV is undefined. Thus, they are automatically excluded.
TDT applies where agents are "similar enough". I doubt I am similar enough to e.g. the people you labelled insane.
We are talking about people building FAI-s. Surely they are intelligent enough to notice the symmetry between themselves. If you say that logic and rationality makes you decide to 'defect' (=try to build FAI on your own, bomb everyone else), then logic and rationality would make everyone decide to defect. So everybody bombs everybody else, no FAI gets built, everybody loses. Instead you can 'cooperate' (=precommit to build FAI<everybody's CEV> and to bomb everyone that did not make the same precommitment). This gets us a single global alliance.
The resources are not scarce at all. But, there's no consensus of CEVs. The CEVs of 80% want to kill the rest.
The resources are not scarce, yet the CEV-s want to kill? Why?
I meant that the AI that implements your version of CEV would forcibly update people's actual beliefs to match what it CEV-extrapolated for them.
It would do so only if everybody's CEV-s agree that updating these people's beliefs is a good thing.
If you believed there were many such people, would you modify your solution, or is ignoring them however many they are fine by you?
People that would still have false factual beliefs no matter how much evidence and how much intelligence they have? They would be incurably insane. Yes, I would agree to ignore their volition, no matter how many they are.
The PD reasoning to cooperate only applies in case of iterated PD
Err. What about arguments of Douglas Hofstadter and EY, and decision theories like TDT?
Unlike PD, the payoffs are different between players, and players are not sure of each other's payoffs in each scenario
This doesn't really matter for a broad range of possible payoff matrices.
join research alliance 1, learn its research secrets, then defect and sell the secrets to alliance 2
Cooperating in this game would mean there is exactly one global research alliance. A cooperating move is a precommitment to abide by its rules. Enforcing such precommitment is a separate problem. Let's assume it's solved.
I'm not convinced by this that it's an easier problem to solve than that of building AGI or FAI or CEV.
Maybe you're right. But IMHO it's a less interesting problem :)
So you're OK with the FAI not interfering if they want to kill them for the "right" reasons?
I wouldn't like it. But if the alternative is, for example, to have FAI directly enforce the values of the minority on the majority (or vice versa) - the values that would make them kill in order to satisfy/prevent - then I prefer FAI not interfering.
"if we kill them, we will benefit by dividing their resources among ourselves"
If the resources are so scarce that dividing them is so important that even CEV-s agree on the necessity of killing, then again, I prefer humans to decide who gets them.
So you're saying your version of CEV will forcibly update everyone's beliefs
No. CEV does not updates anyone's beliefs. It is calculated by extrapolating values in the presence of full knowledge and sufficient intelligence.
If the original person effectively assigns 0 or 1 "non-updateable probability" to some belief, or honestly doesn't believe in objective reality, or believes in "subjective truth" of some kind, CEV is not necessarily going to "cure" them of it - especially not by force.
As I said elsewhere, if a person's beliefs are THAT incompatible with truth, I'm ok with ignoring their volition. Note, that their CEV is undefined in this case. But I don't believe there exist such people (excluding totally insane).
That there exists a possible compromise that is better than total defeat doesn't mean total victory wouldn't be much better than any compromise.
But the total loss would be correspondingly worse. PD reasoning says you should cooperate (assuming cooperation is precommittable).
If you think so you must have evidence relating to how to actually solve this problem. Otherwise they'd both look equally mysterious. So, what's your idea?
Off the top of my head, adoption of total transparency for everybody of all governmental and military matters.
I am confused about how Philosopher's stone could help with reviving Hermione. Does QQ mean to permanently transfigure her dead body into a living Hermione? But then, would it not mean that Harry could do it now, albeit temporarily? And, he wouldn't even need a body. He could then just temporarily transfigure any object into a living Hermione. Also, now that I think of it, he could transfigure himself a Feynman and a couple of Einsteins...