The generalized version of this lesson "that cooperation/collusion favors the good guys - ie those aligned towards humanity" actually plays out in history. In WW2 the democratic powers - those with interconnected economies and governments more aligned to their people - formed the stronger allied coalition. The remaining autocratic powers - all less aligned to their people and also each other - formed a coalition of necessity. Today history simply repeats itself with the democratic world aligned against the main autocratic powers (russia, china, north korea
I don't actually think we're bottlenecked by data. Chinchilla represents a change in focus (for current architectures), but I think it's useful to remember what that paper actually told the rest of the field: "hey you can get way better results for way less compute if you do it this way."
I feel like characterizing Chinchilla most directly as a bottleneck would be missing its point. It was a major capability gain, and it tells everyone else how to get even more capability gain. There are some data-related challenges far enough down the implied path, but we
I'd very much like to understand how your credences can be so high with nothing else to back them up than "it's possible and we lack some data". Like, sure, but to have credences so high you need to have at least some data or reason to back that up.
Humans have not evolved to do math or physics, but we did evolve to resist manipulation and deception, these were commonplace in the ancestral environment.
This seems pretty counterintuitive to me, seeing how easily many humans fall for not-so-subtle deception and manipulation everyday.
I really don't understand the AGI in a box part of your arguments: as long as you want your AGI to actually do something (it can be anything, be it you asked for a proof of a mathematical problem or whatever else), its output will have to go through a human anyway, which is basically the moment when your AGI escapes. It does not matter what kind of box you put around your AGI because you always have to open it for the AGI to do what you want it to do.
The second case might not really make sense, because deception is a convergent instrumental goal especially if the AI is trying to cause X and you're trying to cause not X, and generally because an AI that smart probably has inner optimizers that don't care about this "make a plan, don't execute plans" thing you thought you'd set up.
I believe the second case is a subcase of the problem of ELK. Maybe the AI isn't trying to deceive you, and actually do what you asked it to do (e.g., I want to see "the diamond" on the main detector), yet the plans it produces...
Why would I press the dislike button when I get the possibility to signal virtue by showing people I condemn what "X" says about "Y"?
Talking about the fact that each consciouness will only see a classical state doesn't make sense, because they are in a quantum superposition state. Just like it does not make sense to say that the photon went either right or left in the double slit experiment.
You created a superposition of a million consciousnesses and then outputted an aggregate value about all those consciousnesses.
This I agree with.
Either a million entities experienced a conscious experience, or you can find out the output of a conscious being with ever actually creating a conscious being - i.e. p-zombies exist (or at least aggregated p-zombies).
This I do not. You do not get access to a million entities by the argument I laid out previously. You did not simulate all of them. And you did not create something that behaves like a million entiti...
I'm not sure I understand your answer. I'm saying that you did not simulate at least million consciousnesses, just as in Shor's algorithm you do not try all the divisors.
So even though we've run the simulation function only a thousand times, we must have simulated at least million consciousnesses, or how else could we know that exactly 254,368 of them e.g. output a message which doesn't contain the letter e?
Isn't that exactly what Scott Aaronson explains a quantum computer/algorithm doesn't do? (See https://www.scottaaronson.com/papers/philos.pdf, page 34 for the full explanation)
For AIs, we're currently interested in the values that arise in a single AIs (specifically, the first AI capable of a hard takeoff), so single humans are the more appropriate reference class.
I'm sorry but I don't understand why looking at single AIs make single humans the more appropriate reference class.
That seems like extremely limited, human thinking. If we're assuming a super powerful AGI, capable of wiping out humanity with high likelihood, it is also almost certainly capable of accomplishing its goals despite our theoretical attempts to stop it without needing to kill humans.
If humans are capable of building one AGI, they certainly would be capable to build a second one which could have goals unaligned with the first one.
Even admitting that alignement is not possible, it's not clear why humanity and super-AGI goals should be in contrast, and not just different. Even admitting that they are highly likely to be in contrasts, is not clear why strategies to counter this cannot be of effect (e.g. parner up with a "good" super-AGI).
Because unchecked convergent instrumental goals for AGI are already in contrast with humanity goals. As soon as you realize humanity may have reasons to want to shut down/restrain an AGI (through whatever means), this gives ground to the AGI to wipe humanity.
An other funny thing you can do with square roots: let's take and let us look at the power series . This converges for , so that you can specialize in . Now, you can also do that inside , and the series converges to , ``the'' square root of . But in this actually converges to .
But although bayesianism makes the notion of knowledge less binary, it still relies too much on a binary notion of truth and falsehood. To elaborate, let’s focus on philosophy of science for a bit. Could someone give me a probability estimate that Darwin’s theory of evolution is true?
What do you mean by that question? Because the way I understand it, then the probability is "zero". The probability that, in the vast hypotheses space, Darwin's theory of evolution is the one that's true, and not a slightly modified variant, is completely negligible. My main p...
Regarding the stopping rule issue, it really depends how you decide the stopping. I believe sequential inference lets you do that without any problem but it's not the same as saying that the p-value is within the wanted bounds. But basically all of this derives from working with p-values instead of workable values like log-odds. The other problem of p-values is that it only lets you work with binary hypotheses and makes you believe that writing things like P(H0) actually carry a meaning, when in reality you can't test an hypothesis in a vacuum, you have to...
I'm sorry but I don't get the explanation regarding the coinrun. I claim that the "reward as incentivization" framing still "explains" the behaviour in this case. As an analogy, we can go back to training a dog and rewarding it with biscuits: let's say you write numbers on the floor from 1 to 10. You ask the dog a simple calculus question (whose answer is between 1 to 10), and each time he puts its paw on the right number he gets a biscuit. Let's just say that during the training it so happens that the answer to all the calculus questions is always 6. Woul... (read more)