All of Houshalter's Comments + Replies

I'm not sure what my exact thoughts were back then. I was/am at least skeptical of the specific formula used as it seems arbitrary. It is designed intentionally to have certain properties like exponentially diminishing returns. So it's not exactly a "wild implication" that it has these properties.

I recently fit the Chinchilla formula to the data from the first LLaMA paper:

This was over an unrelated disagreement elsewhere about whether Chinchilla's predictions still held or made sense. As well as the plausibility of training ... (read more)

Human beings can not do most math without pencil and paper and a lot of pondering. Whereas there are a number of papers showing specialized transformers can do math and code at a more sophisticated level than I would have expected before seeing the results.

I literally noted that GPT-J, which uses said 7GB of math (assuming that number is right), usually fails at '2 + 2 ='. People can do several digit addition without pencil and paper. '763 + 119 =' probably doesn't require pencil and paper to get '882'. We do require it for many step algorithms, but this is not that.  'Dumb' computers do 64-bit addition trivially  (along with algebra, calculus, etc.). I haven't seen specialized math models, but I'm dumbfounded that general models don't do math way better. I haven't tried coding using 'AI' tools, so have no real opinion on how well it compares to basic autocomplete.
The basic problem of arithmetic is this: You can't be informal in math, and every single step needs to be checked. Language, while complicated can allow a degree of informality, as long as you can communicate well. Math does not allow this.

The Pile includes 7GB of math problems generated by deepmind basically as you describe. I don't believe the models trained on it can do any of them, but my testing wasn't properly done.

I am unsurprised it includes them, since it is an obvious thing. 7GB sounds like a crazy amount of math problems...but is only a tiny amount compared to what could be generated. Chinchilla was all about how they need more data, and it would be an easy way to increase that (correctly). That don't understand math on 7GB amount of examples is obviously related to the current extremely primitive state of logic in all such models. The big question, would it still not understand math and logic at 100x the amount of it. If it could learn basic abstract reasoning, that would massively improve its performance at all tasks. Since math and logic are literally just languages that express an understanding of the easiest (context-independent) of relations between things, that would prove modern techniques wholly unsuited to real AI. I suspect if it was 700GB of math, it wouldn't fail so hard at math, but who knows?  (GPT-J even fails at things like '2 + 2 =' on half the prompts I try, often giving strange results like '0' or '5' even with a temperature of 0, though often that is because it doesn't even realize it is math, assuming that '2 + 2 =' is somehow a programming thing even though the similarity is entirely superficial. Even when it knows it is doing math, it will often get the answer right at first, and then switch to '2 + 2 = 0'.).

They fit a simplistic model where the two variables were independent and the contribution of each decays exponentially. This leads to the shocking conclusion that the two inputs are independent and decay exponentially...

I mean the model is probably fine for it's intended purpose; finding the rough optimal ratio of parameters and data for a given budget. It might mean that current models have suboptimal compute budgets. But it doesn't imply anything beyond that, like some hard limit to scaling given our data supply.

If the big tech companies really want to t... (read more)

What specific claims in the post do you disagree with? See this post for why multiple epochs will probably not work nearly as well as training on additional data.

The temporal difference learning algorithm is an efficient way to do reinforcement learning. And probably something like it happens in the human brain. If you are playing a game like chess, it may take a long time to get enough examples of wins and losses, for training an algorithm to predict good moves. Say you play 128 games, that's only 7 bits of information, which is nothing. You have no way of knowing which moves in a game were good and which were bad. You have to assume all moves made during a losing game were bad. Which throws out a lot of informati... (read more)

It's back btw. If it ever goes down again you can probably get it on wayback machine. And yes the /r/bad* subreddits are full of terrible academia snobbery. Badmathematics is the best of the bunch because mathematics is at least kind of objective. So they mostly talk about philosophy of mathematics.

The problem is formal models of probability theory have problems with logical uncertainty. You can't assign a nonzero probability to a false logical statement. All the reasoning about probability theory is around modelling uncertainty in the unkown ext... (read more)

1mako yass5y
Hmm. Reading. Okay. Summary: All of Eliezer's writing on this assumed the context of AGI/applied epistemology. That wasn't obvious from the materials, and it did not occur to this group of pure mathematicians to assume that same focus, because they're pure mathematicians and because of the activity they had decided to engage in on that day.
We published new versions of a lot of sequences posts a few months ago. If you click on the "Response to previous version" text, you can read the original text that the comment was referring to.

That's unlikely. By the late 19th century there was no stopping the industrial revolution. Without coal maybe it would have slowed down a bit. But science was advancing at a rapid pace, and various other technologies from telephones to electricity were well on their way. It's hard for us to imagine a world without coal, since we took that path. But I don't see why it couldn't be done. There would probably be a lot more investment in hydro and wind power (both of which were a thing before the industrial revolution.) And eventually solar. Cars would be hard, but electric trains aren't inconceivable.

we have nuclear weapons that are likely visible if fired en mass.

Would we be able to detect nuclear weapons detonated light years away? We have trouble detecting detonations on our own planet! And even if we did observe them, how would we recognize it as an alien invasion vs local conflict, or god knows what else.

The time slice between us being able to observe the stars, and post singularity, is incredibly tiny. It's very unlikely two different worlds will overlap so that one world is able to see the other destroyed and rush a singularity. I'm not even sure if we would rush a singularity if we observed aliens, or if it would make any difference.

First of all, the Earth has been around for a very very long time. Even slowly expanding aliens should have hit us by now. The galaxy isn't that big relative to the vast amounts of time they have probably been around. I don't feel like this explains the fermi paradox.

If aliens wanted to prevent us from fleeing, this is a terribly convoluted way of doing it. Just shoot a self replicating nanobot at us near the speed of light, and we would be dealt with. We would never see it coming. They could have done this thousands of years ago, if not millions. And it w... (read more)

Yes, this explains it only if we are in the very small window between the yellow and red fronts. Us seeing it coming is not the problem; it's the next civilization along not seeing our destruction that's important. And it's not clear at all that you can easily do "the minimal amount of destruction necessary", especially since we have nuclear weapons that are likely visible if fired en mass. More to the point "just shoot them all, it's cheap" is true if you don't care about being observed (you can Dyson suns for the energy, and have visible shielding mechanisms for probes that shoot through the very dusty interstellar - not intergalactic - space). I'm not yet convinced that it's easy or cheap to do it a c-comparable speeds and discreetly.

Well we have plausible reason to believe in aliens. The copernican principle, that the Earth isn't particularly special and the universe is enormous. There's literally no reason to believe angels and demons are plausible.

And god do I hate skeptics and how they pattern match everything "weird" to religion. Yes aliens are weird. That doesn't mean they have literally the same probability of existing as demons.

I imagine that a few centuries ago a RationalWiki page on microorganisms would describe them as: "A pseudoscientific belief that human diseases are caused by invisible beings. Coincidentally, exactly the same thing as described in the Bible. This dangerous myth is spread by opponents of scientific bloodletting and other uneducated people."

I think a concrete example is good for explaining this concept. Imagine you flip a coin and then put your hand over it before looking. The state of the coin is already fixed on one value. There is no probability or randomness involved in the real world now. The uncertainty of it's value is entirely in your head.

From Surely You're Joking Mr. Feynman:

Topology was not at all obvious to the mathematicians. There were all kinds of weird possibilities that were “counterintuitive.” Then I got an idea. I challenged them: "I bet there isn't a single theorem that you can tell me - what the assumptions are and what the theorem is in terms I can understand - where I can't tell you right away whether it's true or false."

It often went like this: They would explain to me, "You've got an orange, OK? Now you cut the orange into a finite number of pieces, put it b

... (read more)

Yudkowsky has changed his views a lot over the last 18 years though. A lot of his earlier writing is extremely optimistic about AI and it's timeline.

This is by far my favorite form of government. It's a great response whenever the discussion of "democracy is the best form of government we have" comes up. Some random notes in no particular order:

Sadly getting support for this in the current day is unlikely because of the huge negative associations with IQ tests. Even literacy tests for voters are illegal because of a terrible history of fake tests being used by poll workers to exclude minorities. (Yes the tests were fake like this one, where all the answers are ambiguous and can be judged as c... (read more)

In the first draft of the lord of the rings, the Balrog ate the hobbits and destroyed middle Earth. Tolkien considered this ending unsatisfactory, if realistic, and wisely decided to revise it.

“You keep using that word, I do not think it means what you think it means”

It's really going to depend on your interests. I guess I'll just dump my favorite channels here.

I enjoy some math channels like Numberphile, computerphile, standupmaths, 3blue1brown, Vi Hart, Mathologer, singingbanana, and some of Vsauce.

For "general interesting random facts" there's Tom Scott, Wendover Productions, CGP Grey, Lindybeige, Shadiversity. and Today I Found Out.

Science/Tech/etc: engineerguy, Kurzgesagt, and C0nc0rdance.

Miscellaneous: kaptainkristian, CaptainDisillusion, and the more recent videos of suckerpinch.

Politics: I unsubscribe... (read more)

Well there is a lot of research into treatments for dementia, like the neurogenesis drug I mentioned above. I think it's quite plausible they will stumble upon general cognitive enhancers that improve healthy people.

Just because it's genetic doesn't mean it's incurable. Some genetic diseases have been cured. I've read of drugs that increase neurogenesis, which could plausibly increase IQ. Scientists have increased the intelligence of mice by replacing their glial cells with better human ones.

A fair point, but I still expect gene-level interventions to work better and be developed noticeably earlier than any "cures" for low IQ in adults or even kids. Notably, after the low-hanging fruits have been picked (malnutrition, lead, etc.), there are no clear avenues for advancement. At the moment we don't have a clue as to where even to start looking.

I wasn't aware that method had a name, but I've seen that idea suggested before when this topic comes up. For neural networks in particular, you can just look at the gradients of the inputs to see how it's output changes as you change each input.

I think the problem people have, is that just tells you what the machine is doing. Not why. Machine learning can never really offer understanding.

For example, there was a program created specifically for the purpose of training human understandable models. It worked by fitting the simplest possible mathematical exp... (read more)

Isn't that exactly what causality and do notation is for? Generate the "how" answer, and then do causal analysis to get the why.

Same as with the GAN thing. You condition it on producing a correct answer (or whatever the goal is.) So if you are building a question answering AI, you have it model the probability distribution something like P(human types this character | human correctly answers question). This could be done simply by only feeding it examples of correctly answered questions as it's training set. Or you could have it predict what a human might respond if they had n days to think about it.

Though even that may not be necessary. What I had in mind was just having the AI r... (read more)

I see that working. But we still have the problem that if the number of answers is too large, somewhere there is going to be an answer X, such that the most likely behaviour for a human that answers X is to write something dangerous. Now, that's ok if the AI has two clearly defined processes: first find the top answer, independently of how it's written up, then write up as a human. If those goals are mixed, it will go awry.

Consider also that religions that convert more people tend to spread faster and farther than religions that don't. So over time religions should become more virulent.

There are some problems with this analysis. First of all, translation is natural language processing. What task requires more understanding of natural language than translation? Second, the BLEU score mentioned is only a cheap and imperfect measure of translation quality. The best measure is human evaluation. And neural machine translation excels at that. Look at this graph. On all languages, the neural network is closer to human performance than the previous method. And on several languages it's extremely close to human performance, and it's translations ... (read more)

Selecting from a list of predetermined answers extremely limits the AI's ability. Which isn't good if we want it to actually solve very complex problems for us! And that method by itself doesn't make the AI safe, just makes it much harder for it to do anything at all.

Note someone found a way to simplify my original idea in the comments. Instead of using the somewhat complicated GAN thing, you can just have it try to predict the next letter a human would type. In theory these methods are exactly equivalent.

How do you trade that off against giving an actually useful answer?

It wasnt until relatively late in the second industrial revolution that coal completely replaced wood. And oil came very late. I think an industrial revolution could happen a second time without fossil fuel.

Good point. However it would have petered out very quickly though as the wood was all burned.

I like the scenario you presented. 6 months until intelligence explosion changes the entire approach to FAI. More risk is acceptable. More abstract approaches to FAI research seem less useful if they can't lead to tangible algorithms in 6 months.

I think the best strategy would be something like my idea to have AI mimic humans. Then you can task it to do FAI research for you. It could possibly produce years worth of FAI research papers in a fraction of the time. I don't think we should worry too much about the nuances of actually training an FAI directly.

Your mimic human ideas feels similar to various things I've been playing around with. Incidentally, I've radically simplified the original "mimic humans" idea (see the second Oracle design here ). Instead of imitating humans, the AI selects from a list of human-supplied answers. This avoids any need for GANs or similar assessment methods ^_^ "Could a human have given this answer? Well, yes, because a human did."

I believe this is the idea of "motivational scaffolding" described in Superintelligence. Make an AI that just learns a model of the world, including what words mean. Then you can describe its utility function in terms of that model - without having to define exactly what the words and concepts mean.

This is much easier said than done. It's "easy" to train an AI to learn a model of the world, but how exactly do you use that model to make a utility function?

No, the choice is to vote for your preferred candidate, or to not vote. Write ins count as "not voting".

From whose point of view? From my own personal perspective there might well be a noticeable difference in utility between writing in Cthulhu and just avoiding the voting station. In any case, since there are at least three alternatives, one of them does not necessarily have to have >50% confidence.
The fact that you count it as not voting does not mean it is in fact not voting, and it especially does not mean that the person is choosing not to vote (they are not choosing that unless they think they are not voting.)
Looking at reality (as opposed to theoretical abstractions), this does not seem to be true.

If you really believe your candidate is less than 50% likely to be the "correct" candidate, you can just vote for the other one. Then you will necessarily have a >50% confidence you voted for the correct candidate. You can't possibly do worse on a binary decision.

You could vote for the other one, but you might not want to, say e.g. that almost all your friends think that the person is the correct candidate. Also, when you think of the sentence, "my candidate is less than 50% likely to be the correct candidate," you are likely to dislike that assertion, and to start thinking of reasons for saying that they are more than 50% likely to be the correct candidate.

Well the control problem is all about making AIs without "inimical motivations", so that covers the same thing IMO. And fast takeoff is not at all necessary for AI risk. AI is just as dangerous if it takes it's time to grow to superintelligence. I guess it gives us somewhat more time to react, at best.

Only if you use language very loosely. If you don't. the Value Alignment problem is about making an AI without inimical motivations, and the Control Problem is about making an AI you can steer irrespective of its motivations. This is about Skynet scenarios specifically. If you have mutlipolar slow development of ASI, then you can fix the problems as you go along. Which is to say that in order to definitely have a Skynet scenario, you definitely do need things to develop at more than a certain rate. So speed of takeoff is an assumption, however dismsively you phrase it.

Are you referring to OP or me? I don't think my estimate of the difference between candidates is ridiculous. It's pretty clear the president can have a massive impact on the world. So large that, even when multiplied by a 1 in 10 million probability, it's still worth your time to vote.

Using dollar amounts might be a bit naive. Instead look at utility directly, perhaps some estimate like QALYs. I think something like health care reform alone has the potential to be worth tens of millions of QALYs. A major war or economic depression can easily cost similar a... (read more)

Yes, from your subjective view your vote is always positive. Thus you should always vote.

You are mistaken. My choices are not to vote for candidate A or to vote for candidate B. My choices are to vote for candidate A, for candidate B, to write in somebody (e.g. Cthulhu), and to not vote at all.

Third parties aren't stable. They can appear, but they inevitably split the vote. They always hurt their cause more than help it. Unless they are so popular they can outright replace one of the parties.

Huh? You mean for their cause it's better to just curl up and die, but refusing to do so subverts their cause..?

False. It requires only a few events, like smarter-than-human AI being invented, and the control problem not being solved. I don't think any of these things is very unlikely.

Not solving the control problem isn't a sufficient condition for AI danger: the AI also needs inimical motivations. So that is a third premise. Also fast takeoff of a singleton AI is being assumed. ETA: The last two assumptions are so frequently made in AI risk circles that they lack salience -- people seem to have ceased to regard them as assumptions at all.

I can still think the CEV machine is better than whatever the alternative is (for instance, no AI at all.) But yes, in theory, you should prefer to make AIs that have your own values and not bother with CEV.

Having a body is irrelevant. Bodies are just one way to manipulate the world to optimize your goals.

"We convert the resources of the world into the things we want." To some extent, but not infinitely, in a fanatical way. Again, that is the whole worry about AI -- that it might do that fanatically. We don't.

What do you mean by "fanatic... (read more)

"Having a body is irrelevant. Bodies are just one way to manipulate the world to optimize your goals." This is not true. Bodies are physical objects that follow the laws of physics, and the laws of physics are not "just one way to manipulate the world to optimize your goals," because the laws have nothing to do with your goals. For example, we often don't keep doing something because we are tired, not because we have a goal of not continuing. AIs will be quite capable of doing the same thing, as for example if thinking too hard about something begins to weaken its circuits. What I mean by fanatically is trying to optimize for a single goal as though it were the only thing that mattered. We do not do that, nor does anything else with a body, nor is it even possible, for the above reason. Yes you should be concerned about what I said about slaves and aliens, as it suggests that the CEV machine might result in things that you consider utterly wicked. I said that from the beginning, when you claimed that it would eliminate all negative results, obviously intending that to mean from your subjective point of view.

In a first past the post election third parties are irrelevant.

More specifically, the calculations above apply to a close election. 538 gives Johnson a less than 0.01% chance of winning. Obviously the probability of you being the tie breaking vote is many many orders of magnitude smaller than is worth calculating.

*Looks at the UK* Are they, now?

That's impossible. You can't have less than 50% confidence on a binary decision. You can't literally believe that a coin flip is more likely to select the correct option than you.

What do you mean you can't have less than 50% confidence in a decision? The whole idea of expected value is that you can be less than 50% sure that something will have positive consequences, and do it anyway. In this very post the idea is that your vote is almost certainly worthless, but there is a very small chance of a very large effect, and therefore you should vote anyway. But you are much less than 50% sure it will have any positive effect at all. So likewise you can be much less than 50% sure your candidate is the right one.
Huh? Of course you can. People consistently make worse-than-random choices all the time. Especially in an information-abusive environment such as casinos, advertising, or politics. In fact, this entire post and concept is based on the idea that without you, the voting populace would make the wrong binary decision.
You are confusing "confidence" and "the probability you are voting for the correct candidate". These are quite different things. From your subjective view the expected value of a vote is always positive. That does not mean that it's actually positive -- see Cromwell.
Well, suppose I think the probability that Johnson would be the best president is 40%, the probability that Clinton would be is 30%, that Stein would be is 20%, and that Trump would be is 10%...

...add a primary supergoal which imposes a restriction on the degree to which "instrumental goals" are allowed to supercede the power of other goals. At a stroke, every problem he describes in the paper disappears, with the single addition of a goal that governs the use of instrumental goals -- the system cannot say "If I want to achieve goal X I could do that more efficiently if I boosted my power, so therefore I should boost my power to cosmic levels first, and then get back to goal X."

This is not so simple. "Power" and &... (read more)

You say: I really do not like being told that I do not know what reinforcement learning is, by someone who goes on to demonstrate that they haven't a clue and can't be bothered to actually read the essay carefully. Bye.

I don't see anything in that link that is relevant to this post.

No that's not how this works. We are calculating the expected value of a vote. The expected value is more than just "best case - worst case". You factor in all possibilities weighted by their probability.

As long as the probability you are voting for the correct candidate is higher than 50%, the expected value of a vote is positive. And obviously it's impossible to get odds worse than that, for a binary decision. You can then multiply the probability you are voting for the correct candidate by the expected value of a correct vote, and it's likely ... (read more)

I think the naivety and imperfections make it useless as a demonstration of this. Taking such ridiculous estimates of the difference between candidates and your level of knowledge about those differences (and especially the difference between you and the median voter you're hoping to override) makes me doubt the seriousness of the calculation, and makes it impossible to make decisions based on. Heck, the range of possibilities between "$hundreds of thousands to charity (not mentioned: and many millions to non-charity cronies of the winner)" and "worth your time to vote", even if we discount the "negative value if you're wrong" option (which is real, but probably only reduces EV for this group rather than making it net negative) is enough to show that people making monetary EV arguments are on the propaganda side rather than the truth side of the calculation.
"As long as the probability you are voting for the correct candidate is higher than 50%," The probability may be lower than 50%, so my statement stands as it is.

The article gives an upper limit of the expected value of a vote. Even if the lower limit is 2 orders of magnitude lower, it still is quite significant. And makes it worth the time to vote, or to try to convince other people to vote your way.

Scott Alexander estimates the value of a vote more conservatively to be about $300 to $5,000 (the higher number for if you live in a swing state.) He backs this up by pointing to specific policies and actions taken by presidents that cost trillions of dollars.

I don't think it's hard to believe that the president can be... (read more)

"Even if the lower limit is 2 orders of magnitude lower," The lower limit is the negative of the upper limit, since your political opinions may be mistaken.
Yeah. If you assign even a small margin by which a Trump win increases the odds of human extinction, suddenly you end up with a really high number below the bottom line.

I don't see why you do that division. The point of being the decisive vote, is that if you didn't show up to vote, the election would have gone the other way (lets ignore ties for the moment.) You can disregard other people entirely in this model. All that matters is the expected value of your action. Which is enormous.

Well now I see we disagree at a much more fundamental level.

There is nothing inherently sinister about "optimization". Humans are optimizers in a sense, manipulating the world to be more like how we want it to be. We build sophisticated technology and industries that are many steps removed from our various end goals. We dam rivers, and build roads, and convert deserts into sprawling cities. We convert the resources of the world into the things we want. That's just what humans do, that's probably what most intelligent beings do.

The definition of F... (read more)

"Well now I see we disagree at a much more fundamental level." Yes. I've been saying that since the beginning of this conversation. If humans are optimizers, they must be optimizing for something. Now suppose someone comes to you and says, "do you agree to turn on this CEV machine?", when you respond, are you optimizing for the thing or not? If you say yes, and you are optimizing the original thing, then the CEV cannot (as far as you know) be compromising the thing you were optimizing for. If you say yes and are not optimizing for it, then you are not an optimizer. So you must agree with me on at least one point: either 1) you are not an optimizer, or 2) you should not agree with CEV if it compromises your personal values in any way. I maintain both of those, but you must maintain at least one of them. In earlier posts I have explained why it is not possible that you are really an optimizer (not during this particular discussion.) People here tend to neglect the fact that an intelligent thing has a body. So e.g. Eliezer believes that an AI is an algorithm, and nothing else. But in fact an AI has a body just as much as we do. And those bodies have various tendencies, and they do not collectively add up to optimizing for anything, except in an abstract sense in which everything is an optimizer, like a rock is an optimizer, and so on. "We convert the resources of the world into the things we want." To some extent, but not infinitely, in a fanatical way. Again, that is the whole worry about AI -- that it might do that fanatically. We don't. I understand you think that some creatures could have fundamental values that are perverse from your point of view. This is because you, like Eliezer, think that values are intrinsically arbitrary. I don't, and I have said so from the beginning. It might be true that slave owning values could be fundamental in some exterrestrial race, but if they were, slavery in that race would be very, very different from slavery in the human r

The CEV process might well be immoral for everyone concerned, since by definition it is compromising a person's fundamental values.

The world we live in is "immoral" in that it's not optimized towards anyone's values. Taking a single person's values would be "immoral" to everyone else. CEV, finding the best possible compromise of values, would be the least immoral option, on average. Optimize the world in a way that dissatisfies the least people the least amount.

That does not necessarily mean "living separately".

Right. I... (read more)

I think optimizing anything is always immoral, exactly because it means imposing things that you should not be imposing. It is also the behavior of a fanatic, not a normal human being; that is the whole reason for the belief that AIs would destroy the world, namely because of the belief that they would behave like fanatics instead of like intelligent beings. In the case of the slave owning race, I am quite sure that slavery is not consistent with their fundamental values, even if they are practicing it for a certain time. I don't admit that values are arbitrary, and consequently you cannot assume (at least without first proving me wrong about this) that any arbitrary value could be a fundamental value for something.

But humans share a lot of values (e.g. wanting to live and not be turned into a dyson sphere.) And a collection of individuals may still have a set of values (see e.g. coherent extrapolated volition.)

It means that when you look an an AI system, you can tell whether it's FAI or not.

Look at it how? Look at it's source code? I argued that we can write source code that will result in FAI, and you could recognize that. Look at the weights of it's "brain"? Probably not, anymore than we can look at human brains and recognize what they do. Look at it's actions? Definitely, FAI is an AI that doesn't destroy the world etc.

I don't see what voting systems have to do with CEV. The "E" part means you don't trust what the real, current humans

... (read more)
I think we have, um, irreconcilable differences and are just spinning wheels here. I'm happy to agree to disagree.

No it's not necessarily a negative outcome. I think it could go both ways, which is why I said it was "my greatest fear/hope".

The premise this article starts with is wrong. The argument goes that AIs can't take over the world, because they can't predict things much better than humans can. Or, conversely, that they will be able to take over because they can predict much better than humans.

Well so what if they can predict the future better? That's certainly one possible advantage of AI, but it's far from the only one. My greatest fear/hope of AI is that it will be able to design technology much better than humans. Humans didn't evolve to be engineers or computer programmers. It's r... (read more)

But AI taking over isn't the negative outcome we are trying to avoid...we are trying to avoid takeover by AIs that are badly misaligned with our values. What's the problem with an AI that runs complex technology in accordance with our values, better than us?
The way I think of it, designing technology is a special case of prediction. E.g. to design a steam engine, you need to be able to predict how steam behaves in different conditions and whether, given some candidate design, the pressure from the steam will be transformed into useful work or not.

People may have different values (although I think deep down we are very similar, humans sharing all the same brains and not having that much diversity.) Regardless, CEV should find the best possible compromise between our different values. That's literally the whole point.

If there is a difference in our values, the AI will find the compromise that satisfies us the most (or dissatisfies us the least.) There is no alternative, besides not compromising at all and just taking the values of a single random person. From behind the veil of ignorance, the first i... (read more)

"Once we do so, we may very well just apply CEV to them and get the best compromise of our values again. Or we may keep our own values, but still allow them to live separately and do their own thing, because we value their existence." The problem I have with what you are saying is that these are two different things. And if they are two different things in the case of the aliens, they are two different things in the case of the humans. The CEV process might well be immoral for everyone concerned, since by definition it is compromising a person's fundamental values. Eliezer agrees this is true in the case of the aliens, but he does not seem to notice that it would also be true in the case of the humans. In any case, I choose in advance to keep my own values, not to participate in changing my fundamental values. But I am also not going to impose those on anyone else. If you define CEV to mean "the best possible way to keep your values completely intact and still not impose them on anyone else," then I would agree with it, but only because we will be stipulating the desired conclusion. That does not necessarily mean "living separately". Even now I live with people who, in every noticeable way, have values that are fundamentally different from mine. That does not mean that we have to live separately. In regard to the last point, you are saying that you don't want to eliminate all potential aliens, but you want to eliminate ones with values that you really dislike. I think that is basically racist. There is some truth in it, however, insofar as in reality, for reasons I have been saying, beings that have fundamental desires for others to suffer and die are very unlikely indeed, and any such desires are likely to be radically qualified. To that degree you are somewhat right: desires like that are in fact evil. But because they are evil, they cannot exist.

No, I'm asking you to specify it. My point is that you can't build X if you can't even recognize X.

And I don't agree with that. I've presented some ideas on how an FAI could be built, and how CEV would work. None of them require "recognizing" FAI. What would it even mean to "recognize" FAI, except to see that it values the kinds of things we value and makes the world better for us.

Learning what humans want is pretty easy. However it's an inconsistent mess which involves many things contemporary people find unsavory. Making it all c

... (read more)
It means that when you look an an AI system, you can tell whether it's FAI or not. If you can't tell, you may be able to build an AI system, but you still won't know whether it's FAI or not. I don't see what voting systems have to do with CEV. The "E" part means you don't trust what the real, current humans say, so to making them vote on anything is pointless. That's a meaningless expression without a context. Notably, we don't have the same genes or the same brain structures. I don't know about you, but it is really obvious to me that humans are not identical. How do you know what's false? You are a mere human, you might well be mistaken. How do you know what's fair? Is it an objective thing, something that exists in the territory? Right, so the fat man gets thrown under the train... X-) Hey, I want to live on the inside. The outside is going to be pretty gloomy and cold :-/ LOL. You're just handwaving then. "And here, in the difficult part, insert magic and everything works great!"
Load More