Wiki Contributions

Comments

Thank you!  The 1000-word max has proven to be unrealistic, so it's not too long.  You and g-w1 picked exactly the same passage.

Thank you!  I'm just making notes to myself here, really:

  • Harry teaches Draco about blood science and scientific hypothesis testing in Chapter 22.
  • Harry explains that muggles have been to the moon in Chapter 7.
  • Quirrell's first lecture is in chapter 16, and it is epic!  Especially the part about why Harry is the most-dangerous student.

I think the problem is that each study has to make many arbitrary decisions about aspects of the experimental protocol.  This decision will be made the same way for each subject in a single study, but will vary across studies.  There are so many such decisions that, if the meta-analysis were to include them as dependent variables, each study would introduce enough new variables to cancel out the statistical power gain of introducing that study.

You have it backwards.  The difference between a Friendly AI and an unfriendly one is entirely one of restrictions placed on the Friendly AI.  So an unfriendly AI can do anything a friendly AI could, but not vice-versa.

The friendly AI could lose out because it would be restricted from committing atrocities, or at least atrocities which were strictly bad for humans, even in the long run.

Your comment that they can commit atrocities for the good of humanity without worrying about becoming corrupt is a reason to be fearful of "friendly" AIs.

By "just thinking about IRL", do you mean "just thinking about the robot using IRL to learn what humans want"?  'Coz that isn't alignment.

'But potentially a problem with more abstract cashings-out of the idea "learn human values and then want that"' is what I'm talking about, yes.  But it also seems to be what you're talking about in your last paragraph.

"Human wants cookie" is not a full-enough understanding of what the human really wants, and under what conditions, to take intelligent actions to help the human.  A robot learning that would act like a paper-clipper, but with cookies.  It isn't clear whether a robot which hasn't resolved the de dicto / de re / de se distinction in what the human wants will be able to do more good than harm in trying to satisfy human desires, nor what will happen if a robot learns that humans are using de se justifications.

Here's another way of looking at that "nor what will happen if" clause:  We've been casually tossing about the phrase "learn human values" for a long time, but that isn't what the people who say that want.  If AI learned human values, it would treat humans the way humans treat cattle.  But if the AI is to learn to desire to help humans satisfy their wants, it isn't clear that the AI can (A) internalize human values enough to understand and effectively optimize for them, while at the same time (B) keeping those values compartmentalized from its own values, which make it enjoy helping humans with their problems.  To do that the AI would need to want to propagate and support human values that it disagrees with.  It isn't clear that that's something a coherent, let's say "rational", agent can do.

How is that de re and de dicto?

You're looking at the logical form and imagining that that's a sufficient understanding to start pursuing the goal. But it's only sufficient in toy worlds, where you have one goal at a time, and the mapping between the goal and the environment is so simple that the agent doesn't need to understand the value, or the target of "cookie", beyond "cookie" vs. "non-cookie". In the real world, the agent has many goals, and the goals will involve nebulous concepts, and have many considerations and conditions attached, eg how healthy is this cookie, how tasty is it, how hungry am I.  It will need to know /why/ it, or human24, wants a cookie in order to intelligently know when to get the cookie, and to resolve conflicts between goals, and to do probability calculations which involve the degree to which different goals are correlated in the higher goals they satisfy.

There's a confounding confusion in this particular case, in which you seem to be hoping the robot will infer that the agent of the desired act is the human, both in the case of the human, and of the AI.  But for values in general, we often want the AI to act in the way that the human would act, not to want the human to do something. Your posited AI would learn the goal that it wants human24 to get a cookie.

What it all boils down to is:  You have to resolve the de re / de dicto / de se interpretation in order to understand what the agent wants.  That means an AI also has to resolve that question in order to know what a human wants. Your intuitions about toy examples like "human 24 always wants a cookie, unconditionally, forever" will mislead you, in the ways toy-world examples misled symbolic AI researchers for 60 years.

So, "mesa" here means "tabletop", and is pronounced "MAY-suh"?

I think your insight is that progress counts--that counting counts.  It's overcoming the Boolean mindset, in which anything that's true some of the time, must be true all of the time.  That you either "have" or "don't have" a problem.

I prefer to think of this as "100% and 0% are both unattainable", but stating it as the 99% rule might be more-motivating to most people.

What do you mean by a goodhearting problem, & why is it a lossy compression problem?  Are you using "goodhearting" to refer to Goodhart's Law?

Load More