Jade Bishop


Conclusion to the sequence on value learning

I have to admit I've seen this as a strong motive for creating AGI in both myself and others. Maybe it's because I just don't get along with other humans very well (or specifically I fail to model them properly), or because I feel as if I would understand AGI better than them, but it just seems much more appealing to me than having an actual child, at least right now. Specifically, my goal is (assuming I understand correctly) non-goal-directed bounded artificial intelligence agents, so... it's pretty similar, at least. It's certainly a strong enough motive for some people.

What are the advantages and disadvantages of knowing your own IQ?

IQ doesn't actually mean much. Stephen Jay Gould's The Mismeasure of Man is a good place to start. IQ varies wildly based off socioeconomic status, education, age, amount of sleep, whether or not you've had any stimulants/depressants/any mind-altering substance, how much you've had to eat, if you've mentally or physically exerted yourself, and so on.

Tracking IQ within an individual could be useful as part of a battery of other tests to predict cognitive degeneration, but a decrease in IQ could simply mean you're having a bad day.

Additionally, intelligence is genetically variable, not genetically determined. It has much to do with environmental, social, and cultural factors, enough to make the genetic component not very important. That means that it's always possible to increase-- through education, practice, etc.-- your IQ, just as it's possible to decrease it-- through drugs, disuse, dogmatization and false beliefs, and so on.

Can coherent extrapolated volition be estimated with Inverse Reinforcement Learning?

Thank you for your feedback! I haven't read this yet, but it comes pretty close to a discussion I had with a friend over this post.

Essentially, her argument started with a simple counterargument: She bought peanut M&Ms when she didn't want to, and didn't realise she was doing it until afterwards. In a similar situation where she was hungry and in the same place, she desired peanut M&Ms to satisfy her hunger, but this time she didn't want them. She knew she didn't want peanut M&Ms, and didn't consciously decide to get them against that want; in this sense, I think a parallel can be drawn with akrasia, where rationality alone isn't enough.

Her point was this: There has to be a line drawn between "intentional conscious action" and "the result of a complex system of interacting parts that puppets the meat sack that holds our brain, sometimes in ways we don't intend." On a base level, this could result in, say, an AI that acts like a normal human but sometimes buys peanut M&Ms against their volition. On an agent-based level where an AI is no more or less capable than a human, this isn't much of an issue, and such things could make individual AI agents more convincing.

But if you want to make a superintelligent AI to run your ideal utopia, you don't want it to decide to feed everyone peanut M&Ms against their will on a whim.

The biggest issue is that we can't determine the difference between "intentional action" and "unintentional response". If we could, then it would then (according to her) be trivial to find out what the CEV of humanity is, no estimation needed.

My largest assumption was that the lowest common denominator of human behaviour is "principled reasoning in pursuit of fixed, though unstated, goals". More realistically, as another friend (and the post you linked) pointed out, the lowest common denominator of human behaviour is going to be "reproduce", which has very unfortunate implications for the Friendliness of this hypothetical agent.

A number of things could be done to ameliorate this, such as not including any means to reproduce or any data supporting reproduction in the trajectories, but they all seem inadequate or ad-hoc. I don't want to staple together a bunch of things I barely understand and declare it the Solution To AI (not that I was attempting to do that, anyway), especially when the issue isn't necessarily with the technology and theory. As the peanut-M&M-purchasing friend put, the technology is sufficient but this post overestimates humans. This wasn't actually what I expected to have an issue on, and it shifts it from "improve technology and theories" to... what, "improve humans"? I'm at a loss as to where to go from here; inverse reinforcement learning has a demonstrable use-case and benefits, but the data is... not good. Garbage in gives garbage out. Is it really possible to improve human behaviour (or our analysis/collection of human behaviour) to achieve better results?

Conspiracy World is missing.

You can find it backed up on the Wayback Machine: https://web.archive.org/web/20180105104931/https://www.lesswrong.com/tag/conspiracy_world Additionally, links to the same content are available on the LessWrong wiki at https://wiki.lesswrong.com/wiki/Beisutsukai.

Edit: Additionally, to address your first question, tags were likely lost in the shift to LessWrong 2.0, but I haven't been around long enough to confirm that.