If you disagree with something I write about AI, I want to hear! I often find myself posting mainly because I want the best available counterarguments.

John_Maxwell's Comments

Daily Low-Dose Aspirin, Round 2

A large recent trial appears to show that low-dose aspirin isn't helpful, and may be harmful, for healthy older people.

AI Alignment 2018-19 Review

Typically, the problem with supervised learning is that it's too expensive to label everything we care about.

I don't think we'll create AGI without first acquiring capabilities that make supervised learning much more sample-efficient (e.g. better unsupervised methods let us better use unlabeled data, so humans no longer need to label everything they care about, and instead can just label enough data to pinpoint "human values" as something that's observable in the world--or characterize it as a cousin of some things that are observable in the world).

But if you think there are paths to AGI which don't go through more sample-efficient supervised learning, one course of action would be to promote differential technological development towards more sample-efficient supervised learning and away from deep reinforcement learning. For example, we could try & convince DeepMind and OpenAI to reallocate resources away from deep RL and towards sample efficiency. (Note: I just stumbled on this recent paper which is probably worth a careful read before considering advocacy of this type.)

In this case, are you imagining that we label some types of behaviors as good and some as bad, perhaps like what we would do with an approval directed agent?

This seems like a promising option.

AI Alignment 2018-19 Review

Value learning. Building an AI that learns all of human value has historically been thought to be very hard, because it requires you to decompose human behavior into the “beliefs and planning” part and the “values” part, and there’s no clear way to do this.

My understanding is that IRL requires this, but it's not obvious to me that supervised learning does? (It's surprising to me how little attention supervised learning has received in AI alignment circles, given that it's by far the most common way for us to teach current ML systems about our values.)

Anyway, regarding IRL: I can see how it would be harmful to make the mistake of attributing stuff to the planner which actually belongs in the values part.

  • For example, perhaps our AI observes a mother caring for her disabled child, and believes that the mother's goal is to increase her inclusive fitness in an evolutionary sense, but that the mother is irrational and is following a suboptimal strategy for doing this. So the AI executes a "better" strategy for increasing inclusive fitness which allocates resources away from the child.

However, I haven't seen a clear story for why the opposite mistake, of attributing stuff to the values part which actually belongs to the planner, would cause a catastrophe. It seems to me that in the limit, attributing all human behavior as arising from human values could end up looking something like an upload--that is, it still makes the stupid mistakes that humans make, and it might not be competitive with other approaches, but it doesn't seem to be unaligned in the sense that we normally use the term. You could make a speed superintelligence which basically values behaving as much like the humans it has observed as possible. But if this scenario is multipolar, each actor could be incentivized to spin the values/planner dial of its AI towards attributing more of human behavior to the human planner, in order to get an agent which behaves a little more rationally in exchange for a possibly lower fidelity replication of human values.

Can we really prevent all warming for less than 10B$ with the mostly side-effect free geoengineering technique of Marine Cloud Brightening?

With regard to funding, I wonder if you could make it into a for-profit by finding municipal regions which experience periodic flooding and getting the city to pay you to pump the floodwaters up into the air as mist?

I believe global warming is supposed to increase flooding, so flood mitigation will probably be a booming industry soon.

By the way, my dad told me someone at his research lab (Xerox PARC) is doing research on Marine Cloud Brightening. If you send me a personal message via my user page, maybe I can get him to introduce you.

Realism about rationality

In my experience, if there are several concepts that seem similar, understanding how they relate to one another usually helps with clarity rather than hurting.

Realism about rationality

It seems to me like my position, and the MIRI-cluster position, is (1) closer to "rationality is like fitness" than "rationality is like momentum"

Eliezer is a fan of law thinking, right? Doesn't the law thinker position imply that intelligence can be characterized in a "lawful" way like momentum?

Whereas the non-MIRI cluster is saying "biologists don't need to know about evolution."

As a non-MIRI cluster person, I think deconfusion is valuable (insofar as we're confused), but I'm skeptical of MIRI because they seem more confused than average to me.

Self-Supervised Learning and AGI Safety

The term "self-supervised learning" (replacing the previous and more general term "unsupervised learning")

BTW, the way I've been thinking about it, "self-supervised learning" represents a particular way to achieve "unsupervised learning"--not sure what use is standard.

2020's Prediction Thread

I guess Paypal, Amazon Pay, etc. could also qualify--they allow me to make purchases without giving a merchant access to my credit card number.

2020's Prediction Thread

As of 1/1/30, customers will not make purchases by giving each merchant full access to a non-transaction-specific numeric string (i.e. credit cards as they are today): 70%

This seems like the kind of bold prediction which failed last time around. Maybe you can make it more specific and say what fraction of online transactions will be processed using something which looks unlike the current credit card setup?

Tabooing 'Agent' for Prosaic Alignment

I think the world where H is true is a good world, because it's a world where we are much closer to understanding and predicting how sophisticated models generalize.

This seemed liked a really surprising sentence to me. If the model is an agent, doesn't that pull in all the classic concerns related to treacherous turns and so on? Whereas a non-agent probably won't have an incentive to deceive you?

Even if the model is an agent, then you still need to be able to understand its goals based on their internal representation. Which could mean, for example, understanding what a deep neural network was doing. Which doesn't appear to be much easier than the original task of "understand what a model, for example a deep neural network, is doing".

Load More