Ash Gray — LessWrong

A Quick List of Some Problems in AI Alignment As A Field

Excellent post. I have nothing really to add, only that you're not alone in this:

Here's a (failure?) mode that I and others are already in, but might be too embarrassed to write about: taking weird career/financial risks, in order to obtain the financial security, to work on alignment full-time ^[2]. Anyone more risk-averse (good for alignment!) might just... work a normal job for years to save up, or modestly conclude they're not good enough to work in alignment altogether. If security mindset can be taught at all, this is a shit equilibrium.
Yes, I know EA

Ash Gray4y32

I think your overall point -- More Dakka, make AGI less weird -- is right. In my experience, though, I disagree with your disagreement:

I disagree with "the case for the risks hasn't been that clearly laid out". I think there's a giant, almost overwhelming pile of intro resources at this point, any one of which is more than sufficient, written in all manner of style, for all manner of audience.^[1]
(I do think it's possible to create a much better intro resource than any that exist today, but 'we can do much better' is compatible with 'it's shocking that the

... (read more)

Humans are very reliable agents

Ash Gray4y20

OK, thanks for linking that. You're probably right in the specific example of MNIST. I'm less convinced about more complicated tasks - it seems like each individual task would require a lot of engineering effort.

One thing I didn't see - is there research which looks at what happens if you give neural nets more of the input space as data? Things which are explicitly out-of-distribution, random noise, abstract shapes, or maybe other modes that you don't particularly care about performance on, and label it all as "garbage" or whatever. Essentially, providing negative as well as positive examples, given that the input spaces are usually much larger than the intended distribution.

Humans are very reliable agents

Ash Gray4y90

>I imagine if our goal was "never misclassify an MNIST digit" we could get to 6-7 nines of "worst-case accuracy" even out of existing neural nets, at the cost of saying "I don't know" for the confusing 0.2% of digits.

Er, how? I haven't seen anyone describe a way to do this. Getting a neural network to meaningfully say "I don't know" is very much cutting-edge research as far as I'm aware.

Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc

Ash Gray4y100

I think you and John are talking about two different facets of interpretability.

The first one is the question of "white-boxing:" how do the model's internal components interrelate to produce its output? On this dimension, the kind of models that you've given as examples are much more interpretable than neural networks.

What I think John is talking about, I understand as "grounding." (Cf. Symbol grounding problem) Although the decision tree (a) above is clear in that one can easily follow how the final decision comes about, the question remains -- who or wha... (read more)

Specializing in Problems We Don't Understand

Ash Gray5y60

This is the focus of General Systems, as outlined by Weinberg. That book is very good, by the way - I highly recommend reading it. It's both very dense and very accessible.

It's always puzzled me that the rationalist community hasn't put more emphasis on general systems. It seems like it should fit in perfectly, but I haven't seen anyone mention it explicitly. General Semantics mentioned in the recent historical post is somewhat related, but not the same thing.

More on topic: One thing you don't mention is that there are fairly general problem solving techni... (read more)