Sorted by New


Stable Pointers to Value III: Recursive Quantilization

I share both of these intuitions.

That being said, I'm not convinced that the space of concepts is smaller as you get more meta. (Naively speaking, there are ~exponentially more distributions over distributions than distributions, though some strong simplicity biases can cut this down a lot.) I suspect that one reason it seems that the space of concepts is "smaller" is because we're worse at differentiating concepts at higher levels of meta-ness. For example, it seems that it's often easier to figure out what the consequences of concrete action X are than the consequences of adopting a particular ethical system, and a lot of philosophy on metaethics seems more confused than philosophy on ethics. I think this is related to the "it's more difficult to get feedback" intuition, where we have fewer distinct buckets because it's too hard to distinguish between similar theories at sufficiently high meta-levels.

Non-Adversarial Goodhart and AI Risks

I'm pretty sure that "hard problem of correctly identifying causality" is a major goal of MIRI's decision theory.

In what sense is discovering causality NP-hard? There's the trivial sense in which you can embed a NP-hard problem (or tasks of higher complexity) into the real world, and there's the sense in which inference in Bayesian networks can embed NP-hard problems.

Can you elaborate on why AIXI/Solomonoff induction is an unsafe utility maximizer, even for Cartesian agents?

I skimmed some of Crick and read some commentary on him, and Crick seems to take the Hobbesian "politics as a necessary compromise" viewpoint. (I wasn't convinced by his definition of the word politics, which seemed not to point at what I would point at as politics.)

My best guess: I think they're arguing not that immature discourse is okay, but that we need to be more polite toward people's views in general for political reasons, as long as the people are acting somewhat in good faith (I suspect they think that you're not being sufficiently polite toward those you're trying to throw out of the overton window). As a result, we need to engage less in harsh criticism when it might be seen as threatening.

That being said, I also suspect that Duncan would agree that we need to be charitable. I suspect the actual disagreement is whether the behavior of the critics Duncan is replying to are actually the sort of behavior we want/need to accept in our community.

(Personally, I think we need to be more willing to do real-life experiments, even if they risk going somewhat wrong. And I think some of the tumblr criticism definitely fell out of what I would want in the overton window. So I'm okay with Duncan's paranthetical, though it would have been nicer if it was more explicit who it was responding to.)

I also think I wouldn't have understood his comments without MTG or at least having read Duncan's explanation to the MTG color wheel.

(Nitpicking) Though I'd add that MTG doesn't have a literal Blue Knight card either, so I doubt it's that reference. (There are knights that are blue and green, but none with the exact names "Blue Knight" or "Green Knight".)

Thanks for posting this. I found the framing of the different characters very insightful.

Raising funds to establish a new AI Safety charity

After looking into the prototype course, I updated upwards on this project, as I think it is a decent introduction to Dylan's Off-Switch Game paper. Could I ask what other stuff RAISE wants to cover in the course? What other work on corrigibility are you planning to cover? (For example Dylan's other work, MIRI's work on this subject and Smitha Mili's paper?)

Could you also write more about who your course is targeting? Why does RAISE believe that the best way to fix the talent gap in AI safety is to help EAs change careers via introductory AI Safety material, instead of, say, making it easier for CS PhD students to do research on AI Safety-relevant topics? Why do we need to build a campus, instead of co-opting the existing education mechanisms of academia?

Finally, could you link some of the mind maps and summaries RAISE has created?

The Utility of Human Atoms for the Paperclip Maximizer

Thanks! I think it makes sense to link it at the start, so new readers can get context for what you're trying to do.

Sources of intuitions and data on AGI

Yeah, I think Ben captures my objection - IDA captures what is different between your approach and MIRI's agenda, but not what is different between some existing AI systems and your approach.

This might not be a bad thing - perhaps you want to choose a name that is evocative of existing approaches to stress that your approach is the natural next step for AI development, for example.

The Utility of Human Atoms for the Paperclip Maximizer

Could I ask what the motivation behind this post was?

Load More