Sorted by New


Stable Pointers to Value III: Recursive Quantilization

I share both of these intuitions.

That being said, I'm not convinced that the space of concepts is smaller as you get more meta. (Naively speaking, there are ~exponentially more distributions over distributions than distributions, though some strong simplicity biases can cut this down a lot.) I suspect that one reason it seems that the space of concepts is "smaller" is because we're worse at differentiating concepts at higher levels of meta-ness. For example, it seems that it's often easier to figure out what the consequences of concrete action X are than the consequences of adopting a particular ethical system, and a lot of philosophy on metaethics seems more confused than philosophy on ethics. I think this is related to the "it's more difficult to get feedback" intuition, where we have fewer distinct buckets because it's too hard to distinguish between similar theories at sufficiently high meta-levels.

Non-Adversarial Goodhart and AI Risks

I'm pretty sure that "hard problem of correctly identifying causality" is a major goal of MIRI's decision theory.

In what sense is discovering causality NP-hard? There's the trivial sense in which you can embed a NP-hard problem (or tasks of higher complexity) into the real world, and there's the sense in which inference in Bayesian networks can embed NP-hard problems.

Can you elaborate on why AIXI/Solomonoff induction is an unsafe utility maximizer, even for Cartesian agents?

Raising funds to establish a new AI Safety charity

After looking into the prototype course, I updated upwards on this project, as I think it is a decent introduction to Dylan's Off-Switch Game paper. Could I ask what other stuff RAISE wants to cover in the course? What other work on corrigibility are you planning to cover? (For example Dylan's other work, MIRI's work on this subject and Smitha Mili's paper?)

Could you also write more about who your course is targeting? Why does RAISE believe that the best way to fix the talent gap in AI safety is to help EAs change careers via introductory AI Safety material, instead of, say, making it easier for CS PhD students to do research on AI Safety-relevant topics? Why do we need to build a campus, instead of co-opting the existing education mechanisms of academia?

Finally, could you link some of the mind maps and summaries RAISE has created?

The Utility of Human Atoms for the Paperclip Maximizer

Thanks! I think it makes sense to link it at the start, so new readers can get context for what you're trying to do.

Sources of intuitions and data on AGI

Yeah, I think Ben captures my objection - IDA captures what is different between your approach and MIRI's agenda, but not what is different between some existing AI systems and your approach.

This might not be a bad thing - perhaps you want to choose a name that is evocative of existing approaches to stress that your approach is the natural next step for AI development, for example.

The Utility of Human Atoms for the Paperclip Maximizer

Could I ask what the motivation behind this post was?

Load More