Sequences

Math Upskilling Notes
Insights from Dath Ilan
Winding My Way Through Alignment

Wiki Contributions

Comments

The main thing I got out of reading Bostrom's Deep Utopia is a better appreciation of this "meaning of life" thing. I had never really understood what people meant by this, and always just rounded it off to people using lofty words for their given projects in life.

The book's premise is that, after the aligned singularity, the robots will not just be better at doing all your work but also be better at doing all your leisure for you. E.g., you'd never study for fun in posthuman utopia, because you could instead just ask the local benevolent god to painlessly, seamlessly put all that wisdom in your head. In that regime, studying with books and problems for the purpose of learning and accomplishment is just masochism. If you're into learning, just ask! And similarly for any psychological state you're thinking of working towards.

So, in that regime, it's effortless to get a hedonically optimal world, without any unendorsed suffering and with all the happiness anyone could want. Those things can just be put into everyone and everything's heads directly—again, by the local benevolent-god authority. The only challenging values to satisfy are those that deal with being practically useful. If you think it's important to be the first to discover a major theorem or be the individual who counterfactually helped someone, living in a posthuman utopia could make things harder in these respects, not easier. The robots can always leave you a preserve of unexplored math or unresolved evil... but this defeats the purpose of those values. It's not practical benevolence if you had to ask for the danger to be left in place; it's not a pioneering scientific discovery if the AI had to carefully avoid spoiling it for you.

Meaning is supposed to be one of these values: not a purely hedonic value, and not a value dealing only in your psychological states. A further value about the objective state of the world and your place in relation to it, wherein you do something practically significant by your lights. If that last bit can be construed as something having to do with your local patch of posthuman culture, then there can be plenty of meaning in the postinstrumental utopia! If that last bit is inextricably about your global, counterfactual practical importance by your lights, then you'll have to live with all your "localistic" values satisfied but meaning mostly absent.

It helps to see this meaning thing if you frame it alongside all the other objectivistic "stretch goal" values you might have. Above and beyond your hedonic values, you might also think it good for you and others to have objectively interesting lives, accomplished and fulfilled lives, and consumingly purposeful lives. Meaning is one of these values, where above and beyond the joyful, rich experiences of posthuman life, you also want to play a significant practical role in the world. We might or might not be able to have lots of objective meaning in the AI utopia, depending on how objectivistic meaningfulness by your lights ends up being.

Considerations that in today's world are rightly dismissed as frivolous may well, once more pressing problems have been resolved, emerge as increasingly important [remaining] lodestars... We could and should then allow ourselves to become sensitized to fainter, subtler, less tangible and less determinate moral and quasi-moral demands, aesthetic impingings, and meaning-related desirables. Such recalibration will, I believe, enable us to discern a lush normative structure in the new realm that we will find ourselves in—revealing a universe iridescent with values that are insensible to us in our current numb and stupefied condition (pp. 318-9).

I believe I and others here probably have a lot to learn from Chris, and arguments of the form "Chris confidently believes false thing X," are not really a crux for me about this.

Would you kindly explain this? Because you think some of his world-models independently throw out great predictions, even if other models of his are dead wrong?

Use your actual morals, not your model of your morals.

I agree that stronger, more nuanced interpretability techniques should tell you more. But, when you see something like, e.g.,

25132 ▁vs, ▁differently, ▁compared, ▁greater, all, ▁per
25134 ▁I, ▁My, I, ▁personally

isn't it pretty obvious what those two autoencoder neurons were each doing?

No, towards an  value.  is the training proxy for that, though.

Epistemic status: Half-baked thought.

Say you wanted to formalize the concepts of "inside and outside views" to some degree. You might say that your inside view is a Bayes net or joint conditional probability distribution—this mathematical object formalizes your prior.

Unlike your inside view, your outside view consists of forms of deferring to outside experts. The Bayes nets that inform their thinking are sealed away, and you can't inspect these. You can ask outside experts to explain their arguments, but there's an interaction cost associated with inspecting the experts' views. Realistically, you never fully internalize an outside expert's Bayes net.

Crucially, this means you can't update their Bayes net after conditioning on a new observation! Model outside experts as observed assertions (claiming whatever). These assertions are potentially correlated with other observations you make. But because you have little of the prior that informs those assertions, you can't update the prior when it's right (or wrong).

To the extent that it's expensive to theorize about outside experts' reasoning, the above model explains why you want to use and strengthen your inside view (instead of just deferring to outside really smart people). It's because your inside view will grow stronger with use, but your outside view won't.

(Great project!) I strongly second the RSS feed idea, if that'd be possible.

I think that many (not all) of your above examples boil down to optimizing for legibility rather than optimizing for goodness. People who hobnob instead of working quietly will get along with their bosses better than their quieter counterparts, yes. But a company of brown nosers will be less productive than a competitor company of quiet hardworking employees! So there's a cooperate/defect-dilemma here.

What that suggests, I think, is that you generally shouldn't immediately defect as hard as possible, with regard to optimizing for appearances. Play the prevailing local balance between optimizing-for-appearances and optimizing-for-outcomes that everyone around does, and try to not incrementally lower the level of org-wide cooperation. Try to eke that level of cooperation up, and set up incentives accordingly.

The ML models that now speak English, and are rapidly growing in world-transformative capability, happen to be called transformers.

This is not a coincidence because nothing is a coincidence.

Two moments of growing in mathematical maturity I remember vividly:

  1. Realizing that equations are claims that are therefore either true or false. Everything asserted with symbols... could just as well be asserted in English. I could start chunking up arbitrarily long and complicated equations between the equals signs, because those equals signs were just the English word "is"!
  2. Learning about the objects that mathematical claims are about. Going from having to look up "Wait, what's a real number again?" to knowing how , and  interrelate told me what we're making claims about. Of course, there are plenty of other mathematical objects -- but getting to know these objects taught me the general pattern.
Load More