Tamsin Leake

I'm Tamsin Leake, co-founder and head of research at Orthogonal, doing agent foundations.

Wiki Contributions

Comments

Sorted by

The knightian in IB is related to limits of what hypotheses you can possibly find/write down, not - if i understand so far - about an adversary. The adversary stuff is afaict mostly to make proofs work.

I don't think this makes a difference here? If you say "what's the best not-blacklisted-by-any-knightian-hypothesis action", then it doesn't really matter if you're thinking of your knightian hypotheses as adversaries trying to screw you over by blacklisting actions that are fine, or if you're thinking of your knightian hypotheses as a more abstract worst-case-scenario. In both cases, for any reasonable action, there's probly a knightian hypothesis which blacklists it.

Regardless of whether you think of it as "because adversaries" or just "because we're cautious", knightian uncertainty works the same way. The issue is fundamental to doing maximin over knightian hypotheses.

This is indeed a meaningful distinction! I'd phrase it as:

  • Values about what the entire cosmos should be like
  • Values about what kind of places one wants one's (future) selves to inhabit (eg, in an internet-like upload-utopia, "what servers does one want to hang out on")

"Global" and "local" is not the worst nomenclature. Maybe "global" vs "personal" values? I dunno.

my best idea is to call the former "global preferences" and the latter "local preferences", but that clashes with the pre-existing notion of locality of preferences as the quality of terminally caring more about people/objects closer to you in spacetime

I mean, it's not unrelated! One can view a utility function with both kinds of values as a combination of two utility functions: the part that only cares about the state of the entire cosmos and the part that only cares about what's around them (see also "locally-caring agents").

(One might be tempted to say "consequentialist" vs "experiential", but I don't think that's right — one can still value contact with reality in their personal/local values.)

That is, in fact, a helpful elaboration! When you said

Most people who "work on AI alignment" don't even think that thinking is a thing.

my leading hypotheses for what you could mean were:

  • Using thought, as a tool, has not occured to most such people
  • Most such people have no concept whatsoever of cognition as being a thing, the way people in the year 1000 had no concept whatsoever of javascript being a thing.

Now, instead, my leading hypothesis is that you mean:

  • Most such people are failing to notice that there's an important process, called "thinking", which humans do but LLMs "basically" don't do.

This is a bunch more precise! For one, it mentions AIs at all.

To be more precise: extrapolated over time, for any undesired selection process or other problem of that kind, either the problem is large enough that it gets exarcerbated over time so much that it eats everything — and then that's just extinction, but slower — or it's not large enough to win out and aligned superintelligence(s) + coordinated human action is enough to stamp it out in the long run, which means they won't be an issue for almost all of the future.

It seems like for a problem to be just large enough that coordination doesn't stamp it away, but also it doesn't eat everything, would be a very fragile equilibrium, and I think that's pretty unlikely.

single-use

Considering how loog it took me to get that by this you mean "not dual-use", I expect some others just won't get it.

Some people who are very concerned about suffering might be considering building an unaligned AI that kills everyone just to avoid the risk of an AI takeover by an AI aligned to values which want some people to suffer.

Let this be me being on the record saying: I believe the probability of {alignment to values that strongly diswant suffering for all moral patients} is high enough, and the probability of {alignment to values that want some moral patients to suffer} is low enough, that this action is not worth it.

I think this applies to approximately anyone who would read this post, including heads of major labs in case they happen to read this post and in case they're pursuing the startegy of killing everyone to reduce S-risk.

See also: how acausal trade helps in 1, 2, but I think I think this even without acausal trade.

sigh I wish people realized how useless it is to have money when the singularity happens. Either we die or we get a utopia in which it's pretty unlikely that pre-singularity wealth matters. What you want to maximize is not your wealth but your utility function, and you sure as hell are gonna get more from LDT handshakes with aligned superintelligences in saved worlds, if you don't help OpenAI reduce the amount of saved worlds.

I believe that ChatGPT was not released with the expectation that it would become as popular as it did.

Well, even if that's true, causing such an outcome by accident should still count as evidence of vast irresponsibility imo.

I'm surprised at people who seem to be updating only now about OpenAI being very irresponsible, rather than updating when they created a giant public competitive market for chatbots (which contains plenty of labs that don't care about alignment at all), thereby reducing how long everyone has to solve alignment. I still parse that move as devastating the commons in order to make a quick buck.

I made guesses about my values a while ago, here.

Load More