Shoshannah Tekofsky

Sequences

Research Journals

Wiki Contributions

Comments

Thank you for sharing! I actually have a similar response myself but assumed it was not general. I'm going to edit the image out.

EDIT: Both are points are moot using Stuart Armstrong's narrower definition of the Orthogonality thesis that he argues in General purpose intelligence: arguing the Orthogonality thesis:

High-intelligence agents can exist having more or less any final goals (as long as these goals are of feasible complexity, and do not refer intrinsically to the agent’s intelligence).

Old post:

I was just working through my own thoughts on the Orthogonality thesis and did a search on LW on existing material and found this essay. I had pretty much the same thoughts on intelligence limiting goal complexity, so yay!

Additional thought I had: Learning/intelligence-boosting motivations/goals are positively correlated with intelligence. Thus, given any amount of time, an AI with intelligence-boosting motivations will become smarter than those do not have that motivation.

It is true that instrumental convergence should lead any sufficiently smart AI to also pursue intelligence-boosting (cognitive enhancement) but:

  • At low levels of intelligence, AI might fail at instrumental convergence strategies.
  • At high levels of intelligence, AI that is not intelligence-boosting will spend some non-zero amount of resources on its actual other goals and thus be less intelligent than an intelligence-boosting AI (assuming parallel universes, and thus no direct competition).

I'm not sure how to integrate this insight in to the orthogonality thesis. It implies that:

"At higher intelligence levels, intelligence-boosting motivations are more likely than other motivations" thus creating a probability distribution across the intelligence-goal space that I'm not sure how to represent. Thoughts?

Hmm, that wouldn't explain the different qualia of the rewards, but maybe it doesn't have to. I see your point that they can mathematically still be encoded in to one reward signal that we optimize through weighted factors.

I guess my deeper question would be: do the different qualias of different reward signals achieve anything in our behavior that can't be encoded through summing the weighted factors of different reward systems in to one reward signal that is optimized?

Another framing here would be homeostasis - if you accept humans aren't happiness optimizers, then what are we instead? Are the different reward signals more like different 'thermostats' where we trade off the optimal value of thermostat against each other toward some set point?

Intuitively I think the homeostasis model is true, and would explain our lack of optimizing. But I'm not well versed in this yet and worry that I might be missing how the two are just the same somehow.

Clawbacks refer to grants that have already been distributed but would need to be returned. You seem to be thinking of grants that haven't been distributed yet. I hope both get resolved but they would require different solutions. The post above is only about clawbacks though.

As a grantee, I'd be very interested in hearing what informs your estimate, if you feel comfortable sharing.

Sure. For instance, hugging/touch, good food, or finishing a task all deliver a different type of reward signal. You can be saturated on one but not the others and then you'll seek out the other reward signals. Furthermore, I think these rewards are biochemically implemented through different systems (oxytocin, something-sugar-related-unsure-what, and dopamine). What would be the analogue of this in AI?

ah, like that. Thank you for explaining. I wouldn't consider that a reversal cause you're then still converting intuitions into testable hypotheses. But the emphasis on discussion versus experimentation is then reversed indeed.

What would the sensible reverse of number 5? I can generate those them for 1-4 and 6, but I am unsure what the benefit could be of confusing intuitions with testable hypotheses?

I really appreciate that thought! I think there were a few things going on:

  • Definitons and Degrees: I think in common speech and intuitions it is the case that failing to pick the optimal option doesn't mean something is not an optimizer. I think this goes back to the definition confusion, where 'optimizer' in CS or math literally picks the best option to maximize X no matter the other concerns. While in daily life, if one says they optimize on X then trading off against lower concerns at some value greater than zero is still considered optimizing. E.g. someone might optimize their life for getting the highest grades in school by spending every waking moment studying or doing self-care but they also spend one evening a week with a romantic partner. I think in regular parlance and intuitions, this person is said to be an optimizer cause the concept is weighed in degrees (you are optimizing more on X) instead of absolutes (you are disregarding everything else except X).
  • unrepresented internal experience: I do actually experience something related to conscious IGF optimization drive. All the responses and texts I've read so far are from people that say that they don't, which made me assume the missing piece was people's awareness of people like myself. I'm not a perfect optimizer (see above definitional considerations) but there are a lot of experiences and motivations that seemed to not be covered in the original essay or comments. E.g. I experience a strong sense of identity shift where, since I have children, I experience myself as a sort of intergenerational organism. My survival and flourishing related needs internally feel secondary to that of the aggregate of the blood line I'm part of. This shift happened to me during my first pregnancy and is quite a disorienting experience. It seems to point so strongly at IGF optimization that claiming we don't do that seemed patently wrong. From examples I can now see that it's still a matter of degrees and I still wouldn't take every possible action to maximize the number of copies of my genes in the next generation.
  • where we are now versus where we might end up: people did agree we might end up being IGF maximizers eventually. I didn't see this point made in the original article and I thought the concern was that training can never work to create inner alignment. Apparently that wasn't the point haha.

Does that make sense? Curious to hear your thoughts.

Load More