Thanks, this was a really useful overview for me.
I find the idea of the AI Objectives Institute really interesting. I've read their website and watched their kick-off call and would be interested how promising people in the AI Safety space think the general approach is, how much we might be able to learn from it, and how much solutions to the AI alignment problem will resemble a competently regulated competitive market between increasingly extremely competent companies.
I'd really appreciate pointers to previous discussions and papers on this topic, too.
Sounds really cool! Regarding the 1st and 3rd person models, this reminded my of self-perception theory (from the man Daryl Bem), which states that humans model themselves in the same way we model others, just by observing (our) behavior.
I feel like in the end our theories of how we model ourselves must involve input and feedback from “internal decision process information“, but this seems very tricky to think about. I‘m soo sure I observe my own thoughts and feelings and use that to understand myself.
Thanks for elaborating!
I guess I would say, any given desire has some range of how strong it can be in different situations, and if you tell me that the very strongest possible air-hunger-related desire is stronger than the very strongest possible social-instinct-related desire, I would say "OK sure, that's plausible." But it doesn't seem particularly relevant to me. The relevant thing to me is how strong the desires are at the particular time that you're making a decision or thinking a thought.
I think that almost captures what I was thinking, only that I expect the average intensity within these ranges to differ, e.g. for some individuals the desire for social interaction is usually very strong or for others rather weak (which I expect you to agree with). And this should explain which desires more often supply the default plan and for which additional "secondary" desires the neocortex has to work for to find an overall better compromise.
For example, you come home and your body feels tired and the desire that is strongest at this moment is the desire for rest, and the plan that suits this desire most is lying in bed and watching TV. But then another desire for feeling productive pushes for more plan suggestions and the neocortex comes up with lying on the coach and reading a book. And then the desire for being social pushes a bit and the revised plan is for reading the book your mum got you as a present.
It would be weird for two desires to have a strict hierarchical relationship.
I agree, I didn't mean to imply a strict hierarchical relationship, and I think you don't need a strict relationship to explain at least some part of the asymmetry. You just would need less honorable desires on average having more power over the default, e.g.
And then we can try to optimize the default by searching for good compromises or something like that, which more often involve more honorable desires, like self-actualization, social relationships, or something like that. (I expect all of this to vary across individuals and probably also cultures).
there's a tradeoff between satisfying competing desires
I agree it depends on the current state, e.g. of course if your satiated you won't care much about food. But, similar to your example, could you make somebody stab their friend by starving them in their need for showing gratitude, or the desire for having fun? I suspect not. But could you do it by starving them in their need of breathing oxygen, or making them super-duper-depesperately thirsty? I (also) suspect more often yes. That seems to imply some more general weighing?
> I guess 'wanting to smoke' should rather be thought of as a strategy to quench the discomfortful craving than a desire?I'm not sure exactly what you mean ...
> I guess 'wanting to smoke' should rather be thought of as a strategy to quench the discomfortful craving than a desire?
I'm not sure exactly what you mean ...
What you replied makes sense to me, thanks.
Very interesting. This reminded me of Keith Stanovich's idea of the master rationality motive, which he defines as a desire to integrate higher-order preferences with first-order preferences. He gives an example of wanting to smoke and not wanting to want to smoke, which sounds like you would consider this as two conflicting preferences, health vs. the short-term reward from smoking. His idea how these conflicts are resolved are to have a "decoupled" simulation in which we can simulate adapting our first-order desires (I guess 'wanting to smoke' should rather be thought of as a strategy to quench the discomfortful craving than a desire?) and finding better solutions.
The master rationality motive seems to aim at something slightly different, though, e.g. given the questionnaire items Stanovich envisions to measure it, for example
Regarding the asymmetry, I have the intuition that the asymmetry of honorability comes through a different weighing of desires, e.g. you'd expect some things to be more important for our survival and reproduction, e.g. food, sex, not freezing, avoiding danger > honesty, caring for non-kin, right?
But don't you share the impression that with increased wealth humans generally care more about the suffering of others? The story I tell myself is that humans have many basic needs (e.g. food, safety, housing) that historically conflicted with 'higher' desires like self-expression, helping others or improving the world. And with increased wealth, humans relatively universally become more caring. Or maybe more cynically, with increased wealth we can and do invest more resources into signalling that we are caring good reasonable people, i.e. the kinds of people others will more likely choose as friends/mates/colleagues.
This makes me optimistic about a future in which humans still shape the world. Would be grateful to have some holes poked into this. Holes that spontaneously come to mind:
Very cool prompt and list. Does anybody have predictions on the level of international conflict about AI topics and the level of "freaking out about AI" in 2040, given the AI improvements that Daniel is sketching out?
Good point relating it to markets. I think I don't understand Acemoglu and Robinson's perspective well enough here, as the relationship between state, society and markets is the biggest questionmark I left the book with. I think A&R don't necessarily only mean individual liberty when talking about power of society, but the general influence of everything that falls in the "civil society" cluster.
I was reminded of the central metaphor of Acemoglu and Robinson's "The Narrow Corridor" as a RAAP candidate: