This is cross-posted from my personal blog
Epistemic status: mostly interesting questions.
Here I want to bring attention to what I think is an extremely impressive case of evolution's ability to 'align' humans in the wild: the development of human sexuality.
Reasons why this is an interesting thing to study from the lens of alignment, and why it is a highly non-trivial accomplishment:
1.) Evolution has been very successful here: almost all humans end up wanting to have sex and typically with opposite-gender partners in a way that would result in children (and hence IGF) in the evolutionary environment.
2.) Sexuality, unlike many other drives such as hunger and thirst, is not something built into the brain from the beginning. Instead there is a sudden 'on switch' around puberty. What happens in the brain during this time? How does evolution exert such fine-grained control of brain development so long (decades) after birth?
3.) It is mostly independent of initial training data before puberty -- i.e. evolution can ignore a decade of data input and representation learning, which it cannot control, during a time period when the brain is undergoing extremely large changes, and still finds a way to instill a new drive highly reliably.
4.) It seems to occur mostly without RL. People start wanting to have sex before they have actually had sex. If sexuality developed by some RL mechanism, it would look like you go around doing your normal things, then at some point you have sex, and realize it is highly rewarding, and you slightly update your behaviours and/or values to get more sex or to want more sex. This is not what happens in humans. Instead, humans often want to start having sex before they have had it, and even before they really know what sex is.
5.) Evolution has solved some variant of the pointers problem to get humans assigning high value to both a previously unknown and mostly non-represented state (i.e. you don't usually have a well-represented sex concept before puberty), as well as also translating this desire to specific other agents in the world -- such as crushes, hot people etc. This is done, presumably, in an entirely genetically mediated way without requiring specific experience.
6.) Sexuality is, usually, a very strong drive which has a large influence over behaviour and long term goals. If we could create an alignment drive as strong in our AGI we would be in a good position.
Some other aspects of the phenomenon that may be interesting to alignment:
1.) Clearly, the alignment in this case is not perfect. Assuming that what evolution 'wants' is child-bearing heterosexual sex, then human sexuality has a large number of deviations from this in practice including homosexuality, asexuality, and various paraphilias.
2.) As the worldwide demographic shift evidences, the link between sex and children has largely been broken off-distribution, but our desire to have sex has largely stayed aligned even significantly off distribution. This may change in the future with fully-realistic VR simulated sex, but has been remarkably resilient to current internet pornography.
3.) Further evidence against the RL experience is that people often still desire sex even if their initial experiences are negative. However, severe abuse etc can often have significant and long lasting effects (but not always) which shows that the intrinsic drives can perhaps be modulated by RL-ish effects.
4.) Specific sexual behaviours can be significantly influenced by culture and hence by environmental training data. This means that strong intrinsic drives can still be modulated by experience somehow, probably by shifting the representation of the concept being pointed at by the drive.
5.) While evolution gives us a strong aligned desire to have sex, this is clearly not coupled with a strong ability to actually obtain sex from scratch and instead we must learn the required skills with a standard RL-ish approach. This, to me, implies that the information content of the drive is relatively low (much lower than all relevant skills) so that it can be genetically encoded so well. This implies that such a drive must be relatively simple. Another argument for this is that much of the changes associated with the development of sexuality around puberty are driven by hormones, which are incredibly low-bandwidth, as well as extremely diffuse signals and cannot implement precise synaptic wiring changes.
The fact that evolution has managed to figure out a way to give humans such a reliable sex drive under these circumstances is rather remarkable and a reasonable test-case of alignment. Understanding how this mechanism works, as well as where it goes wrong (from an evolutionary perspective) seems like it could provide one potential mechanistic route for aligning our own systems. We also likely have a good deal more control over any potential AGI we build, both during design and training and especially after deployment than evolution does with humans. Moreover, this gives an existence proof that developing such a relatively aligned and robust drive is possible even in relatively black-box RL systems like the brain.
This was at least the case before ubiquitous internet porn was easily accessible. Surprisingly, to me, porn has had relatively little apparent effect upon sexual behaviour in general considering how far off distribution it is, which speaks again to the robustness of human's innate sex-drives.
To defend evolution here, it must be noted that these are only small proportions of the population and so evolution's 'alignment' to heterosexual sex has succeeded in >95% of cases. If we had this good odds on AI alignment I would be pretty happy.
I don't think this is a safe assumption. Sex also serves a social bonding function beyond procreation, and there are many theories about the potential advantages of non-heterosexual sex from an evolutionary perspective.
A couple things you might find interesting:
-Men are 33% more likely to be gay for every older brother they have: https://pubmed.ncbi.nlm.nih.gov/11534970/
-Women are more likely to be bisexual than men, which may have been advantageous for raising children: https://pubmed.ncbi.nlm.nih.gov/23563096/
- Homosexuality is extremely common in the animal kingdom (in fact the majority of giraffe sex is homosexual): https://en.wikipedia.org/wiki/List_of_mammals_displaying_homosexual_behavior
I think you need to distinguish between homosexuality/asexuality, which compromise on heterosexual interest and thus surely cannot be adaptive, and bisexuality, which doesn't.
I have seen multiple accounts how homosexuality is selected for. Similarly how ant queens have many non-producing drones whose existence is not a superflous extra, if you have some non-reproducing members in your family they can be an asset rather than a drag. You can be a fulltime uncle which is less possible if you need to be a full time father too. The non-reproductive sex can be a force to have social relations more strongly in place like how that role is more pronouced in bonobos.
Each child given up by being homosexual would have to be compensated for by two children had by one's siblings. This doesn't sound plausible to me.
if you are the 7th sibling then that is only 1/3rd of child per sibling parent. The effect of "33% more likely per sibling" which I didn't previously know about fits in. If you end up having only very few families, risk that all of them do not reproduce by chance gets more significant. Upkeep of a "dubiously useful" extra person gets amortized better the more families there are.
Brothers doing wars tend to be somewhat even and thus continue for a long time. With 4 children, starting 4 lineages with 2 parents leads to more factions than 2 lineages with 3 parents. Or any other effect where increasing the number of families becomes blocked it then makes sense to make the "allowed families" to be more robust. Maybe on hitting starvation shanking the uncle leads to less infigting (sacrificial hierachy instead of lottery) and does not produce orphans.
But I don't actually know. But seems there are things that actually need checking.
I agree that this is an important difference, but I think that "surely cannot be adaptive" ignores the power of group selection effects.
Group selection effects aren't that strong.
Main morals I got out of that was to check and think through the issue rather than just extend the out-of-distribution human social reality ito everything.
Perhaps sometimes they are
Indeed, but insofar as this bonding function enhances IGF then this actually makes it an even more impressive example of alignment to evolution's true goal. I know that there are a bunch of potential evolutionary rationales proposed for homosexuality but I personally haven't studied it in depth nor are any super convincing to me so I'm just assuming the worst-case scenario for evolution here.
i.e. if evolution has precisely titrated the percentage of homosexuality etc so as to maximise IGF taking into account benefits of bonding, additional childcare, group selection etc, then this is actually evidence for evolution achieving a much greater level of alignment than otherwise!
What is evolution's true goal? If it's genetic fitness, then I don't see how this demonstrates alignment. Human sexuality is still just an imperfect proxy, and doesn't point at the base objective at all.
I agree that it's very interesting how robust this is to the environment we grow up in, and I would expect there to be valuable lessons here for how value formation happens (and how we can control this process in machines).
Anecdotally this seems to be true of most people but not all? I've known people who reported developing a sex drive as early as age 5.
The limbic system that controls motivations such as the sex drive is much older than the relatively new neocortex that's responsible for human intelligence.
My guess is that the limbic system evolved by trial and error over millions of years. If this is what happened, maybe we should seek out iterative methods for aligning AI systems such as iteratively testing and developing the motivations of sub-AGI systems.
But as Eliezer Yudkowsky says in his AGI Ruin post, you can't iteratively develop an AGI that's operating at dangerous levels of capability if each mistake kills you. Therefore we might need to extrapolate the motivations of sub-AGI systems to superhuman systems or solve the problem in advance using a theoretical approach.
Sure the limbic system evolved over millions of years, but that doesn't mean we need to evolve it as well -- we could just study it and reimplement it directly without (much) iteration. I am not necessarily saying that this is a good approach to alignment -- I personally would prefer a more theoretically grounded one also. But I think it is an interesting existence proof that imprinting fairly robust drives into agents through a very low bandwidth channel even after a lot of experience and without much RL is possible in practice.
I've had some similar thoughts recently (spurred by a question seen on reddit) about how the instinctive fear of death is implemented.
It's clearly quite robustly present. But we aren't born understanding what death is, there's a wide variety of situations that might threaten death that didn't exist in any ancestral environment, and we definitely don't learn from experience of dying that we don't want to do it again in future.
We see a lot of people die, in the reality, fictions and dreams.
We also see a lot of people having sex or sexual desire in fictions or dreams before experiencing it.
IDK how strong this is a counter argument to how powerful the alignment in us is. Maybe a biological reward system + imitation+ fiction and later dreams is simply what is at play in humans.
I agree that fictional/cultural evidence is important for how people generalise their innate responses to new stimuli. Specifically, I think something similar to Steven Byrnes' proxy matching is going on.
The idea is that we have certain hardwired instincts such as fear of death that are triggered in specific scenarios and we also independently learn a general world-model based on unsupervised learning which learns an independent and potentially un-emotive concept of death. Then we associate our instinctive reactions with this concept such that eventually our instinctive reactions generalise to other stimuli that also evoke this concept such as ones that are not present in the ancestral environment and for which we don't have hardwired reactions to.
The fiction and cultural knowledge etc are super important for shaping our unsupervised concept of death -- since they are the training data! The limits of this generalisation can also be interestingly seen in some cases where there is a lot of disagreement between people -- for example in the classic case of a teleporter which scrambles your atoms at location X only to recreate an exact copy of you at location Y, people have very different instinctive reactions to whether this is 'death' or not which ultimately depends on their world-model concept and not on any hardwired reaction since there are no teleporters in the ancestral environment or now.
I suspect a process like this is also what generates 'human values' and will be writing something up on this shortly.
I do want to note that it can also hijack instrumental convergence in order to achieve alignment.
While this particular alignment case for humans does seem reasonably reliable, it all depends on humans not being proficient at self-improvement/modification yet. For an AGI with self-improvement capability this goes out of the window fast
Why do we expect quadrillion parameter models to be proficient at self improvement/self modification?
I don't think the kind of self improvement Yudkowsky imagined would be a significant factor for AGIs trained in the deep learning paradigm.
Yes to some extent. Humans are definitely not completely robust to RSI / at a reflectively stable equilibrium. I do suspect though that sexual desire is at least partially reflectively stable. If people could arbitrarily rewrite their psychology I doubt that most would completely remove their sex drive or transmute it into some completely alien type of desire (some definitely would and I also think there'd be a fair bit of experimentation around the margin as well as removing/tweaking some things due to social desirability biases).
The main point though is that this provides an existence proof that this degree of robust-ish alignment is possible by evolution, which has a lot less advantages we do. We can probably do at least as well for our first proto-AGIs we build before RSI sets in. The key will then be to either carefully manage or prevent RSI or to build more robust drives that are much more reflectively stable than the human sex drive.
This doesn't mean that it isn't a byproduct of RL. Something needs to be hardcoded, but a simple reward circuit might lead to a highly complex set of desires and cognitive machinery. I think the things you are pointing to in this post sound extremely related to what Shard Theory is trying to tackle.
Indeed, this is exactly the kind of thing I am gesturing at. Certainly, all our repertoires of sexual behaviour are significantly shaped by RL. My point is that evolution has somehow in this case mostly solved some pointers-like problem to get the reward model to suddenly include rewards for sexual behaviour, can do so robustly, and can do so a long time after birth after a decade or so of unsupervised learning and RL has already occurred. Moreover, this reward model leads to people robustly pursuing this goal even fairly off-distribution from the ancestral environment.
I'd count VR sex as a success case here, as long as we're dropping the requirement of childbirth.
I don't think we'd be in a good position even if we instilled an alignment drive this strong in AGI