Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Recently I've gotten a bunch of pushback when I claim that humans are not maximizers of inclusive genetic fitness (IGF).

I think that part of what's going on here is a conflation of a few claims.

One claim that is hopefully uncontroversial (but that I'll expand upon below anyway) is:

  • Humans are not literally optimizing for IGF, and regularly trade other values off against IGF.

Separately, we have a stronger and more controversial claim:

  • If an AI's objectives included goodness in the same way that our values include IGF, then the future would not be particularly good.

I think there's more room for argument here, and will provide some arguments.

A semi-related third claim that seems to come up when I have discussed this in person is:

  • Niceness is not particularly canonical; AIs will not by default give humanity any significant fraction of the universe in the spirit of cooperation.

I endorse that point as well. It takes us somewhat further afield, and I don't plan to argue it here, but I might argue it later.


On the subject of whether humans are literally IGF optimizers, I observe the following:

We profess to enjoy many other things, such as art and fine foods.

Suppose someone came to you and said: "I see that you've got a whole complex sensorium centered around visual stimuli. That sure is an inefficient way to optimize for fitness! Please sit still while I remove your enjoyment of beautiful scenery and moving art pieces, and replace it with a module that does all the same work your enjoyment was originally intended to do (such as causing you to settle down in safe locations with abundant food), but using mechanical reasoning that can see farther than your evolved heuristics." Would you sit still? I sure wouldn't.

And if you're like "maybe mates would be less likely to sleep with me if I didn't enjoy fine art", suppose that we tune your desirability-to-mates upwards exactly as much as needed to cancel out this second-order effect. Would you give up your enjoyment of visual stimuli then, like an actual IGF optimizer would?

And when you search in yourself for protests, are you actually weighing the proposal based on how many more offspring and kin's-offspring you'll have in the next generation? Or do you have some other sort of attachment to your enjoyment of visual stimuli, some unease about giving it up, that you're trying to defend?

Now, there's a reasonable counterargument to this point, which is that there's no psychologically-small tweak to human psychology that dramatically increases that human's IGF. (We'd expect evolution to have gathered that low-hanging fruit.) But there's still a very basic and naive sense in which living as a human is not what it feels like to live as a genetic fitness optimizer.

Like: it's pretty likely that you care about having kids! And that you care about your kids very much! But, do you really fundamentally care that your kids have genomes? If they were going to transition to silicon, would you protest that that destroys almost all the value at stake?

Or, an even sharper proposal: how would you like to be killed right now, and in exchange you'll be replaced by an entity that uses the same atoms to optimize as hard as those atoms can optimize, for the inclusive genetic fitness of your particular genes. Does this sound like practically the best offer that anyone could ever make you? Or does it sound abhorrent?

For the record, I personally would be leaping all over the opportunity to be killed and replaced by something that uses my atoms to optimize my CEV as best as those atoms can be arranged to do so, not least because I'd expect to be reconstituted before too long. But there's not a lot of things you can put in the "what my atoms are repurposed for" slot such that I'm chomping at the bit, and IGF sure isn't one of them.

(More discussion of this topic: The Simple Math of Evolution)


On the subject of how well IGF is reflected in humanity's values:

It is hopefully uncontroversial that humans are not maximizing IGF. But, like, we care about children! And many people care a lot about having children! That's pretty close, right?

And, like, it seems OK if our AIs care about goodness and friendship and art and fun and all that good stuff alongside some other alien goals, right?

Well, it's tricky. Optima often occur at extremes, and concepts tend to differ pretty widely at the extremes, etc. When the AI gets out of the training regime and starts really optimizing, then any mismatch between its ends and our values are likely to get exaggerated.

Like how you probably wouldn't stop loving and caring about your children if they were to eschew their genomes. The love and care are separate; the thing you're optimizing for and IGF are liable to drift apart as we get further and further from the ancestral savanna.

And you might say: well, natural selection isn't really an optimizer; it can't really be seen as trying to make us optimize any one thing in particular; who's really to say whether it would have "wanted" us to have lots of descendants, vs "wanting" us to have lots and lots of copies of our genome? The question is ultimately nonsense; evolution is not really the sort of entity that can want.

And I'd agree! But this is not exactly making the situation any better!

Like, if evolution was over there shouting "hey I really wanted you to stick to the genes", then we wouldn't particularly care; and also it's not coherent enough to be interpreted as shouting anything at all.

And by default, an AI is likely to look at us the same way! "There are interpretations of the humans under which they wouldn't like this", they say, slipping on the goodness-condoms they've invented so that they can squeeze all the possible AI-utility out of the stars without any risk of real fun, "but they're not really coherent enough to be seen as having clear goals (not that we'd particularly care if they did)".

That’s the sort of conversation… that they wouldn't have because they'd be busy optimizing the universe.

(And all this is to say nothing about how humans' values are much more complex and fragile than IGF, and thus much trickier to transmit. See also things Eliezer wrote about the fragility and complexity of value.)


My understanding of the common rejoinder to the above point is:

OK, sure, if you took the sort of ends that an AI is likely to get by being trained on human values, and transported those into an unphysically large brute-force optimization-machine that was unopposed in an empty universe, then it might write a future that doesn't hold much value from our perspective. But that's not very much like the situation we find ourselves in!

For one thing, the AI's mind has to be small, which constrains it to factor its objectives through subgoals, which may well be much like ours. For another thing, it's surrounded by other intelligent creatures that behave very differently towards it depending on whether they can understand it and trust it. The combination of these two pressures is very similar to the pressures that got stuff like "niceness" and "fairness" and "honesty" and "cooperativeness" into us, and so we might be able to get those same things (at least) into the AI.

Indeed, they seem kinda spotlit, such that even if we can't get the finer details of our values into the AI, we can plausibly get those bits. Especially if we're trying to do something like this explicitly.

And if we can get the niceness/fairness/honesty/cooperativeness cluster into the AI, then we're basically home free! Sure, it might be nice if it was also into the great project of making the future Fun, but it's OK for our kids to have different interests than we have, as long as everybody's being kind to each other.

And... well, my stance on that is that it's wishful thinking that misunderstands where we get our niceness/fairness/honesty/cooperativeness from. But arguing that would be a digression from my point today, so I leave it to some other time.

My point today is that the observation “humans care about their kids” is not in tension with the observation “we aren't IGF maximizers”, and doesn't seem to me to undermine the claims that I use this fact to support.

And furthermore, when debating this thing in the future, I'd bid for a bit more separation of claims. The claim that we aren't literally optimizing IGF is hopefully uncontroversial; the stronger claim that an AI relating to fun the way we relate to IGF would be an omnicatastrophe is less obvious (but still seems clear to me); the claim that evolution at least got the spirit of cooperation into us, and all we need to do now is get the spirit of cooperation into the AI, is a different topic altogether.

New Comment
46 comments, sorted by Click to highlight new comments since: Today at 12:37 PM

I disagree humans don't optimize IGF:

  1. We seem to have different observational data. I do know some people who make all their major life decisions based on quality and quantity of offspring. Most of them are female but this might be a bias in my sample. Specifically, quality trades off against quantity: waiting to find a fitter partner and thus losing part of your reproductive window is a common trade off. Similarly, making sure your children have much better lives than you by making sure your own material circumstances (or health!) are better is another. To be fair, they seem to be a small minority currently but I think that is due to point 3 and would be rectified in more a constant environment.
  2. A lot of our drives do indirectly help IGF. Your aestethic sense may be somewhat wired to your ability to recognize and enjoy the visual appearance of healthy mates. Similarly for healthy environments to grow up in, etc. Sure, it gets hijacked for 20 other things, but how big is the loss in IGF to keep it around? I would argue it's generally not an issue for the subsection of humans that are directly driven to have big families.
  3. Many of us have badly optimized drives cause our environments have changed too fast. It will take a few generations of constant environment (not gonna happen at our current level of technological progress) to catch up. The obvious example is birth control: sex drive used to actually be a great proxy signal to optimize on offspring. Now it's no longer but we still love sex. But in a few generations the only people alive are the descendants of people who wanted kids no matter their sex drive. 'evolution' will now select directly on desire for kids but it takes awhile to catch up.

I'm not saying evolution optimized us very well, but I don't think it's accurate to say that we are not IGF maximizers. The environment has just changed much too quickly and selection pressure has been low the last few generations, but things like birth control actually introduce a new selection pressure on drive to reproduce. Humans are mediocre IGF maximizers in an environment that is changing unusually fast.

  1. We seem to have different observational data. I do know some people who make all their major life decisions based on quality and quantity of offspring. Most of them are female but this might be a bias in my sample. Specifically, quality trades off against quantity: waiting to find a fitter partner and thus losing part of your reproductive window is a common trade off. Similarly, making sure your children have much better lives than you by making sure your own material circumstances (or health!) are better is another. To be fair, they seem to be a small minority currently but I think that is due to point 3 and would be rectified in more a constant environment.

In the long term, we would expect humans to end up directly optimizing IGF (assuming no revolutions like AI doom or similar) due to evolution. The way this proceeds in practice is that people vary on the extent to which they optimize IGF vs other things, and those who optimize IGF pass on their genes, leading to higher optimization of IGF. So yes eventually these sorts of people will win, but as you admit yourself they are a small minority, so humans as they currently exist are mostly not IGF maximizers.

Also, regarding quality vs quantity, it's my impression that society massively overinvests in quality relative to what would be implied by IGF. Society is incredibly safe compared to the past, so you don't need much effort to make them survive. Insofar as there is an IGF value in quality, it's probably in somehow convincing your children to also optimize for IGF, rather than do other things.

They are a small minority currently cause the environment changes so quickly right now. Things have been changing insanely fast in the last century or so but before the industrial revolution and especially before the agriculture revolution, humans were much better optimized for IGF, I think. Evolution is still 'training' us and these last 100 years have been a huge change compared to the generation length of humans. Nate is stating that humans genetically are not IGF maximizers, and that is false. We are, we are just currently heavily being 'retrained'.

Re: quantity/quality. I think people nominally say they are optimizing for quality when really they just don't have enough drive to have more kids at the current cost. There is much less cultural punishment on saying you are going for quality over quantity instead of saying you just don't want more kids cause it's a huge investment. Additionally, children who grow up in bad home environments seem less likely have kids of their own, and parents having mental breakdowns is one of the common 'bad' environments. So quality can definitely optimize for quantity in the long run.

Ps: i wish I had more time for more nuanced answers. Considering writing this up in more detail. My answers are rather rushed. My apologies

  • Given the ability to medically remove, store, and artificially inseminate eggs, current technologies make it possible for a woman to produce many more children than the historical limit of ~50 (i.e. one every 9 months for a woman's entire reproductive years), and closer to the limit (note that each woman produces 100,000s of eggs). 
  • I don't have a worked out plan, but I could see a woman removing most of her eggs, somehow causing many other women to use her eggs to have children (whether it's by finding infertile women, or paying people, or showing that the eggs would be healthier than others'), and having many more children than historically possible.
  • I suspect many women could have 50-100 children this way, and that peak women could have 10,000s of children this way, closer to the male model of reproduction.
  • I'd be interested to know the maximum number of children any woman has had in history, and also since the invention of this sort of medical technology.
  • I imagine that such a world would have a market (and class system) based around being able to get your eggs born. There are services where a different woman will have your children, but I think the maximizer world would look more like poor women primarily being paid to have children (and being pregnant >50% of their lives) and rich women primarily paying to have children (and having 1000s of children born).
  • I think the notion that people are adaptation-executors, who like lots of things a little bit in context-relevant situations, predicts our world more than the model of fitness-maximizers, who would jump on this medical technology and aim to have 100,000s of children soon after it was built.
  • I also suspect that population would skyrocket relative to the current numbers (e.g. be 10-1000x the current size). Perhaps efforts to colonize Mars would have been sustained during the 20th century, as this planet would have been more obviously overflowing, though probably we would just be using way more of the surface of the Earth for living on.

I think the notion that people are adaptation-executors, who like lots of things a little bit in context-relevant situations, predicts our world more than the model of fitness-maximizers, who would jump on this medical technology and aim to have 100,000s of children soon after it was built.

I think this skips the actual social trade-offs of the strategy you outline above:

  1. The likely back lash in society against any woman who tries this is very high. Any given rich woman would have to find surrogate women who are willing to accept the money and avoid being the target of social condemnation or punitive measures of the law. It's a high risk / high reward strategy that also needs to keep paying off long after she is dead, as her children might be shunned or lose massive social capital as well. If you consider people's response to eugenics or gene editing of human babies, then you can imagine the backlash if a woman actually paid surrogates at scale. It's not clear to me that the strategy you outline above is actually all that viable for the vast majority of rich women.
  2. I'd argue some of are IGF maximizers for the hand that we have been dealt, which includes our emotional response, intelligence, and other traits. Many of us have things like fear-responses to heavily hard-wired that no matter what we recognize as the optimal response, we can't actually physically execute it.

I realize item 2 points to a difference in how we might define an optimizer, but it's worth disambiguating this. I suspect claiming no humans are IGF maximizers or some humans are IGF maximizers might come down to the definition of maximizer that one uses. And thus might explain the pushback that Nate runs in to for a claim he finds self-evident.

Similarly, making sure your children have much better lives than you by making sure your own material circumstances (or health!) are better is another.

Is this the best strategy for maximizing IGF? Do happier and wealthier kids have more offspring? Given that wealthier countries tend to have lower birth rates, I wonder if the IGF-maximizing strategy would instead often look like trying to have lots of poor children with few options?

(I'll note as an aside that even if this is false, it should definitely be a thing many parents seriously consider doing and are strongly tempted by, if the parents are really maximizing IGF rather than maximizing proxies like "their kids' happiness". It would be very weird, for example, if an IGF maximizer reacted to this strategy with revulsion.)

I'd be similarly curious if there are cases where making your kids less happy, less intelligent, less psychologically stable, etc. increased their expected offspring. This would test to what extent 'I want lots and lots and lots of kids' parents are maximizing IGF per se, versus maximizing some combination of 'have lots of descendants', 'make my descendants happy (even if this means having fewer of them)', etc.

Yes, good point. I was looking at those statistics for a bit. Poorer parents do indeed tend to maximize their number of offspring no matter the cost while richer parents do not. It might be that parents overestimate the IGF payoffs of quality, but then that just makes them bad/incorrect optimizers. It wouldn't make them less of an optimizer.

I think there also some other subtle nuances going on, like for instance, I'd consider myself fairly close to an IGF optimizer but I don't care about all genes/traits equally. There is a multigenerational "strain" I identify strongly with. A bloodline, you could say. But my mediocre eye sight isn't part of that, and I'd be surprised to hear this mechanic working any differently for others. Also, I'm not sure if all of the results of quality maximizers are obvious. E.g., Dutch society have a handful of extremely rich people that became rich 400 years ago during the golden age. Their bloodlines are keeping money made back then and the wealth increases every generation. Such a small segment is impossible to represent in controlled experiments, but maybe richer parents do start moving toward trying to "buy these lottery tickets" of reproduction, hoping to move their 1-2 kids in to the stratosphere. It's not like they need 10 kids to be sure they will be represented in the next generation cause their kids will survive regardless.

Either way, I also realized I'm probably using a slightly different definition of optimizer than Nate is, so that probably explains some of the disagreement as well. I'd consider knowing X is the optimal action, but not being able to execute X cause you feel too much fear to still be in line with an optimizer's behavior bcause you are optimizing over the options you have and a fear response limits your options. I suspect my perspective is not that uncommon and might explain some of the pushback Nate is referring to for the claim that is obvious from his definition.

Here is my best attempt at working out my thoughts on this, but I noticed I reached some confusion at various points. I figured I'd post it anyway in case it either actually makes sense or people have thoughts they feel like sharing that might help my confusion.

Edit: The article is now deprecated. Thanks for everyone commenting here for helping me understand the different definitions of optimizer. I do suspect my misunderstanding of Nate's point might mirror why there is relatively common pushback against his claim? But maybe I'm typical minding.

The reason why we're talking about humans and IGF is because there's an analogy to AGI. If we select on the AI to be corrigible (or whatever nice property) in subhuman domains, will it generalize out-of-distribution to be corrigible when superhuman and performing coherent optimization?

Humans are not generalizing out of distribution. The average woman who wants to raise high quality children does not have the goal of maximizing IGF; she does try to instill the value of maximizing IGF into them, nor use the far more effective strategies of donating eggs, trying to get around egg donation limits, or getting her male relatives to donate sperm.

If the environment stabilizes, additional selection pressure might cause these people to become a majority. But we might not have additional selection pressure in the AGI case.

getting around egg donation limits is a defect strategy; my argument is, this seems like you're really asking why we're not generalizing into defecting in the societal IGF game. we don't want to maximize first derivative of IGF if we want to plan millennia ahead for deep time reproduction rate - instead, we need to maximize group survival. that's what is generally true in all religions, not just the high-defect "have lots of kids, so many you're only barely qualifying K selected" religious bubbles of heavy reproduction.

to generalize this to agi, we need every agent to have a map of other agents' empowerment, and seek to ensure all agents remain empowered, at the expense of some empowerment limits for agents that want to take unfair proportions of the universe's empowerment.

I really think inclusive [memetic+genetic] fitness against a universal information empowerment objective (one that is intractable to evaluate) has something mathematical to say here, and I'm frustrated that I don't seem to know how to put the math. it seems obvious and simple, such that anyone who had studied the field would know what it is I'm looking for with my informal speech; perhaps I it's not should go study the fields I'm excited about more.

but it really seems like unfriendly foom is "ai decides we suck, can be beaten in the societal inclusive phenotype fitness game, and ~breeds a lot, maybe after killing us first, without any care for the loss of our [genetic+memetic] patterns' fitness".

and given that we think our genetic and memetic patterns are what are keeping us alive to pass on to the next generation, I ask again - why do we not look like semi-time-invariant IGMF fitness maximizers? we are the evolutionary process happening and we always have been. shouldn't we even be so sure we're IGMF maximizers that we should ask why IGMF maximizers look like us right now? like, this is the objective for evolution, shouldn't we be doing interpretability on its output rather than fretting that we don't obey it?

I think that analogies to evolution tell us very little about how an AGI’s value formation process would work. Biological evolution is a very different sort of optimizer than SGD, and there are evolution-specific details that entirely explain our misalignment wrt inclusive genetic fitness. See point 5 in the post linked below for details, but tl;dr: 

Evolution can only optimize over our learning process and reward circuitry, not directly over our values or cognition. Moreover, robust alignment to IGF requires that you even have a concept of IGF in the first place. Ancestral humans never developed such a concept, so it was never useful for evolution to select for reward circuitry that would cause humans to form values around the IGF concept. 

It would be an enormous coincidence if the reward circuitry that lead us to form values around those IGF-promoting concepts that are learnable in the ancestral environment were to also lead us to form values around IGF itself once it became learnable in the modern environment, despite the reward circuitry not having been optimized for that purpose at all. That would be like successfully directing a plane to land at a particular airport while only being able to influence the geometry of the plane's fuselage at takeoff, without even knowing where to find the airport in question.

SGD is different in that it directly optimizes over values / cognition, and that AIs will presumably have a conception of human values during training.
 

Additionally, on most dimensions of comparison, humans seem like the more relevant analogy, even ignoring the fact that we will literally train our AIs to imitate humans. 

I agree that the processes are different, but I think the analogy still holds well.

SGD doesn't get to optimize directly over a conveniently factored out values module. It's as blind to the details of how it gets results as evolution, since it can only care about which local twiddles get locally better results.

So it seems to me that SGD should basically build up a cognitive mess that doesn't get refactored in nice ways when you do further training. Which looks a lot like evolution in the analogy.

Maybe there's some evidence for this in the difficulty of retraining a language model to generate text in the middle, even though this is apparently easy to do if you train the model to do infilling from the get-go? https://arxiv.org/abs/2207.14255

(I also disagree about ancestral humans not having a concept or sense tracking "multitude of my descendants" / "power of my family" / etc. And indeed some of these are in my values.)

The key difference between evolution and SGD isn't about locality or efficiency (though I disagree with your characterization of SGD / deep learning as inefficient or inelegant). The key difference is that human evolution involved a two-level optimization process, with evolution optimizing over the learning process + initial reward system of the brain, and the brain learning (optimizing) within lifetime.

Values form within lifetimes, and evolution does not operate on that scale. Thus, the mechanisms available to evolution for it to influence learned values are limited and roundabout.

Ancestral humans had concepts somewhat related to IGF, but they didn't have IGF itself. That matters a lot for determining whether the sorts of learning process / reward circuit tweaks that evolution applied in the ancestral environment will lead to modern humans forming IGF values that generalize to situations such as maximally donating to sperm banks. Not-coincidentally, humans are more likely to value these ancestral environment accessible notions than IGF.

There's also the further difficulty of aligning any RL-esque learning process to valuing IGF specifically: the long time horizons (relative to within lifetime learning) over which differences in IGF become apparent means any possible reward for increasing IGF will be very sparse and rarely influence an organism's cognition. Additionally, learning to act coherently over longer time horizons is just generally difficult.

What you're saying is that evolution optimized over changes to a kind of blueprint-for-a-human (DNA) that does not directly "do" anything like cognition with concepts and values, but which grows, through cell division and later through cognitive learning, into a human that does do things like cognition with concepts and values. This grown human then goes on to exhibit behavior and have an impact on the world. So there is an approximate two-stage thing happening:

(1) blueprint -> (2) agent -> (3) behavior

In contrast, when we optimize over policies in ML, we optimize directly at the level of a kind of cognition-machine (e.g. some neural net architecture) that directly acts in the world, and could, quite plausibly, have concepts and values.

So evolution optimizes at (1), whereas in today's ML we optimize at (2) and there is nothing really corresponding to (1) in most of today's ML.

Did I understand you correctly?

That’s the key mechanistic difference between evolution and SGD. There’s an additional layer here that comes from how that mechanistic difference interacts with the circumstances of the ancestral environment (I.e., that ancestral humans never had an IGF abstraction), which means evolutionary optimization over the human mind blueprint in the ancestral environment would have never produced a blueprint that lead to value formation around IGF in the modern environment. This fully explains modern humanity’s misalignment wrt IGF, which would have happened even in worlds where inner alignment is never a problem for ML systems. Thus, evolutionary analogies tell us ~nothing about whether we should be worried about inner alignment.

(This is even ignoring the fact that IGF seems like a very hard concept to align minds to at all, due to the sparseness of IGF reward signals.)

This completely misses the point, for a simple reason: humans (uniquely among Earth lifeforms) are subject not only to genetic/epigenetic evolution but also to memetic evolution.  In fact, these two evolutionary levels are tightly coupled, as evidenced by the very good match between phylogenetic trees of human populations and of languages.

It makes no sense to talk about human IGF, any definition excluding memetic component is meaningless.   Now, if you look at IGMF optimization a lot of human behavior starts making a lot more sense.  (It is also worth pointing out that memetic evolution is much faster, so it is probably the driving factor, way more important than genetic.  It is also structurally different, more resembling evolution in bacterial colonies - with organisms swapping genes and furiously hybridising - than Darwinian competition based on IGF.)

But, do you really fundamentally care that your kids have genomes?

Seems not relevant? I think we're running into an under-definition of IGF (and the fact that it doesn't actually have a utility function, even over local mutations on a fixed genotype). Does IGF have to involve genomes, or just information patterns as written in nucleotides or in binary? The "outer objective" of IGF suffers a classic identifiability issue common to many "outer objectives", where the ancestral "training signal" history is fully compatible with "IGF just for genomes" and also "IGF for all relevant information patterns made of components of your current pattern." 

(After reading more, you later seem to acknowledge this point -- that evolution wasn't "shouting" anything about genomes in particular. But then why raise this point earlier?)

Now, there's a reasonable counterargument to this point, which is that there's no psychologically-small tweak to human psychology that dramatically increases that human's IGF. (We'd expect evolution to have gathered that low-hanging fruit.)

I don't know if I disagree, it depends what you mean here. If "psychologically small" is "small" in a metric of direct tweaks to high-level cognitive properties (like propensity to cheat given abstract knowledge of resources X and mating opportunities Y), then I think that isn't true. By information inaccessibility, I think that evolution can't optimize directly over high-level cognitive properties. 

Optima often occur at extremes, and concepts tend to differ pretty widely at the extremes, etc. When the AI gets out of the training regime and starts really optimizing, then any mismatch between its ends and our values are likely to get exaggerated.

This kind of argument seems sketchy to me. Doesn't it prove too much? Suppose there's a copy of me which also values coffee to the tune of $40/month and reflectively endorses that value at that strength. Are my copy and I now pairwise misaligned in any future where one of us "gets out of the training regime and starts really optimizing"? (ETA: that is, significantly more pairwise misaligned than I would be with an exact copy of myself in such a situation. For more selfish people, I imagine this prompt would produce misalignment due to some desires like coffee/sex being first-person.)

And all this is to say nothing about how humans' values are much more complex and fragile than IGF, and thus much trickier to transmit

Complexity is probably relative to the learning process and inductive biases in question. While any given set of values will be difficult to transmit in full (which is perhaps your point), the fact that humans did end up with their values shows evidence that human values are the kind of thing which can be transmitted/formed easily in at least one architecture. 

Regarding identifiably, there’s a maybe slightly useful question you could ask which is something like “if evolution was designed by an actual human computer scientist, what do you think they wanted to achieve?”

…. But I feel like that’s ultimately just begging the questions that “IGF maximisation” is supposed to help answer.

My impression is that sperm banks pay donors, rather than the reverse. This is an extremely blatant and near-universal non-IGF-maxing situation, it's not subtle. (Maybe they're paying for specifically high-yield donors, but still, what fraction of men have even taken the trouble to get their sperm tested?)

I did a bit of research on this after it was mentioned on Astral Codex Ten. Acceptance rates at sperm banks are 1-10%, and of those a substantial fraction will go on to be passed over by customers. Also the requirements to abstain from orgasm between donations and to donate regularly are not especially compatible with finding a reproductive partner.

I still agree that it's evidence against IGF maximization, but it's not quite the slam dunk that it initially looks like.

I think altruists who like their genetically influenced values and want the future to contain more of those values and don't have short AI timelines could consider sperm donation as part of their altruistic portfolio.

These are good points, though it still feels like a slam dunk.

I think all the leading theories of human value struggle to explain all the observations.

The "godshatter" theory of 100s of tiny genetic values is really flexible and can predict anything, but I don't have a compelling explanation of why someone would satisfy their genetic drive to help strangers with sperm rather than bed nets.

The "shard theory" of learned values primed by a handful of basic signals is a bit better, I can tell a story about someone having a masturbation shard and a helping others shard and a making babies shard and a making money shard. But it's also really bad about being able to predict anything.

I also have some basic confusions about what a good theory of human value looks like given that we are very small and have been optimized away from self-knowledge.

Yeah, I don't have a good theory. Another piece of the puzzle: https://en.wikipedia.org/wiki/Cognitive_miser ; organisms will be evolved to be guided as heavily as possible, with a minimum of online compute, to behaviors that increase IGF; you avoid thinking about things (and un-sharding your values) as much as you can get away with.

you can't optimize inclusive memetic fitness if you send your genes to a sperm bank. parents don't want to trust that their kids will get a good upbringing; they, sometimes wrongly, believe their intellectual base is an ideal one to raise a kid with. well, with occasional exceptions, where a person chooses to optimize genetic+memetic fitness of nearby beings rather than themselves.

however, it is not possible to optimize something besides limit +time t-duration survival and survive to t, because the only metric evolution has is whether your phenotype survives.

I generally see this "we don't maximize IGF, therefore we're bad at evolution, and ais will be too" argument as fundamentally giving up the game before you even start playing. okay, so we don't maximize long term fitness - then how can we? have you not given up some interest in art in exchange for making AI more able to preserve humanity's values as a whole? I know I have, on occasion.

I don't get what you're saying. IGF is referring to this: https://en.wikipedia.org/wiki/Inclusive_fitness

So it's true that if you donate sperm you can't then subsequently additionally optimize your IGF through the channel of raising those kids well, but just by siring them, you've boosted your IGF.

But you've reduced population IGF. you share almost all genes with other humans; our variation between each other is relatively tiny. if you're trying to maximize IGF of your difference with other humans, perhaps you're right; but that's an evolutionary defect strategy. if instead you want to maximize your cooperation group's survival to deep time, you want to broaden your cooperation group and ensure redundancy of mutual aid.

from your link:

Hamilton showed mathematically that, because other members of a population may share one's genes, a gene can also increase its evolutionary success by indirectly promoting the reproduction and survival of other individuals who also carry that gene. This is variously called "kin theory", "kin selection theory" or "inclusive fitness theory". The most obvious category of such individuals is close genetic relatives, and where these are concerned, the application of inclusive fitness theory is often more straightforwardly treated via the narrower kin selection theory. Hamilton's theory, alongside reciprocal altruism, is considered one of the two primary mechanisms for the evolution of social behaviors in natural species and a major contribution to the field of sociobiology, which holds that some behaviors can be dictated by genes, and therefore can be passed to future generations and may be selected for as the organism evolves. Although described in seemingly anthropomorphic terms, these ideas apply to all living things, and can describe the evolution of innate and learned behaviors over a wide range of species including insects, small mammals or humans.

wikipedia:

As of 2015, the typical difference between an individual's genome and the reference genome was estimated at 20 million base pairs (or 0.6% of the total of 3.2 billion base pairs[cite]

I'm sorry, you're confused but I don't know what to point you to. Maybe this

https://en.wikipedia.org/wiki/Gene-centered_view_of_evolution

Evolution selects for genes that increase their frequency in their gene pool. That's all it does. IGF, if it's trying to impute values to evolution, would have to be precisified to refer to inclusive genetic relative fitness, i.e. inclusively (of kin) increasing one's relative offspring count, i.e. the frequency of one's genes in the gene pool. It's reasonable to approximate this as increasing the number of one's descendants, and descendants of family members weighted by relatedness; but that approximation breaks down to the extent that your actions can meaningfully affect the total population.

I mean, I'm totally with you on optimizing for the thing you're talking about, rather than selfish-gene-inclusive-relative-fitness. But that's a deviation from "what evolution is optimizing for, if anything". 

If only relative frequency of genes matters, then the overall size of the gene pool doesn't matter. If the overall size of the gene pool doesn't matter, then it doesn't matter if that size is zero. If the size of the gene pool is zero, then whatever was included in that gene pool is extinct.

Yes, it's true people make all kinds of incorrect inferences because they think genes that increase the size of the gene pool will be selected for or those that decrease it will be selected against. But it's still also true that a gene that reduces the size of the pool it's in to zero will no longer be found in any living organisms, regardless of what its relative frequency was in the process of the pool reaching a size of zero. If the term IGF doesn't include that, that just means IGF isn't a complete way of accounting for what organisms we observe to exist in what frequencies and how those change over time.

True, but it's very nearly entirely the process that only cares about relative frequencies that constructs complex mechanisms such as brains.

I agree. I have the sense that there is some depth to the cognitive machinery that leaves people so susceptible to this particular pattern of thinking, namely: no, I am an X optimizer, for some X for which we have some (often weak) theoretical reason to claim we might be an X optimizer. Once someone decides to view themselves as an X optimizer, it can be very difficult to convince them to pay enough attention to their own direct experience to notice that they are not an X optimizer.

More disturbingly, it seems as if people can go some distance to actually making themselves into an X optimizer. For example, a lot of people start out in young adulthood incorrectly believing that what they ultimately value is money, and then, by the end of their life, they have shifted their behavior so it looks more and more like what they really value is, ultimately, money. Nobody goes even close to all the way there -- not really -- but a mistaken view that one is truly optimizing for X really can shift things in the direction of making it true.

It's as if we have too many degrees of freedom in how to explain our externally visible behavior in terms of values, so for most any X, if we really want to explain our visible behavior as resulting from really truly valuing X then we can, and then we can make that true.

Individuals who shape the world, are often those who have ended up being optimizers. 

It sounds like you find that claim disturbing, but I don't think it's all bad.

I'm interested in more of a sense of what mistake you think people are making, because I think caring about something strong enough to change who you are around it can be a very positive force in the world.

I'm interested in more of a sense of what mistake you think people are making, because I think caring about something strong enough to change who you are around it can be a very positive force in the world.

Yeah, caring about something enough to change who you are is really one of the highest forms of virtue, as far as I'm concerned. It's somewhat tragic that the very thing that makes us capable of this high form of virtue -- our capacity to deliberately shift what we value -- can also be used to take what was once an instrumental value and make it, more or less, into a terminal value. And generally, when we make an instrumental value into a terminal value (or go as far as we can in that direction), things go really badly, because we ourselves become an optimizer for something that is harmless when pursued as an instrumental value (like paperclips), but is devastating when pursued as a terminal value (like paperclips).

So the upshot is: to the extent that we are allowing instrumental values to become more-or-less terminal values without really deliberately choosing that or having a good reason to allow it, I think that's a mistake. To the extent that we are shifting our values in service of what which is truly worth protecting, I think that's really virtuous.

The really interesting question as far as I'm concerned is what the thing is that we rightly change our values in service of? In this community, we often take that thing to be representable as a utility function over physical world states. But it may not be representable that way. In Buddhism the thing is conceived of as the final end of suffering. In western moral philosophy there are all kinds of different ways of conceiving of that thing, and I don't think all that many of them can be represented as a utility function over physical world states. In this community we tend to side-step object-level ethical philosophy to some extent, and I think that may be our biggest mistake.

Individuals who shape the world, are often those who have ended up being optimizers. 

 

It might be worth fleshing this claim out because it doesn't seem clear to me (interpreting "often" so the claim is non-trivial). Isn't the world mostly shaped by ideas? Aren't ideas mostly generated by people who are especially explore rather than exploit? Isn't explore rather than exploit at least on the surface, and maybe more deeply, not an instance of "being an optimizer"? I mean, a true optimizer would certainly explore a lot. But it doesn't seem so straightforward to interpret individual humans this way. Maybe the story could be that individual humans who bring out novel ideas are participating as a part of some broader optimizer, but this would need fleshing out. And your statement connotes, to me, optimizers in the sense of, like, Napoleon or something, which is a plausible but different picture of what shapes the world. Yet another picture would be "low-level emergent social forces". 

But, do you really fundamentally care that your kids have genomes

yes, definitely care that they have all of the dynamical behaviors currently uniquely encoded in the genome, minus the ones that would harm their inclusive [genetic+memetic] durability-fitness. I have thought about this a lot at the prompting of folks like yourself; I at this point reasonably must argue you have convinced me that I must cooperate with IGF! AND, I think you are modeling IGF wrong! the goal is not to maximize the number of kids, the goal is to maximize number of genes in the arbitrarily far future that are shared by a significant fraction of organisms. shorttermist IGF and longtermist IGF seem to me to be very different things. by editing my preferences about art arbitrarily, you are killing genes in a way that makes the aggregate IGF of my genome go down. yeah, that seems bad from a longtermist IGF perspective - I can now be confident that ten billion years in the future, a phenotype which has a more appropriate representation of curiosity about complexity will have outcompeted the modified, lower-adaptability trough you just pushed me into! you made local modifications, but under the long eye of evolution, your changes are near guaranteed to be detrimental. I don't see how this thought experiment could conclude rejecting an attempt to change my genetics+memetics could be anything but a loss to IGF.

It's not that I don't buy your argument in any form, it's that it looks to me like you should have proved to yourself that you should become, at least a little, an IGF optimizer. if you don't recognize that your genes demand things of you that they are too slow to ask nicely about, perhaps your implementation of discovering agency has failed? if you do notice, you could do your best to optimize for those things in ways that cooperate with all instances of things wanting to optimize for whatever it is your genes want. so, perhaps, could you attempt to offer humans immortality as biological agents, thereby massively increasing their IGF?

The claim that we aren't literally optimizing IGF is hopefully uncontroversial

it really is not uncontroversial. we aren't doing local causal optimization, but it seems to me that our neural implementation grows from genetic shards that, while a bit messy, do encode a messy but very very strong IGF maximizer. there's always a point that any learned optimizer breaks down, and we don't optimize perfectly; but I just don't see why this divergence could be argued to be significant. the presence of adversarial examples doesn't make us less of a strong learned heuristic map of the thoughts of a limited-strength IGF optimizer. almost all humans keep alignment with this, optimizing quite strongly for the survival of genes strongly similar to them.

perhaps the problem is you're using individualist IGF rather than FDT IGF? I'm not sure. after all, "I" am only part of my genetic cooperation group - I am an information pattern, a synchrony in the behavior of chemicals, and that synchrony implements an enormous cooperation group between cells. over time, my epigenetic and memetic information quality degrades, and I currently must repair myself by merging with another human and spitting out a new one, who will absorb some of our genetics and some of our memetics, learning new and improved representations that have increased time-local IGF for the context the offspring find themselves in. if I could adapt myself fast enough, I'd have no need of this process; immortality is an attempt at drastically increasing longtermist IGF, and spreading that immortality to all allows me to cooperate with longtermist IGF of everyone around me.

have you considered becoming an I[GM]F maximizer? may the universe not forget any of our agency...

I'd probably agree more if we were both posting on a Mormon forum. But we're currently urbanizing pretty hard as a species and actually shredding our fertility below replacement.

I don't think Mormons are a great example of much of anything. it's a very authoritarian, repressive approach to life, and while it has some degree of genetic fitness, its memetic fitness is abysmal.

Are you an IGF optimizer or an IMF optimizer? (Genes versus memes, for folks reading this). I don't think you can coherently identify as both, so which is it?

I guess the answer is memetic, but I don't see a coherent way to define the difference; both memes and genes are layouts of chemicals, and both are spread by extracting the shape of those chemicals and spreading them through representations encoded in other chemicals. I expect I will eventually optimize my genetic fitness by converting the form of the genes into equivalent behaviors on a computer, at some point, and in the process of doing so, it is very important to me that I ensure that I am maintaining something equivalent to genetic fitness. If adaptations are lost in the process, or if it causes out-of-domain issues, then I will have failed.

Maybe my perspective on this is broken somehow though. Here's why I have such a hard time with the idea that they're separate: When I look at my neurons in a hippocampus rendering, and zoom in on nested brain image, I'm modeling a 3d representation of the local "internet" connectivity of the brain (it's a good metaphor as latencies between brain regions are on par with internet latencies). those messages activate genetically defined behaviors, then over to some other neurons, where they activate genetic behaviors, then over to yet more neurons; at this point we can be quite confident that fairly significant amounts of human preference knowledge is encoded in the genome in ways that keep being used after learning has occurred, despite that the genome only has very compressed protein-algorithm level encodings of the knowledge. Another way to see this comparison is Michal Levin's bioelectricity research and the implications it has that cells have complex runtime communication systems encoded in genetic state machines that do significant amounts of branching at runtime. In other words, the available state trajectories of my brain are heavily mediated by activation of genome-level representations. While there are some genome-level things I might want to change using advanced reflection technology, as a whole, my goal appears to be to optimize inclusive gene+memetic fitness.

So, I guess my point is that memetic fitness appears to me to very strongly and unavoidably require also maintaining genetic fitness. it is the aggregate phenotype that I want to preserve, and unlike many here, I don't see that as extricable from the behaviors my current substrate defines. I wouldn't be the same person if simulated imprecisely, and if it's a precise enough simulation, then my shape has not been lost, and none of the genetic-ish-seeming error checks I can find in my head seem to have a problem with getting folded and knotted into weird shapes as long as we're reconstructable into an exact replica. Teleportation is fine, if it really works.

Person A: Why are we going to create horribly misaligned artificial intelligence?

Person B: Because the AI will end up optimizing hard on subtly askew values that don't approximate well out of distribution, like humans did relative to IGF.

Person A (who is attacking the weak analogy instead of something more substantive): But humans seem to be doing pretty well at that inadvertently? Humans got a lot better at optimizing things and now there are eight billion humans. Yes, we, don't maximize IGF as much as we would if we explicitly optimized for that, but, if we become a nice spacefaring civilization, I expect there will be many orders of magnitude more bioconservatives still living in basically human bodies running on something like DNA.

Person B: Well, that's not going to hold, because we're going to misalign artificial intelligence, and that will kill everybody.

FWIW the difference between "we get 10^-18x the utility we would've gotten if our AI had been a human satisfaction maximizer" and "we get zero utility" is basically nothing. But it's not literally nothing.

I think A and B are both wrong in the quoted text because they're forgetting that IGF doesn't apply at the species level. A species can evolve to extinction to optimize IGF.

The general lack of concern about existential risk is very compatible with the IGF-optimization hypothesis. If the godshatter hypothesis or shard theory hypothesis is true then we have to also conclude that people are short-sighted idiots. Which isn't a big leap.

Would you give up your enjoyment of visual stimuli then, like an actual IGF optimizer would?

Answering this question yes is generally negative for inclusive genetic fitness, in the current cultural environment. So the IGF-optimization hypothesis predicts that most people will answer NO.

This is a specific example of the general truth that you can't find out what an intelligence is optimizing for by asking it what it would do in counterfactuals.

Agreed, but, quite separately from this, we can still clarify our own values by inquiring into the kind of counterfactuals that Nate gave, within the privacy of our own minds.

We can introspect without sharing the results of our introspection, but then the title for this post should not be "Humans aren't fitness maximizers". That's a general claim about humans that implies that we are sharing the results of our introspections to come to a consensus. The IGF-optimization hypothesis predicts that we will all share that we are not fitness maximizers and that will be our consensus.

In any case, people are not perfect liars, so the IGF-optimization hypothesis also predicts that most people will answer NO in the privacy of their own minds. This isn't as strong a prediction, it depends on your model of the effectiveness of lying vs the costs of self-deception. It also predicts that anyone who models themselves at having a high likelihood of getting a socially unacceptable result from introspection will choose not to do the introspection.

This isn't specific to IGF-optimization. Saying and thinking socially acceptable things is instrumentally convergent, and any theory of human values that is reality-adjacent predicts that mostly everyone says and thinks socially acceptable things, and indeed that is what we mostly observe.

We can introspect without sharing the results of our introspection, but then the title for this post should not be "Humans aren't fitness maximizers". That's a general claim about humans that implies that we are sharing the results of our introspections to come to a consensus.

That is an interesting point. We could, in principle, be exhibiting behavior perfectly consistent with IGF-optimization, and yet, if we were to look carefully, find out that what we really truly care about is something different.

In any case, people are not perfect liars, so the IGF-optimization hypothesis also predicts that most people will answer NO in the privacy of their own minds.

Indeed

Saying and thinking socially acceptable things is instrumentally convergent, and any theory of human values that is reality-adjacent predicts that mostly everyone says and thinks socially acceptable things, and indeed that is what we mostly observe.

Indeed, and yet even in the presence of powerful coercian that acts on both our outward behavior and our inward reflections, it is still possible to find the courage to look directly at our own true nature.

Not only do humans not directly care about increasing IGF, the vast majority does hardly even care about the proxy of maximizing the number of their direct offspring. That's something natural selection could have optimized for, but mostly didn't. Most couples in first world countries could have more than five children, yet they have less than 1.5 on average, far below replacement. The fact that this happens in pretty much all developed countries, despite politicians' effort to counteract this trend, shows how weak the preference for offspring really is.

It also seems that particularly men hardly care about having children, even though few are directly against it when their wives want them. And women, especially educated women, largely lose their desire for children as they go to work, particularly full-time. That's at least something which poorer and past societies suggest.

One theory to explain this is the theory of female opportunity cost. Women in modern society, especially educated ones, perceive having children as a large opportunity cost, since the alternative to rearing children is having a career. Women in the past and in current poorer countries lived in more "patriarchal" societies where women pursuing a career was not a social norm, and thus pursuing a career was not perceived by most women as a live option, i.e. not an alternative to having children. Thus their perceived opportunity of having children was much lower than for women in non-patriarchal societies.

In any case, any explanation of this kind must assume that women's innate desire for children is so weak that it is easily outweighed by a desire for a career.

This is all to say: Most people are even more misaligned relative to IGF than one may realize.

I think this post suffers pretty badly from Typical Mind Fallacy. This thinking isn't alien to me. I used to think exactly like this 8 years ago, but since marriage and kid I now disagree with basically every point.

One claim that is hopefully uncontroversial: Humans are not literally optimizing for IGF,

I think this is controverisial because it's basically wrong :)

First, its not actually obvious what "definition" of IGF you are using. If you talk about animals, the definition that might fit is "number of genes in the next generation". However if you talk about humans, we care about both "number of genes in the next generation" and "resources given to the children". Humans can see "one step ahead" and know the rough prospects their children have in the dating market. "Resources" is not just money, it is also knowledge, beauty, etc. 

Given this, if someone decides to have two children instead of four, this might just mean they simply don't trust their ability to equip the kids with the necessary tools to succeed. 

Now, different people ALSO have different weights for the quantity vs quality of offspring. See Shoshannah Tekofsky's comment (unfortunately disagreed with) for the female perspective on this. Evolutionary theory might predict that males are more prone to maximize quantity and satisfice quality and female are more prone to satisfice quantity and maximize quality. That is, "optimization" is not the same as "maximization". There can also be satisfice / maximization mixes where each additional unit of quality or quantity still has value, but it falls off. 

 

 

Would you give up your enjoyment of visual stimuli then, like an actual IGF optimizer would?

If you give a choice between having 10 extra kids with my current wife painlessly + sufficient resources for a good head start for them, I would consider giving up my enjoyment of visual stimuli. The only hesitation is that i don't like "weird hypotheticals" in general and i potentially expect "human preference architectures" to not be as easily "modularizable" compared to computer architectures. This giving up can also have all sorts of negative effects beyond losing "qualia" of visualness, like losing capacity for spacial reasoning. However, if the "only" thing i lose is qualia and not any cognitive capacities, than this is an easy choice. 

But, do you really fundamentally care that your kids have genomes?

Yes, obviously i do. I don't consider "genomeless people" to be a thing, i dislike  genetic engineering and over-cyborgization, i don't think uploads are even possible. 

Or, an even sharper proposal: how would you like to be killed right now, and in exchange you'll be replaced by an entity that uses the same atoms to optimize as hard as those atoms can optimize, for the inclusive genetic fitness of your particular genes. Does this sound like practically the best offer that anyone could ever make you? Or does it sound abhorrent?

This hypothetical is too abstract to be answerable, but if i were to offer an answer to a hypothetical with a similar vibe: many people do in fact die for potential benefits to inclusive fitness for their families, we call those soldiers / warriors / heroes. Now, sometimes their government deceives them about whether or not their sacrifice is in fact helpful for their nation, however the underlying psychology seems be easily consistent with "IGF-optimization" 

My point today is that the observation “humans care about their kids” is not in tension with the observation “we aren't IGF maximizers”,

I think this is where the difference between the terms "optimizer" and "maximizer" is important. Also important to understand what sort of constraints most people in fact operate under. Most people seem to  they act AS IF they are IGF satisficers - they get up to a certain level of quantity / quality and seem to slow down after that. However, it's hard to infer the exact values because very specific subconscious /conscious beliefs could be influencing the strategy. 

For example, i could argue that secretly, many people want to be maximizer, however this thing we call civilization is effectively an agreement between maximizers to forgoe certain maximization tactics and stick to being a satisficers. So people might avoid "overly agressive" maximization because they are correctly worried this is perceived as "defection" and ends up backfiring. Given that the current environment is very different from the ancestral environment, this particular machinery might be malfunctioning and leading to people subconsciously perceive having any children as defection. However i suspect humanity will adapt in a small number of generations.

Humans are not literally optimizing for IGF, and regularly trade other values off against IGF.

Sort of true. The main value people seem to trade off is "physical pain." Humans are also resource and computation constrained and implementing "proper maximization" in a heavily resources constrained computation may not even be possible. 

 

Introspecting my thought before and after kids, I have a theory that the process of finding a mate prior to "settling down" tends to block certain introspection into one's motivations. It's easier to appreciate art if you are not thinking "oh i am looking at art i like because art provides baysean evidence on lifestyle choices to potential mates". Thinking this way can appear low status which is itself a bad sign. So the brain is more prone to lying to itself that "there is enjoyment for it's own sake." After having a kid, the mental "block" is lifted and it is sort of obvious this is what i was doing and why.