All of geoffreymiller's Comments + Replies

Fair enough. Thanks for replying. It's helpful to have a little more background on Ben. (I might write more, but I'm busy with a newborn baby here...)

Jim - I didn't claim that libel law solves all problems in holding people to higher epistemic standards.

Often, it can be helpful just to incentivize avoiding the most egregious forms of lying and bias -- e.g. punishing situations when 'the writer had actual knowledge that the claims were false, or was completely indifferent to whether they were true or false'. 

Rob - you claim 'it's very obvious that Ben is neither deliberately asserting falsehoods, nor publishing "with reckless disregard'.

Why do you think that's obvious? We don't know the facts of the matter. We don't know what information he gathered. We don't know the contents of the interviews he did. As far as we can tell, there was no independent editing, fact-checking, or oversight in this writing process. He's just a guy who hasn't been trained as an investigative journalist, who did some investigative journalism-type research, and wrote it up.

Number of h... (read more)

Why do you think that's obvious?

I know Ben, I've conversed with him a number of times in the past and seen lots of his LW comments, and I have a very strong and confident sense of his priorities and values. I also read the post, which "shows its work" to such a degree that Ben would need to be unusually evil and deceptive in order for this post to be an act of deception.

I don't have any private knowledge about Nonlinear or about Ben's investigation, but I'm happy to vouch for Ben, such that if he turns out to have been lying, I ought to take a credibility ... (read more)

(Note: this was cross-posted to EA Forum here; I've corrected a couple of minor typos, and swapping out 'EA Forum' for 'LessWrong' where appropriate)

A note on EA LessWrong posts as (amateur) investigative journalism:

When passions are running high, it can be helpful to take a step back and assess what's going on here a little more objectively.

There are all different kinds of EA Forum LessWrong posts that we evaluate using different criteria. Some posts announce new funding opportunities; we evaluate these in terms of brevity, clarity, relevance, and useful ... (read more)

A brief note on defamation law:

The whole point of having laws against defamation, whether libel (written defamation) or slander (spoken defamation), is to hold people to higher epistemic standards when they communicate very negative things about people or organizations -- especially negative things that would stick in the readers/listeners minds in ways that would be very hard for subsequent corrections or clarifications to counter-act.

Without making any comment about the accuracy or inaccuracy of this post, I would just point out that nobody in EA should ... (read more)

(Copying my response from the EA Forum)

I agree there are some circumstances under which libel suits are justified, but the net-effect on the availability of libel suits strikes me as extremely negative for communities like ours, and I think it's very reasonable to have very strong norms against threatening or going through with these kinds of suits. Just because an option is legally available, doesn't mean that a community has to be fine with that option being pursued.

That is the whole point and function of defamation law: to promote especially high standa

... (read more)

What you described was perhaps the intent behind the law, but that's not necessarily how it is used in practice. You can use the law to intimidate people who have less money than you, simply by giving the money to a lawyer... and then the other side needs to spend about the same money on their lawyer... or risk losing the case. "The process is the punishment."

(I have recently contributed money to a defense fund of a woman who exposed a certain criminal organization in my country. The organization was disbanded, a few members were convicted, one of them end... (read more)

The whole point of having laws against defamation, whether libel (written defamation) or slander (spoken defamation), is to hold people to higher epistemic standards when they communicate very negative things about people or organizations

This might be true of some other country's laws against defamation, but it is not true of defamation law in the US. Under US law, merely being wrong, sloppy, and bad at reasoning would not be sufficient to make something count as defamation; it only counts if the writer had actual knowledge that the claims were false, or was completely indifferent to whether they were true or false.

Without making any comment about the accuracy or inaccuracy of this post, I would just point out that nobody in EA should be shocked that an organization (e.g. Nonlinear) that is being libeled (in its view) would threaten a libel suit to deter the false accusations (as they see them), to nudge the author(e.g. Ben Pace) towards making sure that their negative claims are factually correct and contextually fair. 

Wikipedia claims: "The 1964 case New York Times Co. v. Sullivan, however, radically changed the nature of libel law in the United States by esta... (read more)

The 'whole point of libel suits' is to weaponize the expensive brokenness of the legal system to punish people for saying mean things about you.

Gordon - I was also puzzled by the initial downvotes. But they happened so quickly that I figured the downvoters hadn't actually read or digested my essay. Disappointing that this happens on LessWrong, but here we are.

Max - I think your observations are right. The 'normies', once they understand AI extinction risk, tend to have much clearer, more decisive, more negative moral reactions to AI than many EAs, rationalists, and technophiles tend to have. (We've been conditioned by our EA/Rat subcultures to think we need to 'play nice' with the AI industry, no matter how sociopathic it proves to be.)

Whether a moral anti-AI backlash can actually slow AI progress is the Big Question. I think so, but my epistemic confidence on this issue is pretty wide.  As an evolutionary... (read more)

Maybe. But at the moment, the US is really the only significant actor in the AGI development space. Other nations are reacting in various ways, ranging from curious concern to geopolitical horror. But if we want to minimize risk of a nation-state AI arms races, the burden is on the US companies to Just Stop Unilaterally Driving The Arms Race.

I'm predicting that an anti-AI backlash is likely, given human moral psychology and the likely applications of AI over the next few years.

In further essays I'm working on, I'll probably end up arguing that an anti-AI backlash may be a good strategy for reducing AI extinction risk -- probably much faster, more effective, and more globally applicable than any formal regulatory regime or AI safety tactics that the AI industry is willing to adopt.

Well, the AI industry and the pro-AI accelerationists believe that there is an 'immense upside of AGI', but that is a highly speculative, faith-based claim, IMHO. (The case for narrow AI having clear upsides is much stronger, I think.)

It's worth noting that almost every R&D field that has been morally stigmatized -- such as intelligence research, evolutionary psychology, and behavior genetics -- also offered huge and transformative upsides to society, when the field first developed. Until they got crushed by political demonization, and their potential ... (read more)

1O O3mo
I think as capabilities increase at least one nation will view developing safe AGI as a requirement in their national security strategy.

I don't think so. My friend Peter Todd's email addresses typically include his middle initial 'm'.


mwatkins - thanks for a fascinating, detailed post. 

This is all very weird and concerning. As it happens, my best friend since grad school is Peter Todd, professor of cognitive science, psychology, & informatics at Indiana University. We used to publish a fair amount on neural networks and genetic algorithms back in the 90s.

Interesting. Does he have any email addresses or usernames on any platform that involve the string "petertodd"?

That's somewhat helpful. 

I think we're coming at this issue from different angles -- I'm taking a very evolutionary-functional view focused on what selection pressures shape psychological adaptations, what environmental information those adaptations need to track (e.g. snake! or pathogen!), what they need to represent about the world (e.g. imminent danger of death from threat X!), and what behaviors they need to trigger (e.g. run away!). 

From that evolutionary-functional view, the 'high-level cognitive properties' of 'fitness affordances' are the... (read more)

Seems like we broadly agree on most points here, AFAICT. Thanks again for your engagement. :)  This evidence shows that evolution is somehow able to adapt to relevant affordances, but doesn't (to my eye) discriminate strongly between the influence being mediated by selection on high-level cognitive properties.  For example, how strongly do these observations discriminate between worlds where evolution was or wasn't constrained by having or not having the ability to directly select adaptations over high-level cognitive properties (like "afraid of death in the abstract")? Would we notice the difference between those worlds? What amount of affordance-tailoring would we expect in worlds where evolution was able to perform such selection, compared to worlds where it wasn't?  It seems to me that we wouldn't notice the difference. There are many dimensions of affordance-tailoring, and it's harder to see affordances that weren't successfully selected for.  For a totally made up and naive but illustrative example, if adult frogs reliably generalize to model that a certain kind of undercurrent is dangerous (ie leads to predicted-death), but that undercurrent doesn't leave sensory-definable signs, evolution might not have been able to select frogs to avoid that particular kind of undercurrent, even though the frogs model the undercurrent in their world model. If the undercurrent decreases fitness by enough, perhaps frogs are selected to be averse towards necessary conditions for waters having those undercurrents -- maybe those are sensory-definable (or otherwise definable in terms of eg cortisol predictions).  But we might just see a frog which is selected for a huge range of other affordances, and not consider that evolution failed with the undercurrent-affordance. (The important point here doesn't have to do with frogs, and I expect it to stand even if the example is biologically naive.)

If we're dead-serious about infohazards, we can't just be thinking in terms of 'information that might accidentally become known to others through naive LessWrong newbies sharing it on Twitter'. 

Rather, we need to be thinking in terms of 'how could we actually prevent the military intelligence analysts of rival superpowers from being able to access this information'?

My personal hunch is that there are very few ways we could set up sites, security protocols, and vetting methods that would be sufficient to prevent access by a determined government. Whic... (read more)

Bluntly: if you write it on Lesswrong or the Alignment Forum, or send it to a particular known person, governments will get a copy if they care to. Cybersecurity against state actors is really, really, really hard. Lesswrong is not capable of state-level cyberdefense.

If you must write it at all: do so with hardware which has been rendered physically unable to connect to the internet, and distribute only on paper, discussing only in areas without microphones. Consider authoring only on paper in the first place. Note that physical compromise of your home, w... (read more)

If we're nowhere close to solving alignment well enough that even a coarse-grained description of actual human values is relevant yet, then I don't understand why anyone is advocating further AI research at this point. 

Also, 'avoiding deceptive alignment' doesn't really mean anything if we don't have a relatively rich and detailed description of what 'authentic alignment' with human values would look like. 

I'm truly puzzled by the resistance that the AI alignment community has against learning a bit more about the human values we're allegedly aligning with. 

Mostly because ambitious value learning is really fucking hard, and this proposal falls into all the problems that ambitious or narrow value learning has. You're right though that AI capabilities will need to slow down, and I am not hopeful here.

GeneSmith -- I guess I'm still puzzled about how Shard Theory prevents wireheading (broadly construed); I just don't see it as a magic bullet that can keep agents focused on their ultimate goals. I must be missing something.

And, insofar as Shard Theory is supposed to be an empirically accurate description of human agents, it would need to explain why some people become fentanyl addicts who might eventually overdose, and others don't. Or why some people pursue credentials and careers at the cost of staying childless... while others settle down young, have s... (read more)

Akash -- this is very helpful; thanks for compiling it!

I'm struck that much of the advice for newbies interested in 'AI alignment with human values' is focused very heavily on the 'AI' side of alignment, and not on the 'human values' side of alignment -- despite the fact that many behavioral and social sciences have been studying human values for many decades.

It might be helpful to expand lists like these to include recommended papers, books, blogs, videos, etc that can help alignment newbies develop a more sophisticated understanding of the human psycholo... (read more)

Primarily because right now, we're not even close to that goal. We're trying to figure out how to avoid deceptive alignment right now.

GeneSmith -- when people in AI alignment or LessWrong talk about 'wireheading', I understood that not to refer to people literally asking neurosurgeons to stick wires into their brains, but rather to a somewhat larger class of ways to hack one's own reward systems through the usual perceptual input channels.

I agree that humans are not 'reward maximizing agents', whatever that is supposed to mean in reference to actual evolved organisms with diverse, heterogenous, & domain-specific motivational systems. 

I don't think I explained my thinking clearly enough. If using wireheading to refer to the broader class of actions that increase reward at the cost of maximizing reproductive fitness, I agree humans in general do wirehead to some degree. But even if we count taking recreational drugs or masturbation as wireheading, I still don't believe any other theory of values explains the relative lack of these behaviors as well as shard theory. If humans were truly reward maximizers, it's difficult to imagine how they would manage to avoid wireheading as well as they do. I suppose perhaps the "thousand genetic hacks" theory might be able to explain it if evolution was clever enough? There's certainly some evidence that when humans are exposed to new sources of reward that do nothing to benefit reproductive fitness, it's often a disaster. See the numerous cases of hunter-gatherer peoples being exposed to alcohol for the first time. But again... think about the actual wireheading example. There must be millions of humans that know about wireheading, yet so far as I know there are zero examples of people doing so recreationally. There was nothing similar to wireheading in the ancestral environment. Yet nearly everyone seems aversive to the idea of literally wireheading themselves. Why? Humans can anticipate how incredibly rewarding it would be to wirehead. And many humans could afford full time caretakers to ensure they would be able to experience the rewards for years or decades. So why are people aversive to the idea? My interpretation is that humans develop their initial shards of value during childhood in an environment that usually contains limited opportunities to wirehead. As the human world model becomes generally filled in, it becomes the case that most sensory environments activate at least one shard, whose "values" are not aligned with wireheading. I think shard theory has a better explanation of this relative lack of wireheading than alternative models. But it's

Quintin (and also Alex) - first, let me say, thank you for the friendly, collegial, and constructive comments and replies you've offered. Many folks get reactive and defensive when they're hit with a 6,000-word critique of their theory, but you're remained constructive and intellectually engaged. So, thanks for that.

On the general point about Shard Theory being a relatively 'Blank Slate' account, it might help to think about two different meanings of 'Blank Slate' -- mechanistic versus functional.

A mechanistic Blank Slate approach (which I take Shard Theor... (read more)

TurnTrout -- I think the 'either/or' framing here is misleading about the way that genomes can adapt to maximize survival and minimize death.

For example, jumping spiders have evolved special secondary eyes pointing backwards that specifically detect predators from behind that might want to eat them. At the functional level of minimizing death, these eyes 'hardcode death-fear' in a very real and morphological way. Similarly, many animals vulnerable to predators evolve eye locations on the sides of their heads, to maximize degrees of visual coverage they can... (read more)

People are indeed effectively optimized by evolution to do behavior X in situation Y (e.g. be afraid when death seems probable). I think evolution did that a lot. I think people are quite optimized by evolution in the usual behavioral biological ways you described.  I'm rather saying that the genome can't e.g. specify a neural circuit which fires if and only if a person is thinking about death. I'm saying that most biases are probably not explicit adaptations, that evolution cannot directly select for certain high-level cognitive properties (and only those properties), eg "level of risk aversion" or "behavior follows discounting scheme X" or "vulnerability to the framing effect." But evolution absolutely can and did select genotypes to unfold into minds which tend to be shaped in the form "cares more about ingroup."  Hopefully this comment clarifies my views some?

Jan - well said, and I strongly agree with your perspective here.

Any theory of human values should also be consistent with the deep evolutionary history of the adaptive origins and functions of values in general - from the earliest Cambrian animals with complex nervous systems through vertebrates, social primates, and prehistoric hominids. 

As William James pointed out in 1890 (paraphrasing here), human intelligence depends on humans have more evolved instincts, preferences, and values than other animals, not having fewer.

For what it's worth, I wrote a critique of Shard Theory here on LessWrong (on Oct 20, 2022) from the perspective of behavior genetics and the heritability of values. 

The comments include some helpful replies and discussions with Shard Theory developers Quintin Pope and Alex Trout.

I'd welcome any other feedback as well.

Quintin -- yes, indeed, one of the reasons I was excited about Shard Theory was that it has these different emphases you mention (e.g. 'multi-optimizer dynamics, values handshakes among shards, origins of self-reflective modeling, origins of biases, moral reflection as shard deliberation'), which I thought might actually be useful to develop and integrate with in evolutionary psychology and other branches of psychology, not just in AI alignment. 

So I wanted to see if Shard Theory could be made a little more consistent with behavior genetics and ev psy... (read more)

Quintin & Alex - this is a very tricky issue that's been discussed in evolutionary psychology since the late 1980s. 

Way back then, Leda Cosmides & John Tooby pointed out that the human genome will 'offload' any information it can that's needed for brain development onto any environmental regularities that can be expected to be available externally, out in the world. For example, the genome doesn't need to specify everything about time, space, and causality that might be relevant in reliably building a brain that can do intuitive physics -- as ... (read more)

GeneSmith -- thanks for your comment. I'll need to think about some of your questions a bit more before replying.

But one idea popped out to me: the idea that shard theory offers 'a good explanation of how humans were able to avoid wireheading.'

I don't understand this claim on two levels:

  1. I may be missing something about shard theory, but I don't actually see how it could prevent humans, at a general level, from hacking their reward systems in many ways
  2. As an empirical matter, humans do, in fact, hack our reward systems in thousands of ways that distract us f
... (read more)
When I say "wireheading" I'm referring to something more specific than simply "humans not employing strategies that maximize reproductive fitness" I'm talking about how people know that you could experience the greatest possible pleasure by hooking up wires directly to the reward centers in your brain, yet they don't pursue that strategy. If you model humans as reward maximizing agents, that doesn't really make sense.

PS, Gary Marcus at NYU makes some related points about Blank Slate psychology being embraced a bit too uncritically by certain strands of thinking in AI research and AI safety.

His essay '5 myths about learning and innateness

His essay 'The new science of alt intelligence'

His 2017 debate with AI researcher Yann LeCunn 'Does AI need more innate machinery' 

I don't agree with Gary Marcus about everything, but I think his views are worth a bit more attention from AI alignment thinkers.

tailcalled -- these issues of variance, canalization, quality control, etc are very interesting.

For example, it's very difficult to understand why so many human mental disorders are common, heritable, and harmful -- why wouldn't the genetic variants that cause schizophrenia or major depression already have been eliminated by selection? Our BBS target article in 2006 addressed this.

Conversely, it's a bit puzzling that the coefficient of additive genetic variation in human brain size is lower than might be expected, according to our 2007 meta-analysis.

In gen... (read more)

I've read various discussions on this, e.g. Marco Del Giudice's book. Just quickly skimming your article, you seem to be raising some of the same points that I saw in his book. One question: isn't the liability-scale heritability estimate the wrong value to use? Or at least, not straightforwardly the right one. I believe the effective heritability for the purposes of selection would roughly speaking be the prevalence times the liability-scale heritability, is that correct? So if e.g. schizophrenia has a liability heritability of 80% but a prevalence of 0.32%, that would lead to an effective heritability (for the purpose of selection) of 0.26%? I'm not sure why this puzzle isn't solved by births being a constraint? Two questions: * Could it be because they have not reached a local optimum with regards to reproductive success? * Could it be a collider/tradeoff thing? E.g. you can have a lot of offspring or you can invest a lot into the offspring you do have? So if you count the number of offspring, you don't just get the fitness axis, but also for lack of a better word, the r-K axis.

Jacob - thanks! Glad you found that article interesting. Much appreciated. I'll read the links essays when I can.

It's hard to know how to respond to this comment, which reveals some fundamental misunderstandings of heritability and of behavior genetics methods. The LessWrong protocol is 'If you disagree, try getting curious about what your partner is thinking'. But in some cases, people unfamiliar with a field have the same old misconceptions about the field, repeated over and over. So I'm honestly having trouble arousing my curiosity....

The quote from habryka doesn't make sense to me, and doesn't seem to understand how behavior genetic studies estimate heritabilitie... (read more)

My point is not to shift your beliefs about genetics, but to show that your conclusions cannot be shown from heritability, since the statistic is incapable of making the claims you want to make. That doesn't mean the claims are necessarily wrong (they may be rescued), but that more work is necessary to make the claims that you want to make.

Jacob, I'm having trouble reconciling your view of brains as 'Universal Learning Machines' (and almost everything being culturally transmitted), with the fact that millions of other animals species show exactly the kinds of domain-specific adaptive responses studied in evolutionary biology, animal behavior research, and evolutionary psychology.

Why would 'fear of death' be 'culturally transmitted' in humans, when thousands of other vertebrate species show many complex psychological and physiological adaptations to avoid accidents, starvation, parasitism, an... (read more)

That is in fact what I was talking about, because the abstract conscious culturally transmitted fear of death is vastly more general and effective once learned. Humans do seem to have innate fears of some leading causes of early death, such as heights, and indirect fear of many sources of contamination through disgust; there are probably a few other examples. But in general humans have lost many innate skills and responses (which typically come from brainstem CPGs) in favor of the more complex learned variants (cortical) - we even must learn to walk. Human babies are notoriously lacking in fear of all the various ways the world can kill them and require extensive supervision.

You seem to be making some very sweeping claims about heritability here. In what sense is 'heritability not what I think'? 

Do you seriously think that moderate heritability doesn't say anything at all about how much genes matter, versus how much 'non-agentic things can influence a trait'?

Yes, or close to it. Heritability is not equivalent to "how much genes impact traits." Kaj Sotala says it well here: Similarly, one hypothesis that can explain high heritability yet not result in the post's implications being right is that the studies aren't properly controlled. From habryka:

My phrasing was slightly tongue-in-cheek; I agree that sex hormones, hormone receptors in the brain, and the genomic regulatory elements that they activate, have pervasive effects on brain development and psychological sex differences.

Off topic: yes, I'm familiar with evolutionary game theory; I was senior research fellow in an evolutionary game theory center at University College London 1996 - 2000, and game theory strongly influenced my thinking about sexual selection and social signaling.

Steven -- thanks very much for your long, thoughtful, and constructive comment. I really appreciate it, and it does help to clear up a few of my puzzlements about Shard Theory (but not all of them!).

Let me ruminate on your comment, and read your linked essays.

I have been thinking about how evolution can implement different kinds of neural architectures, with different degrees of specificity versus generality, ever since my first paper in 1989 on using genetic algorithms to evolve neural networks. Our 1994 paper on using genetic algorithms to evolve sensori... (read more)

Jacob - I read your 2015 essay. It is interesting and makes some fruitful points.

I am puzzled, though, about when nervous systems are supposed to have evolved this 'Universal Learning Machine' (ULM) capability. Did ULMs emerge with the transition from invertebrates to vertebrates? From rat-like mammals to social primates? From apes with 400 cc brains to early humans with 1100 cc brains?

Presumably bumblebees (1 million neurons) don't have ULM capabilities, but humans (80 billion neurons) allegedly do. Where is the threshold between them -- given that bumble... (read more)

The core architecture of brainstem, basal ganglia, thalamus and pallium/cortex is at least 500 million years old. You are looking for some binary threshold which simply does not exist, the dominance of intra over inter lifetime learning is continuous and depends on brain size * lifespan or cumulative optimization power. Likewise one could ask: What is the threshold between between Alexnet and VIT L/14@336px? What would make you suspect I would argue that? From the ULM post, the core hypothesis: Significant mental algorithms are things like adding numbers, forming sentences, recognizing attractive vs unattractive mates, bipedal walking, courtship strategies, etc - essentially almost everything that infants can't do at birth, which is nearly everything for humans. Both the hardware and initial rough architecture - the architectural prior - are innate, which is where you see the genetic differences between individuals, families, races, sexes, species, etc.
1the gears to ascension1y
"it just happens" is massively underselling how much effect the sex hormones have on which genetics activate, isn't it? somewhere on here someone did an analysis of how genetically-defined mouse neurons train the recognition of mouse squeaks, or something like that, which would be a wonderful bridge between your field and practical brain understanding. (slightly off topic question I've been wanting to ask - are you familiar with evolutionary game theory?)

Charlie - thanks for offering a little more 'origin story' insight into Shard Theory, and for trying to explain what Quintin Trout was trying to express in that passage.

Honestly, I still don't get it. The 'developmental recipe' that maps from genotype to phenotype, for any complex adaptation, is usually opaque, complicated, uninterpretable, and full of complex feedback loops, regulatory systems, and quality control systems. These are typically beyond all human comprehension, because there were never any evolutionary selection pressures for that development... (read more)

I'm saying "Either the genome can hardcode death fear, which would have huge alignment implications, or it can't, which would have huge alignment implications, or it can hardcode death fear but only via advantages it had but we won't, which doesn't have huge implications." Of the three, I think the second is most likely -- that you can't just read off an a priori unknown data structure, and figure out where death is computed inside of that. If there were less hard and complex ways to get an organism to fear death, I expect evolution to have found those instead. See also clues from ethology, where even juicy candidates-for-hardcoding ended up not being hardcoded. 
The genome doesn't code fear of death for humans, but it doesn't need to. Humans learn the concept of death through cultural transmission, and it is immediately terrifying because our primary drive (the instrumental convergent drive) is empowerment, and death is the minimally empowered state.

Jacob -- thanks for your comment. It offers an interesting hypothesis about some analogies between human brain systems and computer stuff. 

Obviously, there's not enough information in the human genome to specify every detail of every synaptic connection. Nobody is claiming that the genome codes for that level of detail. Just as nobody would claim that the genome specifies every position for every cell in a human heart, spine, liver, or lymphatic system. 

I would strongly dispute that it's the job of 'behavior genetics, psychology, etc' to fit thei... (read more)

Not my framework, but that of modern neuroscience. Just as biology is constrained by chemistry which is constrained by physics. I just reskimmed it and it's not obviously out of date and still is a pretty good overview of the modern view of the brain from systems neuroscience which is mostly tabula rasa. The whole debate of inherited traits is somewhat arbitrary based on what traits you consider. Is the use of two spaces after a period an inheritable trait? Belief in marxism? Genetics can only determine the high level hyperparameters and low frequency wiring of the brain - but that still allows a great deal to be inheritable, especially when one considers indirect correlations (eg basketball and height, curiosity drive and educational attainment, etc).

Peter -- I think 'hard coding' and 'hard wiring' is a very misleading way to think about brain evolution and development; it's based way too much on the hardware/software distinction in computer science, and on 1970s/1980s cognitive science models inspired by computer science.

Apparently it's common in some AI alignment circles to view the limbic system as 'hard wired', and the neocortex as randomly initialized? Interesting if true. But I haven't met any behavior geneticists, neuroscientists, evolutionary psychologists, or developmental psychologists who wo... (read more)

The cortex/cerebellum are definitely randomly initialized at birth in terms of most synaptic connectivity[1]. The evidence for this from neuroscience is enormous, overwhelming, and by virtue of being closer to physics - trumps the weak contradictory evidence from all other softer fields (behavior genetics, dev psych, and especially especially evo psych). See this 2015 post which shows how the ev psych evolved modularity hypothesis was rather decisively disproven, and the evidence has only grown stronger since. -------------------------------------------------------------------------------- 1. That does not mean some low frequency components of the wiring distribution are not genetically predetermined - obviously they must be. ↩︎

I haven't read the universal learning hypothesis essay (2015) yet, but at first glance, it also looks vulnerable to a behavior genetic critique (and probably an evolutionary psychology critique as well).

In my view, evolved predispositions shape many aspects of learning, including Bayesian priors about how the world is likely to work, expectations about how contingencies work (e.g. the Garcia Effect that animals learn food aversions more strongly if the lag between food intake and nausea/distress is a few minutes/hours rather than immediate), domain-specifi... (read more)

BTW I found this linked article fascinating, and it's a very important subject for those trying to create safe brain-like AGI by reverse engineering the brain's alignment mechanisms (empathy/love/altruism). It's sometimes called the 'pointing problem': the agent's utility function must use specific concepts from the dynamic learned world model, but it seems difficult to make this connection, because the utility function is innate and the specific concepts we want it to reference will be small targets that could appear anywhere in the complex learned world model. I describe the problem a bit here, and my current best guess of how the brain solves this (correlation guided proxy matching) is a generalization of imprinting mechanisms. Shard theory is also concerned with the problem.
The article collects the evidence which disproves the central evolved modularity hypothesis of ev psych, which was already pretty decisive in 2015 and has only grown moreso.

tailcalled -- I agree that we don't yet have very good GWAS studies of political, religious, and moral ideology values; I was just illustrating that we already have ways of studying those (in principal), we have big genotyped samples in several international samples, and it's just a matter of time before researchers start asking people in those samples about their more abstract kinds of values, and then publishing GWAS studies on those values.

So, I think we're probably in agreement about that issue. 

tailcalled -- thanks for your comments.

As a preliminary reply: here are links to a few genome-wide association studies concerning human values and value-like traits of various sorts: 

risk tolerance

delay discounting

anti-social behavior



cannabis use

sexual orientation

These are just a few illustrative examples. The rate of research and publication for GWAS research is very high, and is accelerated by the existence of large, fully genotyped samples such as UK BioBank; to do genome-wide association studies on particular human values, i... (read more)

Oh I guess I should say, I do agree that Shard Theory seems like it could get too blank slatist. I just don't agree with all of the arguments you presented. Though some of your arguments seem reasonable enough.
Ah, I was already aware of those, I was more thinking of the "political, religious, and moral ideology" values; those are the ones I hadn't seen a genomic study of. I also have some concerns with the notion that the studies you listed here are good examples, but that might be getting a bit too tangential? Idk, up to you if you feel like you want to discuss them.

The expert preppers I know emphasize that prepping for potential disaster is usually treated as an individual-level or family-level activity, but often the best strategies involve more community-level networking, such as getting to know your neighbors, understanding local 'affordances' (water, food, shelter, weapons, experts), and thinking about who your best local allies and clan-mates will be if things get rough.

We evolved to survive in small clans nested within larger tribes, rather than as individual super-preppers. So, my advice would be, figure out who the most useful 6-10 people would be who live near you, and start establishing some communication, trust, and contingency plans with them.

There's a big difference between teleology (humans projecting purposiveness onto inanimate matter) and teleonomy (humans recognizing evolutionary adaptations that emerged to embody convergent instrumental goals that promote the final goals of survival and reproduction). The latter is what I'm talking about with this essay. The biological purposes are not just in the mind of the beholder.

I disagree. Any "purposes" are limited to the mind of a beholder. Otherwise, you'll be joining the camp of the child who thinks that a teddy bear falls to the ground because it wants to.

I know that AI alignment researchers don't aim to hand-code human values into AI systems, and most aim to 'implicitly describe human values'. Agreed. 

The issue is, which human values are you trying to implicitly incorporate into the AI system? 

I guess if you think that all human values are generic, computationally interchangeable, extractible (from humans) by the same methods, and can be incorporated into AIs using the same methods, then that could work, in principle. But if we don't explicitly consider the whole range of human value types, how would we even test whether our generic methods could work for all relevant value types?

Hi Charlie, thanks for your comment. 

Just to clarify: I agree that there would be no point in an AI flagging different value types with a little metadata flag saying 'religious taboo' vs 'food preference' unless that metadata was computationally relevant to the kinds of learning, inference, generalization, and decision-making that the AI did. But my larger point was that humans treat these value types very differently in terms of decision-making (especially in social contexts), so true AI alignment would require that AI systems do too.

I wasn't picturi... (read more)

Koen - thanks for the link to ACM FAccT; looks interesting. I'll see what their people have to say about the 'aligned with whom' question.  

I agree that AI X-risk folks should probably pay more attention to the algorithmic fairness folks and self-driving car folks, in terms of seeing what general lessons can be learned about alignment from these specific domains.

Koen - thanks for your comment. I agree that too many AI safety researchers seem to be ignored all these socio-political issues relevant to alignment. My worry is that, given that many human values are tightly bound to political, religious, tribal, and cultural beliefs (or at least people think they are), ignoring those values means we won't actually achieve 'alignment' even when we think we have. The results could be much more disastrous than knowing we haven't achieved alignment.

You are welcome. Another answer to your question just occurred to me. If you count AI fairness research as a sub-type of AI alignment research, then you can find a whole community of alignment researchers who talk quite a lot with each other about 'aligned with whom' in quite sophisticated ways. Reference: the main conference of this community is ACM FAccT. In EA and on this forum, when people count the number of alignment researchers, they usually count dedicated x-risk alignment researchers only, and not the people working on fairness, or on the problem of making self-driving cars safer. There is a somewhat unexamined assumption in the AI x-risk community that fairness and self-driving car safety techniques are not very relevant to managing AI x-risk, both in the technical space and the policy space. The way my x-risk technical work is going, it is increasingly telling me that this unexamined assumption is entirely wrong. On a lighter note: Well, as long as the 'we' you are talking about here is a group of people that still includes Eliezer Yudkowsky, then I can guarantee that 'we' are in no danger of ever collectively believing that we have achieved alignment.

Thanks Mitchell, that's helpful. 

I think we need a lot more serious thinking about Plan B strategies. 

Hi Mitchell, what would be the best thing to read about MIRI's latest thinking on this issue (what you call Plan B)? 

I don't actually know. I only found out about this a few months ago. Before that, I thought they were still directly trying to solve the problem of "Friendly AI" (as it used to be known, before "alignment" became a buzzword). This is the thread where I learned about plan B.  Maybe this comment sums up the new attitude: 

Netcentrica - thanks for this thoughtful comment. 

I agree that the behavioral sciences, social sciences, and humanities need more serious (quantitative) research on values; there is some in fields such as political psychology, social psychology, cultural anthropology, comparative religion, etc -- but often such research is a bit pseudo-scientific and judgmental, biased by the personal/political views of the researchers. 

However, all these fields seem to agree that there are often much deeper and more pervasive differences in values across people ... (read more)

Viliam - this failure mode for AI is horrifyingly plausible, and all too likely. 

We already see a strong increase in wokeness among AI researchers, e.g. the panic about 'algorithmic bias'. If that trend continues, then any AI that looks aligned with some group's 'politically incorrect values' might be considered entirely 'unaligned', taboo, and dangerous. 

Then the fight over what counts as 'aligned with humanity' will boil down to a political fight over what counts as 'aligned with elite/dominant/prestigious group X's preferred political philosophy'.

2the gears to ascension1y
I would note, since you use the word "woke", that things typically considered woke to reason about - such as the rights of minorities - are in fact particularly important to get right. politically incorrect values are, in fact, often unfriendly to others; there's a reason they don't fare well politically. Generally, "western values" include things like coprotection, individual choice, and the consent of the governed - all very woke values. It's important to be able to design AI that will protect every culture from every other culture, or we risk not merely continuation of unacceptable intercultural dominance, but the possibility that the ai turns out to be biased against all of humanity. nothing less than a solution to all coprotection will protect humanity from demise. woke cannot be a buzzword that causes us to become silent about the things people are sometimes irrational about. they're right that they're important, just not always exactly right about what can be done to improve things. And importantly, there really is agentic pressure in the world to keep things in a bad situation. defect-heavy reproductive strategies require there to be people on a losing end.
Load More