Fair enough. Thanks for replying. It's helpful to have a little more background on Ben. (I might write more, but I'm busy with a newborn baby here...)
Jim - I didn't claim that libel law solves all problems in holding people to higher epistemic standards.
Often, it can be helpful just to incentivize avoiding the most egregious forms of lying and bias -- e.g. punishing situations when 'the writer had actual knowledge that the claims were false, or was completely indifferent to whether they were true or false'.
Rob - you claim 'it's very obvious that Ben is neither deliberately asserting falsehoods, nor publishing "with reckless disregard'.
Why do you think that's obvious? We don't know the facts of the matter. We don't know what information he gathered. We don't know the contents of the interviews he did. As far as we can tell, there was no independent editing, fact-checking, or oversight in this writing process. He's just a guy who hasn't been trained as an investigative journalist, who did some investigative journalism-type research, and wrote it up.
Number of h... (read more)
Why do you think that's obvious?
I know Ben, I've conversed with him a number of times in the past and seen lots of his LW comments, and I have a very strong and confident sense of his priorities and values. I also read the post, which "shows its work" to such a degree that Ben would need to be unusually evil and deceptive in order for this post to be an act of deception.
I don't have any private knowledge about Nonlinear or about Ben's investigation, but I'm happy to vouch for Ben, such that if he turns out to have been lying, I ought to take a credibility ... (read more)
(Note: this was cross-posted to EA Forum here; I've corrected a couple of minor typos, and swapping out 'EA Forum' for 'LessWrong' where appropriate)
A note on EA LessWrong posts as (amateur) investigative journalism:
When passions are running high, it can be helpful to take a step back and assess what's going on here a little more objectively.
There are all different kinds of EA Forum LessWrong posts that we evaluate using different criteria. Some posts announce new funding opportunities; we evaluate these in terms of brevity, clarity, relevance, and useful ... (read more)
A brief note on defamation law:
The whole point of having laws against defamation, whether libel (written defamation) or slander (spoken defamation), is to hold people to higher epistemic standards when they communicate very negative things about people or organizations -- especially negative things that would stick in the readers/listeners minds in ways that would be very hard for subsequent corrections or clarifications to counter-act.
Without making any comment about the accuracy or inaccuracy of this post, I would just point out that nobody in EA should ... (read more)
(Copying my response from the EA Forum)
I agree there are some circumstances under which libel suits are justified, but the net-effect on the availability of libel suits strikes me as extremely negative for communities like ours, and I think it's very reasonable to have very strong norms against threatening or going through with these kinds of suits. Just because an option is legally available, doesn't mean that a community has to be fine with that option being pursued.
That is the whole point and function of defamation law: to promote especially high standa
What you described was perhaps the intent behind the law, but that's not necessarily how it is used in practice. You can use the law to intimidate people who have less money than you, simply by giving the money to a lawyer... and then the other side needs to spend about the same money on their lawyer... or risk losing the case. "The process is the punishment."
(I have recently contributed money to a defense fund of a woman who exposed a certain criminal organization in my country. The organization was disbanded, a few members were convicted, one of them end... (read more)
The whole point of having laws against defamation, whether libel (written defamation) or slander (spoken defamation), is to hold people to higher epistemic standards when they communicate very negative things about people or organizations
This might be true of some other country's laws against defamation, but it is not true of defamation law in the US. Under US law, merely being wrong, sloppy, and bad at reasoning would not be sufficient to make something count as defamation; it only counts if the writer had actual knowledge that the claims were false, or was completely indifferent to whether they were true or false.
Without making any comment about the accuracy or inaccuracy of this post, I would just point out that nobody in EA should be shocked that an organization (e.g. Nonlinear) that is being libeled (in its view) would threaten a libel suit to deter the false accusations (as they see them), to nudge the author(e.g. Ben Pace) towards making sure that their negative claims are factually correct and contextually fair.
Wikipedia claims: "The 1964 case New York Times Co. v. Sullivan, however, radically changed the nature of libel law in the United States by esta... (read more)
Gordon - I was also puzzled by the initial downvotes. But they happened so quickly that I figured the downvoters hadn't actually read or digested my essay. Disappointing that this happens on LessWrong, but here we are.
Max - I think your observations are right. The 'normies', once they understand AI extinction risk, tend to have much clearer, more decisive, more negative moral reactions to AI than many EAs, rationalists, and technophiles tend to have. (We've been conditioned by our EA/Rat subcultures to think we need to 'play nice' with the AI industry, no matter how sociopathic it proves to be.)
Whether a moral anti-AI backlash can actually slow AI progress is the Big Question. I think so, but my epistemic confidence on this issue is pretty wide. As an evolutionary... (read more)
Maybe. But at the moment, the US is really the only significant actor in the AGI development space. Other nations are reacting in various ways, ranging from curious concern to geopolitical horror. But if we want to minimize risk of a nation-state AI arms races, the burden is on the US companies to Just Stop Unilaterally Driving The Arms Race.
I'm predicting that an anti-AI backlash is likely, given human moral psychology and the likely applications of AI over the next few years.
In further essays I'm working on, I'll probably end up arguing that an anti-AI backlash may be a good strategy for reducing AI extinction risk -- probably much faster, more effective, and more globally applicable than any formal regulatory regime or AI safety tactics that the AI industry is willing to adopt.
Well, the AI industry and the pro-AI accelerationists believe that there is an 'immense upside of AGI', but that is a highly speculative, faith-based claim, IMHO. (The case for narrow AI having clear upsides is much stronger, I think.)
It's worth noting that almost every R&D field that has been morally stigmatized -- such as intelligence research, evolutionary psychology, and behavior genetics -- also offered huge and transformative upsides to society, when the field first developed. Until they got crushed by political demonization, and their potential ... (read more)
I don't think so. My friend Peter Todd's email addresses typically include his middle initial 'm'.
mwatkins - thanks for a fascinating, detailed post.
This is all very weird and concerning. As it happens, my best friend since grad school is Peter Todd, professor of cognitive science, psychology, & informatics at Indiana University. We used to publish a fair amount on neural networks and genetic algorithms back in the 90s.
That's somewhat helpful.
I think we're coming at this issue from different angles -- I'm taking a very evolutionary-functional view focused on what selection pressures shape psychological adaptations, what environmental information those adaptations need to track (e.g. snake! or pathogen!), what they need to represent about the world (e.g. imminent danger of death from threat X!), and what behaviors they need to trigger (e.g. run away!).
From that evolutionary-functional view, the 'high-level cognitive properties' of 'fitness affordances' are the... (read more)
If we're dead-serious about infohazards, we can't just be thinking in terms of 'information that might accidentally become known to others through naive LessWrong newbies sharing it on Twitter'.
Rather, we need to be thinking in terms of 'how could we actually prevent the military intelligence analysts of rival superpowers from being able to access this information'?
My personal hunch is that there are very few ways we could set up sites, security protocols, and vetting methods that would be sufficient to prevent access by a determined government. Whic... (read more)
Bluntly: if you write it on Lesswrong or the Alignment Forum, or send it to a particular known person, governments will get a copy if they care to. Cybersecurity against state actors is really, really, really hard. Lesswrong is not capable of state-level cyberdefense.
If you must write it at all: do so with hardware which has been rendered physically unable to connect to the internet, and distribute only on paper, discussing only in areas without microphones. Consider authoring only on paper in the first place. Note that physical compromise of your home, w... (read more)
If we're nowhere close to solving alignment well enough that even a coarse-grained description of actual human values is relevant yet, then I don't understand why anyone is advocating further AI research at this point.
Also, 'avoiding deceptive alignment' doesn't really mean anything if we don't have a relatively rich and detailed description of what 'authentic alignment' with human values would look like.
I'm truly puzzled by the resistance that the AI alignment community has against learning a bit more about the human values we're allegedly aligning with.
GeneSmith -- I guess I'm still puzzled about how Shard Theory prevents wireheading (broadly construed); I just don't see it as a magic bullet that can keep agents focused on their ultimate goals. I must be missing something.
And, insofar as Shard Theory is supposed to be an empirically accurate description of human agents, it would need to explain why some people become fentanyl addicts who might eventually overdose, and others don't. Or why some people pursue credentials and careers at the cost of staying childless... while others settle down young, have s... (read more)
Akash -- this is very helpful; thanks for compiling it!
I'm struck that much of the advice for newbies interested in 'AI alignment with human values' is focused very heavily on the 'AI' side of alignment, and not on the 'human values' side of alignment -- despite the fact that many behavioral and social sciences have been studying human values for many decades.
It might be helpful to expand lists like these to include recommended papers, books, blogs, videos, etc that can help alignment newbies develop a more sophisticated understanding of the human psycholo... (read more)
GeneSmith -- when people in AI alignment or LessWrong talk about 'wireheading', I understood that not to refer to people literally asking neurosurgeons to stick wires into their brains, but rather to a somewhat larger class of ways to hack one's own reward systems through the usual perceptual input channels.
I agree that humans are not 'reward maximizing agents', whatever that is supposed to mean in reference to actual evolved organisms with diverse, heterogenous, & domain-specific motivational systems.
Quintin (and also Alex) - first, let me say, thank you for the friendly, collegial, and constructive comments and replies you've offered. Many folks get reactive and defensive when they're hit with a 6,000-word critique of their theory, but you're remained constructive and intellectually engaged. So, thanks for that.
On the general point about Shard Theory being a relatively 'Blank Slate' account, it might help to think about two different meanings of 'Blank Slate' -- mechanistic versus functional.
A mechanistic Blank Slate approach (which I take Shard Theor... (read more)
TurnTrout -- I think the 'either/or' framing here is misleading about the way that genomes can adapt to maximize survival and minimize death.
For example, jumping spiders have evolved special secondary eyes pointing backwards that specifically detect predators from behind that might want to eat them. At the functional level of minimizing death, these eyes 'hardcode death-fear' in a very real and morphological way. Similarly, many animals vulnerable to predators evolve eye locations on the sides of their heads, to maximize degrees of visual coverage they can... (read more)
Jan - well said, and I strongly agree with your perspective here.
Any theory of human values should also be consistent with the deep evolutionary history of the adaptive origins and functions of values in general - from the earliest Cambrian animals with complex nervous systems through vertebrates, social primates, and prehistoric hominids.
As William James pointed out in 1890 (paraphrasing here), human intelligence depends on humans have more evolved instincts, preferences, and values than other animals, not having fewer.
For what it's worth, I wrote a critique of Shard Theory here on LessWrong (on Oct 20, 2022) from the perspective of behavior genetics and the heritability of values.
The comments include some helpful replies and discussions with Shard Theory developers Quintin Pope and Alex Trout.
I'd welcome any other feedback as well.
Quintin -- yes, indeed, one of the reasons I was excited about Shard Theory was that it has these different emphases you mention (e.g. 'multi-optimizer dynamics, values handshakes among shards, origins of self-reflective modeling, origins of biases, moral reflection as shard deliberation'), which I thought might actually be useful to develop and integrate with in evolutionary psychology and other branches of psychology, not just in AI alignment.
So I wanted to see if Shard Theory could be made a little more consistent with behavior genetics and ev psy... (read more)
Quintin & Alex - this is a very tricky issue that's been discussed in evolutionary psychology since the late 1980s.
Way back then, Leda Cosmides & John Tooby pointed out that the human genome will 'offload' any information it can that's needed for brain development onto any environmental regularities that can be expected to be available externally, out in the world. For example, the genome doesn't need to specify everything about time, space, and causality that might be relevant in reliably building a brain that can do intuitive physics -- as ... (read more)
GeneSmith -- thanks for your comment. I'll need to think about some of your questions a bit more before replying.
But one idea popped out to me: the idea that shard theory offers 'a good explanation of how humans were able to avoid wireheading.'
I don't understand this claim on two levels:
PS, Gary Marcus at NYU makes some related points about Blank Slate psychology being embraced a bit too uncritically by certain strands of thinking in AI research and AI safety.
His essay '5 myths about learning and innateness'
His essay 'The new science of alt intelligence'
His 2017 debate with AI researcher Yann LeCunn 'Does AI need more innate machinery'
I don't agree with Gary Marcus about everything, but I think his views are worth a bit more attention from AI alignment thinkers.
tailcalled -- these issues of variance, canalization, quality control, etc are very interesting.
For example, it's very difficult to understand why so many human mental disorders are common, heritable, and harmful -- why wouldn't the genetic variants that cause schizophrenia or major depression already have been eliminated by selection? Our BBS target article in 2006 addressed this.
Conversely, it's a bit puzzling that the coefficient of additive genetic variation in human brain size is lower than might be expected, according to our 2007 meta-analysis.
In gen... (read more)
Jacob - thanks! Glad you found that article interesting. Much appreciated. I'll read the links essays when I can.
It's hard to know how to respond to this comment, which reveals some fundamental misunderstandings of heritability and of behavior genetics methods. The LessWrong protocol is 'If you disagree, try getting curious about what your partner is thinking'. But in some cases, people unfamiliar with a field have the same old misconceptions about the field, repeated over and over. So I'm honestly having trouble arousing my curiosity....
The quote from habryka doesn't make sense to me, and doesn't seem to understand how behavior genetic studies estimate heritabilitie... (read more)
Jacob, I'm having trouble reconciling your view of brains as 'Universal Learning Machines' (and almost everything being culturally transmitted), with the fact that millions of other animals species show exactly the kinds of domain-specific adaptive responses studied in evolutionary biology, animal behavior research, and evolutionary psychology.
Why would 'fear of death' be 'culturally transmitted' in humans, when thousands of other vertebrate species show many complex psychological and physiological adaptations to avoid accidents, starvation, parasitism, an... (read more)
You seem to be making some very sweeping claims about heritability here. In what sense is 'heritability not what I think'?
Do you seriously think that moderate heritability doesn't say anything at all about how much genes matter, versus how much 'non-agentic things can influence a trait'?
My phrasing was slightly tongue-in-cheek; I agree that sex hormones, hormone receptors in the brain, and the genomic regulatory elements that they activate, have pervasive effects on brain development and psychological sex differences.
Off topic: yes, I'm familiar with evolutionary game theory; I was senior research fellow in an evolutionary game theory center at University College London 1996 - 2000, and game theory strongly influenced my thinking about sexual selection and social signaling.
Steven -- thanks very much for your long, thoughtful, and constructive comment. I really appreciate it, and it does help to clear up a few of my puzzlements about Shard Theory (but not all of them!).
Let me ruminate on your comment, and read your linked essays.
I have been thinking about how evolution can implement different kinds of neural architectures, with different degrees of specificity versus generality, ever since my first paper in 1989 on using genetic algorithms to evolve neural networks. Our 1994 paper on using genetic algorithms to evolve sensori... (read more)
Jacob - I read your 2015 essay. It is interesting and makes some fruitful points.
I am puzzled, though, about when nervous systems are supposed to have evolved this 'Universal Learning Machine' (ULM) capability. Did ULMs emerge with the transition from invertebrates to vertebrates? From rat-like mammals to social primates? From apes with 400 cc brains to early humans with 1100 cc brains?
Presumably bumblebees (1 million neurons) don't have ULM capabilities, but humans (80 billion neurons) allegedly do. Where is the threshold between them -- given that bumble... (read more)
Charlie - thanks for offering a little more 'origin story' insight into Shard Theory, and for trying to explain what Quintin Trout was trying to express in that passage.
Honestly, I still don't get it. The 'developmental recipe' that maps from genotype to phenotype, for any complex adaptation, is usually opaque, complicated, uninterpretable, and full of complex feedback loops, regulatory systems, and quality control systems. These are typically beyond all human comprehension, because there were never any evolutionary selection pressures for that development... (read more)
Jacob -- thanks for your comment. It offers an interesting hypothesis about some analogies between human brain systems and computer stuff.
Obviously, there's not enough information in the human genome to specify every detail of every synaptic connection. Nobody is claiming that the genome codes for that level of detail. Just as nobody would claim that the genome specifies every position for every cell in a human heart, spine, liver, or lymphatic system.
I would strongly dispute that it's the job of 'behavior genetics, psychology, etc' to fit thei... (read more)
Peter -- I think 'hard coding' and 'hard wiring' is a very misleading way to think about brain evolution and development; it's based way too much on the hardware/software distinction in computer science, and on 1970s/1980s cognitive science models inspired by computer science.
Apparently it's common in some AI alignment circles to view the limbic system as 'hard wired', and the neocortex as randomly initialized? Interesting if true. But I haven't met any behavior geneticists, neuroscientists, evolutionary psychologists, or developmental psychologists who wo... (read more)
I haven't read the universal learning hypothesis essay (2015) yet, but at first glance, it also looks vulnerable to a behavior genetic critique (and probably an evolutionary psychology critique as well).
In my view, evolved predispositions shape many aspects of learning, including Bayesian priors about how the world is likely to work, expectations about how contingencies work (e.g. the Garcia Effect that animals learn food aversions more strongly if the lag between food intake and nausea/distress is a few minutes/hours rather than immediate), domain-specifi... (read more)
tailcalled -- I agree that we don't yet have very good GWAS studies of political, religious, and moral ideology values; I was just illustrating that we already have ways of studying those (in principal), we have big genotyped samples in several international samples, and it's just a matter of time before researchers start asking people in those samples about their more abstract kinds of values, and then publishing GWAS studies on those values.
So, I think we're probably in agreement about that issue.
tailcalled -- thanks for your comments.
As a preliminary reply: here are links to a few genome-wide association studies concerning human values and value-like traits of various sorts:
These are just a few illustrative examples. The rate of research and publication for GWAS research is very high, and is accelerated by the existence of large, fully genotyped samples such as UK BioBank; to do genome-wide association studies on particular human values, i... (read more)
The expert preppers I know emphasize that prepping for potential disaster is usually treated as an individual-level or family-level activity, but often the best strategies involve more community-level networking, such as getting to know your neighbors, understanding local 'affordances' (water, food, shelter, weapons, experts), and thinking about who your best local allies and clan-mates will be if things get rough.
We evolved to survive in small clans nested within larger tribes, rather than as individual super-preppers. So, my advice would be, figure out who the most useful 6-10 people would be who live near you, and start establishing some communication, trust, and contingency plans with them.
There's a big difference between teleology (humans projecting purposiveness onto inanimate matter) and teleonomy (humans recognizing evolutionary adaptations that emerged to embody convergent instrumental goals that promote the final goals of survival and reproduction). The latter is what I'm talking about with this essay. The biological purposes are not just in the mind of the beholder.
I know that AI alignment researchers don't aim to hand-code human values into AI systems, and most aim to 'implicitly describe human values'. Agreed.
The issue is, which human values are you trying to implicitly incorporate into the AI system?
I guess if you think that all human values are generic, computationally interchangeable, extractible (from humans) by the same methods, and can be incorporated into AIs using the same methods, then that could work, in principle. But if we don't explicitly consider the whole range of human value types, how would we even test whether our generic methods could work for all relevant value types?
Hi Charlie, thanks for your comment.
Just to clarify: I agree that there would be no point in an AI flagging different value types with a little metadata flag saying 'religious taboo' vs 'food preference' unless that metadata was computationally relevant to the kinds of learning, inference, generalization, and decision-making that the AI did. But my larger point was that humans treat these value types very differently in terms of decision-making (especially in social contexts), so true AI alignment would require that AI systems do too.
I wasn't picturi... (read more)
Koen - thanks for the link to ACM FAccT; looks interesting. I'll see what their people have to say about the 'aligned with whom' question.
I agree that AI X-risk folks should probably pay more attention to the algorithmic fairness folks and self-driving car folks, in terms of seeing what general lessons can be learned about alignment from these specific domains.
Koen - thanks for your comment. I agree that too many AI safety researchers seem to be ignored all these socio-political issues relevant to alignment. My worry is that, given that many human values are tightly bound to political, religious, tribal, and cultural beliefs (or at least people think they are), ignoring those values means we won't actually achieve 'alignment' even when we think we have. The results could be much more disastrous than knowing we haven't achieved alignment.
Thanks Mitchell, that's helpful.
I think we need a lot more serious thinking about Plan B strategies.
Hi Mitchell, what would be the best thing to read about MIRI's latest thinking on this issue (what you call Plan B)?
Netcentrica - thanks for this thoughtful comment.
I agree that the behavioral sciences, social sciences, and humanities need more serious (quantitative) research on values; there is some in fields such as political psychology, social psychology, cultural anthropology, comparative religion, etc -- but often such research is a bit pseudo-scientific and judgmental, biased by the personal/political views of the researchers.
However, all these fields seem to agree that there are often much deeper and more pervasive differences in values across people ... (read more)
Viliam - this failure mode for AI is horrifyingly plausible, and all too likely.
We already see a strong increase in wokeness among AI researchers, e.g. the panic about 'algorithmic bias'. If that trend continues, then any AI that looks aligned with some group's 'politically incorrect values' might be considered entirely 'unaligned', taboo, and dangerous.
Then the fight over what counts as 'aligned with humanity' will boil down to a political fight over what counts as 'aligned with elite/dominant/prestigious group X's preferred political philosophy'.