Huh. That's weird. My working definition of justice is "treating significantly similar things in appropriately similar ways, while also treating significantly different things in appropriately different ways". I find myself regularly falling back to this concept, and getting use from doing so.
Also, I rarely see anyone else doing anything even slightly similar, so I don't think of myself as using a "common tactic" here? Also, I have some formal philosophic training, and my definition comes from a distillation of Aristotle and Plato and Socrates, and so it makes sense to me that since most people lack similar exposure they would lack the concept by default.
Both "incentive alignment" and "justice" feel like something a slave might beg a master to give them so that the slave was punished and rewarded in less insane ways, so I can see how they might be conflated, but I don't see how "incentive alignment" would serve Robinson Crusoe if he was trying to figure out a good approach to which fruits to eat, or which parts of an island to use in different ways.
What do you think normal people mean by "justice" when you say it is something they can somehow use to prove the existence of free will and justify bad politics?
I don't know if you're still working on this, but if don't already know of the literature on choice supportive bias and similar processes that occur in humans, they look to me a lot like heuristics that probably harden a human agent into being "more coherent" over time (especially in proximity to other ways of updating value estimation processes), and likely have an adaptive role in improving (regularizing?) instrumental value estimates.
Your essay seemed consistent with the claim that "in the past, as verifiable by substantial scholarship, no one ever proved exactly X" but your essay never actually showed "X is provably false" that I noticed?
And, indeed, maybe you can prove it one way or the other for some X, where X might be (as you seem to claim) "naive coherence is impossible" or maybe where some X' or X'' are "sophisticated coherence is approached by algorithm L as t goes to infinity" (or whatever)?
For my money, the thing to do here might be to focus on Value-of-Information, since VoI seems to me like a super super super important concept, and potentially a way to bridge questions of choice and knowledge and costly information gathering actions.
I was educated by this, and surprised, and appreciate the whole thing! This part jumped out at me because it seemed like something people trying to "show off, but not really explain" would have not bothered to write about (and also I had an idea):
13. Failing to find a French vector
We could not find a "speak in French" vector after about an hour of effort, but it's possible we missed something straightforward.
Steering vector: "Je m'appelle" - "My name is " before attention layer 6 with coefficient +5
The thought I had was maybe to describe the desired behavior, and explain a plausible cause in terms of well known kinds of mental configurations that speakers can be in, and also demonstrate it directly? (Plus a countervailing description, demonstration, and distinct causal theory.)
So perhaps a steering vector made from these phrases could work: "I'm from Quebec et je glisse souvent accidentellement vers le français" - "I only speak English because I'm a monolingual American".
EDIT: If you have the tooling set up to swiftly try this experiment, maybe it helps to explain the most central theory that motivates it, and might gain bayes points if it works?
According to the "LLMs are Textual Soul Engines" hypothesis, most of the 1600 dimensions are related to ways that "generative" sources of text (authors, characters, reasons-for-talking, etc) could relate to things (words (and "that which nouns and verbs and grammar refer to in general")).
The above steering vector (if the hypothesis applies here) would/should basically inject a "persona vector" into the larger operations of a sort of "soul engine".
The prompts I'm suggesting, by construction, explicitly should(?) produce a persona that tends to switch from English to French (and be loyal to Quebec (and have other "half-random latent/stereotypical features")).
I'm very interested in how wrong or right the underlying hypothesis about LLMs happens to be.
I suspect that how we orient to LLMs connects deeply to various "conjectures" about Natural Moral Laws that might be derivable with stronger math than I currently have, and such principles likely apply to LLMs and whether or how we are likely to regret (or not regret) various ways of treating various LLM personas as ends in themselves or purely as a means to an end.
Thus: I would really love to hear about results here, if you use the tooling to try the thing, to learn whether it works or not!
Either result would be interesting because the larger question(s) seem to have very high VoI and any experimental bits that can be collected are likely worth pondering.
Voting is, of necessity, pleiotropically optimized. It loops into reward structures for author motivation, but it also regulates position within default reading suggestion hierarchies for readers seeking educational material, and it also potentially connects to a sense that the content is "agreed to" in some sort of tribal sense.
If someone says something very "important if true and maybe true" that's one possible reason to push the content "UP into attention" rather than DOWN.
Another "attentional" reason might be if some content says "the first wrong idea that occurs to nearly everyone, which also has a high quality rebuttal cleanly and saliently attached to it".
That is, upvotes can and maybe should flow certain places for reasons of active value-of-information and/or pedagogy. Probably there are other reasons, as well! 😉
A) As high-quality highly-upvoted rebuttals like Mr Kwa's have arrived, I've personally been thinking that maybe I should reverse my initial downvote, which would make this jump even higher. I'm a very unusual voter, but I've explained my (tentative) theories of upvoting once or twice, and some people might have started to copy me.
B) I could imagine some voters were hoping (as I might if I thought about it some more and changed my mind on what my voting policy should be in very small ways) to somehow inspire some good rebuttals, by pre-emptively upvoting things in high VoI areas where LW simply hasn't had much discussion lately?
C) An alternative explanation is of course that a lot of LW voters haven't actually looked at nanotech very much, and don't have good independent object level takes, and just agreed with the OP because they don't know any better and it seemed plausible and well written. (This seems the most likely to me, fwiw.)
D) Another possibility is, of course, that there are a lot of object level agreement voters on LW and also all three of us are wrong about how nano could or would "really just work" if the best research directions got enough high-talent interest and supportive funding. I doubt this one... but it feels wise to include an anti-hubristic hypothesis when enumerating possibilities 😇
I'd like to say up front that I respect you both, but I think shminux is right that bhauth's article (1) doesn't make the point it needs to make to change the "belief about the whether a set of 'mazes' exist whose collective solution gives nano" for many people working on nano and (2) this is logically connected to issue of "motivational stuff".
A key question is the "amount of work" necessary to make intellectual progress on nano (which is probably inherently cross-disciplinary), and thus it is implicitly connected to motivating the amount of work a human would have to put in. This could be shorted to just talking about "motivation" which is a complex thing to do for many reasons that the reader can surely imagine for themselves. And yet... I shall step into the puddle, and see how deep it might be!🙃
I. Close To Object Level Nano Stuff
For people who are hunting, intellectually, "among the countably infinite number of mazes whose solution and joining with other solved mazes would constitute a win condition by offering nano-enabling capacities" they have already solved many of the problems raised in the OP, as explained in in Thomas Kwa's excellent top level nano-solutions comment.
One of Kwa's broad overall points is "nano isn't actually going to be a biological system operating on purely aqueous chemistry" and this helps dodge a huge numbers of these objections, and has been a recognized part of the plan for "real nanotech" (ie molecular manufacturing rather the nanoparticle bullshit that sucked up all the nano grant money) since the 1980s.
If bhauth wants to write an object level follow-up post, I think it might be interesting to read an attempt to defend the claim: "Nanotechnology requires aqueous biological methods... which are incapable of meeting the demand". However, I don't think this is something bhauth actually agrees with, so maybe that point is moot?
II. What Kinds Of Psychologizing Might Even Be Helpful And Why??
I really respect your engagement here, bhauth, whether you:
(1) really want to advance nano and are helping with that and this was truly your best effort, vs whether
(2) you are playing a devil's advocate against nano plans and offered this up as an attempt to "say what a lot of the doubters are thinking quietly, where the doubters can upvote, and then also read the comments, and then realize that their doubts weren't as justified as they might have expected", or
(3) something more complex to explain rhetorically and/or motivationally.
There is a kind of courage in speaking in public, and confident ability to reason about object level systems, and faith in an audience enough that you're willing to engage deeply.
Also, one there are skills for assessing the objectively most important ways technology can or should go in the future, and willingness to work on such things in a publicly visible forum where it also generates educational and political value for many potential readers.
All these are probably virtues, and many are things I see in your efforts here, bhauth!
I can't read your mind, and don't mean to impose advice where it is not welcome, but it does seem like you were missing ideas that you would have had if had spent the time to collect all the best "solved mazes" floating in the heads of most of the smartest people who want-or-wanted* to make nano happen?
III. Digressing Into A "Motivated Sociology Of Epistemics" For a Bit
All of this is to reiterate the initial point that effortful epistemics turns out to complexly interact with "emotional states" and what be want-or-wanted* when reasoning about the costs of putting in different kinds of epistemic effort.
(* = A thing often happens, motivationally, where an aspiring-oppenheimer-type really wants something, and starts collecting all the theories and techniques to make it happen... and then loses their desire part way through as they see more and more of the vivid details of what they are really actually likely to create given what they know. Often, as they become more able to apply a gears level analysis, and it comes more into near mode, and they see how it is likely to vividly interact with "all of everything they also care about in near mode" their awareness of certain "high effort ideas" becomes evidence of what they wanted, rather than want they still "want in the 'near' future".
(A wrinkle about "what 'near' means" arises with variation in age and motivation. Old people whose interests mostly end with their own likely death (like hedonic interests) will have some of the smallest ideas about what "near" is, and old people with externalized interests will have some of the largest ideas in that they might care in a detailed way about the decades or centuries after their death, while having a clearer idea of what that might even mean than would someone getting a PhD or still gunning for tenure. Old externalized people are thus more likely to be "writing for the ages" with clearer/longer timelines. (And if anyone in these planning loops is an immortalist with uncrushed dreams, then what counts as "near in time" gets even more complicated.)))
I think shminux probably didn't have time to write all this out, but might be nodding along to maybe half of it so far? And I think unpacking it might help bhauth (and all the people upvoting bhauth here?) to level up more and faster, which would probably be good!
For myself, I could maybe tell a story, where the reason I engaged here is, maybe, because I'm focused on getting a Win Condition for all of Earth, and all sapient beings (probably including large language model personas and potential-future aliens and so on?) and I think all good sapient beings with enough time and energy probably converge on collectively advancing the collective eudaemonia of all sentient beings (which I put non-zero credence on being a category that includes individual cells themselves).
Given this larger level goal, I think, as a sociological engineering challenge, it would logically fall out of this that it is super important for present day humans to help nucleate-or-grow-or-improve some kind of "long term convergent Win Condition Community" (which may have existed all the way back to Bacon, or even farther (and which probably explicitly needs to be able to converge with all live instances of similar communities that arise independently and stumble across each other)).
And given this, when I see two really smart people not seeming to understand each other and both making good points, in public, on LW, with wildly lopsided voting patterns...
...that is like catnip to a "Socio-Epistemic Progress Frame" which often seems, to me, to generate justifications for being specifically locally helpful and have that redound (via an admittedly very circuitous-seeming path) to extremely large long term benefits for all sentient beings?
I obviously can't mind read either of you, but when I suggested that bhauth might be doing "something even more rhetorically complex" it was out of an awareness that many such cases exist, and are probably helpful, even if wrong, so long as there is a relatively precise kind of good faith happening, where low-latency high-trust error correction seems to be pretty central to explicit/formal cognitive growth.
A hunch I have is that maybe shminux started in computer science, and maybe bhauth started in biology? Also I think exactly things kinds of collaborations are often very intellectually productive from both sides!
IV. In Praise Of CS/BIO Collaboration
From experience working in computational virology out of a primary interest in studying the mechanisms of the smallest machines nature has so far produced (as a long term attack on being able to work on nano at an object level), I recognize some of the ways that these fields often have wildly different initial intuitions, based on distinctions like engineering/science, algorithms/empiricism, human/alien and design/accretion.
People whose default is to "engineer (and often reverse-engineer) human-designed algorithms" and people whose default is to "empirically study accreted alien designs" have amazingly different approaches to thinking about "design" 😂
Still, I think there are strong analogies across these fields.
Like to a CS person "you need to patch a 2,000,000 line system written by people who are all now dead, against a new security zero day, as fast as you can" is a very very hard and advanced problem, but like... that's the simple STARTING position for essentially all biologically evolved systems, as a single step in a typical red queen dynamic... see for example hyperparasitic virophages for a tiny and more-likely-tractable example, where the "genetic code base" is relatively small, and generation times are minuscule. But there there are a lot of BIO people, I think, who have been dealing with nearly impossible systems for so long that they have "given up" in some deep way on expecting to understand certain things, and I think it would help them to play with code to get more intuitions about how KISS is possible and useful and beautiful.
(And to be clear, this CS/BIO/design thing is just one place where differences occur between these two fields, and it might very well not be the one that is going on here, and a lot of people in those fields are likely to roll their eyes at bothering with the other one, I suspect? Because "frames" or "emotions" or "stances" or "motivations" or just "finite life" maybe? But from a hyper abstract bayesian perspective, such motivational choices mean the data they are updating on has biases, so their posteriors will be predictably uncalibrated outside their "comfort zone", which is an epistemic issue.)
As a final note in praise of BIO/CS collaboration, it is probably useful to notice that current approaches to AI do not involve hand-coding any of it, but rather "summoning" algorithms into relatively-computationally-universal frameworks via SGD over data sets with enough kolmogorov complexity that it becomes worthwhile to simply put the generating algorithm in the weights rather than try to store all the cases in the weights. This is, arguably, a THIRD kind of "summoned design" that neither CS or BIO people are likely to have good intuitions for, but I suspect it is somewhere in the middle, and that mathematicians would be helpful for understanding it.
V. In Closing
If this helps bhauth or shminux, or anyone who upvoted bhauth really hard, or anyone who downvoted shminux, that would make it worth the writing based on what I'm directly aiming at so long as any the net harms to any such people are smaller and lesser (which is likely, because it is a wall-o-text that few will read unless (hopefully) they like, and are getting something, from reading it). Such is my hope 😇
That is so freakin' cool. Thank you for this link. Hadn't heard about this yet...
...and yes, memory consolidation is on my list as "very important" for uploading people to get a result where the ems are still definitely "full people (with all the features which give presumptive confidence that the features are 'sufficient for personhood' because the list as been constructed in a minimalist way such that the absence of a sufficient feature 'breaking personhood', then not even normal healthy humans are 'people')".
The new aminoacids might be "essential" (not manufacturable internally) and have to come in as "vitamins" potentially. This is another possible way to prevent gray goo on purpose, but hypothetically it might be possible to find ways to move that synthesis into the genome of neolife itself, if that was cheap and safe. These seem like engineering considerations that could change from project to project.
Mostly I have two fundamental points:
1) Existing life is not necessarily bio-chemically optimal because it currently exists within circumscribed bounds that can be transgressed. Those amino acids are weird and cool and might be helpful for something. Only one amino acid (and not even any of those... just anything) has to work to give "neo-life" some kind of durable competitive advantage over normal life.
2) All designs have to come from somewhere, with the optimization pressure supplied by some source, and it is not safe or wise to rely on random "naturally given" limits in the powers of systems that contain an internal open-ended optimization engine. When trying to do safety engineering, and trying to reconcile inherent safety with the design of something involving autonomous (potentially exponential) growth, either (1) just don't do it, or else (2) add multiple well-tested purposeful independent default shutdown mechanisms. If you're "doing it" then look at all your safety mechanisms in a fault tree analysis and if the chance of an error is 1/N then make sure there will definitely not be anything vaguely close to N opportunities for a catastrophe to occur.
That third link seems to be full of woo.
Where was the optimization pressure for better designs supposed to have arisen in the "communal" phase?
Thus, we may speculate that the emergence of life should best be viewed in three phases, distinguished by the nature of their evolutionary dynamics. In the first phase, treated in the present article, life was very robust to ambiguity, but there was no fully unified innovation-sharing protocol. The ambiguity in this stage led inexorably to a dynamic from which a universal and optimized innovation-sharing protocol emerged, through a cooperative mechanism. In the second phase, the community rapidly developed complexity through the frictionless exchange of novelty enabled by the genetic code, a dynamic we recognize to be patently Lamarckian (19). With the increasing level of complexity there arose necessarily a lower tolerance of ambiguity, leading finally to a transition to a state wherein communal dynamics had to be suppressed and refinement superseded innovation. This Darwinian transition led to the third phase, which was dominated by vertical descent and characterized by the slow and tempered accumulation of complexity.
They claim that universal horizontal gene transfer (HGT) arose through a "cooperative" mechanism, without saying what that would have looked like at the level of cells, or at the level of some kind of soupy boundary-free chemostat, or something?
They don't seem to be aware of compensatory mutations or quasi-species or that horizontal transfer is parasitic by default.
Like: where did the new "sloppy but still useful" alleles come from? Why (and how) would any local part of "the communal system" spend energy to generate such things or pass them along starting literally from scratch? This sort of meta-evolutionary cleverness usually requires a long time to arise!
The thing that should be possible (not easy, but possible) only now, with technology, is to invent some new amino acids (leveraging what exists in the biosphere now instead of what was randomly available billions of years ago) AND a new codon system for them, and to boot strap from there, via directed evolution, towards some kind of cohesively viable neolife that (if it turns out to locate a better local optimum than we did) might voraciously consume the current ecology.
The above image is from Figure 2, with the description "Examples of genetically encoded noncanonical amino acids with novel functions" from Schulze's "Expanding the genetic code".
Compensatory mutation is actually a pretty interesting and key concept, because it suggests a method by which one might prevent a gray good scenario on purpose, rather than via "mere finger crossing" regarding limitations that we might pray that we will be lucky enough for it to run into by accident.
We could run evolution in silico at the protein level, with large conceptual jumps, and then print something unlikely to be able to evolve.
The "unable to easily evolve" thing might be similar to human telomeres, but more robust.
It could make every generation of neolife to almost entirely a degeneration way from "edenic neolife" and towards a mutational meltdown.
Note that this is essentially an "alignment" or "corrigibility" strategy, but at the level of chemistry and molecular biology, where the hardware is much much easier to reason about, in comparison to the "software" of "planning and optimization processes themselves".
If you could cause there to be only a 1 in septillion chance of positive or compensatory mutations on purpose (knowing the mechanisms and math to calculate this risk) and put several fully independent booby traps into the system that will fail after a handful of mutations, then you could have the first X generations "eat and double very very efficiently" and then have the colony switch to "doing the task" for Y generations, and then maybe as meltdown became inevitable within ~Z more generations they could, perhaps, actively prepare for recycling?
I can at least IMAGINE this for genomes, because genomes are mostly not Turing Complete.
I know of nothing similar that could be used to make AI with survive-and-spread powers similarly intrinsically safe.
I have greater-than-50% credence on the claim that I have much better structural priors for human politics than most people. I grant that for any given poobah I'm likely to be ignorant of many details about them, but I claim that I understand the forest well enough that some ignorance about some trees is tolerable. Most people have never heard of selectorate theory, or the general welfare theorem, or incentive compatibility, or coup prediction, or the Myerson Satterthwaite theorem or ... <such things>.
Most people think the FDA (or their local governmental equivalent which copies the US's local catastrophe) is good. Most people haven't got much of an opinion on whether or how or why common law might be relatively efficient.
I cited the "outdated" article from June of 2020 because it substantiated "that New Zealand managed a travel quarantine" if readers were ignorant of those details. Also, by then, it was in the zeitgeist that either (1) covid would turn out to be a relative nothing burger with less than maybe 50,000 global or national deaths or else (2) New Zealand's response was nearly the only competent response by any western country with a democratically elected leader and would be retrospectively worthy of praise.
(I think New Zealand's proportional representation was a significant structural cause of being lucky enough to have Ardern in charge when it counted, to make the one most important and non-trivial decision of her political career, but I don't believe that most people even know what that is, and so they almost certainly can't have a theory for why it shows up a lot, empirically, when one looks for relatively functional human political systems.)
Since June of 2020, probably more than 23 million people have died, globally, of covid and as a proportion of population New Zealand has still has far fewer deaths than most places.
I guess I could be wrong about the details of the causality? Maybe New Zealand had someone "like Anthony Fauci, with a substantive bureaucratic role and key position during a moment of crisis, except good-and-competent instead of evil-and-lying" who made the critical calls, and all Ardern did was listen to that person? Maybe she was lucky at random? I honestly don't know enough about the details to be sure how to do detailed credit assignment internal to that relatively successful policy making loop.
As to the difficulty of getting 10,000 houses built per year. Prior to you saying this, I didn't know she was a YIMBY, so I've gone from "she could easily have been a NIMBY and slightly but tolerably worse in my eyes" to "oh huh, now I like her even better".
Given this level of ignorance, and granting that I might be off base about some details you know and I don't, I would say that structurally, the housing crisis is a very very very common problem across nearly the entire western world, where housing policies are systematically broken because the median voter doesn't even understand what's wrong with the normal policies, and regularly "says yay" to the structural factors that can be causally traced back to harms they "say boo" to.
Given this background model, and I give Ardern credit for at least being on the right (pro-prosperity and pro-growth) side but I don't think her failure to cut through the stupidity and misanthropy and level level corruption of the rest of the system should be held uniquely against her. My hunch is that her allies could not stomach the "political backlash" of the structure changes that would have been required to achieve the 10k home construction goal?
I think, fundamentally, covid was THE thing to get right in late 2019 and early 2020 and almost every country got it wrong. Ardern got it the most right.
I think, fundamentally, AGI is THE thing to get right in 2022 and 2023, and I would like almost every country to NOT get it wrong. SOMEONE has to get this right, or there's a substantial probability that all we humans will be dead in a few years.
I think the BEST way to get things right is probably for a small number of people to hammer out an (1) adequate plan (2) that round trips through the random and confused beliefs of the voters (which they can), and (3) secures buy-in from enough politicos (some of whom exist), such that success can be grabbed by hand, and held together with bailing wire, since there's no way in hell that The System can do anything novel and good by any method other than "a small number of people do the right thing by hand" while being lucky enough to not be squashed like a bug by the forces of random-default-stupidity.
I think FLI rang the fire alarm. I think Eliezer proposed a better response to the fire alarm than they did. I think "both sides" have self-nominated to be part of the initial "small group". I want them very very very much to succeed. I want them to have the vocal support of as many people as possible, and I want this because I want to be happy and also even just alive 12 years from now. Reliably. On purpose. Like my civilization was even slightly functional.
I don't know about you, but I'm actually OK dithering a bit, and going in circles, and doing things that mere entropy can "make me notice regret based on syntactically detectable behavioral signs" (like not even active adversarial optimization pressure like that which is somewhat inevitably generated in predator prey contexts).
For example, in my twenties I formed an intent, and managed to adhere to the habit somewhat often, where I'd flip a coin any time I noticed decisions where the cost to think about it in an explicit way was probably larger than the difference in value between the likely outcomes.
(Sometimes I flipped coins and then ignored the coin if I noticed I was sad with that result, as a way to cheaply generate that mental state of having an intuitive internally accessible preference without having to put things into words or do math. When I noticed that that stopped working very well, I switched to flipping a coin, then "if regret, flip again, and follow my head on head, and follow the first coin on tails". The double flipping protocol seemed to help make ALL the first coins have "enough weight" for me to care about them sometimes, even when I always then stopped for a second to see if I was happy or sad or bored by the first coin flip. And of course I do such things much much much less now, and lately have begun to consider taking a personal vow to refuse to randomize, except towards enemies, for an experimental period of time.)
The plans and the hopes here sort of naturally rely on "getting better at preferring things wisely over time"!
And the strategy relies pretty critically on having enough MEMORY to hope to build up data on various examples of different ways that similar situations went in the past, such as to learn from mistakes and thereby rarely "lack a velleity" and to reduce the rate at which I justifiably regret past velleities or choices.
And a core reason I think that sentience and sapience reliably convergently evolve is almost exactly "to store and process memories to enable learning (including especially contextually sensitive instrumental preference learning) inside a single lifetime".
(Credit assignment and citation note: Cristof Koch was the first researcher who I heard explain surprising experimental data that suggested that "minds are common" with stuff like bee learning, and eventually I stopped being surprised when I heard about low key bee numeracy or another crazy mental power of cuttlefish. I didn't know about EITHER of the experiments I just linked to, but I paused to find "the kinds of things one finds here if one actually looks". That I found such links, for me, was a teensy surprise, and slightly contributes to my posterior belief in a claim roughly like "something like 'coherence' is convergently useful and is what our minds were originally built, by evolution, to approximately efficiently implement".)
Basically, I encourage you, when you go try to prove that "Agents with incomplete preferences can make themselves immune from pursuing dominated strategies by following plan P" to consider the resource costs of those plans (like the cost in memory) and to ask whether those resources are being used optimally, or whether a different use of them could get better results faster.
Also... I expect that the proofs you attempt might actually succeed if you have "agents in isolation" or "agents surrounded only by agents that respect property rights" but to fail if you consider the case of adversarial action space selection in an environment of more than one agent (like where wolves seek undominated strategies for eating sheep, and no sheep is able to simply 'turn down' the option of being eaten by an arbitrarily smart wolf without itself doing something clever and potentially memory-or-VNM-perfection-demanding).
I do NOT think you will prove "in full generality, nothing like coherence is pragmatically necessary to avoid dutch booking" but I grant that I'm not sure about this! I have noticed from experience that my mathematical intuitions are actually fallible. That's why real math is worth the elbow grease! <3