This is Dr. Andrew Critch's professional LessWrong account.   Andrew is the CEO of Encultured AI, and works for ~1 day/week as a Research Scientist at the Center for Human-Compatible AI (CHAI) at UC Berkeley. He also spends around a ½ day per week volunteering for other projects like the Berkeley Existential Risk initiative and the Survival and Flourishing Fund.   Andrew earned his Ph.D. in mathematics at UC Berkeley studying applications of algebraic geometry to machine learning models. During that time, he cofounded the Center for Applied Rationality and SPARC. Dr. Critch has been offered university faculty and research positions in mathematics, mathematical biosciences, and philosophy, worked as an algorithmic stock trader at Jane Street Capital’s New York City office, and as a Research Fellow at the Machine Intelligence Research Institute. His current research interests include logical uncertainty, open source game theory, and mitigating race dynamics between companies and nations in AI development.


I'm afraid I'm sceptical that you methodology licenses the conclusions you draw.

Thanks for raising this.  It's one of the reasons I spelled out my methodology, to the extent that I had one.  You're right that, as I said, my methodology explicitly asks people to pay attention to the internal structure of what they were experiencing in themselves and calling consciousness, and to describe it on a process level.  Personally I'm confident that whatever people are managing to refer to by "consciousness" is a process than runs on matter.  If you're not confident of that, then you shouldn't be confident in my conclusion, because my methodology was premised on that assumption.

Of course people differ with respect to intuitions about the structure of consciousness. 

Why do you say "of course" here?  It could have turned out that people were all referring to the same structure, and their subjective sense of its presence would have aligned.  That turned out not to be the case.

But the structure is not the typical referent of the word 'conscious', 

I disagree with this claim.  Consciousness is almost certainly a process that runs on matter, in the brain.  Moreover, the belief that "consciousness exists" — whatever that means — is almost always derived from some first-person sense of awareness of that process, whatever it is.  In my investigations, I asked people to attend to the process there were referring to, and describe it.  As far as I can tell, they usually described pretty coherent things that were (almost certainly) actually happening inside their minds.  This raises a question: why is the same word used to refer to these many different subject experiences of processes that are almost certainly physically real, and distinct, in the brain?

The standard explanation is that they're all facets or failed descriptions of some other elusive "thing" called "consciousness", which is somehow perpetually elusive and hard for scientists to discover.  I'm rejecting that explanation, in favor of a simpler one: consciousness is a word that people use to refer to mental processes that they consider intrinsically valuable upon introspective observation, so they agree with each other when they say "consciousness is valuable" and disagree with each other when they say "the mental process I'm calling conscious consists of {details}".  The "hard problem of consciousness" is the problem of resolving a linguistic dispute disguised as an ontological one, where people agree on the normative properties of consciousness (it's valuable) but not on its descriptive properties (its nature as a process/pattern.)

the first-person, phenomenal character of experience itself is.

I agree that the first-person experience of consciousness is how people are convinced that something they call consciousness exists.  Usually when a person experiences something, like an image or a sound, they can describe the structure of the thing they're experiencing.  So I just asked them to describe the structure they were experiencing and calling "consciousness", and got different — coherent — answers from different people.  The fact that their answers were coherent, and seemed to correspond to processes that almost certainly actually exist in the human mind/brain, convinced me to just believe them that they were detecting something real and managing to refer to it through introspection, rather than assuming they were all somehow wrong and failing to describe some deeper more elusive thing that was beyond their experience.

I totally agree with the potential for confusion here!

My read is that the LessWrong community has too low of a prior on social norms being about membranes (e.g., when, how, and how not to cross various socially constructed information membranes). Using the term "boundaries" raises the prior on the hypothesis "social norms are often about boundaries", which I endorse and was intentional on my part, specifically for the benefit of LessWrong readership base (especially the EA community) who seemed to pay too little attention to the importance of <<boundaries>>, for many senses of "too little". I wrotr about that in Part 2 of the sequence, here:

When a confusion between "social norms" and "boundaries" exists, like you I also often fall back on another term like "membrane", "information barrier", or "causal separation". But I also have some hope of improving Western discourse more broadly, by replacing the conflation "social norms are boundaries" with the more nuanced observation "social norms are often about when, how, how not, and when not to cross a boundary".

Thanks for sharing this! Because of strong memetic selection pressures, I was worried I might be literally the only person posting on this platform with that opinion.

FWIW I think you needn't update too hard on signatories absent from the FLI open letter (but update positively on people who did sign).  Statements about AI risk are notoriously hard to agree on for a mix of political reasons.  I do expect lab leads to eventually find a way of expressing more concerns about risks in light of recent tech, at least before the end of this year.  Please feel free to call me "wrong" about this at the end of 2023 if things don't turn out that way.

Do you have a success story for how humanity can avoid this outcome? For example what set of technical and/or social problems do you think need to be solved? (I skimmed some of your past posts and didn't find an obvious place where you talked about this.)

I do not, but thanks for asking.  To give a best efforts response nonetheless:

David Dalrymple's Open Agency Architecture is probably the best I've seen in terms of a comprehensive statement of what's needed technically, but it would need to be combined with global regulations limiting compute expenditures in various ways, including record-keeping and audits on compute usage.  I wrote a little about the auditing aspect with some co-authors, here
... and was pleased to see Jason Matheny advocating from RAND that compute expenditure thresholds should be used to trigger regulatory oversight, here:

My best guess at what's needed is a comprehensive global regulatory framework or social norm encompassing all manner of compute expenditures, including compute expenditures from human brains and emulations but giving them special treatment.  More specifically-but-less-probably, what's needed is some kind of unification of information theory + computational complexity + thermodynamics that's enough to specify quantitative thresholds allowing humans to be free-to-think-and-use-AI-yet-unable-to-destroy-civilization-as-a-whole, in a form that's sufficiently broadly agreeable to be sufficiently broadly adopted to enable continual collective bargaining for the enforceable protection of human rights, freedoms, and existential safety.

That said, it's a guess, and not an optimistic one, which is why I said "I do not, but thanks for asking."

It confuses me that you say "good" and "bullish" about processes that you think will lead to ~80% probability of extinction. (Presumably you think democratic processes will continue to operate in most future timelines but fail to prevent extinction, right?) Is it just that the alternatives are even worse?

Yes, and specifically worse even in terms of probability of human extinction.

That is, norms do seem feasible to figure out, but not the kind of thing that is relevant right now, unfortunately.


From the OP:

for most real-world-prevalent perspectives on AI alignment, safety, and existential safety, acausal considerations are not particularly dominant [...].  In particular, I do not think acausal normalcy provides a solution to existential safety, nor does it undermine the importance of existential safety in some surprising way. 

I.e., I agree.

we are so unprepared that the existing primordial norms are unlikely to matter for the process of settling our realm into a new equilibrium.

I also agree with that, as a statement about how we normal-everyday-humans seem quite likely to destroy ourselves with AI fairly soon.  From the OP:

I strongly suspect that acausal norms are not so compelling that AI technologies would automatically discover and obey them.  So, if your aim in reading this post was to find a comprehensive solution to AI safety, I'm sorry to say I don't think you will find it here.  

For 18 examples, just think of 3 common everyday norms having to do with each of the 6 boundaries given as example images in the post :)  (I.e., cell membranes, skin, fences, social group boundaries, internet firewalls, and national borders).  Each norm has the property that, when you reflect on it, it's easy to imagine a lot of other people also reflecting on the same norm, because of the salience of the non-subjectively-defined actual-boundary-thing that the norm is about.  That creates more of a Schelling-nature for that norm, relative to other norms, as I've argued somewhat in my «Boundaries» sequence.

Spelling out such examples more carefully in terms of the recursion described in 1 and 2 just prior is something I've been planning for a future post, so I will take this comment as encouragement to write it!

To your first question, I'm not sure which particular "the reason" would be most helpful to convey.  (To contrast: what's "the reason" that physically dispersed human societies have laws?  Answer: there's a confluence of reasons.).  However, I'll try to point out some things that might be helpful to attend to.

First, committing to a policy that merges your utility function with someone else's is quite a vulnerable maneuver, with a lot of boundary-setting aspects.  For instance, will you merge utility functions multiplicatively (as in Nash bargaining), linearly (as in Harsanyi's utility aggregation theorem), or some other way?  Also, what if the entity you're merging with has self-modified to become a "utility monster" (an entity with strongly exaggerated preferences) so as to exploit the merging procedure?  Some kind of boundary-setting is needed to decide whether, how, and how much to merge, which is one of the reasons why I think boundary-handling is more fundamental than utility-handling.

Relatedly, Scott Garrabrant has pointed out in his sequence on geometric rationality that linear aggregation is more like not-having-a-boundary, and multiplicative aggregation is more like having-a-boundary:

I view this as further pointing away from "just aggregate utilities" and toward "one needs to think about boundaries when aggregating beings" (see Part 1 of my Boundaries sequence).  In other words, one needs (or implicitly assumes) some kind of norm about how and when to manage boundaries between utility functions, even in an abstract utility-function-merging operations where the boundary issues come down to where to draw parentheses in between additive and multiplicative operations.  Thus, boundary-management are somewhat more fundamental, or conceptually upstream, of principles that might pick out a global utility function for the entirely of the "acausal society".

(Even if the there is a global utility function that turns out to be very simple to write down, the process of verifying its agreeability will involve checking that a lot of boundary-interactions.  For instance, one must check that this hypothetical reigning global utility function is not dethroned by some union of civilizations who successfully merge in opposition to it, which is a question of boundary-handling.)

This is cool (and fwiw to other readers) correct. I must reflect on what it means for real world cooperation... I especially like the A <-> []X -> [][]X <-> []A trick.

