This is Dr. Andrew Critch's professional LessWrong account. Andrew is the CEO of Encultured AI, and works for ~1 day/week as a Research Scientist at the Center for Human-Compatible AI (CHAI) at UC Berkeley. He also spends around a ½ day per week volunteering for other projects like Berkeley Existential Risk initiative and the Survival and Flourishing Fund. Andrew earned his Ph.D. in mathematics at UC Berkeley studying applications of algebraic geometry to machine learning models. During that time, he cofounded the Center for Applied Rationality and SPARC. Dr. Critch has been offered university faculty and research positions in mathematics, mathematical biosciences, and philosophy, worked as an algorithmic stock trader at Jane Street Capital’s New York City office, and as a Research Fellow at the Machine Intelligence Research Institute. His current research interests include logical uncertainty, open source game theory, and mitigating race dynamics between companies and nations in AI development.

Wiki Contributions


"Tech company singularities", and steering them to reduce x-risk

Yep, you got it!  The definition is meant to be non-recursive and grounded in 2022-level industrial capabilities.  This definition is bit unsatisfying insofar as 2022 is a bit arbitrary, except that I don't think the definition would change much if we replaced 2022 by 2010.

I decided not to get into these details to avoid bogging down the post with definitions, but if a lot of people upvote you on this I will change the OP.

Thanks for raising this!

"Tech company singularities", and steering them to reduce x-risk

I agree this is an important question.  From the post:

given the choice to do so — in the form of agreement among its Board and CEO — with around one year of effort following the choice. 

I.e., in the definition, the "company" is considered to have "chosen" once the Board and CEO have agreed to do it.  If the CEO and Board agree and make the choice but the company fails to do the thing — e.g., because the employees refuse to go along with the Board+CEO decision — then the company has failed to execute on its choice, despite "effort" (presumably, the CEO and Board telling their people and machines to do stuff that didn't end up getting done).

As for what is or is not a tech company, I don't think it matters to the definition or the post or predictions, because I think only things that would presently colloquially be considered "tech companies" have a reasonable chance at meeting the remainder of the conditions in the definition.

"Tech company singularities", and steering them to reduce x-risk

(I originally posted this reply to the wrong thread)

tech companies are much, much better at steering you than you are at steering them. So in the AI policy space, people mostly work on trying to explain AI risk to decisionmakers in an honest and persuasive way, not by relabelling tech companies (which can be interpreted or misinterpreted as pointing fingers).

I agree with this generally.

Slack gives you space to notice/reflect on subtle things

+1 to John and Ray for this; my experience is very similar to John's. Here's a relevant old post from me, where I was trying to gesture at the importance of a cluster of things around or similar to your #3, as distinct from #1 and #2; (Title: Boredom as Exploratory Overhead Cost.)

Spend twice as much effort every time you attempt to solve a problem

Nice post!  I think something closer to would be a better multiplier than two.  Reason:

Instead of minimizing the upper bound of total effort (b^2d−1)/(b-1), it makes sense to also consider the lower bound, (bd−1)/(b-1), which is achieved when d is a power of b. We can treat the "expected" effort (e.g., if you have a uniform improper prior on ) as landing in the middle of these two numbers, i.e., 


This is minimized where which approaches b=1+√2 for d large.  If you squint at your seesaw graphs and imagine a line "through the middle" of the peaks and troughs, I think you can see it bottoming out at around 2.4.

Power dynamics as a blind spot or blurry spot in our collective world-modeling, especially around AI

Sorry for the slow reply!

I feel like you are lumping together things like "bargaining in a world with many AIs representing diverse stakeholders" with things like "prioritizing actions on the basis of how they affect the balance of power."

Yes, but not a crux for my point.  I think this community has a blind/blurry spot around both of those things (compared to the most influential elites shaping the future of humanity).   So, the thesis statement of the post does not hinge on this distinction, IMO.

I would prefer keep those things separate.

Yep, and in fact, for actually diagnosing the structure of the problem, I would prefer to subdivide further into three things and not just two:

  1. Thinking about how power dynamics play out in the world;
  2. Thinking about what actions are optimal*, given (1); and
  3. Choosing optimal* actions, given (2) (which, as you say, can involve actions chosen to shift power balances)

My concern is that the causal factors laid out in this post have lead to correctable weaknesses in all three of these areas. I'm more confident in the claim of weakness (relative to skilled thinkers in this area, not average thinkers) than any particular causal story for how the weakness formed.  But, having drawn out the distinction among 1-3, I'll say that the following mechanism is probably at play in a bunch of people:

a) people are really afraid of people manipulated;

b) they see/experience (3) as an instance of being manipulated and therefore have strong fear reactions around it (or disgust, or anger, or anxiety);

c) their avoidance reactions around (3) generalize to avoiding (1) and (2).

d) point (c) compounds with the other causal factors in the OP to lead to too-much-avoidance-of-thought around power/bargaining.

There are of course specific people who are exceptions to this concern.  And, I hear you that you do lots of (1) and (2) while choosing actions based on a version of optimal* defined in terms of "win-wins".   (This is not an update for me, because I know that you personally think about bargaining dynamics.)

However, my concern is not about you-specifically.  Who exactly is about, you might ask?  I'd say "the average lesswrong reader" or "the average AI researcher interested in AI alignment".  

For example, "Politics is the mind-killer" is mostly levied against doing politics, not thinking about politics as someonething that someone else might do (thereby destroy the world).

Yes, but not a crux to the point I'm trying to make.  I already noted in the post that PMK is was not trying to make people think about politics; in fact, I included a direct quote to that effect: "The PMK post does not directly advocate for readers to avoid thinking about politics; in fact, it says “I’m not saying that I think we should be apolitical, or even that we should adopt Wikipedia’s ideal of the Neutral Point of View.” "  My concern, as noted in the post, is that the effects of PMK (rather than its intentions) may have been somewhat crippling:

[OP] However, it also begins with the statement “People go funny in the head when talking about politics”.   If a person doubts their own ability not to “go funny in the head”, the PMK post — and concerns like it — could lead them to avoid thinking about or engaging with politics as a way of preventing themselves from “going funny”.  

I also agree with the following comparison you made to mainstream AI/ML, but it's also not a crux for the point I'm trying to make:

it seems to me that rationalist and EA community think about AI-AI bargaining and costs from AI-AI competition much more than the typical AI researchers, as measured by e.g. fraction of time spent thinking about those problems, fraction of writing that is about those problems, fraction of stated research priorities that involve those problems, and so on. This is all despite outlier technical beliefs suggesting an unprecedentedly "unipolar" world during the most important parts of AI deployment (which I mostly disagree with).

My concern is not that our community is under-performing the average AI/ML researcher in thinking about the future — as you point out, we are clearly over-performing.  Rather, the concern is that we are underperforming the forces that will actually shape the future, which are driven primarily by the most skilled people who are going around shifting the balance of power.  Moreover, my read on this community is that it mostly exhibits a disdainful reaction to those skills, both in specific (e.g., if I called out specific people who have them) and in general (if I call them out in the abstract, as I have here).

Here's a another way of laying out what I think is going on:

A = «anxiety/fear/disgust/anger around being manipulated»

B = «filter-bubbling around early EA narratives designed for institution-building, single, single-stakeholder AI alignment»

C = «commitment to the art-of-rationality/thinking»

S = «skill at thinking about power dynamics»

D = «disdain for people who exhibit S»

Effect 1: A increases C (this is healthy).

Effect 2: C increases S (which is probably why out community it out-performing mainstream AI/ML).

Effect 3: A decreases S in a bunch of people (not all people!), specifically, people who turn anxiety into avoidance-of-the-topic.

Effect 4: Effects 1-3 make it easy for a filter-bubble to form that ignores power dynamics among and around powerful AI systems and counts on single-stakeholder AI alignment to save the world with one big strong power-dynamic instead of a more efficient/nuanced wielding of power.

The solution to this problem is not to decrease C, from which we derive our strength, but to mitigate Effect 3 by getting A to be more calibrated / less triggered.  Around AI specifically, this requires holding space in discourse for thinking and writing about power-dynamics in multi-stakeholder scenarios.

Hopefully this correction can be made without significantly harming efforts to diversify highly focussed efforts on alignment, such as yours.  E.g., I'm still hoping like 10 people will go work for you at ARC, as soon as you want to expand to that size, in large part because I know you personally think about bargaining dynamics and are mulling them over in the background while you address alignment.  [Other readers: please consider this an endorsement to go work for Paul if he wants you on his team!]

What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)

> Failure mode: When B-cultured entities invest in "having more influence", often the easiest way to do this will be for them to invest in or copy A'-cultured-entities/processes.  This increases the total presence of A'-like processes in the world, which have many opportunities to coordinate because of their shared (power-maximizing) values.  Moreover, the A' culture has an incentive to trick the B culture(s) into thinking A' will not take over the world, but eventually, A' wins.

> In other words, the humans and human-aligned institutions not collectively being good enough at cooperation/bargaining risks a slow slipping-away of hard-to-express values and an easy takeover of simple-to-express values (e.g., power-maximization).

This doesn't feel like other words to me, it feels like a totally different claim.

Hmm, perhaps this is indicative of a key misunderstanding.

For example, natural monopolies in the production web wouldn't charge each other marginal costs, they would charge profit-maximizing profits. 

Why not?  The third paragraph of the story indicates that: "Companies closer to becoming fully automated achieve faster turnaround times, deal bandwidth, and creativity of negotiations." In other words, at that point it could certainly happen that two monopolies would agree to charge each other lower cost if it benefitted both of them.  (Unless you'd count that as instance of "charging profit-maximizing costs"?)  The concern is that the subprocesses of each company/institution that get good at (or succeed at) bargaining with other institutions are subprocesses that (by virtue of being selected for speed and simplicity) are less aligned with human existence than the original overall company/institution, and that less-aligned subprocess grows to take over the institution, while always taking actions that are "good" for the host institution when viewed as a unilateral move in an uncoordinated game (hence passing as "aligned").

At this point, my plan is try to consolidate what I think the are main confusions in the comments of this post, into one or more new concepts to form the topic of a new post.

What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)

> My prior (and present) position is that reliability meeting a certain threshold, rather than being optimized, is a dominant factor in how soon deployment happens.

I don't think we can get to convergence on many of these discussions, so I'm happy to just leave it here for the reader to think through.

Yeah I agree we probably can't reach convergence on how alignment affects deployment time, at least not in this medium (especially since a lot of info about company policies / plans / standards are covered under NDAs), so I also think it's good to leave this question about deployment-time as a hanging disagreement node.

I'm reading this (and your prior post) as bids for junior researchers to shift what they focus on. My hope is that seeing the back-and-forth in the comments will, in expectation, help them decide better.

Yes to both points; I'd thought of writing a debate dialogue on this topic trying to cover both sides, but commenting with you about it is turning out better I think, so thank for that!

Load More