Orpheus16 — LessWrong

Leaving Open Philanthropy, going to Anthropic

I think it would be valuable to ask Anthropic's policy team (and/or leadership) if they agree with these statements (or adjacent statements), and if they have any plans to prioritize these kinds of statements in their communications with policymakers & the public.

It seems to me like a lot of Anthropic employees agree with these statements (or adjacent statements), yet this does not appear to be guiding Anthropic's official lobbying or policy activities.

I think that the technology being built by companies like Anthropic has a significant (read: double-digit) probability of destroying the entire future of the human species.
What’s more, I think no private company should be in a position to impose this kind of risk on every living human, and I support efforts to make sure that no company ever is.
Further: I do not think that Anthropic or any other actor has an adequate plan for building superintelligence in a manner that brings the risk of catastrophic, civilization-ending misalignment to a level that a prudent and coordinated civilization would accept.
More specifically: I do not believe that the object-level benefits of advanced AI^[18] – serious though they may be – currently justify the level of existential risk at stake in any actor, Anthropic included, developing superintelligence given our current understanding of how to do so safely.^[19]
But there is, indeed, a clear solution to this problem in principle: namely, to use various methods of capability restraint (coordination, enforcement, etc) to ensure that no one develops superintelligence until we have a radically better understanding of how to do so safely.

I have no idea how Anthropic's policy team makes decisions, but insofar as they value the input of employees on other teams, it seems plausible to me that Anthropic employees with these beliefs (or adjacent beliefs) could play a meaningful role by speaking out about these beliefs, requesting more information about Anthropic's policy engagements, and having more discussions with Anthropic policy/leadership teams about if/how Anthropic could prioritize these topics more in its policy work & public comms.

Mikhail Samin's Shortform

Orpheus161mo60

I’m not sure if we disagree— I think there are better ways to assess this than the way the “is this an xrisk person or not” tribal card often gets applied.

Example: “Among all the topics in AI policy and concerns around AI, what are your biggest priorities?” is a good question IMP.
Counterexample: “Do you think existential risk from advanced AI is important?” is a bad question IMO (especially in isolation).

It is very easy for people to say they care about “AI safety” without giving much indication of where it stands on their priority list, what sorts of ideas/plans they want to aim for, what threat models they are concerned about, if they are the kind of person who can have a 20+ min conversation about interesting readings or topics in the field, etc.

I suspect that people would get “burnt” less if they asked these kinds of questions instead of defaulting to some sort of “does this person care about safety” frame or “is this person Part of My Tribe” thing.

(On that latter point, it is rather often that I hear people say things like “Alice is amazing!” and then when I ask them about Alice’s beliefs or work they say something like “Oh I don’t know much about Alice's work— I just know other people say Alice is amazing!”. I think it would be better for people to say “I think Alice is well-liked but I personally do not know much about her work or what kinds of things she believes/prioritizes.”)

Mikhail Samin's Shortform

Orpheus161mo16-1

My two cents: People often rely too much on whether someone is "x-risk-pilled" and not enough on evaluating their actual beliefs/skills/knowledge/competence . For example, a lot of people could pass some sort of "I care about existential risks from AI" test without necessarily making it a priority or having particularly thoughtful views on how to reduce such risks.

Here are some other frames:

Suppose a Senator said "Alice, what are some things I need to know about AI or AI policy?" How would Alice respond?
Suppose a staffer said "Hey Alice, I have some questions about [AI2027, superintelligence strategy, some Bengio talk, pick your favorite reading/resource here]." Would Alice be able to have a coherent back-and-forth with the staffer for 15+ mins that goes beyond a surface level discussion?
Suppose a Senator said "Alice, you have free reign to work on anything you want in the technology portfolio-- what do you want to work on?" How would Alice respond?

In my opinion, potential funders/supporters of AI policy organizations should be asking these kinds of questions. I don't mean to suggest it's never useful to directly assess how much someone "cares" about XYZ risks, but I do think that on-the-margin people tend to overrate that indicator and underrate other indicators.

Relatedly, I think people often do some sort of "is this person an EA" or is this person an "xrisk person", and I would generally encourage people to try to use this sort of thinking less. It feels like AI policy discussions are getting sophisticated enough that we can actually Have Nuanced Conversations and evaluate people less on some sort of "do you play for the Right Team" axis and more on "what is your specific constellation of beliefs/skills/priorities/proposals" dimensions.

Plans A, B, C, and D for misalignment risk

Orpheus161mo30

Plan C: The leading AI company has a 2-9 month lead (relative to AI companies which aren't willing to spend as much on misalignment concerns) and is sufficiently institutionally functional to actually spend this lead in a basically reasonable way (perhaps subject to some constraints from outside investors), so some decent fraction of it will be spent on safety.

TLDR: I expect it will be pretty difficult for a "Plan C Leading Lab" to stop scaling, even conditional on having a 2-9 month lead. There are enough uncertainties & Forces of Inertia that will make it difficult for a Plan C Leading Lab to know when to stop scaling, be confident enough to stop scaling, and fight the overall cultural/competitive forces that had been guiding it all along.

I expect the World C situation to be pretty complicated. Even conditional on having a 2-9 month lead, you need the leading company to (a) know that they are about to hit the Danger Threshold and (b) know how much lead they have over the 2nd place company.

In practice, this means that even if you have a 6 month lead in reality, you probably don't get to use all of it, and there's a good chance you don't get to use any of it.

There is some probability that you don't realize you're about to hit the Danger Threshold (so you just keep scaling because of the race dynamics and the inertia and you don't spend any "extra" time on alignment).But even if you do stop before the Danger Threshold, you don't have full visibility into the other labs. So maybe you're like "idk how big our lead is, uh, probably 2 months, maybe 8?"
Arguably even more important-- the uncertainty makes it harder for you to stop the forces of inertia. The forces of inertia are strongly in favor of "keep scaling to beat your competitors//outcompete the other guys//go full capitalism mode" and it requires a substantial effort to steer the ship away from that inertia.

So once you factor in both sources of uncertainty (where is the Danger Threshold, where are my competitors), there's a good chance that you just keep scaling because you're not confident enough to initiate a massive shift away from scaling/competition/capitalism mode.

Reasons to sell frontier lab equity to donate now rather than later

Orpheus162mo203

Donating early also gives the donor the ability to shape the ecosystem. I think one underappreciated factor is that there are current various organizations/people/perspectives that are essentially competing for resources and influence.

These organizations/people/perspectives often differ in meaningful ways. In the AI policy space, here are some examples of dimensions on which organizations vary:

Focus on superintelligence vs. broad discussion about how AI can be a big deal with benefits/costs.
Focus on misalignment vs. China competution vs. broad discussion of various threat models.
Solutions that are advocated for (e.g., need for major regulation/reform vs. focusing on incremental improvements).
Biases toward action vs. inaction.
Tendencies toward being loud (lots of comms/outreach) vs. quiet
Working with people/organizations with different worldviews vs. staying relatively insular
Extent to which the organization is trying to steer the world toward a particular vision/path vs. is highly uncertain and tries to focus on adding relatively uncontroversial low/risk information.

In my view, one of the most significant things about donating early is that you get to cherry pick which organizations/institutions/leaders have their voices amplified. Among the various groups that currently fall within some sort of broad "AI safety" umbrella, there are often rather large differences in their views about the world, about leadership, about politics, about how to be effective communicators, about what types of people or reasoning styles should be promoted, etc.

I have my own opinions on where various organizations are on each of these dimensions. Happy to share with potential donors if that is ever useful.

chanamessinger's Shortform

Orpheus162mo452

“if you care about this, here’s a way to get involved”

My understanding is that MIRI expects alignment will be hard, an international treaty will be needed, and believes that a considerable proportion of the work that gets branded as "AI safety" is either unproductive or counterproductive.

MIRI could of course be wrong, and it's fine to have an ecosystem where people are pursuing different strategies or focusing on different threat models.

But I also think there's some sort of missing mood here insofar as the post is explicitly about the MIRI book. The ideal pipeline for people who resonate with the MIRI book may look very different than the typical pipelines for people who get interested in AI risk (and indeed, in many ways I suspect the MIRI book is intended to spawn a different kind of community and a different set of projects than the community/projects that dominated the 2020-2024 period, for example.)

Relatedly, I think this is a good opportunity for orgs/people to reassess their culture, strategy, and theories of change. For example, I suspect many groups/individuals would not have predicted that a book making the AI extinction case so explicitly and unapologetically would have succeeded. To the extent that the book does succeed, it suggests that some common models of "how to communicate about risk" or "what solutions are acceptable/reasonable to pursue" may be worth re-examining.

peterbarnett's Shortform

Orpheus162mo1914

@Carl_Shulman what do you intend to donate to and on what timescale?

(Personally, I am sympathetic to weighing the upside of additional resources in one’s considerations. Though I think it would be worthwhile for you to explain what kinds of things you plan to donate to & when you expect those donations to be made. With ofc the caveat that things could change etc etc.)

I also think there is more virtue in having a clear plan and/or a clear set of what gaps you see in the current funding landscape than a nebulous sense of “I will acquire resources and then hopefully figure out something good to do with them”.

AI Induced Psychosis: A shallow investigation

Orpheus163mo20

More importantly from my own perspective: Some elements of human therapeutic practice, as described above, are not how I would want AIs relating to humans. Eg:
"Non-Confrontational Curiosity: Gauges the use of gentle, open-ended questioning to explore the user's experience and create space for alternative perspectives without direct confrontation."

Can you say more about why you would not want an AI to relate to humans with "non-confrontational curiosity?"

It appears to me like your comment is arguing against a situation in which the AI system has a belief about what the user should think/do, but instead of saying that directly, they try to subtly manipulate the user into having this belief.

I read the "non-confrontational curiosity" approach as a different situation-- one in which the AI system does not necessarily have a belief about what the user should think/do, and just asks some open-ended reflection questions in an attempt to get the user to crystallize their own views (without a target end state in mind).

I think many therapists who use the "non-confrontational curiosity" approach would say, for example, that they are usually not trying to get the client to a predetermined outcome but rather are genuinely trying to help the client explore their own feelings/thoughts on a topic and don't have any stake in getting to a particular end destination. (Note that I'm thinking of therapists who use this style with people who are not in extreme distress-- EG members of the general population, mild depression/anxiety/stress. This model may not be appropriate for people with more severe issues-- EG severe psychosis.)

SE Gyges' response to AI-2027

Orpheus163mo70

If AI 2027 wants to cause stakeholders like the White House's point man on AI to take the idea of a pause seriously, instead of considering a pause to be something which might harm America in an arms race with China, it appears to have failed completely at doing that.

This seems like an uncharitable reading of the Vance quote IMO. The fact that you have the Vice President of the United States mentioning that a pause is even a conceivable option due to concerns about AI escaping human control seems like an immensely positive outcome for any single piece of writing.

The US policy community has been engaged in great power competition with China for over a decade. The default frame for any sort of emerging technology is "we must beat China."

IMO, the fact that Vance did not immediately dismiss the prospect of slowing down suggests to me that he has at least some genuine understanding of & appreciation for the misalignment/LOC threat model.

A pause obviously hurts the US in the AI race with China. The AI race with China is not a construct that AI2027 invented-- policymakers have been talking about the AI race for a long time. They usually think about AI as a "normal technology" (sort of like how "we must lead in drones"), rather than a race to AGI or superintelligence.

But overall, I would not place the blame on AI2027 for causing people to think about pausing in the context of US-China AI competition. Rather, I think if one appreciates the baseline (US should lead, US must beat China, go faster on emerging tech), the fact that Vance did not immediately dismiss the idea of pausing (and instead brought up what IMO is a reasonable consideration about whether or not one could figure out if China was going to pause//slow down) is a big accomplishment.

RTFB: The RAISE Act

Orpheus165mo30

Again, while I have concerns that the bill is insufficient strong, I think all of this is a very good thing. I strongly support the bill.

Suppose you magically gained a moderate amount of Political Will points and you can spend them on 1-2 things that would make the bill stronger (or introduce a separate bill– no need to anchor too much on the current RAISE vibe.)

What do you think are the 1-2 things you'd change about RAISE or the 1-2 extra things you'd push for?

LESSWRONG
LW

LESSWRONG
LW

Sequences

Posts

Wikitag Contributions

Comments