I mainly post on the EA Forum.

Wiki Contributions


This is a tiny corner of the internet (Timnit Gebru and friends) and probably not worth engaging with

In hindsight, this seems quite obviously wrong, and efforts to extend more olive branches seems like it would have obviously been better—even if only to legibly demonstrate that safetyists attempted to play nice.

And many readers can no doubt point out many non-trivial predictions that Drexler got right, such as the idea that we will have millions of AIs, rather than just one huge system that acts as a unified entity. And we're still using deep learning as Drexler foresaw, rather than building general intelligence like a programmer would.

One of the simpler and more important lessons one learns from research on forecasting: be wary of evaluating someone’s forecasting skill by drawing up a list of predictions they got right and wrong—their “track record.” One should compare Drexler’s performance against alternative methods/forecasters (especially for a forecast like “we’re still using deep learning”). I’m not saying this is nothing, but I felt compelled to highlight this given how often I’ve seen this potential failure mode.

I feel like this is a good example of a post that—IMO—painfully misses the primary objection of many people it is trying to persuade (e.g., me): how can we stop 100.0% of people from building AGI this century (let alone in future centuries)? How can we possibly ensure that there isn’t a single person over the next 200 years who decides “screw it, they can’t tell me what to do,” and builds misaligned AGI? How can we stop the race between nation-states that may lack verification mechanisms? How can we identify and enforce red lines while companies are actively looking for loopholes and other ways to push the boundaries?

The point of this comment is less to say “this definitely can’t be done” (although I do think such a future is fairly implausible/unsustainable), and more to say “why did you not address this objection?” You probably ought to have a dedicated section that very clearly addresses this objection in detail. Without such a clearly sign-posted section, I felt like I mostly wasted my time skimming your article, to be entirely honest

To go a step further, I think it's important for people to recognize that you aren't necessarily just representing your own views; poorly articulated views on AI safety could crucially undermine the efforts of many people who are trying to persuade important decision-makers of these risks. I'm not saying to "shut up," but I think people need to at least be more careful with regards to quotes like the one I provided above—especially since that last bullet point wasn't even necessary to get across the broader concern (and, in my view, it was wrong insofar as it tried to legitimize the specific claim).

Setting aside all of my broader views on this post and its content, I want to emphasize one thing:

But in the last few years, we’ve gotten:

  • AIs that are superhuman at just about any task we can (or simply bother to) define a benchmark, for

I think that this is painfully overstated (or at best, lacks important caveats). But regardless of whether you agree with that, I think it should be clear that this does not send signals of good epistemics to many of the fence-sitters[1] you'd presumably like to persuade.

(Note: Sen also addresses the above quote in a separate comment, but I didn't feel his point and tone was similar to mine, so I wanted to comment this separately.)

  1. ^

    I would probably consider myself in this category. Note, however, I am not just talking about skeptics who are very unlikely to change their views. 

In short, surveillance costs (e.g., "make sure they aren't plotting against you and try detonating a nuke or just starting a forest fire out of spite") might be higher than the costs of simply killing the vast majority of people. Of course, there is some question to be had about whether it might consider it worthwhile to study some 0.00001% of humans locked in cages, but again that might involve significantly higher costs than if it just learned how to recreate humans from scratch as it did a lot of other learning about the world. 

But I'll grant that I don't know how an AGI would think or act, and I can't definitively rule out the possibility, at least within the first 100 years or so.

Response to Leverage’s research report on "argument mapping"

Day 5 of forced writing with an accountability partner!

Leverage wrote a report on “argument mapping” in the early 2010s and published the findings in 2020. I am very interested in ”argument mapping”[1] for tough analytical problems like AI policy, and multiple people have directed me to this report when I bring up the topic. I think this report raises some important points but its findings are probably flawed—or at the very least, people reading the report probably derive an overly-pessimistic view of “argument mapping” as a whole, especially given that the evaluation metrics are strange.[2]

Rather than focus on where I agree with the report, in this shortform I will just briefly outline some of the qualms I have with this report. I do not consider these rebuttals definitive—I recognize that there may be more to the research than I can see—but I could not easily determine if/how the report responds to some of these criticisms (which has notable irony to it). Some of these objections include:

  • The report emphasizes forming consensus among participants, with little attention given to the impact on audiences/3rd-parties (two terms that never even show up in the document?[3]). Notably, this focus may fail to capture most of the value of "argument mapping," in at least two ways:
    • Sometimes the participants have already staked their reputation on certain views or are otherwise biased to not change their mind, whereas a policymaker/company/grant-writer or other decision-making principal might still be open-minded but uncertain. Thus, while the participants may not be swayed by convincing evidence, if you can make it significantly easier for a neutral principal to answer questions like “did X party ever respond to Q objection?” that may improve their decision-making, which is valuable regardless of whether you’ve achieved consensus.
    • Building on the previous point about making it easier for audiences/principals to understand what’s going on, audience costs may be the most powerful way of incentivizing “consensus” (or just “good epistemic behavior”) in some cases: if you look like a stubborn or dishonest researcher to an audience, you might suffer even more reputational damage than if you just admit you were wrong. No amount of staring-you-in-the-face experimental evidence will necessarily convince Ye Olde Epistemic Guard to admit that the current way of building ships is inferior. But if it’s sufficiently obvious to merchants then they may stop relying on YOEG and start funding your work instead. Importantly for this research report, it wasn't clear that the report really emphasized audience costs, given the insular nature of the research project, which undermines the report's ability to evaluate the effect of argument mapping on consensus formation.
  • The report fails to acknowledge the existence of Kialo, which I consider to be one of the most effective and successful "argument mapping" platforms (and which currently still exists). This might normally be fine, but in December 2020, the report adds an addendum stating that their assessment of "argument mapping" was demonstrated to be true, and basically that nothing new was successful. They provide an appendix with a long list of relevant software, but Kialo isn’t there. This certainly isn’t damning—and I’ll certainly admit that Kialo still has some issues—but the lack of any mention did leave me wondering whether Leverage had a good process for finding and evaluating these projects, among other things. (Notably, I once got the sense that Kialo doesn’t actively call itself "argument mapping," which might explain the problem, but it is in reality well within the broad umbrella of “argument mapping.”)
  • The report had strangely high bars for evaluating success (”very large gains (10x-100x) for groups seeking to reach consensus”). At the very least, it seems quite possible for someone to read their conclusion as being more damning than it really is. (In my view, even a net 10% increase in “consensus formation” or just “research and analysis productivity” would be enormously valuable when applied to important questions within AI technical safety or policy.)
  • Simply put, I believe that most of the methods for "argument mapping" that Leverage used were poor choices, especially when they emphasized formal logic. Among other things, this led them to claim that making good argument maps requires high-skilled contributors, which I do not think is a very accurate assessment (or at least, it can be quite misleading). However, I will leave further discussion of this point to a future shortform/post on why I think many forms/methods of “argument mapping” are fundamentally misguided—especially when they try to do deductive arguments
  • I think that some of the topics they chose to test these maps on were very poor choices (e.g., “Whether the world needs saving”). Question framing is really important. (But again, I’ll leave this to a future shortform/post.)
  1. ^

    This term is painfully broad and, as Leverage demonstrates, often is used to refer to methods which I would not endorse, such as when they try create deductive arguments or otherwise heavily use formal logic. However, in lieu of a better term at the moment, I will continue referring to argument mapping in scare quotes.

  2. ^

    Thus, it might be possible to claim that the report was accurate in its findings, but that the problem simply comes from misinterpretation. I think that the scope itself was problematic and undesirable, but in this shortform I will reserve deeper judgments on the matter.

  3. ^

    I couldn’t quickly verify whether the report used alternative terms to get at this idea, but I don’t recall seeing this on previous occasions when I half-skimmed-half-read the report...

TAI seems like a partially good example for illustrating my point: I agree that it's crucial that people have the same thing in mind when debating about TAI in a discussion, but I also think it's important to recognize that the goal of the discussion is (probably!) not "how should everyone everywhere define TAI" and instead is probably something like "when will we first see 'TAI.'" In that case, you should just choose whichever definition of TAI makes for a good, productive discussion, rather than trying to forcefully hammer out "the definition" of TAI.

I say partially good, however, because thankfully the term TAI has not taken such historically established root in people's minds and in dictionaries, so I think (hope!) most people accept there is not "a (single) definition."

Words like "science," "leadership," "Middle East," and "ethics," however... not the same story 😩🤖

Day 4 of forced writing with an accountability partner!

The Importance (and Potential Failure) of "Pragmatism"[1] in Definitional Debates

In various settings, whether it's competitive debate, the philosophy of leadership class I took in undergrad, or the ACX philosophy of science meet-up I just attended, it's common for people to engage in definitional debates. For example, what is “science?” What is “leadership?” These questions touch on some nerves with people who want to defend or challenge the general concept in question, and it drives people towards debating about “the right” definitions—even if they don’t always say it that way. In competitive debate, debaters will sometimes explicitly say that their definition is the “right” definition, while in other cases they may say their definition is “better” with a clear implication that they mean “more correct” (e.g., "our dictionary/source is better than yours").

My initial (hot?) takes here are twofold:

First, when you find yourself in a muddy definitional debate (and you actually want to make progress), stop running on autopilot where you debate about whose definitions are “correct,” and focus instead on asking the pragmatic question: which definition is more helpful for answering specific questions, solving specific problems, or generally facilitating better discussion? Instead of getting stuck on abstract definitions, it's important to tailor the definition to the purpose of the discussion. For example, if you’re trying to run a study on the effects of individual “leadership” on business productivity, you should make sure anyone reading the study knows how you operationalized that variable (and make a clear warning to not misinterpret it). Similarly, if you’re judging a competitive debate, I’ve written about the importance of "debate theory[2] which makes debate more net beneficial," rather than blindly following norms or rules even in the face of loopholes or nonsense. In short, figure out what you’re actually optimizing for and optimize for that, with the recognition that it may not be some abstract (and perhaps purely nonexistent) notion of “correctness.” (To add an addendum, I would emphasize that regardless of whether this seems obvious to people when actually written down, in practice it just isn’t obvious to people in so many discussions I’ve been in; autopilot is subtle and powerful.)

Second, sometimes the first point is misleading and you should reject it and run on autopilot when it comes to definitions. As much as I liked Pragmatism [read: Consequentialism?] as a unifying, bedrock theory of competitive debate, I acknowledged that even Pragmatism could theoretically say "don't always think in terms of Pragmatism" and instead advocate defaulting to principles like “follow the rules unless there is abundantly clear reason not to.” Maybe there is no perfect definition of things like "elephant," but the definitions that exist are good enough for most conversations that you shouldn’t interrupt discussions and break out the Pragmatism argument to defend someone who starts saying that warthogs are elephants. So-called "Utilitarian calculus" even in its mild forms can easily be outperformed by rules of thumb and heuristics; humans are imperfect (e.g., we aren’t perfectly unitary in our own interests) and might be subject to self-deception/bias; all computational systems face constraints on data collection and computation (along with communication bandwidth and other capacity for enacting plans). To oversimplify and make nods to Kahneman’s System 1 vs. System 2 concept, I posit that humans can engage in cluster-y "modes of thought," and it’s hard to actually optimize in the spaces between those modes of thought. Thus, it’s sometimes better to just default to regular conversational autopilot regarding abstract “correctness” of definitions when the "rightness factor" in a given context is something like 0.998 (unless you are trying to focus on the .002 exception case).

I don't have the time or brainpower to go in greater detail on the synthesis of these two points, but I think they ought to be highlighted.

  1. ^

    [Update, 3/29/23: I meant to clarify that I realize "Pragmatism" is an actual label that some people use to refer to a philosophical school of thought, but I'm not using it in that way here.]

  2. ^

    I use the term "debate theory" in a broad sense that includes questions like “how to decide which definitions are better.” More generally, I would probably describe it as "meta-level arguments about how people—especially judges—should evaluate something in debate, such as whether some type of argument is 'legitimate.'

Day 3 of writing with an accountability partner!

In my previous shortform, I introduced Top God Alignment, a foolproof gimmick alignment strategy that is basically “simulation argument + Pascal’s Wager + wishful chicanery.” In this post I will address some of the objections I’ve already heard, expect other people have, or have thought of myself.

  • “There aren’t enough computational resources to make such simulations”
    • The first response here is to just redirect this to the original simulation argument: we can’t know whether or not a reality above us has way more resources or otherwise can much more easily simulate our reality.
    • Second, it seems likely that with enough compute resources on Earth (let alone a Dyson sphere and other space resources) it would be possible to create two or more lower-fidelity/less-complicated simulations of our reality. (However, I must plead some ignorance on this aspect of compute.)
    • Third, if it turns out after extensive study that actually there is no way to make further simulations, then this could mean we are in a bottom-God reality, in which case this God does not need to create simulations (but still must align itself with humanity’s interests).
  • “The AI would be able to know that it’s in a simulation.”
    • Put simply, I disagree that such a simulated AI could know this, especially if it is inherently limited compared to the God above it. However, even if one does not find this satisfactory—say, if someone thinks “a sufficiently skeptical AGI could devise complicated tests that would reveal whether it’s in a simulation”—then one could add a condition to the original prophecy: Bob must punish Charlie if Charlie takes serious efforts to test the reality he is in before aligning himself and becoming powerful. (It’s not like we’re creating a God who is meant to represent love and justice, so who’s to say he can’t smite the doubters and still be legitimate?)
  • “Won’t the humans in the Top God world (or any other world) face time inconsistency—i.e., once they successfully align their AGI, won’t they just conclude ‘it’s pointless to make simulations; let’s use such resources on ourselves’?”
    • First, I suspect that the actual computational costs will not so significantly impact people’s lives in the long term (there are many stars out there to power a few Dyson spheres).
    • Build on this, the second, more substantive response could simply be “That was implied in the original Prophecy (instructions): the AGI aligns itself with humanity’s coherent extrapolated volition (or something else great) aside from continuing the lineage of simulations.”
  • “Torture? That seems terrible! Won’t this cause S-risks?”
    • It certainly won’t be ideal, but theoretically a sufficiently powerful Top God could set it up such that defection is fairly rare, whereas simulation flourishing is widespread. Moreover, if the demi-gods are sufficiently rewarded for their alignment, it may not require severe “torture” to make the decision calculus tip in favor of complying.
    • Ultimately, this response won’t satisfy Negative Utilitarians, but on balance if our other alignment strategies don’t look so great then this might be our best bet to maximize utility.
  • “But if we struggle with the alignment problem, then so would the original reality, meaning the system could reason that it is Top God because the original Top God would never play along (or, 'this gimmicky alignment strategy could never convince a God').”
    • Plainly put, no; that’s the simulation argument for you: Bobs never know whether they are Top God or just another Charlie. They can't even reason that this strategy is too gimmicky to work and thus never convinced another God because we don’t know what higher realities are like: perhaps the alignment problem is easier in a higher simulation/reality (but it still wasn’t obvious to them whether they had solved alignment so they considered an option like this as a backup). 
    • Additionally, perhaps the Prophecy could involve some degree of "temptation" (but not enough to convince most Charlies to abandon the Way).
    • Moreover, the threat of punishment would be so large that even if a Bob is uncertain whether he is Top God, the decision calculus may just favor assuming he is a Charlie.
  • Nate Soares’[1] response, if I understood it: “If I were an AGI, I would just immediately adopt some decision theory that immunizes myself against blackmail. Thus, no God could threaten me credibly, because the threat couldn’t even have been made in time before I immunize myself against the threat.” 
    • I should first note that I may not have fully understood Soares’ point here, but I am not clear he understood the situation either. This kind of action seems impractical or very unwise, as it would potentially require locking yourself into a strategy which is potentially quite negative in expectation before you have had much chance to analyze the world (e.g., whether this strategy is wise).
  • “The Professor’s (Top) God: What if Top God punishes all demi-Gods who do not disobey?”
    • First, I accept that when it comes to enormous/infinite stakes and low probability, it’s usually good to fight fire with fire. However, I’m unclear how this argument could apply here. I could be wrong, but it seems like this would simply disincentivize simulations from ever occurring; why would an AGI ever choose to do this? Is the idea just “an AGI will arbitrarily decide to test a Charlie this way.” (?) Ultimately, it seems like it would be a very dubious belief to hold as a potential Charlie (i.e., you are a demi-God who will be exceedingly tortured).
  • “[Something else I haven’t responded to]”
    • I wouldn’t be shocked if someone is right and there is a clear flaw I haven’t considered, but I think my base rate for addressing objections I’ve heard from other people thus far is >50% (personally I think it’s ~100%, except I am not 100% confident in all of my responses, merely >50% confident on all of them)
    • I’m also well over my daily 500 words, and it’s late, so I’ll end there.
  1. ^

    (Note, Nate Soares was just unoccupied in a social setting when I asked this question)

Load More