Ebenezer Dukakis — LessWrong

alignment equivalents to "make a trillion dollars" for capabilities that are easy to verify, strictly imply alignment, and extremely difficult to get any traction on (and with it, a series of weakenings of such a metric that are easier to get traction on but also less-strictly imply alignment).

I expect there's a fair amount of low-hanging fruit in finding good targets for automated alignment research. E.g. how about an LLM agent which reads 1000s of old LW posts looking for a good target? How about unlearning? How about a version of RLHF where you show an alignment researcher two AI-generated critiques of an alignment plan, and they rate which critique is better?

The Most Common Bad Argument In These Parts

Ebenezer Dukakis21d190

I believe in Thinking Fast and Slow, Kahneman refers to this fallacy as "What You See Is All There Is" (WYSIATI). And it used to be common for people to talk about "Unknown Unknowns" (things you don't know, that you also don't know you don't know).

MAGA speakers at NatCon were mostly against AI

Ebenezer Dukakis2mo132

To me, if someone posts an intense-feeling negatively worded text in response to what other people are doing, it usually signals that there is something they care about that they perceive to be threatened. I’ve found it productive to try relate with that first, before responding. Jumping to enforcing general rules stipulated somewhere in the community, and then implying that the person not following those rules is not harmonious with or does not belong to the community, can get counterproductive.

I'm a bit concerned about a situation where "insiders" always get this sort of contextual benefit-of-the-doubt, and "outsiders" don't.

MAGA speakers at NatCon were mostly against AI

Ebenezer Dukakis2mo100

Just to clarify here, I have no issue with you thinking the post is bad. That seems besides the point to me. My issue is with you doing much of what you accuse Miller of doing.

Insults: "This post seems completely insane to me, as do people who unquestionable retweet it."

Aggression: "I cannot believe I have to argue for this... [cursing]..."

Sneering: "Has anyone who liked this actually read this post? How on earth is this convincing to anyone?"

Note also that the discussion around the Bentham post was previously calm and friendly. You walked in and dramatically worsened the discourse quality. By contrast, Geoffrey engages on hot-button political topics where discussion is already very heated.

As a quick and relatively objective measure, with a quick search, out of all 80K Geoffrey Miller tweets, I was only able to find one non-quoted f-bomb ("Fuck the Singularity.").

Your tweets appear to have a somewhat larger number of them, and they're often directed at individuals rather than abstract concepts. "Fuck them", "fuck you", "fuck [those people]".

As a matter of simple intellectual honesty, it would be nice if you could acknowledge that you engage in insults and aggressive behavior on Twitter. You might be doing it less than Geoffrey does. You might express it in a different way. But it's just a question of degree, as far as I can tell. I really don't think you have much moral high ground here.

You also have far fewer tweets than Geoffrey does (factor of ~16 difference). So it's not just that you've dropped more f-bombs than him; your density of f-bombs appears to be far higher.

MAGA speakers at NatCon were mostly against AI

Ebenezer Dukakis2mo*93

Keep in mind that US conservatives are liable to be reading this thread, trying to determine whether they want to ally with a group such as yourselves. Conservatives have much more leverage to dictate alliance terms than you do. Note the alliance with the AI art people was apparently already wrecked. Something you might ask yourselves: If you can't make nice with a guy like me, who shares more of your ideals than either artists or US conservatives do, how do you expect to make nice with US conservatives?

MAGA speakers at NatCon were mostly against AI

Ebenezer Dukakis2mo9-7

It's not a norm of discourse that one cannot state that a position is absurd.

Speaking as someone who makes very little effort to avoid honey consumption, my opinion of Habryka would have dropped much less if he'd said something like: "Sorry, this position is just intuitively absurd to me, and I'm happy to reject it on that basis." So I don't think the issue has to do with absurdity per se.

I said I thought he violated "what I'd consider reasonable norms of discourse". You can see Ben West thought something similar.

I'd estimate that Habryka violated roughly 7 or 8 of the Hacker News commenting guidelines in that discussion.

Your ideas about reasonable discourse can be different from mine, and Ben West's, and Hacker News'. That's OK. I was just sharing my opinion.

It's been a while since I read that discussion. I remember my estimation of Habryka dropped dramatically when I read it. Maybe I can try to reconstruct why in more detail if you want. But contrasting what Habryka wrote with the HN commenting guidelines seems like a reasonable starting point.

And it is a virtue of discourse to show up and argue for one's stances, as Habryka does throughout that thread!

You'll notice that Habryka doesn't provide any concrete example of Geoffrey violating a norm of reasonable discourse in this thread. I did provide a concrete example.

Is it possible that invokation of such "norms" can be a mere figleaf for drawing ingroup/outgroup boundaries in the traditional tribalistic way?

Is it too much to ask that Dear Leadership is held to the same standards, and treated the same way, as everyone else is?

MAGA speakers at NatCon were mostly against AI

Ebenezer Dukakis2mo3-15

I've seen Eliezer violate what I'd consider norms of reasonable discourse on Twitter. You too.

Banning Said Achmiz (and broader thoughts on moderation)

Ebenezer Dukakis2mo42

Before you quit, maybe we can create a wiki page of people who left, with contact information, to open the door for a refugee forum at some point in the future?

The Problem

Ebenezer Dukakis3mo*5-7

Of the clever solutions you invented and tested within the survivable regime, 2/3rds of them survive the 6 changes you didn't see coming, 1/3rd fail. Now you're dead.

It seems unreasonable to conclude we're now dead, if 2/3rds of our solutions survived the 6 changes we didn't see coming.

The success of a single solution should ideally be more of a sufficient condition for success, rather than a necessary condition. (Note this is plausible depending on the nature of the "solutions". Consider a simple "monitors for bad thoughts" model. If even a single monitor flags bad thoughts, we can instantly pull the plug and evaluate. A malicious AI has to bypass every single monitor to execute malice. If a single monitor works consistently and reliably, that ends up being a sufficient condition for overall prevention of malice.)

If you're doing this right, your solutions should have a lot of redundancy and uncorrelated failure modes. 2/3rds of them working should ideally be plenty.

[Edit: I notice people disagreevoting this. I'm very interested to learn why you disagree, either in this comment thread or via private message.]

Ebenezer Dukakis's Shortform

Ebenezer Dukakis4mo2710

A few months ago, someone here suggested that more x-risk advocacy should go through comedians and podcasts.

Youtube just recommended this Joe Rogan clip to me from a few days ago: The Worst Case Scenario for AI. Joe Rogan legitimately seemed pretty freaked out.

@So8res maybe you could get Yampolskiy to refer you to Rogan for a podcast appearance promoting your book?

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments