Simon Skade

Self-teaching myself about AI safety and thinking about how to save the world. Like to have a chat? Me too! Please reach out / book a meeting: https://calendly.com/simon-skade/30min

Wiki Contributions

Comments

If you’re very optimistic about ELK then you should be optimistic about outer alignment

I find the title misleading:

  1. I think you should remark and in the title (and the post) that you're only talking about outer alignment.
  2. The "if you're very optimistic" sounds as if there is reason to be very optimistic. I'd rather phrase it as "if ELK works, we may have good chances for outer alignment".
A concrete bet offer to those with short AI timelines

I don't agree that we sold our post as an argument for why timelines are short. Thus, I don't think this objection applies.

You probably mean "why timelines aren't short". I didn't think you explicitly thought it was an argument against short timelines, but because the post got so many upvotes I'm worried that many people implicitly perceive it as such, and the way the post is written contributes to that. But great that you changed the title, that already makes it a lot better!

That said, I do agree that the initial post deserves a much longer and nuanced response.

I don't really think the initial post deserves a nuanced response. (My response would have been "the >30% 3-7 years claim is compared to current estimates of many smart people an extraordinary claim that requires an extraordinary burden of proof, which isn't provided".)
But I do think that the community (and especially EA leadership) should probably carefully reevaluate timelines (considering arguments of short timelines and how good they are), so great if you are planning to do a careful analysis of timeline arguments!

Ideal governance (for companies, countries and more)

Yeah I'm very frustrated about the way governments are structured in general. Couldn't we buy some land somewhere from some country to found our own country? Perhaps some place in (say) Canada with an area of (say) Tokyo, where almost nobody lives and we could just raise towns the way we like? Does anyone know if sth like this is possible?

(I mean, we have some money and maybe could get other billionares (or other people who would like to live there) to support the project. Being able to write the rules ourselves and design cities from the start opens up so many nice opportunities. We could build a such awesome place to live in and offer many people or companies benefits, so it might actually be a great financial investment. (Though I admit I'm not being very concrete and perhaps a bit overly optimistic, but I do think much would be possible.) We could almost live like in dath ilan (except that earth people wouldn't think in such nice ways as dath ilanis). (I'm aware that I'm probably just dreaming up an alternate optimistic reality, but I think it's at least worth checking if it is possible, and if so to seriously consider it, though it would take a lot of time and it's not clear if it would be worth it, given that AGI may come relatively soon.))

A concrete bet offer to those with short AI timelines

I think this post is epistemically weak (which does not mean I disagree with you):

  1. Your post pushes the claim that “It's time for EA leadership to pull the short-timelines fire alarm.” wouldn't be wise. Problems in the discourse: (1) "pulling the short-timelines fire alarm" isn't well-defined in the first place, (2) there is a huge inferential gap between "AGI won't come before 2030" and "EA shouldn't pull the short-timelines fire alarm" (which could mean sth like e.g. EA should start planning to start a Manhattan project for aligning AGI in the next few years.), and (3) your statement "we are concerned about a view of the type expounded in the post causing EA leadership to try something hasty and ill-considered" that slightly addresses that inferential gap is just a bad rhetorical method where you interpret what the other said in a very extreme and bad way, although the other person actually didn't mean that, and you are definitely not seriously considering the pros and cons of taking more initiative. (Though of course it's not really clear what "taking more initiative" means, and critiquing the other post (which IMO was epistemically very bad) would be totally right.)
  2. You're not giving a reason why you think timelines aren't that short, only saying you believe it enough to bet on it. IMO, simply saying "the >30% 3-7 years claim is compared to current estimates of many smart people an extraordinary claim that requires an extraordinary burden of proof, which isn't provided" would have been better.
  3. Even if not explicitly or even if not endorsed by you, your post implicitly promotes the statement "EA leadership doesn't need to shorten its timelines". I'm not at all confident about this, but it seems to me like EA leadership acts as if we have pretty long timelines, significantly longer than your bets would imply. (The way the post is written, you should have at least explicitly pointed out that this post doesn't imply that EA has short enough timelines.)
  4. AGI timelines are so difficult to predict that prediction markets might be extremely outperformed by a few people with very deep models about the alignment problem, like Eliezer Yudkowsky or Paul Christiano, so even if we would take many such bets in the form of a prediction market, this wouldn't be strong evidence that our estimate is that good, or the estimate would be extremely uncertain.
    (Not at all saying taking bets is bad, though the doom factor makes taking bets difficult indeed.)

It's not that there's anything wrong with posting such a post saying you're willing to bet, as long as you don't sell it as an argument why timelines aren't that short or even more downstream things like what EA leadership should do. What bothers me isn't that this post got posted, but that it and the post it is counterbalancing received so many upvotes. Lesswrong should be a place where good epistemics are very important, not where people cheer for their side by upvoting everything that supports their own opinion.

The case for Doing Something Else (if Alignment is doomed)
  1. Convince a significant chunk of the field to work on safety rather than capability
  2. Solve the technical alignment problem
  3. Rethink fundamental ethical assumptions and search for a simple specification of value
  4. Establish international cooperation toward Comprehensive AI Services, i.e., build many narrow AI systems instead of something general

I'd say that basically factors into "solve AI governance" and "solve the technical alignment problem", both of which seem extremely hard, but we need to try it anyways.
(In particular, points 3&4 are like instances of 2 that won't work. (Ok maybe sth like 4 has a small chance to be helpful.))

The governance and the technical part aren't totally orthogonal. Making progress on one helps making the other part easier or buys more time.

(I'm not at all as pessimistic as Eliezer, and I totally agree with What an Actually Pessimistic Containment Strategy Looks Like, but I think you (like many people) seem to be too optimistic that something will work if we just try a lot. Thinking about concrete scenarios may help to see the actual difficulty.)

Call For Distillers

I think I weakly disagree with the implication that “distillation” should be thought of as a different category of activity from “original research”.

(I might be wrong, but) I think there is a relatively large group of people who want to become AI alignment researchers that just wouldn't be good enough to do very effective alignment research, and I think many of those people might be more effective as distillers. (And I think distillers (and teachers for AI safety) as occupation is currently very neglected.)

Similarly, there may also be people who think they aren't good enough for alignment research, but may be more encouraged to just learn the stuff well and then teach it to others.

ELK prize results

Btw., a bit late but if people are interested in reading my proposal, it's here: https://docs.google.com/document/d/1kiFR7_iqvzmqtC_Bmb6jf7L1et0xVV1cCpD7GPOEle0/edit?usp=sharing

It fits into the "Strategy: train a reporter that is useful for another AI" category, and solves the counterexamples that were proposed in this post (except if I missed sth and it is actually harder to defend against the steganography example, but I think not). (It won $10000.) It also discusses some other possible counterexamples, but not extensively and I haven't found a very convincing one. (Which does not mean there is no very convincing one, and I'm also not sure if I find the method that promising in practice.)

Overall, perhaps worth reading if you are interested in the "Strategy: train a reporter that is useful for another AI" category.

MIRI announces new "Death With Dignity" strategy

If I knew as a certainty that I cannot do nearly as much good some other way, and I was certain that taking the pill causes that much good, I'd take the pill, even if I die after the torture and no one will know I sacrificed myself for others.

I admit those are quite unusual values for a human, and I'm not arguing about that it would be rational because of utilitarianism or so, just that I would do it. (Possible that I'm wrong, but I think very likely I'm not.) Also, I see that the way my brain is wired outer optimization pushes against that policy, and I think I probably wouldn't be able to take the pill a second time under the same conditions (given that I don't die after torture), or at least not often.

Replacing Karma with Good Heart Tokens (Worth $1!)

For people like me who are really slow on the uptake in things like this, and realize the pun randomly a few hours later while doing something else: The pun is because of goodhart (from Goodhart's law).) (I'm not thinking much in what a word sounds like, and I just overread the "Good Hearts Laws" as something not particularly interesting, so I guess this is why I haven't noticed.)

Load More