Self-teaching myself about AI safety and thinking about how to save the world.
Like to have a chat? Me too! Please reach out / book a meeting: https://calendly.com/simon-skade/30min
I find the title misleading:
I don't agree that we sold our post as an argument for why timelines are short. Thus, I don't think this objection applies.
You probably mean "why timelines aren't short". I didn't think you explicitly thought it was an argument against short timelines, but because the post got so many upvotes I'm worried that many people implicitly perceive it as such, and the way the post is written contributes to that. But great that you changed the title, that already makes it a lot better!
That said, I do agree that the initial post deserves a much longer and nuanced response.
I don't really think the initial post deserves a nuanced response. (My response would have been "the >30% 3-7 years claim is compared to current estimates of many smart people an extraordinary claim that requires an extraordinary burden of proof, which isn't provided".)But I do think that the community (and especially EA leadership) should probably carefully reevaluate timelines (considering arguments of short timelines and how good they are), so great if you are planning to do a careful analysis of timeline arguments!
Yeah I'm very frustrated about the way governments are structured in general. Couldn't we buy some land somewhere from some country to found our own country? Perhaps some place in (say) Canada with an area of (say) Tokyo, where almost nobody lives and we could just raise towns the way we like? Does anyone know if sth like this is possible?
(I mean, we have some money and maybe could get other billionares (or other people who would like to live there) to support the project. Being able to write the rules ourselves and design cities from the start opens up so many nice opportunities. We could build a such awesome place to live in and offer many people or companies benefits, so it might actually be a great financial investment. (Though I admit I'm not being very concrete and perhaps a bit overly optimistic, but I do think much would be possible.) We could almost live like in dath ilan (except that earth people wouldn't think in such nice ways as dath ilanis). (I'm aware that I'm probably just dreaming up an alternate optimistic reality, but I think it's at least worth checking if it is possible, and if so to seriously consider it, though it would take a lot of time and it's not clear if it would be worth it, given that AGI may come relatively soon.))
I think this post is epistemically weak (which does not mean I disagree with you):
It's not that there's anything wrong with posting such a post saying you're willing to bet, as long as you don't sell it as an argument why timelines aren't that short or even more downstream things like what EA leadership should do. What bothers me isn't that this post got posted, but that it and the post it is counterbalancing received so many upvotes. Lesswrong should be a place where good epistemics are very important, not where people cheer for their side by upvoting everything that supports their own opinion.
Convince a significant chunk of the field to work on safety rather than capabilitySolve the technical alignment problemRethink fundamental ethical assumptions and search for a simple specification of valueEstablish international cooperation toward Comprehensive AI Services, i.e., build many narrow AI systems instead of something general
I'd say that basically factors into "solve AI governance" and "solve the technical alignment problem", both of which seem extremely hard, but we need to try it anyways.(In particular, points 3&4 are like instances of 2 that won't work. (Ok maybe sth like 4 has a small chance to be helpful.))
The governance and the technical part aren't totally orthogonal. Making progress on one helps making the other part easier or buys more time.
(I'm not at all as pessimistic as Eliezer, and I totally agree with What an Actually Pessimistic Containment Strategy Looks Like, but I think you (like many people) seem to be too optimistic that something will work if we just try a lot. Thinking about concrete scenarios may help to see the actual difficulty.)
I think I weakly disagree with the implication that “distillation” should be thought of as a different category of activity from “original research”.
(I might be wrong, but) I think there is a relatively large group of people who want to become AI alignment researchers that just wouldn't be good enough to do very effective alignment research, and I think many of those people might be more effective as distillers. (And I think distillers (and teachers for AI safety) as occupation is currently very neglected.)
Similarly, there may also be people who think they aren't good enough for alignment research, but may be more encouraged to just learn the stuff well and then teach it to others.
Btw., a bit late but if people are interested in reading my proposal, it's here: https://docs.google.com/document/d/1kiFR7_iqvzmqtC_Bmb6jf7L1et0xVV1cCpD7GPOEle0/edit?usp=sharing
It fits into the "Strategy: train a reporter that is useful for another AI" category, and solves the counterexamples that were proposed in this post (except if I missed sth and it is actually harder to defend against the steganography example, but I think not). (It won $10000.) It also discusses some other possible counterexamples, but not extensively and I haven't found a very convincing one. (Which does not mean there is no very convincing one, and I'm also not sure if I find the method that promising in practice.)
Overall, perhaps worth reading if you are interested in the "Strategy: train a reporter that is useful for another AI" category.
If I knew as a certainty that I cannot do nearly as much good some other way, and I was certain that taking the pill causes that much good, I'd take the pill, even if I die after the torture and no one will know I sacrificed myself for others.
I admit those are quite unusual values for a human, and I'm not arguing about that it would be rational because of utilitarianism or so, just that I would do it. (Possible that I'm wrong, but I think very likely I'm not.) Also, I see that the way my brain is wired outer optimization pushes against that policy, and I think I probably wouldn't be able to take the pill a second time under the same conditions (given that I don't die after torture), or at least not often.
For people like me who are really slow on the uptake in things like this, and realize the pun randomly a few hours later while doing something else: The pun is because of goodhart (from Goodhart's law).) (I'm not thinking much in what a word sounds like, and I just overread the "Good Hearts Laws" as something not particularly interesting, so I guess this is why I haven't noticed.)
Ah makes sense