LessWrong team member / moderator. I've been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I've been interested in improving my own epistemic standards and helping others to do so as well.
Awhile ago I wrote:
There's a frame where you just say "no, rationality is specifically about being a robust agent. There are other ways to be effective, but rationality is the particular way of being effective where you try to have cognitive patterns with good epistemology and robust decision theory."
This is in tension with the "rationalists should win", thing. Shrug.
I think it's important to have at least one concept that is "anyone with goals should ultimately be trying to solve them the best way possible", and at least one concept that is "you might consider specifically studying cognitive patterns and policies and a cluster of related things, as a strategy to pursue particular goals."
Just had a thought that you might carve this into something like "shortterm rationality" and "longterm rationality", where shorterm is "what cognitive algorithms will help me right now (to systematically achieve my goals, given my current conditions and skills)", and longterm rationality is like "what cognitive-algorithm-metacognitive practices would help me longterm to invest in?"
Part of the deal of being allies if you don't have to be allies about everything. I don't think they particularly need to do anything to help with technical safety (there just need to be people who understand and care about that somewhere). I'm pretty happy if they're just on board with "stop building AGI" for whatever reason.
I do think they eventually need to be on board with some version of the handling the intelligence curse (I didn't know that term, here's a link ), although I think in a lot of worlds the gameboard is so obviously changed I expect handling it to be an easier sell.
We must pause early (AIs pose significant risk before they speed up research much). I think this is mostly ruled out by current evidence
FYI I currently would mainline guess that this is true. Also I don't get why current evidence says anything about it – current AIs aren't dangerous, but that doesn't really say anything about whether an AI that's capable of speeding up superalignment or pivotal-act-relevant research by even 2x would be dangerous.
I'm not getting your 35% to 5% reference? I just have no hope of getting as low as 5%, but a lot of hope for improving on just letting the labs take a swing.
i.e, if basically anything other than a long pause will be insufficient to actually work, you might as well swing for the pause.
I'm thinking of a slightly different plan than "increase the rate of people being able to think seriously about the problem" I'd like to convince people who already understand the problem to accept that pause is unlikely and alignment is not known to be impossibly hard even on short timelines. ...
...Getting entirely new people to understand the hard parts of the problem and then understand all of the technical skills or theoretical subtleties is another route. I haven't thought as much about that one because I don't have a public platform,
I think it's useful to think of "rate of competent people think seriously about the right problems" is, like, the "units" of success for various flavors of plans here. There are different bottlenecks.
I currently think the rate-limiting reagent is "people who understand the problem". And I think that's in turn rate-limited on:
I think "moving an overton window" is a sort of different operation than what Bengio/Hinton/Dario are doing. (Or, like, yes, they are expanding an overton window, but, their entire strategy for doing so seems predicated on a certain kind of caution/incrementalness)
I think there are two pretty different workable strategies:
Going halfway from one to the other doesn't actually work, and the second one doesn't really work unless you actually do have those convictions. There are a few people trying to do the latter, but, most of them just don't actually have the reputation that'd make anyone care (and also there's a lot of skill to doing it right). I think if at least one of Yoshua/Geoffrey/Dario/Demis switched strategies it'd make a big difference.
Do you mean like a short pithy name for the fallacy/failure-mode?
That's why I want to convince more people that actually understand the problem to identify and work like mad on the hard parts like the world is on fire, instead of hoping it somehow isn't or can be put out.
FYI something similar to this was basically my "last year's plan", and it's on hold because I think it is plausible right now to meaningfully move the overton window around pauses or at least dramatic slowdowns. (This is based on seeing the amount of traffic AI 2027 got, and the number of NatSec endorsements that If Anyone Builds It Got, and having recently gotten to read it and thinking it is pretty good)
I think if Yoshua Bengio, Geoffrey Hinton, or Dario actually really tried to move overton windows instead of sort of trying to manuever within the current one, it'd make a huge difference. (I don't think this means it's necessarily tractable for most people to help. It's a high-skill operation)
(Another reason for me putting "increase the rate of people able to think seriously about the problem" on hold is that my plans there weren't getting that much traction. I have some models of what I'd try next when/if I return to it but it wasn't a slam dunk to keep going)
I assume a lot of us aren't very engaged with pause efforts or hopes because it seems more productive and realistic to work on reducing ~70% toward ~35% misalignment risks.
Nod. I do just, like, don't think that's actually that great a strategy – it presupposes it is actually easier to get from 70% to 35% than from 35% to 5%. I don't see Anthropic-et-al actually really getting ready to ask the sort of questions that would IMO be necessary to actually do-the-reducing.
I do agree that's enough evidence to be confused and dissatisfied with my guess. I'm basing my guess more on the phrasing of question, which sounds more like it's just meaning to be "what a reasonable person would think 'prevent exinction' would mean", and, the fact that Eliezer said that-sort-of-thing in another context doesn't necessarily mean it's what he meant here."