(I appreciate you commenting; and I'm probably not going to try to evaluate your claims, because I'm not trying to be an expert in this stuff; but
E.g. I'm pretty skeptical for lay-reasons like "Really? If one of the 3ish major companies strongly supported the bill, that wouldn't much increase its chances of getting passed?" and "Really? But couldn't they have still said the bill was best as-is?" and "Really? How are you / how was Anthropic leadership so confident the bill wasn't viable?" and "Really? We're still doing this 'play it cool' strategy instead of just saying what we think would be good to do?" and "Wow, one of the top companies somehow figured out a way to rationalize not supporting regulating themselves even though they nobly said they would support that and they are such amazing homies, what an incredible surprise". But I could imagine lacking some context that makes the decision seem better.)
SB-1047. Anthropic lobbied hard to water down the bill, attempted to kill it, and only performed better than other AI companies due to internal pressure.
Even this single thing alone seems basically unforgiveable.
Ok I think I agree qualitatively with almost everything you say (except the thing about compute mattering so much in the longer run). I especially agree (IIUC what you're saying) that a top priority / best upstream intervention is the cultural attitudes. Basically my pushback / nuance is "the cultural consensus has a harder challenge compared to e.g. pandemic stuff, so the successful example of pandemic stuff doesn't necessarily argue that strongly that the consensus can work for AI in the longer run". In other words, while I agree qualitatively with
The consensus might be mostly sufficient [in the case of AI]
, I'm also suggesting that it's quantitatively harder to have the consensus do this work.
Research is also downstream of attitudes,
Totally, yeah. It's just that
And it might take at least decades to get from a design for an ASI that bootstraps on a 5 GW datacenter campus, to an ASI that bootstraps on an antique server rack.
I guess it might but I really super wouldn't bank on it. Stuff can be optimized a lot.
(I agree qualitatively, and not sure whether we disagree, but:)
those years could then be used for reducing the other inputs to premature superintelligence.
Basically I'm saying that there are inputs to premature ASI which
time for a significant shift in the attitudes to AI danger
This is true, but there's a countervailing force of just general, distributed, algorithmic AGI capabilities progress. One of the widely cited reasons for believing short timelines in the first place is "lots of researchers are working on capabilities, so even if they hit some bottleneck, they'll adapt and look for other paradigms / insights". I don't think this is actually valid for short timelines because switching, and the motivation for switching, can take some years; but it is definitely valid on slightly longer time scales (fives / tens of years). Progress that's more about algorithms than chips is harder to regulate, legally and probably also by social norms (because less legible). So even as attitudes become more anti-AGI-capabilities, enacting that becomes harder and harder, at least in some big ways.
I think the skepticism about the protein folder was "we can't make something effective because we can't optimize enough / search hard enough", where my skepticism about alignment is "we can't make something aligned because we can't aim optimization processes well enough". Part of how we can't aim search processes is that we don't have easily testable proxy measurements that are bound up with alignment strongly enough. What would be the evaluation function for AlignmentFold?
My argument, though, is that it is still very possible for the difficulty of alignment to be in the Apollo regime, and that we haven't received much evidence to rule that regime out (I am somewhat skeptical of a P vs. NP level of difficulty, though I think it could be close to that).
Are you skeptical of PvNP-level due to priors or due to evidence? Why those priors / what evidence?
(I think alignment is pretty likely to be much harder than PvNP. Mainly this is because alignment is very very difficult. (Though also note that PvNP has a maybe-possibly-workable approach, https://en.wikipedia.org/wiki/Geometric_complexity_theory, which its creator states might take a mere one century, though I presume that's not a serious specific estimate.))
Maybe the real issue is we don't know what AGI will be like, so we can't do science on it yet. Like pre-LLM alignment research, we're pretty clueless.
(This is my position, FWIW. We can ~know some things, e.g. convergent instrumental goals are very likely to either pursued, or be obsoleted by some even more powerful plan. E.g. highly capable agents will hack into lots of computers to run themselves--or maybe manufacture new computer chips--or maybe invent some surprising way of doing lots of computation cheaply.)
Is there a short summary that you'd be willing to undersign, regarding the intentions / efforts of Anthropic leadership around regulation? E.g.
Or something like that.