Maybe the most important test for a political or economic system is whether it self-destructs. This is in contrast to whether it produces good intermediate outcomes. In particular, if free-market capitalism leads to an uncontrolled intelligence explosion, then it doesn’t matter if it produced better living standards than alternative systems for ~200 years – it still failed at the most important test.
A couple of other ways to put it:
Under this view, political/economic systems that produce less growth but don’t create the incentives for unbounded competition are preferred. Sadly, for Molochian reasons this seems hard to pull off.
I like this idea. As I tried to be more organized and less late to things, I implicitly did something like this, and this is a nice framing of that process.
I do think the asymmetry of the consequences distorts the updates I make a little though – since I am trying hard not to be late, I sometimes leave an unreasonable amount of buffer. I was once 45 minutes early to an appointment because I was taking public transport to an unfamiliar part of the city. I find it harder to make an update based on being early, because I don't know the variance – if I'm late (and I was trying hard not to be), then I clearly underestimated the worst case, but if I'm early then I could have just got lucky.
I think there are multiple ways of interpreting “alignment is as difficult as X”. There’s “the safety issues in building AGI are similar to the safety issues in building X”, but there’s also “solving the safety issues in building AGI takes the same level of total effort as building X”.
I interpreted Chris Olah’s graph as the latter – that the ‘steam engine world’ is a world where solving AI safety takes as much total effort as building the steam engine, agnostic of how that effort is spent. NOT that in those worlds, you solve AI safety issues in the same way that you solve steam engine safety issues.
Put another way, I was imagining the graph as primarily quantitative – you could crudely replace the x-axis with “# person-hours”.
You will never find a $100 bill on the floor of Grand Central Station at rush hour, because someone would have picked it up already.
Are you really less likely to find $100 in Grand Central Station than finding $100 anywhere else? It's true that there are many more people who could find it before you, but there are also many more people that could drop $100. If you imagine a 1D version, where everyone walks through either Grand Central Station or a quiet alley along the same line, one after the other, then it seems like you should be equally likely to find $100 in either case – if the person in front of you in the line drops $100.
There are many ideas in here that I've heard said offhand but never really dived into, and there's something very informative and satisfying about seeing them painted in detail. Similar to the difference between a one-sentence description of a painting, and the actual painting.
It's satire though, it conveys some vibes but the detail that's being painted is not an accurate portrayal of the actual detail that exists. Put another way, I think it would be a mistake if you encountered someone in real life and thought "ah yes, that's 'the person who feels they have had a meditation/emotional insight and this has gotten them over the intellectual hump of not building killing machines' from The Company Man, I know how they think".
I wonder what the driving factor of transmission is before symptoms emerge. If everyone was very careful about following good practices once they were obviously sick (e.g. wearing masks, sanitizing their hands after blowing nose or any contact with face), what would we have to do to prevent the spread? Do fomites become more important if you aren't coughing or sneezing yet?
Our paper on defense in depth (STACK) found similar results – similarly-sized models with a few-shot prompt significantly outperformed the specialized guard models, even when adjusting for FPR on benign queries.
One thing is that even given access to the model weights and the code behind the API, you could not tell if the model was password-locked, whereas you would see the hardcoded verifier. Thus if a lab wanted to hide capabilities they could delete the training data and you would have no way of knowing.
The Wikipedia article has a typo in one of these: it should say "I am sparkling; you are unusually talkative; he is drunk." (as in the source)
Under this view you can totally have intermediary metrics, they just look more like “how much does your society avoid tragedies of the commons” rather than “what is the median quality of life”.
To be clear, this post was not intended as a subtle endorsement of communism. I agree with MondSemmel’s point that basically any system which produced slower economic growth would probably do better under this view, if only because AI development is slower.