"Simulacrum escapees" are explicitly one of the main failure modes we'll need to address, yes. Some thoughts:
This is a challenge, but one I'm optimistic about handling.
Weeping Agents: Anything that holds the image of an agent becomes an agent
Nice framing! But I somewhat dispute that. Consider a perfectly boxed-in AI, running on a computer with no output channels whatsoever (or perhaps as a homomorphic computation, i. e., indistinguishable from noise without the key). This thing holds the image of an agent; but is it really "an agent" from the perspective of anyone outside that system?
Similarly, a sufficiently good world-model would sandbox the modeled agents well enough that it wouldn't, itself, engage in an agent-like behavior from the perspective of its operators.
As in: we come up with a possible formalization of some aspect of agent foundations, then babble potential theorems about it at the proof synthesizer, and it provides proofs/disproofs. This is a pretty brute approach and is by no means a full solution, but I expect it can nontrivially speed us up.
I think that (metaphorically) there should be an all-caps disclaimer that reads something like "TO BE CLEAR AI IS STILL ON TRACK TO KILL EVERYONE YOU LOVE; YOU SHOULD BE ALARMED ABOUT THIS AND TELLING PEOPLE IN NO UNCERTAIN TERMS THAT YOU HAVE FAR, FAR MORE IN COMMON WITH YUDKOWSKY AND SOARES THAN YOU DO WITH THE LOBBYISTS OF META, WHO ABSENT COORDINATION BY PEOPLE ON HUMANITY'S SIDE ARE LIABLE TO WIN THIS FIGHT, SO COORDINATE WE MUST" every couple of paragraphs.
Yeah, I kind of regret not prefacing my pseudo-review with something like this. I was generally writing it from the mindset of "obviously the book is entirely correct and I'm only reviewing the presentation", and my assumption was that trying to "sell it" to LW users was preaching to the choir (I would've strongly endorsed it if I had a big mainstream audience, or even if I were making a top-level LW post). But that does feel like part of the our-kind-can't-cooperate pattern now.
My guess is that they're doing the motte-and-bailey of "make it seem to people who haven't read the book that it says that the ASI extinction is inevitable, that the book is just spreading doom and gloom", from which, if challenged, they could retreat to "no, I meant doom isn't inevitable even if we do build ASI using the current methods".
Like, if someone means the latter (and has also read the book and knows that it goes to great lengths to clarify that we can avoid extinction), would they really phrase it as "doom is inevitable", as opposed to e. g. "safe ASI is impossible"?
Or maybe they haven't put that much thought into it and are just sloppy with language.
I particularly agree with the point about the style being much more science-y than I'd expected, in a way that surely filters out large swathes of people. I'm assuming "people who are completely clueless about science and are unable to follow technical arguments" are just not the target audience. To crudely oversimplify, I think the target audience is 120+ IQ people, not 100 IQ people.
I mention this for transparency but also because some seem to be rallying around IABIED, even with its shortcomings, because they don’t think there is another option
I think IABIED should be rallied around because "the MIRI book" is the obvious Schelling point for rallying around. It has brand recognition in our circles, its release is a big visible event, it managed to get into best-seller categories meaning it's visible to the mainstream audiences, etc. Even if there are other books which are moderately better at doing what IABIED does, it wouldn't be possible to amplify their impact the same way (even if, say, Eliezer personally recommended them), so IABIED it is.
Further, even if it's possible to coordinate around and boost a different book the same way, this would require additional time; months or years (if that better book is yet to be written). We don't have much of that luxury, in expectation.
This still wouldn't be a good idea if IABIED were actively bad, of course. But it's not. I think it's reasonably good, even if we have our quibbles; and MIRI's pre-release work shows that it seems convincing to non-experts.
We could think about crafting better persuasion-artefacts in the future, but I think rallying around IABIED is the only option, at this point in time. And it may or may not be a marginally worse option compared to some hypothetical alternatives, but it's not a bad option.
However, in most of the experimental sciences, formal results are not the main bottleneck, so speed-ups would be more dependent on progress on coding, fuzzier tasks, robotics, and so on
One difficulty with predicting the impact of "solving math" on the world is the Jevons effect (or a kind of generalization of it). If posing a problem formally becomes equivalent to solving it, it would have effects beyond just speeding up existing fully formal endeavors. It might potentially create qualitatively new industries/approaches relying on cranking out such solutions by the dozens.
E. g., perhaps there are some industries which we already can fully formalize, but which still work in the applied-science regime, because building the thing and testing it empirically is cheaper than hiring a mathematician and waiting ten years. But once math is solved, you'd be able to effectively go through dozens of prototypes per day for, say, $1000, while previously, each one would've taken six months and $50,000.
Are there such industries? What are they? I don't know, but I think there's a decent possibility that merely solving formal math would immediately make things go crazy.
I am somewhat interested now. I'll aim to look over it and get back to you, but no promises.
I still think there's a science to it which is yet to be properly written up. It's not at the level of "this combination of design choices/clothing elements is bad, this one is good", but there is a high-level organization to the related skills/principles, which can be taught to speed up someone learning design/fashion. They would still need to do a bunch of case studies/bottom-up learning afterwards (to learn specific extant patterns like "the vibe of a 90s CS professor"), but you can make that learning more sample-efficient.
Social skills are a good parallel. Actually talking to people and trying to accomplish different things with your conversations is necessary for developing social competence, but knowing some basics of the theory of mind and social dynamics is incredibly helpful for knowing what to pay attention to and try.
I agree that a competent write-up on the (true) theory of fashion seems to be missing. The usual way to deal with such situations is to act like the neural network you are: find some big dataset of [clothing example, fashionability analysis] pairs, consume it, then reverse-engineer the intuitions you've learned. If there's no extant literature on the top-down theory available, go bottom-up and derive it yourself. (It will be time-consuming.)
Of course, there are lots of other options besides literally just a leather jacket. As a general rule, any outfit which makes people ask “are you in a band?” signals coolness.
There are lots of options in the possibility-space. But are there lots of options on the actual market?
Fashion industry is one of those things that makes me want to just go do it all myself in frustration.[1] The clothing-space seems drastically underexplored.
The most obvious element is coloration. For any piece of clothing, there's a wide variety of complex-yet-tasteful multicolor patterns one might try. Very subtle highlights and gradients; unusual but simple geometric patterns; strong but carefully-chosen contrasts. You don't want a literal clash-of-colors clown outfit, but there's a wealth of possibilities beyond "one simple color"; e. g., mixing different hues of the same color.
Yet, most items on the market do just pick one simple color. Alternatively, they pick a common, basic (and therefore boring, conformist) pattern, e. g. plaid shirts. On the other end of the spectrum, you have graphic tees and such, which are varied but are decidedly unsubtle and, in my opinion, pretty lame (outside very specific combinations of design and social context[2]).
You can slightly deal with that by wearing several items of different colors that combine the way you want. But this only allows basic combinations, and the ability to do that becomes very constrained in hot weather.
Eagerly awaiting the point when AI advances enough for me to vibe-design and 3D print anything I can imagine.
Eh, I think any nontrivial technical project can be made to sound like an incredibly significant and therefore dauntingly impossible achievement, if you pick the right field to view it from. But what matters is the actual approach you're using, and how challenging the technical problems are from the perspective of the easiest field in which they could be represented.
Some examples:
To generalize: Suppose there's some field A which is optimizing for X. Improving on X using the tools of A would necessarily require you to beat a market that is efficient-relative-to-you. Experts in A already know the tools of A in and out, and how to use them to maximize X. Even if you can beat them, it would only be an incremental improvement. A slightly better solver for systems of nonlinear equations, a slightly faster horse, a slightly better trading algorithm.
The way to actually massively improve on X is to ignore the extant tools of A entirely, and try to develop new tools for optimizing X by using some other field B. On the outside view, this is necessarily a high-risk proposition, since B might end up entirely unhelpful; but it's also high-reward, since it might allow you to actually "beat the market". And if you succeed, the actual technical problems you'll end up solving will be massively easier than the problems you'd need to solve to achieve the same performance using A's tools.
Bringing it back around: This agenda may or may not be viewed as aiming to revolutionize program induction, but I'm not setting out to take the extant program-induction tools and try to cobble together something revolutionary using them. The idea is to use an entirely different line of theory (agent foundations, natural abstractions, information theory, recent DL advances) to achieve that end result.