I haven't followed every comment you've left on these sorts of discussions, but they often don't include information or arguments I can evaluate. Which MIRI employees, and what did they actually say? Why do you think that working at Anthropic even in non-safety roles is a great way to contribute to AI safety? I understand there are limits to what you can share, but without that information these comments don't amount to much more than you asking us to defer to your judgement. Which is a fine thing to do, I just wish it were more clearly stated as such.
This was a really important update for me. I remember being afraid of lots of things before I started publishing more publicly on the internet: how my intelligence would be perceived, if I'd make some obviously stupid in retrospect point and my reputation ruined forever, etc. Then at some point in this thought loop I was like wait the most likely thing is just that no one reads this, right? More like a "huh" or a nothing at all rather than vitriolic hatred of my soul or whatever I was fearing. This was very liberating, and still is. I probably ended up over optimizing for invisibility because of the freedom I feel from it—being mostly untethered from myopic social dynamics has been really helpful for my thinking and writing.
I tend to write in large tomes that take months or years to complete, so I suppose I disagree with you too. Not that intellectual progress must consist of this, obviously, but that it can mark an importantly different kind of intellectual progress from the sort downstream of continuous shipping.
In particular, I think shipping constantly often causes people to be too moored to social reception, risks killing butterfly ideas, screens off deeper thought, and forces premature legibility. Like, a lot of the time I feel ready to publish something there is some bramble I pass in my writing, some inkling of “Is that really true? What exactly do I mean there?” These often spin up worthy investigations of their own, but I probably would’ve failed to notice them were I more focused on getting things out.
Intellectual labor should aggregate minute-by-minute with revolutionary insights aggregating from hundreds of small changes.
This doesn’t necessarily seem in conflict with “long tomes which take months to write.” My intellectual labor consists of insights aggregating from hundreds of small changes afaict, I just make those changes in my own headspace, or in contact with one or two other minds. Indeed, I have tried getting feedback on my work in this fashion and it’s almost universally failed to be helpful—not because everyone is terrible, but because it’s really hard to get someone loaded enough to give me relevant feedback at all.
Another way to put it: this sort of serial iteration can happen without publishing often, or even at all. It’s possible to do it on your own, in which case the question is more about what kind of feedback is valuable, and how much it makes sense to push for legibility versus pursuing the interesting thread formatted in your mentalese. I don’t really see one as obviously better than the other in general, and I think that doing either blindly can be pretty costly, so I'm wary of it being advocated as such.
The first RSP was also pretty explicit about their willingness to unilaterally pause:
Note that ASLs are defined by risk relative to baseline, excluding other advanced AI systems.... Just because other language models pose a catastrophic risk does not mean it is acceptable for ours to.
Which was reversed in the second:
It is possible at some point in the future that another actor in the frontier AI ecosystem will pass, or be on track to imminently pass, a Capability Threshold… such that their actions pose a serious risk for the world. In such a scenario, because the incremental increase in risk attributable to us would be small, we might decide to lower the Required Safeguards.
Relatedly, I often feel like I'm interfacing with a process that responded to every edge case with patching. I imagine this is some of what's happening when the poor printer has to interface with a ton of computing systems, and also why bureaucracies like the DMV seem much more convoluted than necessary. Since each time an edge case comes up the easier thing is to add another checkbox/more red tape/etc, and no one is incentivized enough to do the much harder task of refactoring all of that accretion. The legal system has a bunch of this too, indeed I just had to sign legal documents which were full of commitments to abstain from very weird actions (why on Earth would anyone do that?). But then you realize that yes, someone in fact did that exact thing, and now it has to be forever reflected there.
I agree we're in better shape all else equal than evolution was, though not by enough that I think this is no longer a disaster. Even with all these advantages, it still seems like we don't have control in a meaningful sense—i.e., we can't precisely instill particular values, and we can't tell what values we've instilled. Many of the points here don't bear on this imo, e.g., it's unclear to me that having tighter feedback loops of the ~same crude process makes the crude process any more precise. Likewise, adapting our methods, data, and hyperparameters in response to problems we encounter doesn't seem like it will solve those problems, since the issues (e.g., of proxies and unintended off-target effects) will persist. Imo, the bottom line is still that we're blindly growing a superintelligence we don't remotely understand, and I don't see how these techniques shift the situation into one where we are in control of our future.
Agreed. Also, I think the word “radical” smuggles in assumptions about the risk, namely that it’s been overestimated. Like, I’d guess that few people would think of stopping AI as “radical” if it was widely agreed that it was about to kill everyone, regardless of how much immediate political change it required. Such that the term ends up connoting something like “an incorrect assessment of how bad the situation is.”
Empirics reigns, and approaches that ignore it and try to nonetheless accomplish great and difficult science without binding themselves tight to feedback loops almost universally fail.
Many of our most foundational concepts have stemmed from first principles/philosophical/mathematical thinking! Examples here abound: Einstein’s thought experiments about simultaneity and relativity, Szilard’s proposed resolution to Maxwell’s demon, many of Galileo’s concepts (instantaneous velocity, relativity, the equivalence principle), Landauer’s limit, logic (e.g., Aristotle, Frege, Boole), information theory, Schrödinger’s prediction that the hereditary material was an aperiodic crystal, Turing machines, etc. So it seems odd, imo, to portray this track record as near-universal failure of the approach.
But there is a huge selection effect here. You only ever hear about the cool math stuff that becomes useful later on, because that's so interesting; you don't hear about stuff that's left in the dustbin of history.
I agree there are selection effects, although I think this is true of empirical work too: the vast majority of experiments are also left in the dustbin. Which certainly isn’t to say that empirical approaches are doomed by the outside view, or that science is doomed in general, just that using base rates to rule out whole approaches seems misguided to me. Not only because one ought to choose which approach makes sense based on the nature of the problem itself, but also because base rates alone don’t account for the value of the successes. And as far as I can tell, the concepts we’ve gained from this sort of philosophical and mathematical thinking (including but certainly not limited to those above) have accounted for a very large share of the total progress of science to date. Such that even if I restrict myself to the outside view, the expected value here still seems quite motivating to me.
Fwiw, my experience has been more varied. My most well received comments (100+ karma) are a mix of spending days getting a hard point right and spending minutes extemporaneously gesturing at stuff without much editing. But overall I think the trend points towards "more effort = more engagement and better received." I have mostly attributed this to the standards and readership LessWrong has cultivated, which is why I feel excited to post here. It seems like one of the rare places on the internet where long, complex essays about the most fascinating and important topics are incentivized. My reddit posts are not nearly as well received, for instance. I haven't posted as many essays yet, but I've spent a good deal of effort on all of them, and they've all done fairly well (according to karma, which ofc isn't a great indicator of impact, but some measure of "popularity").
I weakly guess that your hypothesis is right, here. I.e., that the posts you felt most excited about were exciting in part because they presented more interesting and so more difficult thinking and writing challenges. At least for me, tackling topics on the edge of my knowledge takes much more skill and much more time, and it is often a place where effort translates into "better" writing: clearer, more conceptually precise, more engaging, more cutting to the core of things, more of what Pinker is gesturing at. These posts would not be good were they pumped out in a day—not an artifact I'd be proud of, nor something that other people would see the beauty or the truth in. But the effortful version is worth it, i.e., I expect it to be more helpful for the world, more enduring, and more important, than if that effort had been factored out across a bunch of smaller, easier posts.