If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.
My main "claims to fame":
I elaborated a bit more on what I meant by "crazy": https://www.lesswrong.com/posts/PMc65HgRFvBimEpmJ/legible-vs-illegible-ai-safety-problems?commentId=x9yixb4zeGhJQKtHb.
And yeah I do have a tendency to take weird ideas seriously, but what's weird about the idea here? That some kinds of safety work could actually be harmful?
Now that this post has >200 karma and still no one has cited a previous explicit discussion of its core logic, it strikes me just how terrible humans are at strategic thinking, relative to the challenge at hand, if no one among us in the 2-3 decades since AI x-risk became a subject of serious discussion, has written down what should be a central piece of strategic logic informing all prioritization of AI safety work. And it's only a short inferential distance away from existing concepts and arguments (like legibility, capabilities work having negative EV). Some of us perhaps intuitively understood it, but neglected to or couldn't write down the reasoning explicitly, which is almost as bad as completely missing it.
What other, perhaps slightly more complex or less obvious, crucial considerations are we still missing? What other implications follow from our low strategic competence?
Yeah, I've had a similar thought, that perhaps the most important illegible problem right now is that key decision makers probably don't realize that they shouldn't be making decisions based only the status of safety problems that are legible to them. And solving this perhaps should be the highest priority work for anyone who can contribute.
"Musings on X" style posts tend not to be remembered as much, and I think this is a fairly important post for people to remember.
I guess I'm pretty guilty of this, as I tend to write "here's a new concept or line of thought, and its various implications" style posts, and sometimes I just don't want to spoil the ending/conclusion, like maybe I'm afraid people won't read the post if they can just glance at the title and decide whether they already agree or disagree with it, or think they know what I'm going to say? The Nature of Offense is a good example of the latter, where I could have easily titled it "Offense is about Status".
Not sure if I want to change my habit yet. Any further thoughts on this, or references about this effect, how strong it is, etc.?
That's a good point. I hope Joe ends up focusing more on this type of work during his time at Anthropic.
What are the disagreement votes for[1], given that my comment is made of questions and a statement of confusion? What are the voters disagreeing about?
(I've seen this in the past as well, disagreement votes on my questioning comments, so figure I'd finally ask what people have in mind when're voting like this.)
2 votes totally -3 agreement, at the time of this writing
Sorry, you might be taking my dialog too seriously, unless you've made such observations yourself, which of course is quite possible since you used to work at OpenAI. I'm personally far from the places where such dialogs might be occurring, so don't have any observations of them myself. It was completely imagined in my head, as a dark comedy about how counter to human (or most human's) nature strategic thinking/action about AI safety is, and partly a bid for sympathy for the people caught in the whiplashes, to whom this kind of thinking or intuition doesn't come naturally.
Edit: To clarify a bit more, B's reactions like "WTF!" were written more for comedic effect, rather than trying to be realistic or based on my best understanding/predictions of how a typical AI researcher would actually react. It might still be capturing some truth, but again just want to make sure people aren't taking my dialog more seriously than I intend.
The Inhumanity of AI Safety
A: Hey, I just learned about this idea of artificial superintelligence. With it, we can achieve incredible material abundance with no further human effort!
B: Thanks for telling me! After a long slog and incredible effort, I'm now a published AI researcher!
A: No wait! Don't work on AI capabilities, that's actually negative EV!
B: What?! Ok, fine, at huge personal cost, I've switched to AI safety.
A: No! The problem you chose is too legible!
B: WTF! Alright you win, I'll give up my sunken costs yet again, and pick something illegible. Happy now?
A: No wait, stop! Someone just succeeded in making that problem legible!
B: !!!
But at the same time humans are able to construct intricate logical artifacts like the general number field sieve, which seems to require many more steps of longer inferential distance, and each step could only have been made by a small number of specialists in number theory or algebraic number theory available and thinking about factoring algorithms at the time. (Unlike the step in the OP, which seemingly anyone could have made.)
Can you make sense of this?