If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.
My main "claims to fame":
maybe we could get some actual concrete examples of illegible problems and reasons to think they are important?
See Problems in AI Alignment that philosophers could potentially contribute to and this comment from a philosopher saying that he thinks they're important, but "seems like there's not much of an appetite among AI researchers for this kind of work" suggesting illegibility.
Yeah it's hard to think of a clear improvement to the title. I think I'm mostly trying to point out that thinking about legible vs illegible safety problems leads to a number of interesting implications that people may not have realized. At this point the karma is probably high enough to help attract readers despite the boring title, so I'll probably just leave it as is.
I think it is generous to say that legible problems remaining open will necessarily gate model deployment, even in those organizations conscientious enough to spend weeks doing rigorous internal testing.
In this case you can apply a modified form of my argument, by replacing "legible safety problems" with "safety problems that are actually likely to gate deployment", and then the conclusion would be that working on such safety problems are of low or negative EV for the x-risk concerned.
It means that if a problem isn't actually going to get solved by someone else, then it's my job to make sure it gets solved, no matter who's job it is on paper.
There is a countless number of problems in the world that are not actually going to get solved, by anyone. This seems to imply that it's my job to make sure they all get solved. This seems absurd and can't be what it means, but what is the actual meaning of heroic responsibility then?
For example, does it mean that I should pick the problem to work on that has the highest EV per unit of my time, or pick the problem that I have the biggest comparative advantage in, or something like that? But then how does "heroic responsibility" differ from standard EA advice and what is "heroic" about it? (Or maybe it was more heroic and novel, at a time when there was no standard EA advice?) Anyway I'm pretty confused.
What about more indirect or abstract capabilities work, like coming up with some theoretical advance that would be very useful for capabilities work, but not directly building a more capable AI (thus not "directly involves building a dangerous thing")?
And even directly building a more capable AI still requires other people to respond with bad thing Y = "deploy it before safety problems are sufficiently solved" or "fail to secure it properly", doesn't it? It seems like "good things are good" is exactly the kind of argument that capabilities researchers/proponents give, i.e., that we all (eventually) want a safe and highly capable AGI/ASI, so the "good things are good" heuristic says we should work on capabilities as part of achieving that, without worrying about secondary or strategic considerations, or just trusting everyone else to do their part like ensuring safety.
One potential issue is that this makes posting shortforms even more attractive, so you might see everything being initially posted as shortforms (except maybe very long effortposts) since there's no downside to doing that. I wonder if that's something the admins want to see.
Any suggestions?
Thanks! Assuming it is actually important, correct, and previously unexplicated, it's crazy that I can still find a useful concept/argument this simple and obvious (in retrospect) to write about, at this late date.
Legible problem is pretty easy to give examples for. The most legible problem (in terms of actually gating deployment) is probably wokeness for xAI, and things like not expressing an explicit desire to cause human extinction, not helping with terrorism (like building bioweapons) on demand, etc., for most AI companies.
Giving an example for an illegible problem is much trickier since by their nature they tend to be obscure, hard to understand, or fall into a cognitive blind spot. If I give an example of a problem that seems real to me, but illegible to most, then most people will fail to understand it or dismiss it as not a real problem, instead of recognizing it as an example of a real but illegible problem. This could potentially be quite distracting, so for this post I decided to just talk about illegible problems in a general, abstract way, and discuss general implications that don't depend on the details of the problems.
But if you still want some explicit examples, see this thread.