The AI safety research field is not a traditional research field because it comes with a deadline. Really, it is more of a project than a field. A project, like the Manhattan project with even higher consequences. If we accept this to be true, then the greatest imperative is speed, and if the pessimists are right and we are only a few years from losing control, then speed is more important than almost anything else. Over the last few months, I have been attempting to do AI safety research part-time. I dedicated maybe a total of 10 hours a week for 3 months and succeeded in making a classifier that sucked. In a normal discipline, that would be okay, and expected. I remember being told once that you only really start contributing to a research field after 3 years of working in it. That won't work for AI safety. There just isn't the time to run the 6-9 month research loop over and over again and slowly amass experience.
It occurred to me that the only way that I could contribute to the field in a way that actually moves the needle is to focus on building automated research loops. I don't think we are quite there yet, but I have started experimenting around in this direction and have been pleasantly surprised by the quality of some of the outputs. Of course, there are pitfalls to watch out for. AIs occasionally lie and fabricate, but it's not in principle impossible to build automated verification mechanisms to reduce that sort of behavior. It's not clear how much better my setup can get (it could very well plateau soon), but it seems useful to push in this direction (even if models/agents are not quite good enough now, they will improve in the future).
Now, I know there is a whole host of dangers with this... I know that this is exactly the route through which the disaster, if it comes, would likely arrive. AIs doing AI research leading to a large jump in capabilities. I just don't see any other way that:
1. I could personally contribute meaningfully to AI Safety
2.We can get to a technical solution to AI alignment in time (if indeed timelines are as short as seems likely).
The AI safety research field is not a traditional research field because it comes with a deadline. Really, it is more of a project than a field. A project, like the Manhattan project with even higher consequences. If we accept this to be true, then the greatest imperative is speed, and if the pessimists are right and we are only a few years from losing control, then speed is more important than almost anything else. Over the last few months, I have been attempting to do AI safety research part-time. I dedicated maybe a total of 10 hours a week for 3 months and succeeded in making a classifier that sucked. In a normal discipline, that would be okay, and expected. I remember being told once that you only really start contributing to a research field after 3 years of working in it. That won't work for AI safety. There just isn't the time to run the 6-9 month research loop over and over again and slowly amass experience.
It occurred to me that the only way that I could contribute to the field in a way that actually moves the needle is to focus on building automated research loops. I don't think we are quite there yet, but I have started experimenting around in this direction and have been pleasantly surprised by the quality of some of the outputs. Of course, there are pitfalls to watch out for. AIs occasionally lie and fabricate, but it's not in principle impossible to build automated verification mechanisms to reduce that sort of behavior. It's not clear how much better my setup can get (it could very well plateau soon), but it seems useful to push in this direction (even if models/agents are not quite good enough now, they will improve in the future).
Now, I know there is a whole host of dangers with this... I know that this is exactly the route through which the disaster, if it comes, would likely arrive. AIs doing AI research leading to a large jump in capabilities. I just don't see any other way that:
1. I could personally contribute meaningfully to AI Safety
2.We can get to a technical solution to AI alignment in time (if indeed timelines are as short as seems likely).
I would love to hear your thoughts!