I was about to make the same point. GPTx is trying to hack GPTx-1 at best. Unless there is a very sudden takeoff, important software will be checked, rechecked by all capable AI. Yud seems to miss this, (or believe the hard takeoff is so sudden that there won't be any GPTx-1 to make the code secure).
I remember when spam used to be a thing and people were breathlessly predicting a flood of Android viruses... Attack doesn't always get easier.
I was of the understanding that the only reasonable long term strategy was human enhancement in some way. As you probably agree even if we perfectly solved alignment whatever that meant we would be in a world with AI's getting ever smarter and a world we understood less and less. At least some people having significant intelligence enhancement though neural lace or mind uploading seems essential medium to long term. I see getting alignment somewhat right as a way of buying us time.
Something smarter than you will wind up doing whatever it wants. If it wants something even a little different than you want, you're not going to get your way.
As long as it wants us to be uplifted to its intelligence level then that seems OK. It can have 99% of the galaxy as long as we get 1%.
My positive and believable post singularity scenario is where you have circles of more to less human like creatures. I.e. fully human, unaltered traditional earth societies, societies still on earth with neural lace, some mind uploads, space colonies with probably all at least somewhat enhanced, and starships pretty much pure AI (think Minds like in the Culture)
I agree that the extreme one-shot plan outlined by Yud and discussed here isn't likely.
However its likely that we make thing a lot easier for an AI, for example with autonomous weapons in a wartime situation. If the AI is already responsible for controlling the weapons systems (drones with guns etc are far superior to soldiers) and making sure the factories are running at max efficiency then far less calculation and creativity is needed for AI takeover.
IMO as I think a slow takeoff is most likely, robots, autonomous weapons systems, increase takeover risk a lot. For this reason now I am much less convinced a pause in AI capabilities is a good thing. I would rather have a superintelligence in a world without these things, i.e. now than later.
To put this plainly if we were offered a possible future where over the course of the next 1-2 years we learned (and deployed) everything there was to know about intelligence and mind algorithms to exploit our hardware to the max efficiency but there was no hardware improvements I would be tempted to take it over the alternative. A plausible alternative is of course it takes 5-10 years to get such algorithms right and this happens with a large overhang and sudden capability gains into a world with neuromorphic chips, robots everywhere and perhaps an autonomous weapons system war ongoing.
not allow the machine to look at its own weights
Is this enough if the AI has access to other compute and can make itself a "twin" on some other hardware? If it has training data similar to what it was trained on and can test its new twin to make it similar to itself in capabilities, then it could identify with that twin as being essentially itself then look at those weights etc.
Thanks. My title was a bit tongue in cheek 'Betteridge's law' so yes I agree. I have decided to reply to your post before I have read your references as that may take a while to digest but I plan to do so. I also see you have written stuff on P-Zombies that I was going to write something on. As a relative newcomer its always a balance between just saying something and attempting to read and digest everything about it on LW first.
Level of AI risk concern: medium/high
(Similar risk to Christiano from what I can see, however I would put him >medium compared to the public and most AI researchers from what I can see)
General level of risk tolerance in everyday life: medium
Not sure how exactly to rate this as I started a tech company and do semi-dangerous recreational activities, but don't think it is high.
Brief summary of what you do in AI
Not actively working in it, however have released a product containing a DNN I trained in the past and follow the field.
Nothing unusual about me personally, especially compared to this audience.
Thanks for the prediction. I may write one of my own at sometime but I thought I should put some ideas forward in the meantime.
Good article, I agree that we definitely need to try now and its likely that if we don't another group will take over the narrative.
I also think that it is important for people to know what they are working towards as well as away from. Imagining what a positive Singularity for them personally is like is something I think the general public should also start doing. Positive visions inspire people, we know that. To me its obvious that such a future would involve different groups with different values somewhat going their own ways. Thinking about it, that is about the only thing I can be sure of. Some people will obviously be much more enthusiastic for biological/tech enhancement than others, and of course living of earth. we agree that coherent extrapolated volition is important, its time we thought a bit about what its details are.
I don't understand why it helps that much if instrumental convergence isn't expected. All it takes is one actor to deliberately make a bad agentic AI and you have all the problems, but with no "free energy" being taken out by slightly bad, less powerful AI beforehand that would be there if instrumental convergence happened. Slow takeoff seems to me to make much more of a difference.