While I too was using Tao as a reference class, it's not the only reason for mentioning him. I simply expect that people with IQs that ridiculously high are simply better suited to tackling novel topics, and I do mean novel, building a field from scratch, ideally with mathematical precision.
All the more if they have a proven track record, especially in mathematics, and I suspect that if Tao could be convinced to work on the problem, he would have genuinely significant insight. That and a cheerleader effect, which wouldn't be necessary in an ideal world, but that's hardly the one we live in is it?
I wonder what it would take to bring Terence Tao on board..
At any rate, this is good news, the more high status people in academia take Alignment seriously, the easier it becomes to convince the next one, in what I hope is a virtuous cycle!
If there was one thing that I could change in this essay, it would be to clearly outline that the existence of nanotechnology advanced enough to do things like melt GPUs isn't necessary even if it is sufficient for achieving singleton status and taking humanity off the field as a meaningful player.
Whenever I see people fixate on critiquing that particular point, I need to step in and point out that merely existing tools and weapons (is there a distinction?) suffice for a Superintelligence to be able to kill the vast majority of humans and reduce our threat to it to negligible levels. Be that wresting control of nuclear arsenals to initiate MAD or simply extrapolating on gain-of-function research to produce extremely virulent yet lethal pathogens that can't be defeated before the majority of humans are infected, such options leave a small minority of humans alive to cower in the wreckage until the biosphere is later dismantled.
That's orthogonal to the issue of whether such nanotechnology is achievable for a Superintelligent AGI, it merely reduces the inferential distance the message has to be conveyed as it doesn't demand familiarity with Drexler.
(Advanced biotechnology already is nanotechnology, but the point is that no stunning capabilities need to be unlocked for an unboxed AI to become immediately lethal)
Absolutely! I haven't used the messaging features here much, but I'm open to a conversation in any medium of your choice.
Primarily talking about it in rat-adjacent communities that are both open to such discussion, but also contain a large number of people who aren't immersed in AI X-risk. A pertinent example would be either the SSC subreddit or its spinoff, The Motte.
The ideal target is someone with the intellectual curiosity to want to know more about such matters, while also not having encountered them beyond glancing summaries. Below that threshold, people are hard to sway because they're going off the usual pop culture tropes about AI, and significantly above that, you have the LW crowd, and me trying to teach them anything novel would be trying to teach my grandma to suck eggs.
If I can find people who are mildly aware of such possibilities, then it's easier to dispel any particular misconceptions they have, such as the tendency to anthromorphize AI, the question of "why not shut it off" etc. Showing them the blistering pace of progress in ML is a reliable eye-opener in my experience.
Engaging with naysayers is also effective, there's a certain stentorian type who not only has said misunderstandings, but loudly shares them to dismiss X-risk altogether. Dismantling such arguments is always good, even if the odds of convincing them are minimal. There's always a crowd of undecided but curious people who are somewhat swayed.
There's also the topic of automation-induced unemployment, which is what I usually bring up in medical circles that would otherwise be baffled by AI X-risk. That's the most concrete and imminent danger any skilled professional faces, even if the current timelines indicate that the period between the widespread adoption of near-human AI and actual Superhuman AGI is going to be tiny.
That's about as much as I can do, I don't have the money to donate anything but pocket change, and my access to high-flying ML engineers is mostly restricted to this very forum. I'm acutely aware that I'm not good enough at math to produce original work in the field, so given those constraints, I consider it a victory if I can sway people wealthier and better positioned by virtue of living in the First World on the matter!
Things are a lot easier for me, given that I know that I couldn't contribute to Alignment research directly, and the other option, monetarily, is at least not bottlenecked by money so much as prime talent. A doctor unfortunate enough to reside in the Third World, who happens to have emigration plans and a large increase in absolute discretionary income that will only pay off in tens of years has little scope to do more than signal boost.
As such, I intend to live the rest of my life primarily as a hedge against the world in which AGI isn't imminent in the coming decade or two, and do all the usual things humans do, like keeping a job, having fun, raising a family.
That's despite the fact that I think it's more likely than not that I or my kids won't make it out of the 21st century, but at the least it'll be a quick and painless death, with the dispassionate dispatch of a bulldozer running over an anthill, not any actual malice.
Outright sadism is unlikely to be a terminal or contingent goal for any AGI we make, however unaligned; and I doubt that the life expectancy of anyone on a planet rapidly being disassembled for parts will be large enough for serious suffering. In slower circumstances, such as an Aligned AI that only caters to the needs of a cabal of creators, leaving the rest of us to starve, I have enough confidence that I can make the end quick.
Thus, I've made my peace with likely joining the odd 97 billion anatomically modern humans in oblivion, plus another 8 or 9 concurrently departing with me, but it doesn't really spark anxiety or despair. It's good to be alive, and I probably wouldn't prefer to have been born at any earlier a time in history. Hoping for the best and expecting the worst really, assuming your psyche can handle it.
Then again, I'm not you, and someone with a decent foundation in ML is also in the 0.01% of people who could feasibly make an impact in the time we have, and I selfishly hope that you can do what I never could. And if not, at least enjoy the time you have!
I find this a questionable proposition at best. Indeed, there are fates worse than extinction for humanity, such as an AI that intentionally tortures humans, meat or simulated, beyond the default scenario of it considering us as arrangements of atoms that it could use for something else, a likely convergent goal of most unaligned AI. The fact that it still keeps humans alive to be tortured would actually be a sign that we were closer to aligning it than not, which is a small consolation on a test where anything significantly worst than a perfect score on our first try is death.
However, self-preservation is easy.
An AGI of any notable intelligence would be able to assemble Von Neumann probes by the bucketload, and use them as the agents of colonization. We've presumably got an entity that is at least as intelligent as a human, likely incredibly more so, that is unlikely to be constrained by biological hurdles that preclude us from making perfect copies of ourselves, memory and all, with enough redundancy and error correction that data loss wouldn't be a concern till the black holes start evaporating.
Space is enormous. An AGI merely needs to seed a few trillion copies of itself, in inconvenient locations such as interstellar space or even extragalactic space, and rest assured that even if the main-body encounters some unfortunate outcome, such as an out-of-context problem, a surprise supernova explosion, alien AGI or the like, it would be infeasible to hunt down each and every copy scattered across the light-cone, especially the ones accelerated to 99.99% c and then sent out of the Laniakea Supercluster.
As such, I feel it is a vanishingly unlike that a situation like the one outlined here could even arise, as it requires the unlikely confluence of an AGI being unable to think of frankly obvious mitigation strategies.
I was just overwhelmed by the number of hyperlinks, producing what can only be described as mild existential terror haha. And the fact that they lead to clear examples of the feasibility of such proposal in every single example was impressive.
I try to follow along with ML, mostly by following behind Gwen's adventures, and this definitely seems to be a scenario worth considering, where business as usual continues for a decade, we make what we deem prudent and sufficient efforts to Align AI and purge unsafe AI, but the sudden arousal of agentic behavior throws it all for a loop.
Certainly a great read, and concrete examples that show Tomorrow AD futures plausibly leading to devastating results are worth a lot for helping build intuition!