EY saying that he would be happy with building ASI even if it only had a 10% chance of working out and not extinguishing humanity.
Not the main point of your post, but, I roll to disbelieve on this being a real thing he said – do you have a quote for that? (I maybe remember a version that was more like 50%)
I am pretty confident in my memory because I was a bit surprised to read it myself so it was salient. It was a tweet.
Twitter's search engine ignores the "%" symbol in search queries and I can't DM him on Twitter without paying. @Eliezer Yudkowsky paging
Maybe there is a better way to search the corpus of his tweets.
The point is that starting with human brain emulation seems like it could definitely lead to super intelligence while also having no particular reason to instrumentally converge on preferences / values / behaviors completely alien to humans.
This is not how I understand the term "instrumental convergence." My understanding of instrumental convergence is that it refers to goals that help achieve other goals. Pretty much any goal at all is easier if you have power and resources. And most goals require you to continue to exist.
And so humans are absolutely affected by instrumental convergence. Most humans seek to avoid death, and many humans seek power and resources. And humans seeking power and resources will sometimes behave very badly indeed. "Uploading" scanned humans and upgrading them to superintelligence might very well produce beings that sought power and resources, even in dangerous ways.
The other thing you're thinking about—AIs with weird, alien values—is also part of Eliezer's argument. Personally, I do suspect that he overstates the certainty of getting and AI with weird, alien values. He puts that likelihood at close to 100%. I would have put it lower—no higher than 85% or so. But this is largely because any AI is likely to be partially based on a technology analogous to LLMs. And so the AIs will probably at least understand human values. This does not at all guarantee that the AIs will share human values (or that they'll share the right human values from our perspective).
Hmmm, I don't think I said that humans are not affected by instrumental convergence or that my emulated superbrain would certainly not be dangerous. I just think if you made a random human 1000x more intelligent it's pretty unlikely they would immediately extinguish the rest of the human race, and the emulated superbrain seems like it belongs in a similar reference class.
Preamble
My current understanding: the EY/MIRI perspective is that superintelligent AI will invariably instrumentally converge on something that involves extinguishing humanity. I believe I remember a tweet from EY saying that he would be happy with building ASI even if it only had a 10% chance of working out and not extinguishing humanity.
Further understanding: This perspective is ultimately not sensitive to the architecture of the AI in question. I'm aware that many experts do view different types of AI as more or less risky. I think I recall some discussion years ago where people felt that the work OpenAI was doing was ultimately more dangerous / harder to align than DeepMind's work.
So as I understand it EY has the strongest possible view on instrumental convergence: It will definitely happen, no matter how you build ASI.
Ever since I encountered this strong view, as I understood it many years ago, I have considered it trivially false.
Hypothetical
Imagine a timeline where society arrived towards ASI via something much closer to bio-simulation or bio-engineering. That is to say, machine learning / neural nets / training were not really relevant.
Maybe this society first arrives at whole human brain emulation -- something that ultimately would surely be materially possible to do eventually. Maybe this happens on a big cube of silicon wafers, maybe it is fleshier, who knows.
I think it seems clear that this non-super AI would not "instrumentally converge" any more than you or I.
But I think that you could definitely arrive at ASI from this as a building block.
Etc. The point is that starting with human brain emulation seems like it could definitely lead to super intelligence while also having no particular reason to instrumentally converge on preferences / values / behaviors completely alien to humans.
---
I have not shared this idea because I thought it was obvious but I have also never seen anyone else say it (I didn't look that hard).
So my questions:
- Is this idea novel to you?
- Is this an idea that EY has replied to?
- Do you like my idea? please let me know.
Post script: I do wonder if next-token prediction is ultimately similar enough to the evolutionary forces that created human intelligence that it creates something ultimately quite capable of empathizing with humans. I am sure people have commented on this.
Thanks