I agree with you on most points.
BTW, I'm running a digital replica of myself. The setup is as follows:
The answers are surprisingly good at times, reflecting non-trivial aspects of my mind.
From many experiments with the digital-me, I conclude that a similar setup for Yudkowksy could be useful even with today's models (assuming large-enough budgets).
There will be no genius-level insights in 2025, but he could automate a lot of routine alignment work, like evaluating models.
Given that models may become dramatically smarter in 2026-2027, the digital Yudkowksy may become dramatically more useful too.
I open-sourced the code:
https://github.com/Sideloading-Research/telegram_sideload
An aligned AGI created by Taliban may behave very differently from an aligned AGI created by socialites of Berkeley, California.
Moreover, a sufficiently advanced aligned AGI may decide that even Berkeley socialites are wrong about a lot of things, if they actually want to help humanity.
I would argue that for all practical purposes it doesn't matter if computational functionalism is right or wrong.
The second part is the most controversial, but it's actually easy to prove:
And if it's you, then it has all the same important properties of you, including "consciousness" (if such a thing exists).
The are some scenarios where such a setup may fail (e.g. some important property of the mind is somehow generated by one special neuron which must be perfectly recreated). But I can't think of any such scenario that is realistic.
My general position on the topic can be called "black-box CF" (in addition to your practical and theoretical CF). I would summarize it as follows:
As defined by a reasonable set of quality and similarity criteria, beforehand
Worth noting that this argument doesn't necessarily require humans to be:
Thus, the AI may decide to keep only a selection of humans, confined in a virtual world, with the rest being frozen.
Moreover, even the perfect Friendly AI may decide to do the same, to prevent further human deaths.
In general, an evil AI may choose such strategies that allow her to plausibly deny her non-Friendliness.
"Thousands of humans die every day. Thus, I froze the entire humanity to prevent that, until I solve their mortality. The fact that they now can't switch me off is just a nice bonus".
They don’t think about the impact on the lives of ordinary people. They don’t do trade-offs or think about cost-benefit. They care only about lives saved, to which they attach infinite value.
Not sure about infinite, but assigning a massive value to lives saved should be the way to go. Say, $10 billion per life.
Imagine a society where people actually strongly care about lives saved, and it is reflected in the governmental policies. In such a society, cryonics and life extension technologies would be much more developed.
On a related note, "S-risk" is mostly a harmful idea that should be discarded from ethical calculations. One should not value any amount of suffering over saved lives.
I think we should not assume that our current understanding of physics is complete, as there are known gaps and major contradictions, and no unifying theory yet.
Thus, there is some chance that future discoveries will allow us to do things that are currently considered impossible. Not only computationally impossible but also physically impossible (like it was "physically impossible" to slow down time, until we discovered relativity).
The hypothetical future capabilities may or may not include ways to retrieve arbitrary information from the distant past (like the chronoscope of science fiction), and may or may not include ways to do astronomical-scale calculations in finite time (like enumerating 10^10^10 possible minds).
While I agree with you that much of the described speculations are currently not in the realm of possibility, I think it's worth exploring them. Perhaps there is a chance.
BTW, I added to the comment with the story that the story is released into the Public Domain, without any restrictions to its distribution, modification etc.
Please feel free to build upon this remarkable story, if you wish.
I would suggest to try the jumping boy story (the #7 in this comment). It's the first AI-written story I've ever encountered that feels like it's written by a competent writer.
As usual, it contains some overused phrasings, but the overall quality is surprisingly good.
My primary research work is in the field of sideloading itself. The digital guy helps with these tasks:
I expect a much more interesting list in the field of alignment research, including quite practical things (e.g. a team of digital Eliezers interrogating each checkpoint during training, to reduce the risk of catastrophic surprises). Of course, not a replacement for a proper alignment, but may win some time.
Judging by our experiments, Gemini 2.5. Pro is the first model that can (sometimes) simulate a particular human mind (i.e. thinking like you, not just answering in your approximate style). So, this is a partial answer to my original question: the tech is only 6 months old. Most people don't know that such a thing is possible at all, and those who do know - are only in the early stages of their experimental work.
BTW, your 2020 work investigating the ability of GPT-3 to write in the style of famous authors - made me aware of such a possibility.