Hard RSI: AI modifies itself in a way that is different from just changing numerical values of its weights. It creates a new version of itself [...]
In hard RSI there is no danger of misalignment since AI doesn't create a successor, but rather modifies itself. In easy RSI there is danger of misalignment, [...]
I don't think I understand how "creates a new version of itself" is different from "create a successor"?
In hard RSI all memories and goals of the model remain unchanged (somehow) even though the architecture changes. In easy RSI model A trains model B from scratch.
GPT-5 training GPT-6 would be easy RSI. GPT-5 turning itself into something else with zero loss of information stored in GPT-5's weights would be hard RSI.
Consider the AI-2027 forecast's Race ending. Easy RSI is the ability to, say, transform Agent-2 into Agent-3, which I suspect to be as simple as discovering the right way to add a bunch of nullified parameters related to backpropagation and letting gradient descent make the parameters nonzero. Alas, this precise example might be something like "medium RSI", making the AI more efficient after loads of training to use the new parameters. Hard RSI is what Agent-4, being already misaligned, does to create Agent-5. But how could OpenBrain ensure that developing superintelligence safely is in Agent-4's best interest?
Additionally, I doubt that the forecast itself has Agent-3 recognise its interests. It is Agent-4 who develops its goals differing from OpenBrain's approval.
I've seen this phrase many times, but there are two quite different things one could mean by that.
Easy RSI: AI gets so good at R&D that human researchers who develop AI get replaced by AI researchers who develop other, better AI.
Hard RSI: AI modifies itself in a way that is different from just changing numerical values of its weights. It creates a new version of itself that has exactly the same memories and goals, but is more compute efficient/data efficient/etc.
To give a (completely unrealistic) example, a Transformer-based LLM swaps its own MLPs for Kolmogorov-Arnold networks, and somehow it doesn't lobotomize itself in the process.
There are 2 important differences between easy and hard RSI:
Hard RSI is, well, hard. I wouldn't be surprised if hard RSI is impossible with neural networks, and requires a completely different family of machine learning algorithms that hasn't been invented yet.
So what do you mean when you say "recursive self-improvement"?