My understanding of Auto-GPT is that it strings together many GPT-4 requests, while notably also giving it access to memory and the internet. Empirically, this allocation of resources and looping seems promising for solving complex tasks, such as debugging the code of Auto-GPT itself. (For those interested, this paper discusses how to use looped transformers can serve as general-purpose computers).
But to my ears, that just sounds like an update of the form “GPT can do many tasks well”, not in the form of “Aligned oversight is tractable”. Put another way, Auto-GPT sounds like evidence for capabilities, not evidence for the ease of scalable oversight. The question of whether human values can be propagated up through increasingly amplified models seems separate from the ability to improve self-recursively, in the same way that capabilities-progress is distinct from alignment-progress.