The AI may be able to use introspection to help notice some potential problems within itself, but for most of the important distribution shifts it’s in the same position as the AI researcher and is also speculating about consequences of the coming distribution shifts.
There’s a weaker version, where for example the AI has a moderately strong desire to always be truthful, but otherwise ultimately would prefer something other than helping the AI developers. The AI won’t particularly try to find “flaws” in itself, but if asked it’ll tell the truth about anything it has noticed. The humans don’t know how far to trust it, but it seems trustworthy to the limited extent to which they can test for that. In this version, there’s more responsibility resting on the humans, who have to take advantage of this apparent honesty to extract research and understanding to work out how they should iterate.
I feel like I'm missing something about these paragraphs.
It seems like corrigibility helps you the most when you're starting to enter the distribution shift (so it's not just speculating about future problems). At that point the AI can notice things that are not what you intended and proactively alert you (so it's not just implementing the weak form of corrigibility).
Is the claim that there's a dilemma between two options?
Either 1) the AI is only speculating about future problems and so has limited ability to detect those problems or 2) the AI is already in the sway of those problems and so will not be motivated to help fix them?
Why is there no middle ground, where the AI is encountering problems and proactively flagging them?
We get to test AI systems while they have little control over the situation, have done relatively little online learning, and haven’t thought about how to improve themselves. As we use the AGI to help us do research, and help us design improvements, these things gradually stop being true, bit by bit. A storm in 10 years is analogous to the situation in an AI lab after an AGI has helped run 10 large scale research projects after it was first created, where it has designed and implemented improvements to itself (approved by engineers), noticed mistakes that it commonly makes and learned to avoid them, thought deeply about why it’s doing what it’s doing and potential improvements to its situation, and learned a lot from its experiments. The difference between these two situations is the distribution shift.
I appreciate how concrete you're being about what you're thinking about when you think about distributional shift.
If "as much intelligence as humans have" is the normal amount of intelligence, then more intelligence than that would (logically speaking) be "super". Right?
No. Superintelligence, to my knowledge, never referred to "above average intelligence", or I would be an example of a superintelligence. It referred to a machine that was (at least) broadly smarter than the most capable humans.
I'm sure you can find a quote in Bostrom's Superintelligence to that effect.
His old okcupid page (sorry, it looks like the wayback machine doesn't have it an accessible copy of http://www.okcupid.com/profile/Eyudkowsky) said that he was into orgasm denial, but only if his partner is also into it.
This is a great point. Thanks for making it.
I found both of these posts helpful for me, despite being ~10 years older than you guys. Reading how people are engaging with the situation emotionally, somehow supports my own emotional engagement with what's going on.
Flexible. When an S-process round starts, there's an estimate about how much will be allocated in total, but funders (usually Jaan, sometimes others), might ultimately decide to give more or less, depending on both the quality of the applications and the quality of the analysis by the recommenders in the S-process.
I also had this initial misreading, but I think I figured out what you meant on my first reread.
Thanks, fixed.
But there often is a long period between when you stop endorsing the habit and when you've finally trained yourself to do something different. (If ever. Behavior-change is famously hard for humans.)
Also, I'll note that religious deconversions very often happen in stages, including stages that involve narrow realizations that you were mistaken, and looming suspicions that you're going to change your mind. The whole edifice doesn't usually collapse in a single moment. It's a process. (This interview covers a good example.)
Notably, it seems unhealthy if every time a person gets an inkling that maybe christianity is false, they dutifully go to their pastor and get freshly brainwashed to patch those specific objections. It's unhealthy because Christianity is false and adult humans should grow out of it in time.
It's unclear to me how strongly we can or should draw the analogy between changes in belief and changes in motivation, since one has a right answer and the other (presumably) doesn't.
Yeah, but putting your attention on the right things often does take a lot of compute.