Ah, no line number. Context:
To me it seems analogous to how there are many statements that need to be said very carefully in order to convey the intended message under high-decoupling norms, like claims about how another person's motivations or character traits affect their arguments.
Interestingly, learning a reward model for use in planning has a subtle and pernicious effect we will have to deal with in AGI systems, which AIXI sweeps under the rug: with an imperfect world or reward model, the planner effectively acts as an adversary to the reward model. The planner will try very hard to push the reward model off distribution so as to get it to move into regions where it misgeneralizes and predicts incorrect high reward.
Remix: With an imperfect world... the mind effectively acts as an adversary to the heart.
Think of a person who pursues wealth as an instrumental goal for some combination of doing good, security, comfort, and whatever else their value function ought to be rewarding ("ought" in a personal coherent extrapolated volition sense). They achieve it but then, apparently it's less uncomfortable to go on accumulating more wealth than it is to get back to the thorny question of what their value function ought to be.
Is there a more-formal statement somewhere of the theorem in Complexity theory of team games without secrets? Specifically, one that only uses terms with standard meanings in complexity theory? I find that document hard to parse.
If concreteness is helpful, take "terms with standard meanings in Complexity Theory" to be any term defined in any textbook on complexity theory.
This is awesome! A couple suggestions:
"and quickly starts to replace both Fox News and other news sources among members of all political parties." -- if this is plausible, it's not clear to me, and while I'm not a great predictor of the human race, I'm pretty damn smart. More importantly, your story doesn't need it; what it needs is just that Face News is useful and liked by a strong majority of people, like Google is today.
Murpt is a fun detail, but your story doesn't need him either. Fasecure can become dominate in government systems over a period of years, without needing to assume bad politicians. It's empirically a great system for cybersecurity, after all. It can start at the state level. Few people besides the AI safety nerds (who, I'll grant, by now have realized that nanotech stories don't seem to resonate with the general public) seem to be raising a fuss about it. Elon Musk tweets some concerns, but he's too preoccupied with mars to do much else. There are plenty of non-billionaire computer scientists who are concerned, but they look at how the careers of earlier computer scientists have fared after speaking out, in the wake of ridicule from famous academics that followed, and that triggers enough doubt in their perspective, or simply short-sighted self-interest, that they mostly keep quiet.
Would love to read more stuff like this, and would be happy to help!
I've kept fairly up to date on progress in neural nets, less so in reinforcement learning, and I certainly agree at how limited things are now.
What if protecting against the threat of ASI requires huge worldwide political/social progress? That could take generations.
Not an example of that (which I haven't tried to think of), but the scenario that concerns me the most, so far, is not that some researchers will inadvertently unleash a dangerous ASI while racing to be the first, but rather that a dangerous ASI will be unleashed during an arms race between (a) states or criminal organizations intentionally developing a dangerous ASI, and (b) researchers working on ASI-powered defences to protect us against (a).
He might be willing to talk off the record. I'll ask. Have you had Darklight on? See http://lesswrong.com/r/discussion/lw/oul/openai_makes_humanity_less_safe/dqm8
I see. I guess hadn't made the connection of attributing benefits to high-contextualizing norms. Only got as far as observing that certain conversations go better with comp lit friends than with comp sci peers. That was the only sentence that gave me a parse failure. I liked the post a lot.