This post was super helpful introduction to some of the key points in paradigms for AI Safety. I'm not sure how these ideas/questions fit into the larger literature/community around Safety/Alignment, but is there some theoretical way to model "narrow simulation"? I.e., this is an approach that is still "modeling humans," however the agent only has a "narrow" theory of mind to humans? For example, this could be that the agent can only model social cognition for humans developmentally up to age 5, but the agent can perform physical-based tasks of unbounded c... (read more)
This post was super helpful introduction to some of the key points in paradigms for AI Safety. I'm not sure how these ideas/questions fit into the larger literature/community around Safety/Alignment, but is there some theoretical way to model "narrow simulation"? I.e., this is an approach that is still "modeling humans," however the agent only has a "narrow" theory of mind to humans? For example, this could be that the agent can only model social cognition for humans developmentally up to age 5, but the agent can perform physical-based tasks of unbounded c... (read more)