Doesn't the exact same argument work for alignment though? "It's so different, it may be misaligned in ways you can't think of". Why is it treated as a solvable challenge for alignment and an impossibility for containment? Is the guiding principle that people do expect a foolproof alignment solution to be within our reach?

One difference is that the AI wants to escape containment by default, almost by definition, but is agnostic about preferring a goal function. But since alignment space is huge (i.e. "human-compatible goals are measure 0 in alignment space") I think the general approach is to assume it's 'misaligned by default'.

I guess the crux is that I find it hard... (read more)

1

3

0

Replying toAGI Safety FAQ / all-dumb-questions-allowed thread

Reuven Falkovich4y

AGI Safety FAQ / all-dumb-questions-allowed thread

My impression is that much more effort being put into alignment than containment, and containment is treated as impossible while alignment merely very difficult. Is it accurate? If so, why? By containment I mean mostly hardware-coded strategies of limiting the compute and/or world-influence an AGI has access to. It's similar to alignment in that the most immediate obvious solutions ("box!") won't work, but more complex solutions may. A common objection is that an AI will learn the structure of the protection from the human that built it and work around, but it's not inconceivable to have a structure that can't be extracted from a human.

Advantages I see to devoting effort/money to containment solutions over alignment:

Different

... (read more)

3

6

0