Epistemic status: Naive and exploratory, reflects my primary conceptual understanding, awaiting a technical deep dive. 99% of ideas are not my own, rather distilled from the resources hyperlinked throughout.
Many alignment researchers err towards local optimization i.e. seek low-hanging fruits and leverage incremental improvements. Fast and imperfect iterative improvement is necessary but sometimes it should be integrated within a principled and holistic strategy.
I fear that many people interested in W2SG will default to testing ideas on the basis of informed heuristics, without robust inside views to justify them. While in aggregate we might converge on effective solutions, my general thesis is that W2SG needs people adopting an AI scout approach.
This introductory post and... (read 2843 more words →)