So far as the slave carries out immediate work from fear of consequences they are locally aligned with the master's will.
How did you get respondents? Why are they "nationally representative"?
1/ evidence for these statements?
2/ in what sense is it profitable to throw away food or maintain empty dwellings that is distinct from "maintaining everyone else's quality of life"?
3/ if the evil is that some people's needs are not valued enough could that not be remedied by giving them money and making it profitable to meet their needs?
Is martingale different from conservation of expected evidence?
Given that in more than a third of the cases where GPT and the answer set disagreed you thought GPT was right and the answer set was wrong, did you check for cases where GPT and the answer set agreed on an answer you thought was wrong?
Astral Codex Ten: https://astralcodexten.substack.com/p/your-book-review-why-machines-will
This seems to have stopped in July 2022.
"Finally, we can test our entire pipeline by deliberately training misaligned models, and confirming that our techniques detect the worst kinds of misalignments (adversarial testing)."
In a "heatplot" or plots cf https://www.elsblog.org/the_empirical_legal_studi/2023/05/heatplots-for-correlation-coefficients-graphs.html
You could also study the distribution of correlation strengths found over the range of correlations tested, possible, seeing how it compares to what would be expected by chance.
skeptical reaction with one expression of support: https://statmodeling.stat.columbia.edu/2023/05/31/jurassic-ai-extinction/
and generally "beware the one of just one study"
In 26 models taken from volumes 21 to 25 of the journal Law and Human Behavior, the highest R-squared -proportion of VARIANCE, not variation, explained was 40% and the second highest 24%