Message

Samuel Ratnam

Message

190

Samuel Ratnam

190

I think I phrased my previous comment poorly. What I meant is that if you have developed a set of abstractions relevant to achieving your goals and I want to predict you accurately, then I also need to develop abstractions that are are relevant to achieving your goals. Given a limited representational capacity, this creates a pressure for you to develop representations similar to those of others.

Beliefs are Chosen to Serve Goals

Samuel Ratnam24d1-1

I expect one strong reason for different ASIs to develop similar abstractions regardless of goals is because they need to predict a bunch of other agents in the world (either humans or other ASIs) and so need to be able to represent the goals of other agents.

Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment

200

Cam, Puria, Kyle O’Brien, David Africa, Samuel Ratnam, andyk

Ω 595mo

TL;DR

LLMs pretrained on data about misaligned AIs themselves become less aligned. Luckily, pretraining LLMs with synthetic data about good AIs helps them become more aligned. These alignment priors persist through post-training, providing alignment-in-depth. We recommend labs pretrain for alignment, just as they do for capabilities.

Website: alignmentpretraining.ai
Us: geodesicresearch.org | x.com/geodesresearch

Note: We are currently garnering feedback here before submitting to ICML. Any suggestions here or on our Google Doc (which contains a more detailed overview of our experiments) are welcome! We will be releasing a revision on arXiv in the coming days. Folks who leave feedback will be added to the Acknowledgment section. Thank you!

Abstract

We pretrained a suite of 6.9B-parameter LLMs, varying only the content related to AI systems, and evaluated them for misalignment. When filtering the vast majority...

(Continue Reading - 2401 more words)