no one can see it other than the people you shared the doc with
Please stop sharing google docs for comments
Instead: post the draft online, then share the link so people can comment in public.
I only share a google doc is if there's a specific person whose comments I want before posting online. But people often share these google docs in big slack channels — at that point, just post online!
I think it slows innovation.
MMLU knowledge, fit well enough, requires inventing the universe
I suspect "fit well enough" doesn't track anything in reality.
Jerry Wei writes:
We expect this to become even more of an issue as AIs increasingly use tools to do their own research rather than rely on their learned knowledge (we tried to filter this kind of data as well, but it wasn't enough assurance against misuse).
I think his critique is this:
Suppose we had a perfect filtering system, such that the dangerous knowledge has zero mutual information with the model weights:
Nonetheless, the dangerous knowledge is "accessible" to the agent via web search + tools + in-context reasoning.
To solve this problem, we need either alignment techniques (e.g. train the model not to use these affordances) or inference-time monitoring techniques (e.g. constitutional classifiers). But if we had those techniques then we don't need the pretraining filtering.
If there's >1% chance of capturing >1% of the cosmos, then the EV is cosmic
I think that SGD isn't sample efficient enough to solve continual learning
One is that many participants at ARENA have already done AI Safety research before participating. Second evidence is that at least four ARBOx (a 2-week compressed version of ARENA) are doing elite AI safety fellowships (1 Anthopic Fellows Program, 2 LASR Labs, 1 MATS).
I don't think this is evidence that ARENA is about signalling:
I mention cross-lab monitoring here:
The tech would need to be a bit messy because you'll need to send your competitor a "zero-knowledge proof" that you are using their AI only for monitoring catastrophic actions, and not for automating R&D, without leaking what your AIs are doing.
Potential setup: