(I remember a particular exchange where Eliezer got so frustrated at my inability to do some basic Bayesian reasoning that he ended up writing a whole guide to Bayes' Theorem).
Wow. This made me laugh out loud.
Thank you for your service to the intellectual commons!
To test whether Drake’s circumvention of his short-term memory loss worked via the intended mechanism, I could ask my girlfriend in advance to prompt me once — and only once — to complete the long-term memory scene that I had been practicing. Then I could see if I have a memory of the scene after I fully regain my memory.
You should have her decide (and write down) what to encode in advance, so that you can check later not just if you remembered something, but if you successfully encoded it in such a way that you communicate what you intend to communicate to yourself.
(Since Drake managed to send a memory, but was only guessing about what it was intended to mean.)
Ziz is...exceptionally (and probably often uncomfortably) aware of the way people's minds work in a psychoanalytic sense.
What do you mean by this? Like, she's better than average at predicting people's behavior in various circumstances?
Back in January, I participated in a workshop in which the attendees mapped out how they expect AGI development and deployment to go. The idea was to start by writing out what seemed most likely to happen this year, and then condition on that, to forecast what seems most likely to happen in the next year, and so on, until you reach either human disempowerment or an end of the acute risk period.
This post was my attempt at the time.
I spent maybe 5 hours on this, and there's lots of room for additional improvement. This is not a confident statement of how I think things are most likely to play out. There are already some ways in which I think this projection is wrong. (I think it's too fast, for instance). But nevertheless I'm posting it now, with only a few edits and elaborations, since I'm probably not going to do a full rewrite soon.
2024
2025
2026
2027 and 2028
2028
2029
2030
the fundamental laws governing how AI training processes work are not "thinking back"
As a commentary from an observer: this is distinct from the proposition "the minds created with those laws are not thinking back."
Near human AGI need not transition to ASI until the relevant notKillEveryone problems have been solved.
How much is this central to your story of how things go well?
I agree that humanity could do this (or at least it could if it had it's shit together), and I think it's a good target to aim for that buys us sizable successes probability. But I don't think it's what's going to happen by default.
This seems clearly false in the case of deep learning, where progress on instilling any particular behavioral tendencies in models roughly follows the amount of available data that demonstrate said behavioral tendency. It's thus vastly easier to align models to goals where we have many examples of people executing said goals. As it so happens, we have roughly zero examples of people performing the "duplicate this strawberry" task, but many more examples of e.g., humans acting in accordance with human values, ML / alignment research papers, chatbots acting as helpful, honest and harmless assistants, people providing oversight to AI models, etc. See also: this discussion. [emphasis mine]
The thing that makes powerful AI powerful is that it can figure out how to do things that we don't know how to do yet, and therefore don't have examples of. The key question for aligning superintelligences is "how do they generalize in new domains that are beyond what humans were able to do / reason about / imagine.
I haven't seen careful analysis of LLMs (probably because they're newer, so harder to fit a trend), but eyeballing it... Chinchilla by itself must have been a factor-of-4 compute-equivalent improvement at least.
Ok, but discovering the Chinchilla scaling laws is a one time boost to training efficiency. You should expect to repeatedly get 4x improvements because you observed that one.
Is this true? My understanding is that it's typically the well educated middle classes that start revolutions. Starving peasants are not well positioned to organized much less win, revolutions.