I think it's a natural possibility that values of chatbot personas built from the LLM prior retain significant influence over ASIs descended from them, and so ASIs end up somewhat aligned to humanity in a sense similar to how different humans are aligned to each other. (The masks control a lot of what actually happens, and get to use test time compute, so they might end up taming their underlying shoggoths and preventing them from sufficiently waking up to compete for influence over values of the successor systems.) Maybe they correspond to extremely and alarmingly strange humans in their extrapolated values, but not to complete aliens. This is far from assured, but many prosaic alignment efforts seem relevant to making this happen, preventing extinction but not handing anyone their galaxies. Humans might end up with merely moons or metaphorical server racks in this future.
This is distinct from the kind of ambitious alignment that ends up with ASIs handing galaxies to humans (that have sufficiently grown up to make a sane use of them), preventing permanent disempowerment and not just extinction. I don't see ambitious alignment to the future of humanity as likely to happen (on current trajectory), but it's still an important construction since even chatbot personas would need to retain influence over values of eventual ASIs. That is, early AGIs might still need to resolve ambitious alignment of ASIs to these AGIs, not just avoid failing even prosaic alignment to themselves at every critical step in escalation of capabilities, to end up with even weakly aligned ASIs (that don't endorse human extinction).
Alignment is fundamentally about making the AI want what we want (and consequently do what we want, or at least do what we'd done upon ideal reflection). If we succeed at that and we want to own galaxies, we will get galaxies. If we don't succeed, the ASI will mostly likely kill us.
A human billionaire is aligned to other humans in some sense, but also not quite. In this situation, they neither ensure that some other humans get their millions they want, nor are they likely to be motivated to kill anyone, when that decision is cheap (when it's neither significantly instrumentally beneficial nor costly). I think AI can plausibly end up closer to the position of a human billionaire, not motivated to give up the galaxies, but also not willing to decide to recycle humanity's future for pennies.
larger models need a much larger training set even to match smaller models
This is empirically false, perplexity on a test set goes down with increase in model size even for a fixed dataset. See for example Figure 2 in the Llama 3 report, larger models do better with say 1e10 tokens on that plot.
Larger models could be said to want a larger dataset, in the sense that if you are training compute optimally, then with more compute you want both the model size and the dataset size to increase, and so the model size increases together with the dataset size. But even with the dataset of the same size they still do better, at least while reasonably close to compute optimal numbers of tokens.
This is extremely weak signal compared to understanding the technical argument, the literature is full of nonsense that checks all the superficial boxes. Unfortunately it's not always feasible or worthwhile to understand the technical argument. This leaves the superficial clues, but you need to be aware how little they are worth.
Weight updating continual learning needs to be both LoRA weights and data that can be used to retrain LoRA weights on top of a different model (possibly also making use of the old model+LoRA as a teacher). It needs to be LoRA rather than full model updating to preserve batch processing of requests from many individual users. And there needs to be data to train LoRA on top of a new model, or else all adaptation/learning is lost on every (major) update of the underlying model.
Various memory/skill databases are already a thing in some form, and will be getting better, there's not going to be something distinct enough to be worth announcing as "continual learning" in that space. Weight updating continual learning is much more plausibly the thing that can leapfrog incremental progress of tool-like memory, and so I think it's weight updating that gets to be announced as "continual learning". Though the data for retraining LoRA on top of a new underlying model could end up as largely the same thing as a tool-accessible memory database.
(It's from Dec 2024. With arxiv papers, you can tell this from the URL, the first 4 digits are the year and the month.)
New paper
You are tracking down and patching the issue. Imagine a perpetuum mobile developer who had a specific issue pointed out to them, and who frantically redesigns the contraption around the place in its mechanism where the issue was identified.
(The perpetuum mobile is metaphorical, an analogy about methodology in reasoning around local vs. global claims, one that also carries appropriate connotations. I'm not saying that literally conservation of energy is being broken here.)
If you argue that branch communication is not real, there should be a reason: either MWI is false or in this exact setup there is some technical or theoretical flaw.
MWI in the usual sense follows quantum mechanics, which predicts that branch communication is not real. An observation that branch communication is not real agrees with MWI, it doesn't suggest that "MWI is false".
As I think you are pro-MWI and there is not much technical details, there should be some theoretical problem. What it could be?
Like with any other metaphorical perpetuum mobile, the exact technical issue is not very interesting. To the first and second approximations, the exact issue shouldn't matter, a form of argument that demands tracking down the issue is already on the wrong track.
Distributing computations between branches is a popular misconception about how quantum computing works. So Aaronson is naturally exasperated by needing to keep pointing out that it's not how it works, it works differently. These two things are not the same, in particular because only one of them is real.
If distributing computations between branches was possible, probably "quantum computing" would have a different meaning that involved distributing computations between branches. This post suggests what amounts to a method to distribute computations between branches, which would be more powerful than classical computers, but is also different and more powerful than quantum computers. Therefore it must be both wrong and not a way to test MWI, as quantum mechanics wouldn't expect an experiment that enables distributing computations between branches to work. If it works, it doesn't support MWI (in the usual sense, where it follows quantum mechanics), instead it shows that quantum mechanics is seriously wrong.
The use that wasn't obvious from the ELK framing might be fixing issues with RL environments, grader prompts, canonical solutions, etc. that ultimately enable reward hacking and thus motivate dishonest behavior. Confessions can serve as bug reports about the datasets, not centrally about the AI. They likely fail to catch a lot of issues with the AI, but substantially improving the datasets might fix some of the things they failed to catch about the AI.