In Jack Clark's Import AI 439, he references a new paper Universally Converging Representations of Matter Across Scientific Foundation Models
> Do AI systems end up finding similar ways to represent the world to themselves? Yes, as they get smarter and more capable, they arrive at a common set of ways of representing the world.
> The latest evidence for this is research from MIT which shows that this is true for scientific models and the modalities they’re trained on: “representations learned by nearly sixty scientific models, spanning string-, graph-, 3D atomistic, and protein-based modalities, are highly aligned across a wide range of chemical systems,” they write. “Models trained on different datasets have highly similar representations of small molecules, and machine learning interatomic potentials converge in representation space as they improve in performance, suggesting that foundation models learn a common underlying representation of physical reality.”
> As with other studies of representation, they found that as you scale the data and compute models are trained on, “their representations converge further”.
This seems like useful empirical evidence for the Natural Abstraction Hypothesis (I haven't been following progress on that research agenda, so I don't know how significant of an update this is)
Thank you for the comment Saul - I agree with a lot of your points, in particular that "explosive" periods are costly and inefficient (relative perhaps to some ideal), and that they are not in and of themselves a solution for long-term retention.
I expect if we have a crux it's whether someone who intends to follow an incremental path vs someone who does an intense acquisition period is more likely to, ~ a year later, actually have the skill. And my guess is, for a number of reasons, it's the later; I'd expect a lot of incrementalists to 'just not actually do the thing'.
* My ideal strategy would be "explore lightly a number of things, to determine what you want -> explode towards that for an intense period of time -> establish incremental practices to maintain and improve"
* Your comment also highlighted for me, something that I had cut from the initial draft, my belief that explosive periods help overcome emotional blockers, which I think might be a big part of why people shy away from skills they say they want.
No worries, I appreciate the perspective. I agree that for many skills there is a consolidation and rest period that is needed. An obvious example is that you can't cram all of the effort needed to build muscle into one week and expect the same kinds of returns that you would get over many months. Though, I do expect you could master the biomechanical skills of weightlifting much faster with that attitude!
If you have examples of the multidimensional learning schedule, I'd love to hear them. I'm imagining something like {30 minutes of spanish language shows}?
thanks for sharing a data point on that claim
Thank you for the post! I have also been (very pleasantly!) surprised by how aligned current models seem to be.
I'm curious if you expect 'alignment by default' to continue to hold in a regime where continuous learning is solved and models are constantly updating themselves/being updated by what they encounter in the world?
Chain of Thought not producing evidence of scheming or instrumental convergence does seem like evidence against, but it seems quite weak to me as a proxy for what to expect from 'true agents'. CoT doesn't run long enough or have the type of flexibility I'd expect to see in an agent that was actually learning over long time horizons, which would give it the affordance to contemplate the many ways it could accomplish its goals.
And, while just speculation, I imagine that the kind of training procedures we're doing now to instill alignment will not be possible with Continuous Learning, or we'll have to pay a heavy alignment tax to do that for these agents. Note: Jan's recent tweet on his impression that it is quite hard to align large models and it doesn't fall out for free from size.
Thank you! I had not read this Kevin Simler essay but I quite like it, and it does match my perspective.
I did! and I in fact have read - well some of :) - the whitepaper. But it still seems weird that it's not possible to Increase the Trust in the third party through financial means, dramatic PR stunts (auditor promises to commit sepuku if they are found to have lied)
source needed, but I recall someone on the community notes team saying it was very similar but there are some small differences between prod and the open source version (it's difficult to maintain exact compatibility). For the point of the comment and context I agree open source does a good job of this, though given the number of people on twitter who still allege its being manipulated, I think you need some additional juice (a whistleblower prize?)
Why so few third party auditors of algorithms? for instance, you could have an auditing agency make specific assertions about what the twitter algorithm is doing, whether the community notes is 'rigged'
If Elon wanted to spend a couple hundred thousand on insanely commited high integrity auditors, it'd be a great experiment
Congratulations to Anna and the team for cohering around a vision and set of experiments. I donated to the new CFAR; I hope you continue posting about what you learn through the upcoming workshops.
One {note? suggestion? “real spirit” discussion point?} - I feel like the framing of aCFAR was missing something important about the state of rationality today. Namely from this, the year 2026 onward, being more rational is unlikely to be a “human technique” only affair. It will look more like cyborgs and centaurs - humans using AI tools and agents in different configurations and ways to make better decisions.
I won't belabor how good the AIs have gotten, and instead will just note that they are effective aids for rationalist techniques:
I appreciate that it's a Center for Applied Rationality, and maybe this particular center doesn't need to think about the cyborg angle and can just focus on developing better models of “who-ness”. Maybe a different center should!
But it seems valuable to consider, to the extent you want to push forward the frontier of rationality. I suspect there's some connection between the moments when AI meaningfully aids my real thinking, the moments when I'm doing slop-ful fake thinking where the AI is aiding my delusions, and the concept you're defining as "who-ness." Who-ness seems adjacent to taste, which might matter a lot for steering AI fleets towards goodness and meaningful concepts. And it’s probably the case that the general rationality techniques you’re imparting and working on with attendees could be more effective with AI assistance.