No worries, I appreciate the perspective. I agree that for many skills there is a consolidation and rest period that is needed. An obvious example is that you can't cram all of the effort needed to build muscle into one week and expect the same kinds of returns that you would get over many months. Though, I do expect you could master the biomechanical skills of weightlifting much faster with that attitude!
If you have examples of the multidimensional learning schedule, I'd love to hear them. I'm imagining something like {30 minutes of spanish language shows}?
thanks for sharing a data point on that claim
Thank you for the post! I have also been (very pleasantly!) surprised by how aligned current models seem to be.
I'm curious if you expect 'alignment by default' to continue to hold in a regime where continuous learning is solved and models are constantly updating themselves/being updated by what they encounter in the world?
Chain of Thought not producing evidence of scheming or instrumental convergence does seem like evidence against, but it seems quite weak to me as a proxy for what to expect from 'true agents'. CoT doesn't run long enough or have the type of flexibility I'd expect to see in an agent that was actually learning over long time horizons, which would give it the affordance to contemplate the many ways it could accomplish its goals.
And, while just speculation, I imagine that the kind of training procedures we're doing now to instill alignment will not be possible with Continuous Learning, or we'll have to pay a heavy alignment tax to do that for these agents. Note: Jan's recent tweet on his impression that it is quite hard to align large models and it doesn't fall out for free from size.
Thank you! I had not read this Kevin Simler essay but I quite like it, and it does match my perspective.
I did! and I in fact have read - well some of :) - the whitepaper. But it still seems weird that it's not possible to Increase the Trust in the third party through financial means, dramatic PR stunts (auditor promises to commit sepuku if they are found to have lied)
source needed, but I recall someone on the community notes team saying it was very similar but there are some small differences between prod and the open source version (it's difficult to maintain exact compatibility). For the point of the comment and context I agree open source does a good job of this, though given the number of people on twitter who still allege its being manipulated, I think you need some additional juice (a whistleblower prize?)
Why so few third party auditors of algorithms? for instance, you could have an auditing agency make specific assertions about what the twitter algorithm is doing, whether the community notes is 'rigged'
If Elon wanted to spend a couple hundred thousand on insanely commited high integrity auditors, it'd be a great experiment
epistemic status: thought about this for like 15 minutes + two deep research reports
a contrarian pick for underrated technology area is lie detection through brain imaging. It seems like it will become much more robust and ecologically valid through compute scaled AI techniques, and it's likely to be much better at lie detection than humans because we didn't have access to images of the internals of other peoples brains in the ancestral environment.
On the surface this seems like it would be transformative - brain scan key employees to make sure they're not leaking information! test our leaders for dark triad traits (ok that's a bit different than specific lies but still) - however there's a cynical part of me that sounds like some combo of @ozziegooen and Robin Hanson which notes we have methods now (like significantly increased surveillance and auditing) which we could use for greater trust and which we don't employ.
So perhaps this won't be used except for the most extreme natsec cases, where there are already norms of investigations and reduced privacy.
Related quicktake: https://www.lesswrong.com/posts/hhbibJGt2aQqKJLb7/shortform-1#25tKsX59yBvNH7yjD
Good points! I agree that actual prototyping is necessary to see if an idea works, and as a demo it can be far more convincing. Especially w/ the decreased cost of building web apps, leveraging them for fast demos of techniques seems valuable.
Thank you for the comment Saul - I agree with a lot of your points, in particular that "explosive" periods are costly and inefficient (relative perhaps to some ideal), and that they are not in and of themselves a solution for long-term retention.
I expect if we have a crux it's whether someone who intends to follow an incremental path vs someone who does an intense acquisition period is more likely to, ~ a year later, actually have the skill. And my guess is, for a number of reasons, it's the later; I'd expect a lot of incrementalists to 'just not actually do the thing'.
* My ideal strategy would be "explore lightly a number of things, to determine what you want -> explode towards that for an intense period of time -> establish incremental practices to maintain and improve"
* Your comment also highlighted for me, something that I had cut from the initial draft, my belief that explosive periods help overcome emotional blockers, which I think might be a big part of why people shy away from skills they say they want.