I am an AI Policy Fellow at IAPS, where I am researching AI integrity and compute governance. Previously, I was a GovAI summer fellow, participant in ARENA 5.0, hardware security research assistant through the SPAR program, and security engineer at a hedge fund. I recently graduated from Columbia University in December 2024, where I studied computer science.
My academic interests lie in post-AGI governance, hardware & software security, (meta-)ethics, physics, and cognitive science. My primary goal is improve the welfare of sentient beings, and I think one of the best ways to secure a flourishing future is by ensuring that the transition to a world with transformative AI goes well. I’m also interested in issues like wild animal welfare and factory farming.
I have signed no contracts or agreements whose existence I cannot mention.
Great post! It's been almost a year since this was posted so I was curious if anyone has worked on these questions:
- Do you get any weird results from the pre-training data not being IID? Does this compromise capabilities in practice? Or does it lead to increased capabilities because the model cannot lean as much on memorization when it’s constantly getting trained on a previously-unseen future?
What if you want to run multiple epochs?[21] Then you have a conflict between wanting to fully update on the old data before you see new data vs. wanting to maximally spread out the points in time at which you repeat training data. How severe is this conflict? Are there any clever methods that could reduce it?
I did a quick lit review and didn't find much. Here's what I did find (not perfectly related to the above questions, though).
So, has anyone pursued the two quoted questions above? Super curious if anyone has good results!
Semi-cooperation is one way for both sides to learn from each other—but so is poor infosec or even outright espionage. If both countries are leaking or spying enough, that might create a kind of uneasy balance (and transparency), even without formal agreements. It’s not exactly stable, but it could prevent either side from gaining a decisive lead.
In fact, sufficiently bad infosec might even make certain forms of cooperation and mutual verification easier. For instance, if both countries are considering setting up trusted data centers to make verifiable claims about AGI development, the fact that espionage already permeates much of the AI supply chain could paradoxically lower the bar for trust. In a world where perfect secrecy is already compromised, agreeing to “good enough” transparency might become more feasible.
Thanks for the comment. Strong upvoted!
I agree that the quotations described as "backwards" are not necessarily wrong given the two possible (and reasonable) interpretations of the RLHF procedure. Thanks for flagging this subtlety; I had not thought of it before. I will update the body of the post to reflect this subtlety.
Meta point: I'm so grateful for the LessWrong community. This is my first post and first comment, and I find it so wild that I'm part of a community where people like you write such insightful comments. It's very inspiring :)
Relatedly, if you turn off watch history on youtube, the entire recommendation algorithm gets disabled. This means you can't access youtube shorts or recommended videos. Turning off watch history single-handedly fixed my youtube addiction (specifically, I no longer doomscroll on youtube)!