LESSWRONG
LW

Dave Banerjee
35130
Message
Dialogue
Subscribe

I am a summer fellow at Centre for the Governance of AI, where I research compute governance. Previously, I was a participant in ARENA 5.0, hardware security research assistant through the SPAR program, and security engineer at a hedge fund. I recently graduated from Columbia University in December 2024, where I studied computer science.

My academic interests lie in AI safety & governance, hardware & software security, (meta-)ethics, physics, and cognitive science. My primary goal is improve the welfare of sentient beings, and I think one of the best ways to secure a flourishing future is by ensuring that the transition to a world with transformative AI goes well. I’m also interested in issues like wild animal welfare and factory farming.

I have signed no contracts or agreements whose existence I cannot mention.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
What's important in "AI for epistemics"?
Dave Banerjee2mo90

Great post! It's been almost a year since this was posted so I was curious if anyone has worked on these questions:

  • Do you get any weird results from the pre-training data not being IID? Does this compromise capabilities in practice? Or does it lead to increased capabilities because the model cannot lean as much on memorization when it’s constantly getting trained on a previously-unseen future?
  • What if you want to run multiple epochs?[21] Then you have a conflict between wanting to fully update on the old data before you see new data vs. wanting to maximally spread out the points in time at which you repeat training data. How severe is this conflict? Are there any clever methods that could reduce it?

I did a quick lit review and didn't find much. Here's what I did find (not perfectly related to the above questions, though).

  • This GitHub issue explored whether training data order affects memorization. They attempted to prompt an LLM with the first 20 tokens of each document in its training set and plot the number of subsequent correct reproduced tokens against the position of the document in the training set. They did not find a statistically significant relationship.
  • This paper tried to train chronologically consistent LLMs, while mitigating future training data leakage. Their models performed performed relatively the same as normal LLMs. However, it's not clear to me how well they filtered their data. The only experiment they ran to "prove" that their training data wasn't contaminated with future events was to predict future presidents. They found that their models checkpointed from 1999 to 2024 were always unable to predict the correct future president. This is not strong enough evidence IMO.

So, has anyone pursued the two quoted questions above? Super curious if anyone has good results!

Reply
Contain and verify: The endgame of US-China AI competition
Dave Banerjee2mo20

Semi-cooperation is one way for both sides to learn from each other—but so is poor infosec or even outright espionage. If both countries are leaking or spying enough, that might create a kind of uneasy balance (and transparency), even without formal agreements. It’s not exactly stable, but it could prevent either side from gaining a decisive lead.

In fact, sufficiently bad infosec might even make certain forms of cooperation and mutual verification easier. For instance, if both countries are considering setting up trusted data centers to make verifiable claims about AGI development, the fact that espionage already permeates much of the AI supply chain could paradoxically lower the bar for trust. In a world where perfect secrecy is already compromised, agreeing to “good enough” transparency might become more feasible.

Reply
Do Self-Perceived Superintelligent LLMs Exhibit Misalignment?
Dave Banerjee2mo40

Thanks for the comment. Strong upvoted!

I agree that the quotations described as "backwards" are not necessarily wrong given the two possible  (and reasonable) interpretations of the RLHF procedure. Thanks for flagging this subtlety; I had not thought of it before. I will update the body of the post to reflect this subtlety.

Meta point: I'm so grateful for the LessWrong community. This is my first post and first comment, and I find it so wild that I'm part of a community where people like you write such insightful comments. It's very inspiring :)

Reply21
25Do Self-Perceived Superintelligent LLMs Exhibit Misalignment?
2mo
2