LESSWRONG
LW

59
Michael Ripa
26Ω4130
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Where are the AI safety replications?
Answer by Michael RipaJul 27, 202550

For Interpretability research, something being worked on right now are a set of tutorials which replicates results from recent papers in NNsight: https://nnsight.net/applied_tutorials/

What I find cool about this particular effort is that because the implementations are done with NNsight, it both makes it easier to adapt experiments to new models, and you can run the experiments remotely.

 

(Disclaimer - I work on the NDIF/NNsight project, though not on this initiative, so take my enthusiasm with a grain of salt)

Reply
So You Think You've Awoken ChatGPT
Michael Ripa3mo70

Enjoyed reading this article, thanks for taking the time to carefully write it up!

Something I wanted to flag - I'm not totally convinced that people have a good calibration to identifying AI writing from human writing, at least without any helpful priors, such as the person's normal writing style. I haven't formally looked into this, but am curious whether you (or anyone else) had found any strong evidence that convinced you otherwise.

A few reasons to back up my skepticism:

  • There was a calibration test for deepfake videos at the MIT museum, which showed statistics on % correct of other visitors after you made your guess. Although people were reasonably calibrated on some videos, there were a non-trivial amount which people weren't.
    • Writing seems fundamentally harder to classify IMO, hence why plagiarism isn't a solved problem.
  • I feel like it is easy to get confirmation bias on how well you identify AI generated writing given that you often are aware of the true positives (e.g. by interacting with AI), and true negatives (e.g. reading pre 2022 writing) but not as much exposure to false negatives and positives.
  • You can obfuscate AI generated text pretty easily (e.g. by removing em dashes, content summarizations etc). This is much easier if you actually understand the content you are getting it to generate, such as when you have a draft to polish up or having it flesh out something you had been thinking about previously.

I might be taking your claim a little bit out of context, as you were discussing it more in relation to having it help you with idea generation, but I still feel like this is worth raising. I agree that you might be fooling yourself that you are producing good content by using AI, but I disagree that people will definitely "sniff out" that you used AI to help.

Reply
An Opinionated Guide to Using Anki Correctly
Michael Ripa3mo61

Great post! I used Anki religiously during the first few years of my undergrad but eventually fell out of the habit, mostly because making new cards became too time-consuming. (I wish I had come across advice like this back then!)

A few anecdotes from my own experience:

  • For math, I naively created cards that covered my first-year real analysis and linear algebra lecture notes in extreme detail. I used a custom card template that supported LaTeX, and many of the cards required me to prove theorems or solve problems. Despite how tedious they were to make, I actually enjoyed them. They forced me to whiteboard solutions and gave me reasonably quick feedback.
    • While I wouldn’t recommend this approach (it was incredibly time-intensive) reviewing those cards has consistently been a uniquely rewarding experience.
    • They trigger a kind of mental time travel, vividly bringing back both the content and mindset I was in when I created them. More than anything else, they help me reconnect with a sense of intellectual curiosity and creativity that I often struggle to access otherwise.
  • One experiment I tried was adding images to the back of my cards to aid recall. For language learning, I wrote a Python script to scrape Google Images for vocabulary terms. For math and CS, I’d usually hand-pick images.
    • I have mixed feelings about how well this worked. I can still recall some of the images, but not always the questions they were tied to. Still, they sometimes help me recall the general “neighborhood” of related cards. Curious if anyone else has tried this and what their experience was like.
  • Ironically, the deck I learned the least from was my computer science one, which makes sense in hindsight. The cards were often too large and passive. Unlike the math ones, I didn’t design them to actively engage with the material.
    • Looking back, I wonder how things might’ve changed if I had created cards that asked me to implement things, maybe even with runnable code snippets on the back.

 

I think the biggest hurdle for me in getting back into Anki has been not knowing what information is actually worth the effort to memorize. Reading this made me realize that creating really small, focused cards might make that question feel a lot less “all or nothing.” I might give it another shot :)

Reply
12How model editing could help with the alignment problem
Ω
2y
Ω
1