TL;DR: Can we collect real-world examples of misalignment while preserving privacy?
Naively, startups that monitor agents for unintended behavior could publish examples of misalignment they found, but I doubt almost any org would agree to this, since examples might involve the org's code or other private data.
I found this interesting because the user of Replit seems to want to disclose the problem.
I can imagine having a norm (or a regulation?) where after Replit fix whatever problem they had - they release the LLM transcripts that are relevant to the incident.
(or maybe I'm wrong and this person wouldn't want to share enough details for a full transcript)
Looking at Cursor's Privacy Policy:
We do not use Inputs or Suggestions to train our models unless: (1) they are flagged for security review (in which case we may analyze them to improve our ability to detect and enforce our Terms of Service)
But who decides what is flagged? And can Cursor do things other than train their models in other cases?
Also, this is from their TOS:
5.4. Usage Data. Anysphere may: (i) collect, analyze, and otherwise process Usage Data internally for its business purposes, including for security and analytics, to enhance the Service, and for other development and corrective purposes
I'm not sure if this is a reliable hint for what orgs would agree to
This seems like the highest quality source of data for "what do LLMs want" I've heard of (am I missing others?), may I download and analyze it myself?
Thanks for doing this!
Fair,
Note they used GPT-3 which wasn't trained with RLHF (right?)
Hey, I think I'm the target audience of this post, but it really doesn't resonate well with me.
Here are some of my thoughts/emotions about it:
5. Was worrying about covid also similarly misguided?
At the beginning of covid, people said "stop making a big deal about it", "lots of people freaked out about things in the past and were wrong", I also live close to a city with a lot of (religious) anti-vax people, and I visited a (hippie) place where I told someone when I was last tested and what I did since, but she told me she sees I look good and I better not think about it, I'll be healthier if I don't think about covid too much.
Should I have listened to the people who told me to chill about covid? I assume you agree I shouldn't have, beyond (I agree) no reason to emotionally freak out, just take reasonable actions, that's all.
What's the difference between the AI and covid scenarios? I assume you'd say "covid was real" (?). Anyway, that's my crux. Is AI danger "real" (vaguely defined).
What if the covid deniers would say "if you're so sure about covid, tell us exactly in what week (or whatever) everyone will die" - this seems like a hard/unfair question. I personally think people spend way too much energy geeking out about AI timelines, and after (metaphorically) a million requests for timelines, a serious team wrote ai-2027, which is a probability distribution, where they think there's a 50% chance problems will happen earlier, and 50% they'll happen later (right?). Telling me to chill at 2028 seems similar to asking for an estimate for when covid problems will actually hit, getting a median probability distribution, and telling me to "chill" if that median passed (while still things seem to be getting worse for similar reasons?). Yeah there's a time where if AI doesn't seem to be taking over then I'll eat my hat and say I am completely missing something about how the world works, but that time is not 2028.
6. Fallacy Fallacy
I often think back on The Adventures Of Fallacy Man.
Both "sides" have fallacies. Maybe doomers want a way to deal with our mortality. Maybe non-doomers have a bias against imagining changes or whatever. I don't think this kind of conversation is a good way to figure out what is true. I think AI is a complicated topic that makes it clear how problematic many rules-of-thumb or metaphors are and encourages us to look at the details, or at least to pick our mental shortcuts wisely.
Would you leave more anonymous feedback for people who ask for it if there was a product that did things like:
I'm mostly interested to hear from people who consider leaving feedback and sometimes don't, I think it would be cool if we could make progress on solving whatever painpoint you have (not sure we'll succeed, but worth trying).
If you prefer to reply anonymously (very meta!) you can say so here. I'll assume it's ok to share responses as comments to this shortform unless you ask me not to.
Idea credit: cwbakerlee
So FYI, the emoji I posted above was actually AI generated.
The real one looks like this (on my browser) :
I still think the original one looks a lot like a paperclip, but I cheated and exaggerated this in the April Fool's version
Seems like Unicode officially added a "person being paperclipped" emoji:
Here's how it looks in your browser: 🙂↕️
Whether they did this as a joke or to raise awareness of AI risk, I like it!
I asked Claude how relevant this is to protecting something like a H100, here are the parts that seem most relevant from my limited understanding:
1. Reading (not modifying) data from antifuse memory in a Raspberry Pi RP2350 microcontroller
2. Using Focused Ion Beam (FIB) and passive voltage contrast to extract information
I recently read through the entire pdf - big thumbs up from me!
A few questions / requests:
Big thumbs up again!