Friendly question, do you think the title seemed like clickbait? Perhaps I erred with that. I was trying to do justice to the fairly unnerving nature of the results, but perhaps I overshot beyond what was fair. It frankly causes me great anxiety to try to find the right wording for these things.
I just want to chime in here as someone who just posted an article, today, that covers interpretability research, primarily by academic researchers, but with Anthropic researchers also playing a key contributor to the story. (I had no idea these posts would come out on the same day.)
I just want to say that I very much appreciate and endorse this kind of post, and I think Anthropic employees should too; and I'm guessing that many of them do. It may be a trite cliche, but it's simply true; with great power comes great responsibility, and there are a lot of reasons to question what the company Anthropic (and other large AI companies) are doing.
As a science journalist, I also have to say that I especially endorse questioning people who would describe themselves as journalists—including myself—on their roles in such matters. The whole point of labelling yourself as a journalist is to try to clarify the principled nature of your work, and it is very unclear to me how anyone can sustain those principles in certain contexts, like working at Anthropic.
That said, generally speaking, I also want to note something of my personal views, which is that I see ethics as being extremely complicated; it's just simply true that we humans live in a space of actions that is often deeply ethically flawed and contradictory. And I believe we need to make space for these contradictions (within reason ... which we should all be trying to figure out, together), and there's really no other way of going through things. But I think fair efforts to hold people and organizations accountable should almost universally tend to be welcomed and encouraged, not discouraged.
Thank you for the feedback. I feel that that is a valid criticism. I will think about this in future articles on the topic. This was my first foray into thinking seriously about defense in depth for powerful AI design, and looking at recent research on the topic. The research is pretty marginal, and there was not much to go on.
This is a very interesting personal account, thanks for sharing this. I would imagine and be curious about whether this kind of issue crops up with any number of economics research topics, like research around environmental impacts, unethical technologies more generally, excessive (and/or outright corrupt) military spending, and so on.
There are perhaps (good-faith) questions to be asked about the funding sources and political persuasions of the editors of these journals, or the journal businesses themselves, and why they might be incentivized to stay clear of such topics. Of course, we are actively seeing a chill in the US right now on research into many other areas of social science. One can imagine how you might be seeing something related.
So, I do imagine things like the psychological phenomenon of denial of mortality might be at play here, and that's an interesting insight. But I would also guess there to be many other phenomena, as well, and frankly of a more unsavory nature.
Ok I see. So, in the context of my question (which I'm not exactly sure if that's what you're speaking to, or just speaking more generally), you see misalignment to broad human values as indeed being misalignment, just not a misalignment that is unexpected.
One discussion question I'd be interested in hearing from people about, which has to do with how I used the word 'misalignment' in the headline:
Do people think that companies like Twitter/X/xAI who don't (seemingly) align their tools to broader human values are indeed creating tools that exhibit 'misalignment'; or are these tools seen not as 'misaligned,' but as only aligned to their own motives (e.g., profit), which is to be expected? In other words, or relatedly, how should we be thinking about the alignment framework, especially in its historical context—as a program that was perhaps overly idealistic or optimistic about what companies would do to make AI generally safe and beneficial, or as a program that is and was always meant to only be about making AI aligned with its corporate controllers?
I imagine the framing of this question, itself, might be objected to in various ways—just dashed this out.
Thanks - didn't see his remarks about this, specifically. I'll try to look them up, thanks.
I can see what you mean. However, I would say that just claiming "that's not what we are trying to do" is not a strong rebuttal. For example, we would not accept such a rebuttal from a weapons company, which was seeking to make weapons technology widely available without regulation. We would say - it doesn't matter how you are trying to use the weapons, it matters how others are, with your technology.
In the long term, it does seem correct to me that the greater concern is issues around superintelligence. However, in the near term it seems the issue is we are making things that are not at all superintelligent, and that's the problem. Smart at coding and language, but coupled e.g. with a crude directive to 'make me as much money as possible,' with no advanced machinery for ethics or value judgement.
Thank you!
I see, thanks for the feedback. That's valid. I'm trying to figure out how to build this website and actually get it useful for people, and right now that involves some tinkering with things like setting breakpoints or cutoffs on the summaries, for trying to encourage subscriptions—to help get the word out more easily, etc.
I've perhaps erred there with where I set the breakpoints. Let me know if you have any feedback or thoughts on how you'd prefer it to be set up; would be much appreciated.