Mordechai Rorvig — LessWrong

LESSWRONG
LW

What did we learn from the AI Village in 2025?

I have found AI Village (and the updates from it) a pretty helpful source of insight.

Although, to be clear, my feeling about whole situation with agents is that it is fairly disturbing, and it is playing with fire. But if the reality is that these things are going to be rolled out like this—and obviously, they are—then we do need open testbeds like this to see what's happening.

This was the high-quality ****book before ****book hit the AI news echo chamber this past week. Although, to be fair, I guess that experiment demonstrated the more high-population, message board-focused variant of a similar setup.

Thoughts on Toby Ords' AI Scaling Series

Mordechai Rorvig5d30

Thanks for the post, interesting.

Would anyone who downvoted this care to give more thoughts? I'm not knowledgeable enough here to know what the critiques might be.

Leading models take chilling tradeoffs in realistic scenarios, new research finds

Mordechai Rorvig2mo10

I see, thanks for the feedback. That's valid. I'm trying to figure out how to build this website and actually get it useful for people, and right now that involves some tinkering with things like setting breakpoints or cutoffs on the summaries, for trying to encourage subscriptions—to help get the word out more easily, etc.

I've perhaps erred there with where I set the breakpoints. Let me know if you have any feedback or thoughts on how you'd prefer it to be set up; would be much appreciated.

Leading models take chilling tradeoffs in realistic scenarios, new research finds

Mordechai Rorvig2mo10

Friendly question, do you think the title seemed like clickbait? Perhaps I erred with that. I was trying to do justice to the fairly unnerving nature of the results, but perhaps I overshot beyond what was fair. It frankly causes me great anxiety to try to find the right wording for these things.

Unless its governance changes, Anthropic is untrustworthy

Mordechai Rorvig2mo103

I just want to chime in here as someone who just posted an article, today, that covers interpretability research, primarily by academic researchers, but with Anthropic researchers also playing a key contributor to the story. (I had no idea these posts would come out on the same day.)

I just want to say that I very much appreciate and endorse this kind of post, and I think Anthropic employees should too; and I'm guessing that many of them do. It may be a trite cliche, but it's simply true; with great power comes great responsibility, and there are a lot of reasons to question what the company Anthropic (and other large AI companies) are doing.

As a science journalist, I also have to say that I especially endorse questioning people who would describe themselves as journalists—including myself—on their roles in such matters. The whole point of labelling yourself as a journalist is to try to clarify the principled nature of your work, and it is very unclear to me how anyone can sustain those principles in certain contexts, like working at Anthropic.

That said, generally speaking, I also want to note something of my personal views, which is that I see ethics as being extremely complicated; it's just simply true that we humans live in a space of actions that is often deeply ethically flawed and contradictory. And I believe we need to make space for these contradictions (within reason ... which we should all be trying to figure out, together), and there's really no other way of going through things. But I think fair efforts to hold people and organizations accountable should almost universally tend to be welcomed and encouraged, not discouraged.

Plans to build AGI with nuclear reactor-like safety lack 'systematic thinking,' say researchers

Mordechai Rorvig3mo50

Thank you for the feedback. I feel that that is a valid criticism. I will think about this in future articles on the topic. This was my first foray into thinking seriously about defense in depth for powerful AI design, and looking at recent research on the topic. The research is pretty marginal, and there was not much to go on.

Publishing academic papers on transformative AI is a nightmare

Mordechai Rorvig3mo82

This is a very interesting personal account, thanks for sharing this. I would imagine and be curious about whether this kind of issue crops up with any number of economics research topics, like research around environmental impacts, unethical technologies more generally, excessive (and/or outright corrupt) military spending, and so on.

There are perhaps (good-faith) questions to be asked about the funding sources and political persuasions of the editors of these journals, or the journal businesses themselves, and why they might be incentivized to stay clear of such topics. Of course, we are actively seeing a chill in the US right now on research into many other areas of social science. One can imagine how you might be seeing something related.

So, I do imagine things like the psychological phenomenon of denial of mortality might be at play here, and that's an interesting insight. But I would also guess there to be many other phenomena, as well, and frankly of a more unsavory nature.

Social media feeds 'misaligned' when viewed through AI safety framework, show researchers

Mordechai Rorvig3mo32

Ok I see. So, in the context of my question (which I'm not exactly sure if that's what you're speaking to, or just speaking more generally), you see misalignment to broad human values as indeed being misalignment, just not a misalignment that is unexpected.

Social media feeds 'misaligned' when viewed through AI safety framework, show researchers

Mordechai Rorvig3mo20

One discussion question I'd be interested in hearing from people about, which has to do with how I used the word 'misalignment' in the headline:

Do people think that companies like Twitter/X/xAI who don't (seemingly) align their tools to broader human values are indeed creating tools that exhibit 'misalignment'; or are these tools seen not as 'misaligned,' but as only aligned to their own motives (e.g., profit), which is to be expected? In other words, or relatedly, how should we be thinking about the alignment framework, especially in its historical context—as a program that was perhaps overly idealistic or optimistic about what companies would do to make AI generally safe and beneficial, or as a program that is and was always meant to only be about making AI aligned with its corporate controllers?

I imagine the framing of this question, itself, might be objected to in various ways—just dashed this out.

Can someone, anyone, make superintelligence a more concrete concept?

Mordechai Rorvig1y10

Thanks - didn't see his remarks about this, specifically. I'll try to look them up, thanks.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments