Dave Orr — LessWrong

LESSWRONG
LW

I'm also reminded of the dispossessed, in which le guin describes two worlds, a capitalist society in which people are rich but have curtailed freedoms and some authoritarian aspects, and a kind of socialist anarchy where people are very poor, claim to be free, and are constrained by culture and custom rather than law.

I find that people who come into the book with a strong prior on capitalism being good or bad will also end up with a clear view on which "utopia" is better. The book itself is probably a critique of the idea that utopia is even possible and whether it's a coherent concept at all.

AI #131 Part 2: Various Misaligned Things

Dave Orr1mo20

Which makes it rather strange to choose to sell worse, and thus less expensive and less profitable, chips to China rather than instead making better chips to sell to the West.

I think what might be going on here is that different fabs have separate capacities. You can't make more H200s because they have to be made on the new and fancy fabs, but you can make H20s in an older facility. So if you want to sell more chips, and you're supply limited on the H200s, then the only thing you can do is make crappier chips and figure out where to sell them.

come work on dangerous capability mitigations at Anthropic

Dave Orr2mo50

We certainly plan to!

come work on dangerous capability mitigations at Anthropic

Dave Orr2mo8-2

I 100% endorse working on alignment and agree that it's super important.

We do think that misuse mitigations at Anthropic can help improve things generally though race-to-the-top dynamics, and I can attest that while at GDM I was meaningfully influenced by things that Anthropic did.

come work on dangerous capability mitigations at Anthropic

Dave Orr2mo74

I'm new to Anthropic myself, leading the Safeguards team. I joined a few weeks ago, inspired by the mission and the opportunity. I'm really worried about the world as AI continues to get more powerful, and I couldn't pass up the chance to help if I could.

I was previously at GDM working on similar problems (miss you all!), but the chance to help drive the safety agenda at Anthropic as we transition to a new scarier world felt too important to miss.

So far everything at Anthropic except my new commute is amazing, but most of all the feeling of the mission is intense and awesome. Also the level of transparency inside the company is astounding for a company this size (not that it's big compared to many others).

Obviously there could be some honeymoon effect here but honestly I'm having a lot of fun, and I honestly think Safeguards (along with Alignment Science) makes a real different in safety for the world.

Should you start a for-profit AI safety org?

Dave Orr2mo20

It depends on what you're trying to do, right? Like, if you build a great eval to detect agent autonomy, but nobody adopts it, you haven't accomplished anything. You need to know how to work with AI labs. In that case, selling widgets (your eval) is highly aligned with AI safety.

IME there are an extremely large number of NGOs with passionate people who do not remotely move the needle on whatever problem they are trying to solve. I think it's the modal outcome for a new nonprofit. I'm not 100% sure that the feedback loop aspect is the reason but I think it plays a very substantial role.

Should you start a for-profit AI safety org?

Dave Orr2mo42

I agree the incentives matter, and point in the direction you indicate.

However, there's another effect pointing in the opposite direction that also matters and can be as large or bigger: feedback loops.

It's very easy in the nonprofit space to end up doing stuff that doesn't impact the real world. You do things that you hope matter and that sound good to funders, but measurement is hard and funding cycles are annual so feedback is rare.

In contrast, if you have a business, you get rapid feedback from customers, and know immediately if you're getting traction. You can iterate rapidly and quickly get better at something because of rapid feedback.

So in addition to thinking about abstract incentives, think about what kind of product feedback you will get and how important that is. For policy work, maybe it's not that important. For things that look more like auditing, testing, etc where your effectiveness is in significant part transactional, think hard about being for profit.

Source: long engagement with NGO sector on boards and funders, work at private companies.

How AI researchers define AI sentience? Participate in the poll

Dave Orr3mo20

I don't think behavioral is enough -- I think LLMs have basically passed the Turing test anyway.

But I also don't see why it would need to have our specific brain structure either. Surely experiences are possible with things besides the mammal brain. However, if something did have similar brain structure to us, that would probably be sufficient. (It certainly is for other people, and I think most of us think that e.g. higher mammals have experiences.)

What I think we need is some kind of story about why what we have gives rise to experience, and then we can see if AIs have some similar pathway. Unfortunately this is very hard because we have no idea why what we have gives rise to experience (afaik).

Until we have that I think we just have to be very uncertain about what is going on.

How AI researchers define AI sentience? Participate in the poll

Dave Orr3mo30

We think humans are sentient because of two factors: first, we have internal experience that means we ourselves are sentient; and two, we rely on testimony from others who say they are sentient. We can rely on the latter because people seem similar. I feel sentient and say I am. You are similar to me and say you are. Probably you are sentient.

With AI, this breaks down because they aren't very similar to us in terms of cognition, brain architecture, or "life" "experience". So unfortunately AI saying they are sentient does not produce the same kind of evidence as it does for people.

This suggests that any test should try to establish relevant similarity between AIs and humans, or else use an objective definition of what it means to experience something. Given that the latter does not exist, perhaps the former will be more useful.

The best simple argument for Pausing AI?

Dave Orr3mo20

For that specific example, I would not call it safety critical in the sense that you shouldn't use an unreliable source. Intel involves lots of noisy and untrustworthy data, and indeed the job is making sense out of lots of conflicting and noisy signals. It doesn't strike me that adding an LLM to the mix changes things all that much. It's useful, it adds signal (presumably), but also is wrong sometimes -- this is just what all the inputs are for an analyst.

Where I would say it crosses a line is if there isn't a human analyst. If an LLM analyst was directly providing recommendations for actions that weren't vetted by a human, yikes that seems super bad and we're not ready for that. But I would be quite surprised if that were happening right now.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments