LESSWRONG
LW

272
otto.barten
522231460
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
"AI Alignment" is a Dangerously Overloaded Term
otto.barten2y*90

I think it's a great idea to think about what you call goalcraft.

I see this problem as similar to the age-old problem of controlling power. I don't think ethical systems such as utilitarianism are a great place to start. Any academic ethical model is just an attempt to summarize what people actually care about in a complex world. Taking such a model and coupling that to an all-powerful ASI seems a highway to dystopia.

(Later edit: also, an academic ethical model is irreversible once implemented. Any goal which is static cannot be reversed anymore, since this will never bring the current goal closer. If an ASI is aligned to someone's (anyone's) preferences, however, the whole ASI could be turned off if they want it to, making the ASI reversible in principle. I think ASI reversibility (being able to switch it off in case we turn out not to like it) should be mandatory, and therefore we should align to human preferences, rather than an abstract philosophical framework such as utilitarianism.)

I think letting the random programmer that happened to build the ASI, or their no less random CEO or shareholders, determine what would happen to the world, is an equally terrible idea. They wouldn't need the rest of humanity for anything anymore, making the fates of >99% of us extremely uncertain, even in an abundant world.

What I would be slightly more positive about is aggregating human preferences (I think preferences is a more accurate term than the more abstract, less well defined term values). I've heard two interesting examples, there are no doubt a lot more options. The first is simple: query chatgpt. Even this relatively simple model is not terrible at aggregating human preferences. Although a host of issues remain, I think using a future, no doubt much better AI for preference aggregation is not the worst option (and a lot better than the two mentioned above). The second option is democracy. This is our time-tested method of aggregating human preferences to control power. For example, one could imagine an AI control council consisting of elected human representatives at the UN level, or perhaps a council of representative world leaders. I know there is a lot of skepticism among rationalists on how well democracy is functioning, but this is one of the very few time tested aggregation methods we have. We should not discard it lightly for something that is less tested. An alternative is some kind of unelected autocrat (e/autocrat?), but apart from this not being my personal favorite, note that (in contrast to historical autocrats), such a person would also in no way need the rest of humanity anymore, making our fates uncertain.

Although AI and democratic preference aggregation are the two options I'm least negative about, I generally think that we are not ready to control an ASI. One of the worst issues I see is negative externalities that only become clear later on. Climate change can be seen as a negative externality of the steam/petrol engine. Also, I'm not sure a democratically controlled ASI would necessarily block follow-up unaligned ASIs (assuming this is at all possible). In order to be existentially safe, I would say that we would need a system that does at least that.

I think it is very likely that ASI, even if controlled in the least bad way, will cause huge externalities leading to a dystopia, environmental disasters, etc. Therefore I agree with Nathan above: "I expect we will need to traverse multiple decades of powerful AIs of varying degrees of generality which are under human control first. Not because it will be impossible to create goal-pursuing ASI, but because we won't be sure we know how to do so safely, and it would be a dangerously hard to reverse decision to create such. Thus, there will need to be strict worldwide enforcement (with the help of narrow AI systems) preventing the rise of any ASI."

About terminology, it seems to me that what I call preference aggregation, outer alignment, and goalcraft mean similar things, as do inner alignment, aimability, and control. I'd vote for using preference aggregation and control.

Finally, I strongly disagree with calling diversity, inclusion, and equity "even more frightening" than someone who's advocating human extinction. I'm sad on a personal level that people at LW, an otherwise important source of discourse, seem to mostly support statements like this. I do not.

Reply
We’ve automated x-risk-pilling people
otto.barten1mo10

Sounds promising!

Somewhat related, there was an EA forum post recently about cost effectiveness of comms from OP. They calculated viewer minute per dollar, but I think conversions per dollar would be better. Would be interesting to compare the conversions per dollar you get with our data. Maybe good to post your approach there as a comment too?

Reply
We’ve automated x-risk-pilling people
otto.barten1mo80

Thanks for writing the post, automating xrisk-pilling people is really awesome, and more people should be trying to do it! Of course, traditional ways of automating x-pilling people are called 'books', 'media' and 'social media' and have been going strong for a while already. Still, if your chatbot works better, that would be awesome and imo should be supported and scaled!

We've done some research on xrisk comms using surveys. We defined conversion rate by asking readers the same open question before and after they consumed our intervention such as opeds or videos. The question we asked was: "List three events, in order of probability (from most to least probable), that you believe could potentially cause human extinction within the next 100 years." If people did not include AI or similar before, but did include it after our intervention, or if they raised AI's position in the top three, we counted them as converted. Conversion rates we got were typically in between 30% and 65% I think (probably decreasing over time). Paper here.

Maybe good to do the same survey for your chatbot? You can do so pretty easily with Prolific, we used n=300 and that's not horribly expensive. I'd be curious how high your conversion rates are.

Also, of course it's important how many people you can direct towards your website. Do you have a way to scale these numbers?

Keep up the good work!

Reply11
These are my reasons to worry less about loss of control over LLM-based agents
otto.barten1mo10

I agree AI intelligence is and likely will remain spiky and some spikes are above human-level (of course a calculator also spikes above human-level). But I'm as of yet not convinced that the whole LLM-based intelligence spectrum will max out above takeover-level. But I'd be open for arguments.

Reply
These are my reasons to worry less about loss of control over LLM-based agents
otto.barten1mo10

Someone or something will always be in power. If that entity decides to not allocate any resources to most humans, true, we die. But that could have happened in an AI takeover scenario as well, depending on whose values etc. would have been in there.

Reply
MAGA speakers at NatCon were mostly against AI
otto.barten2mo10

"Protectionism against AI" is a bit of an indirect way to point at not using AI for some tasks for job market reasons, but thanks for clarifying. Reducing immigration or trade won't solve AI-induced job loss, right? I do agree that countries could decide to either not use AI, or redistribute AI-generated income, with the caveat that those choosing not to use AI may be outcompeted by those who do. I guess we could, theoretically, sign treaties to not use AI for some jobs anywhere.

I think AI-generated income redistribution is more likely though, since it seems the obviously better solution.

Reply1
MAGA speakers at NatCon were mostly against AI
otto.barten2mo10

Thanks for correcting it. I still don't really get your connection between protectionism and mass unemployment. Perhaps you could make it explicit?

Reply
MAGA speakers at NatCon were mostly against AI
otto.barten2mo12

Scifi was probably fun to think about for some in the 90s but things got more serious when it became clear the singularity could kill everyone we love. Yud bit the bullet and now says we should stop AI before it kills us. Did you bite that bullet too? If so, you're not purely pro-tech anymore whether you like it or not. (Which I think shouldn't matter because pro- and anti-tech has always been a silly way to look at the world.)

Reply
MAGA speakers at NatCon were mostly against AI
otto.barten2mo10

I don't really understand your thoughts about developing vs developed countries and protectionism, could you make them more explicit?

Reply
MAGA speakers at NatCon were mostly against AI
otto.barten2mo10

How would you define pro-tech, which I assume you identify as? For example, should AI replace humanity a) in any case if it can, b) only if it's conscious, c) not at all?

Reply
Load More
1otto.barten's Shortform
5y
29
16Space colonization and scientific discovery could be mandatory for successful defensive AI
14d
0
7These are my reasons to worry less about loss of control over LLM-based agents
1mo
4
11We should think about the pivotal act again. Here's a better version of it.
2mo
2
15AI Offense Defense Balance in a Multipolar World
4mo
5
17Yes RAND, AI Could Really Cause Human Extinction [crosspost]
4mo
4
10US-China trade talks should pave way for AI safety treaty [SCMP crosspost]
6mo
0
15New AI safety treaty paper out!
7mo
2
11Proposing the Conditional AI Safety Treaty (linkpost TIME)
1y
9
9Announcing the AI Safety Summit Talks with Yoshua Bengio
1y
1
13What Failure Looks Like is not an existential risk (and alignment is not the solution)
2y
12
Load More