LESSWRONG
LW

otto.barten
465191250
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
1otto.barten's Shortform
5y
29
No wikitag contributions to display.
"AI Alignment" is a Dangerously Overloaded Term
otto.barten2y*90

I think it's a great idea to think about what you call goalcraft.

I see this problem as similar to the age-old problem of controlling power. I don't think ethical systems such as utilitarianism are a great place to start. Any academic ethical model is just an attempt to summarize what people actually care about in a complex world. Taking such a model and coupling that to an all-powerful ASI seems a highway to dystopia.

(Later edit: also, an academic ethical model is irreversible once implemented. Any goal which is static cannot be reversed anymore, since this will never bring the current goal closer. If an ASI is aligned to someone's (anyone's) preferences, however, the whole ASI could be turned off if they want it to, making the ASI reversible in principle. I think ASI reversibility (being able to switch it off in case we turn out not to like it) should be mandatory, and therefore we should align to human preferences, rather than an abstract philosophical framework such as utilitarianism.)

I think letting the random programmer that happened to build the ASI, or their no less random CEO or shareholders, determine what would happen to the world, is an equally terrible idea. They wouldn't need the rest of humanity for anything anymore, making the fates of >99% of us extremely uncertain, even in an abundant world.

What I would be slightly more positive about is aggregating human preferences (I think preferences is a more accurate term than the more abstract, less well defined term values). I've heard two interesting examples, there are no doubt a lot more options. The first is simple: query chatgpt. Even this relatively simple model is not terrible at aggregating human preferences. Although a host of issues remain, I think using a future, no doubt much better AI for preference aggregation is not the worst option (and a lot better than the two mentioned above). The second option is democracy. This is our time-tested method of aggregating human preferences to control power. For example, one could imagine an AI control council consisting of elected human representatives at the UN level, or perhaps a council of representative world leaders. I know there is a lot of skepticism among rationalists on how well democracy is functioning, but this is one of the very few time tested aggregation methods we have. We should not discard it lightly for something that is less tested. An alternative is some kind of unelected autocrat (e/autocrat?), but apart from this not being my personal favorite, note that (in contrast to historical autocrats), such a person would also in no way need the rest of humanity anymore, making our fates uncertain.

Although AI and democratic preference aggregation are the two options I'm least negative about, I generally think that we are not ready to control an ASI. One of the worst issues I see is negative externalities that only become clear later on. Climate change can be seen as a negative externality of the steam/petrol engine. Also, I'm not sure a democratically controlled ASI would necessarily block follow-up unaligned ASIs (assuming this is at all possible). In order to be existentially safe, I would say that we would need a system that does at least that.

I think it is very likely that ASI, even if controlled in the least bad way, will cause huge externalities leading to a dystopia, environmental disasters, etc. Therefore I agree with Nathan above: "I expect we will need to traverse multiple decades of powerful AIs of varying degrees of generality which are under human control first. Not because it will be impossible to create goal-pursuing ASI, but because we won't be sure we know how to do so safely, and it would be a dangerously hard to reverse decision to create such. Thus, there will need to be strict worldwide enforcement (with the help of narrow AI systems) preventing the rise of any ASI."

About terminology, it seems to me that what I call preference aggregation, outer alignment, and goalcraft mean similar things, as do inner alignment, aimability, and control. I'd vote for using preference aggregation and control.

Finally, I strongly disagree with calling diversity, inclusion, and equity "even more frightening" than someone who's advocating human extinction. I'm sad on a personal level that people at LW, an otherwise important source of discourse, seem to mostly support statements like this. I do not.

Reply
The Industrial Explosion
[+]otto.barten12d-70
The Industrial Explosion
otto.barten12d10

It looks to me like this is a scenario where superhuman AI is intent-aligned. If that's true, rainforests exist if humans prefer rainforests over mansions or superyachts or some other post-AGI luxury they could build from the same atoms. I'm afraid they won't.

Reply
The Industrial Explosion
otto.barten15d10

Agree about the celestial bodies. Can you explain what you mean by "but also the direction pointed by the market argument is not entirely without merit", and why the cited paper is relevant?

I would be reasonably optimistic if we had a democratic world government (or perhaps a UN-intent-aligned ASI blocking all other ASI) that we'd decide to leave at least some rainforest and the sun in one piece. I'm worried about international competition between states though where it becomes practically impossible due to such competition to not destroy earth for stuff. Maybe Russia will in the end win because it holds the greatest territory. Or more likely: the winning AI/industrial nation will conquer the rest of the world and will transform their earth to stuff as well.

Maybe we should have international treaties limiting the amount of nature a nation may convert to stuff?

Reply
The Industrial Explosion
otto.barten15d61

Climate change exists because doing something that's bad for the world (carbon emission) is not priced. Climate change isn't much worse than it is already because most people still can't afford to live very climate unfriendly lives.

In this scenario, I'm mostly worried that without any constraints on what people can afford, not only might carbon emission go through the roof, but all other planetary boundaries that we know and don't know yet might also be shattered. We could of course easily solve this problem by pricing externalities, which would not be very costly in an abundant world. Based on our track record, I just don't think that we'll do that.

Will we still have rainforest after the industrial explosion? Seems quite unlikely to me.

Reply
Yes RAND, AI Could Really Cause Human Extinction [crosspost]
otto.barten19d10

Appreciate your comment. Loss of control does make killing all humans easier, doesn't it? Once someone/something has control (sovereignty) over a population, by definition, they can do whatever they want. For example, they could demand part of the population kills the other part, ask a (tiny) part of the population to create weapons (possibly for a bogus reason) and use them against the entire population, etc. etc. Even with low tech, it's easy to kill off a population once you have control (sovereignty), this has been demonstrated at many historical genocides. With high tech, it becomes trivial. Note there's no hurry: once we've lost control, this will likely remain the case, so an AI would have billions of years to carry out whatever plan they want to.

Reply
The Boat Theft Theory of Consciousness
otto.barten24d20

Ah I wasn't really referring to the OP, more to people who in general might blindly equate vague notions of whatever consciousness might mean to moral value. I think that's an oversimplification and possibly dangerous. Combined with symmetric population ethics, a result could be that we'd need to push for spamming the universe with maximum happy AIs, and even replacing humanity with maximum happy AIs since they'd contain more happiness per kg or m3. I think that would be madness.

Animals: yes, some. Future AIs: possibly.

If I'd have to speculate, I'd guess that self-awareness is just included in any good world model, and sentience is a control feedback loop, in both humans and AIs. These two things together, perhaps in something like a global workspace, might make up what some people call consciousness. These things are obviously useful to steer machines into a designed direction. But I fear they will turn out to be trivial engineering results: one could argue an automatic vacuum cleaner has feeling, since it has a feedback loop steering it clear of a wall. That doesn't mean it should have rights.

I think the morality question is a difficult one, will remain subjective, and we should vote on it, rather than try to solve it analytically. I think the latter is doomed.

Reply1
The Boat Theft Theory of Consciousness
otto.barten1mo130

I like this treatment of consciousness and morality so much better than the typical EA (and elsewhere) naive idea that anything that "has consciousness" suddenly "has moral value" (even worse, and dangerous, is to combine that with symmetric population ethics). We should treat these things carefully (and imo democratically) to avoid making giant mistakes once AI allows us to put ethics into practice.

Reply
Secret Collusion: Will We Know When to Unplug AI?
otto.barten1mo10

This is a late comment, but extremely impressive work!

I'm a huge fan of explicit, well-argued threat model work, and even more impressive that you made great contributions to mitigation measures already. I find this threat model frankly seemingly more likely to become existential, and possibly at lower AI capability levels, than either yudkowsky/bostrom scenarios or christiano/gradual displacement ones. So seems hugely important!

A question: am I right that most of your analysis presumes that there would be a fair amount of oversight, at least oversight attempts? If so, I'd be afraid that the actual situation might be heavy deployment of agents without much oversight attempts at all (given both labs' and govts' safety track record so far). In such a scenario: 

  1. How likely do you think collusion attempts aiming for takeover would be?
  2. Could you estimate what kind of capabilities would be needed for a multi-agent takeover?
  3. Would you expect some kind of warning shot before a successful multi-agent takeover or not?
Reply
otto.barten's Shortform
otto.barten2mo10

Maybe economic niche occupation requires colonizing the universe

Reply
Load More
16Yes RAND, AI Could Really Cause Human Extinction [crosspost]
22d
4
10US-China trade talks should pave way for AI safety treaty [SCMP crosspost]
2mo
0
15New AI safety treaty paper out!
4mo
2
10Proposing the Conditional AI Safety Treaty (linkpost TIME)
8mo
8
9Announcing the AI Safety Summit Talks with Yoshua Bengio
1y
1
13What Failure Looks Like is not an existential risk (and alignment is not the solution)
1y
12
17Announcing #AISummitTalks featuring Professor Stuart Russell and many others
2y
1
65AI Regulation May Be More Important Than AI Alignment For Existential Safety
2y
39
12[Crosspost] An AI Pause Is Humanity's Best Bet For Preventing Extinction (TIME)
2y
0
7[Crosspost] Unveiling the American Public Opinion on AI Moratorium and Government Intervention: The Impact of Media Exposure
2y
0
Load More