Liron - LessWrong

What do coherence arguments actually prove about agentic behavior?

When an agent is goal-oriented, they want to become more goal-oriented, and maximize the goal-orientedness of the universe with respect to their own goal

Because expected value tells us that the more resources you control, the more robust you are to maximizing your probability of success in the face of what may come at you, and the higher your maximum possible utility is (if you have a utility function without an easy-to-hit max score).

“Maximizing goal-orientedness of the universe” was how I phrased the prediction that conquering resources involves having them aligned to your goal / aligned agents helping you control them.

What do coherence arguments actually prove about agentic behavior?

Liron6d20

> goal-orientedness is a convergent attractor in the space of self-modifying intelligences

This also requires a citation, or at the very least some reasoning; I'm not aware of any theorems that show goal-orientedness is a convergent attractor, but I'd be happy to learn more.

Ok here's my reasoning:

When an agent is goal-oriented, they want to become more goal-oriented, and maximize the goal-orientedness of the universe with respect to their own goal. So if we diagram the evolution of the universe's goal-orientedness, it has the shape of an attractor.

There are plenty of entry paths where some intelligence-improving process spits out a goal-oriented general intelligene (like biological evolution did), but no exit path where a universe whose smartest agent is super goal-oriented ever leads to that no longer being the case.

Robin Hanson AI X-Risk Debate — Highlights and Analysis

Liron8d20

I'm happy to have that kind of debate.

My position is "goal-directedness is an attractor state that is incredibly dangerous and uncontrollable if it's somewhat beyond human-level in the near future".

The form of those arguments seems to be like "technically it doesn't have to be". But realistically it will be lol. Not sure how much more there will be to say.

Robin Hanson AI X-Risk Debate — Highlights and Analysis

Liron12d40

Thanks. Sure, I’m always happy to update on new arguments and evidence. The most likely way I see possibly updating is to realize the gap between current AIs and human intelligence is actually much larger than it currently seems, e.g. 50+ years as Robin seems to think. Then AI alignment research has a larger chance of working.

I also might lower P(doom) if international govs start treating this like the emergency it is and do their best to coordinate to pause. Though unfortunately even that probably only buys a few years of time.

Finally I can imagine somehow updating that alignment is easier than it seems, or less of a problem to begin with. But the fact that all the arguments I’ve heard on that front seem very weak and misguided to me, makes that unlikely.

Robin Hanson AI X-Risk Debate — Highlights and Analysis

Liron14d64

Thanks for your comments. I don’t get how nuclear and biosafety represent models of success. Humanity rose to meet those challenges not quite adequately, and half the reason society hasn’t collapsed from e.g. a first thermonuclear explosion going off either intentionally or accidentally is pure luck. All it takes to topple humanity is something like nukes but a little harder to coordinate on (or much harder).

Robin Hanson & Liron Shapira Debate AI X-Risk

Liron17d40

Here's a better transcript hopefully: https://share.descript.com/view/yfASo1J11e0

I updated the link in the post.

Robin Hanson & Liron Shapira Debate AI X-Risk

Liron18d20

Thanks I’ll look into that. Maybe try the transcript generated by YouTube?

What do coherence arguments actually prove about agentic behavior?

Answer by LironJun 01, 20241-3

I guess I just don’t see it as a weak point in the doom argument that goal-orientedness is a convergent attractor in the space of self-modifying intelligences?

It feels similar to pondering the familiar claim of evolution, that systems that copy themselves and seize resources are an attractor state. Sure it’s not 100% proven but it seems pretty solid.

Failures in Kindness

Liron4mo105

Context is a huge factor in all these communications tips. The scenario I'm optimizing for is when you're texting someone who has a lot of options, and you think it's high expected value to get them to invest in a date with you, but the most likely way that won't happen is if they hesitate to reply to you and tap away to something else. That's not always the actual scenario though.

Imagine you're the recipient, and the person who's texting you met your minimum standard to match with, but is still a-priori probably not worth your time and effort going on a date with, because their expected attractiveness+compatibility score is too low, though you haven't investigated enough to be confident yet. (This is a common epistemic state of e.g. a woman with attractive pics on a dating app that has more male users.)

Maybe the first match who asks you "how's your week going" feels like a nice opportunity to ramble how you feel, and a nice sign that someone out there cares. But if that happens enough on an app, and the average date-worthiness of the people that it happens with is low, then the next person who sends it doesn't make you want to ramble anymore. Because you know from experience that rambling into a momentumless conversation will just lead it to stagnate in its next momentumless point.

It's nice when people care about you, but it quickly gets not so nice when a bunch of people with questionable date-appeal are trying to trade a cheap care signal for your scarce attention and dating resources.

If the person sending you the message has already distinguished themselves to you as "dateworthy", e.g. by having one of the best pics and/or profile in your judgment, then "How's your week going" will be a perfectly adequate message from them; in some cases maybe even an optimal message. You can just build rapport and check for basic red flags, then set up a date.

But if you're not sold on the other person being dateworthy, and they start out from a lower-leverage position in the sense that they initially consider you more dateworthy than you consider them, then they better send a message that somehow adds value to you, to help them climb the dateworthiness gap.

But again, context is always the biggest factor, and context has a lot of detail. E.g. if you don't consider someone dateworthy, but you're in a scenario where someone just making conversation with you is adding value to you (e.g. not a ton of matches demanding your attention using the same unoriginal rapport-building gambit), then "How's it going" can work great.

This is actually the default context if you're brave enough to approach strangers you want to date in meatspace. The stranger can be much more physically attractive or higher initially-perceived dating market value than you. Yet just implicitly signaling your social confidence through boldness, body language, and friendly/fun way of speaking and acting, raises your dateworthiness significantly, and the real-world-interaction modality doesn't have much competition these days, so the content of the conversation that leads up to a date can be super normal smalltalk like "How's it going".

Failures in Kindness

Liron4mo62

Yeah nice. A statement like "I'm looking for something new to watch" lowers the stakes by making the interaction more like what friends talk about rather than about an interview for a life partner, increasing the probability that they'll respond rather than pausing for a second and ending up tapping away.

You can do even more than just lowering the stakes if you inject a sense that you're subconsciously using the next couple conversation moves to draw out evidence about the conversation partner, because you're naturally perceptive and have various standards and ideas about people you like to date, and you like to get a sense of who the other person is.

If done well, this builds a curious sense that the question is a bit more than just making formulaic conversation, but somehow has momentum to it. The best motivation for someone to keep talking to you on a dating app is if they feel they're being seen by a savvy evaluator who will reflect back a valuable perspective about them. The person talking to you can then be subconsciously thinking about how attractive/interesting/unique/etc they are (an engaging experience). Also, everyone wants to feel like they're maximizing their potential by finding someone to date who's in the upper range of their "league", and there are ways to engage in conversation that are more consistent with that ideal.

IMO the best type of conversation to have after a few opening back&forths, is to get them talking about something they find engaging, which is generally also something that reflects them in a good light, which makes it fun and engaging for them while also putting you in a position to give a type of casual "feedback", ultimately leading up to a statement of interest which shows them why you're not just another random match but rather someone they have more reason to meet and not flake on. Your movie question could be a good start toward discovering something like that, but probably not an example of that unless they're a big movie person.

I'd try to look at their profile to clues of something they do in their life where they make an effort that someone ought to notice and appreciate, and get em talking about that.

Those are just some thoughts I have about how to distinguish yourself in the middle part of the conversation between opening interest and asking them on a date.

LESSWRONG
LW

Posts

Wiki Contributions

Comments