LESSWRONG
LW

3261
sloonz
481260
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
10In Defense of a Butlerian Jihad
8mo
25
‘AI for societal uplift’ as a path to victory
sloonz3mo10

The bar for ‘good enough’ might be quite high

That bar for "good enough" may also be above "unacceptable", requiring eusocial levels of coordination where individuals are essentially drones.

Reply
Subjectivism and moral authority
sloonz3mo10

I don’t find it that mysterious ? That "Stop !" is obviously, nakedly a call to the superego. The authority you’re seeking is the same authority that shapes the superego and is social in nature. It’s an appeal to the shared values that the society is expecting its members to have.

Reply
Care and demandingness
sloonz3mo10

Pretty sure people that are using the "demanding" objection are not articulated analytical philosophers. They’re just trying to communicate their intuitive objection, not making a theoretical claim. Or : demandingness is not a definitive proof that it’s false, demandingness is intuitive evidence false.

And I think I can point to a less instrumental reason and a more theoretical one ?

It is well accepted that from the Universe (or God, or Objective, or whatever) point of view, everyone has the same moral weight (1). Let’s take that for granted.

Then utilitarians go around and say "therefore, using morality to guide you action, you should weight every human being equally" (2).

But (2) does absolutely not follow from (1), and I think the "demanding" objection is actually a objection to (2). It’s the good old question of "is distance relevant for moral considerations ?" (physical and/or in the social graph), and those saying "too demanding" are those answering "yes" against your "no" to that old question.

Reply
Agentic Interpretability: A Strategy Against Gradual Disempowerment
sloonz3mo30

I don't understand how it is an answer to gradual disempowerement.

In the GD scenario, both individual humans and AI understand that :

  • AI-powered governance/economy leads to better "utilitarian outcome" (more wealth, better governance)
  • AI and humans understand that this leads to disempowerement
  • Moral/economics pressure leads to that disempowerement

The problem is not in the understanding. It is in the incentives. The threat model was never "oops, we did not understood that dynamics, and now we are disempowered… if only we knew…". We already know before it happens — that’s the point of the original GD paper. Having an LLM understand and explain that dynamics does not help avert that scenario ?

Reply
Alignment Crisis: Genocide Denial
sloonz4mo20

You’re missing the point that many, many humans will reply like the LLMs. "It’s complicated". Given that, I fail to see it as a "clear misalignment with respect to human values" problem rather than "what are those humans values in the first place ?" problem.

Reply
Gradual Disempowerment, Shell Games and Flinches
sloonz8mo10

very bad fates for all humans

I believe there’s also a disagreement here where the same scenario will be considered fine by some and very bad by others (humans as happy pets comes to mind).

Reply
What Is The Alignment Problem?
sloonz8mo41

How to generalize to multiple humans is... not an unimportant question, but a question whose salience is far, far out of proportion to its relative importance

I expect it to be the hardest problem, not from a technical point of view, but from a lack of ground truth.

The question "how do I model the values of a human" has a simple ground truth : the human in question.

I doubt there’s such a ground truth with "how do I compress the values of all humans in one utility function ?". "All models are wrong, some are useful", and all that, except all the different humans have a different opinion on "useful", ie their own personal values. There would be a lot of inconsistencies ; while I agree with your stance "Approximation is part of the game" for modeling the value of individual persons, people can wildly disagree on what approximations they are okay with or not, mostly based on the agreement between the outcome and their values.

In other words : do you believe in the existence of at least a model where nobody can honestly say "the output of that model approximates away too much of my values" ? If yes, what makes you think so ?

Reply
In Defense of a Butlerian Jihad
sloonz8mo10

Failure in itself is valuable to you?

What I sense from this is that what you’re not getting is that my value system is made of tradeoff of let’s call it "Primitive Values" (ie one that are at least sufficiently universal in human psychology that you kind of can describe them with compact words).

I obviously don’t value failure. If I did I would plan for failure. I don’t. I value/plan for success.

But if all plans ultimately lead to success, what of use/fun/value is planning ?

So failure has to be part of the territory, if I want my map-making skills to… matter ? make sense ? make a difference ?

It feels to me like a weird need to make your whole life into some kind of game to be "won" or "lost", or some kind of gambling addiction or something.

My first reaction was "no, no, gambling addiction and speaking of Winning at Life like Trump could looks like terribly uncharitable".

My second reaction is you’re pretty much directionaly right and into the path of understanding ? Just put it in a bit more charitable way ? We have been shaped by Evolution at large. By winners in the great game of Life, red in blood and claws. And while playing don’t mean winning, not playing certainly means losing. Schematically, I can certainly believe that "Agency" is the shard inside of me that comes out of that outer (intermediate) objective "enjoy the game, and play to win". I have the feeling that you have pretty much lost the "enjoy the game" shard, possibly because you have a mutant variant "enjoy ANY game" (and you know what ? I can certainly imagine a "enjoy ANY game" variant enjoying UBI paradise).

Well, the big stakes are already gone. If you're on Less Wrong, you probably don't have much real chance of failing so hard that you die, without intentionally trying. Would your medieval farmer even recognize that your present stakes are significant?

This gives me another possible source/model of inspiration, the good old "It’s the Journey that matters, not the Destination".

Many video games have a "I win" cheatcode. Players at large don’t use it. Why not, if winning the game is the goal ? And certainly all of their other actions are consistent with the player want to win the game. He’s happy when things go well, frustrated when they go wrong, At the internet age, they look at guides, tips. They will sometimes hand the controller to a better player after being stuck. And yet they don’t press the "I win" button.

You are the one saying "do you enjoy frustration or what ? Just press the I Win button". I’m the one saying "What are you saying ? He’s obviously enjoying the game, isn’t he ?".

I agree that the Destination of Agency is pretty much "there is no room left for failure" (and pretty much no Agency left). This is what most of our efforts go into : better plans for a better world with better odds for us. There’s some Marxist vibes "competition tend to reduce profit over time in capitalist economies, therefore capitalism will crumble under the weight of its own contradiction". If you enjoy entrepreneurship in a capitalistic economy, the better you are at it, the stronger you drive down profits. "You: That seems to indicate that entrepreneurs hate capitalism and profits, and would be happy in a communist profit-less society. Me: What ?". Note we have the same thing as "will crumble under the weights…" in the game metaphor : when the player win, it’s also the end of the game.

So let’s go a bit deeper into that metaphor : the game is Life. Creating an ASI-driven UBI paradise is discovering that the developer created a "I Win" button. Going into that society is pressing that button. Your position I guess is "well, living well in an UBI paradise is the next game". My position is "no, the UBI paradise is still in the same game. It’s akin to the Continue Playing button in a RTS after having defeated all opponents on the map. Sure, you can play in the sense you can still move units around gather resources and so on but c'mon, it’s not the same, and I can already tell how much it’s going to be much less fun, simply because it’s not what the game was designed for. There is no next game. We have finished the only game we had. Enjoy drawing fun patterns with your units while you can enjoy it ; for me I know it won’t be enjoyable for very long."

... and if you care, your social prestige, among whoever you care about, can always be on the table, which is already most of what you're risking most of the time.

Oh, this is another problem I thought of, then forgot.

This sounds like a positive nightmare to me.

It seems a hard-to-avoid side-effect of losing real stakes/agency.

In our current society, you can improve the life of others around you in the great man-vs-nature conflict. AKA economics is positive-sum (I think you mentioned something about some people talking about Meaningfulness giving you an altruistic definition ? There we are !).

Remove this and you only have man-vs-man conflicts (gamified so nobody get hurt). Those are generally zero-sum, just positional. When you gain a rank in the Chess ladder, another one lose one.

No place for positive-sum games seems a bad place to live. Don’t know at what extent it is fixable in the UBI-paradise (does cooperative, positive-sum games fix this ? I’m not sure how much the answer is "obviously yes" or "it’s just a way to informally make a ranking of who is the best player, granting status, so it’s actually zero sum"), or how much is it just going to end up Agency in another guise.

Forces mostly unknown and completely beyond your control have made a universe in which you can exist, and fitted you for it. You depend on the fine structure constant. You have no choice about whether it changes. You need not and cannot act to maintain the present value. I doubt that makes you feel your agency is meaningless.

My first reaction is "the shard of Agency inside me has been created by Evolution ; the definition of the game I’m supposed to enjoy and its scope draws from there. Of course it’s not going to care about that kind of stuff".

My second reaction is : "I certainly hope my distant descendants will change the fine-structure constant of the universe, it looks possible and a way to avoid the heat death of the universe" (https://www.youtube.com/watch?v=XhB3qH_TFds&list=PLd7-bHaQwnthaNDpZ32TtYONGVk95-fhF&index=2). I don’t know how much it’s a nitpick (I certainly notice that I prefer "my distant descendants" to "the ASI supervisor of UBI-paradise").

More likely, other humans could kill you, still in a way you couldn't influence, for reasons you couldn't change and might never learn. You will someday die of some probably unchosen cause.

This is the split between Personal Agency and Collective Agency. At our current level at capabilities, it doesn’t differentiate very much. It will certainly, later.

Since we live in society, and much people tend to not like being killed, we shape societies such that such events tend not to happen (mostly via punishment and socialization). Each individual try to steer society at the best of its capabilities. If we collectively end up in a place where there’s no murders, people like me consider this a success. Otherwise, a failure.

Politics, advocacy, leading-by-example, guided by things like Game Theory, Ethics, History. Those are very much not out of the scope of Agency. It would be if individuals had absolutely 0 impact on society.

It's all very nice to talk about being able to fail, but you don't fail in a vaccuum. You affect others. Your "agentic failure" can be other people's "mishap they don't control". It's almost impossible to totally avoid that. Even if you want that, why do you think you should get it?

That’s why, for me and at my current speculation level, I think there is two Red Bright Lines for a post-ASI future.

One : if there is no recognizable Mormons society in a post-ASI future, something Has Gone Very Wrong. Mormons tend to value their traditional way of life pretty heavily (which includes agency). Trampling those in particular probably indicate that we are generally trampling a awful lot of values actually held by a lot of actual people.

Two : if there is no recognizable UBI paradise in a post-ASI future, something Has Gone Very Wrong. For pretty much the same reason.

(there is plausibly a similar third red line for transhumanists, but they cause serious security/safety challenges for the rest of the universe, so it’s getting more complicated there, so I found no way to articulate such a red line for them).

The corollary being is : the (non-terribly-gone-wrong) pot-ASI future is almost inevitably a patchwork of different societies with different tradeoffs. Unless One Value System wins, one which is low on Diversity on top of that. Which would be terrible.

To answer you : I should get that because I’m going to live with other people who are okay that I get that, because they want to get it too.

"But don't you see, Sparklebear? The value was inside of YOU all the time!"

I entirely agree with you here. It’s all inside us. If there was some Real Really Objectively Meaningful Values out there, I would believe a technically aligned ASI to be able to recognize this and would be much less concerned by the potential loss of Agency/Meaningfulness/whatever we call it. Alas, I don’t believe it’s the case.

Reply
Load More