An explanation of evil in an organized world

God is good*

*for a very specific definition of "goodness", which doesn't actually capture the intuition of most people about ethics and is mostly about iteraction of sub-atomic particles.

Beauty and the Bets

Ape in the coat2d10

First of all, it‘s certainly important to distinguish between a probability model and a strategy. The job of a probability model is simply to suggest the probability of certain events and to describe how probabilities are affected by the realization of other events. A strategy on the other hand is to guide decision making to arrive at certain predefined goals.

Of course. As soon as we are talking about goals and strategies we are not talking about just probabilities anymore. We are also talking about utilities and expected utilities. However, probabilities do not suddenly change because of it. Probabilistic model is the same, there are simply additional considerations as well.

My point is, that the probabilities a model suggests you to have based on the currently available evidence do NOT neccessarily have to match the probabilities that are relevant to your strategy and decisions.

Whether or not your probability model leads to optimal descision making is the test allowing to falsify it. There are no separate "theoretical probabilities" and "decision making probabilities". Only the ones that guide your behaviour can be correct. What's the point of a theory that is not applicable to practice, anyway?

If your model claims that the probability based on your evidence is 1/3 but the optimal decision making happens when you act as if it's 1/2, then your model is wrong and you switch to a model that claims that the probability is 1/2. That's the whole reason why betting arguments are popular.

If Beauty is awake and doesn‘t know if it is the day her bet counts, it is in fact a rational strategy to behave and decide as if her bet counts today.

Questions of what "counts" or "matters" are not the realm of probability. However, the Beauty is free to adjust her utilities based on the specifics of the betting scheme.

All your model suggests are probabilities conditional on the realization of certain events.

The model says that

P(Heads|Red) = 1/3

P(Heads|Blue) = 1/3

but

P(Heads|Red or Blue) = 1/2

Which obviosly translates in a betting scheme: someone who bets on Tails only when the room is Red wins 2/3 of times and someone who bets on Tails only when the room is Blue wins 2/3 of times, while someone who always bet on Tails wins only 1/2 of time.

This leads to a conclusion that observing event "Red" instead of "Red or Blue" is possible only for someone who has been expecting to observe event "Red" in particular. Likewise, observing HTHHTTHT is possible for a person who was expecting this particular sequence of coin tosses, instead of any combination with length 8. See Another Non-Anthropic Paradox: The Unsurprising Rareness of Rare Events

LLMs could be as conscious as human emulations, potentially

Ape in the coat4d10

From our state of knowledge about consciousness it's indeed not impossible that modern LLMs are conscious. I wouldn't say it's likely, I definitely wouldn't say that they are as likely to be conscious as uploaded humans. But the point stands. We don't know for sure and we lack proper way to figure it out.

Previously we could've vaguely point towards Turing test, but we are past this stage now. Behavioral analysis of a model at this point is mostly unhelpful. A few tweaks can make the same LLM that previously confidently claimed not to be conscious, to swear that it's conscious and is suffering. So what a current LLM says about the nature of its consciousness gives us about 0 bit of evidence.

This is another reason to stop making bigger models and spend a lot of time figuring out what we have already created. At some point we may create a conscious LLM, won't be able to tell the difference and it would be a moral catastrophe.

LLMs seem (relatively) safe

Ape in the coat4d10

You mean, "ban superintelligence"? Because superintelligences are not human-like.

The kind of superintelligence that doesn't possess human-likeness that we want it to possess.

That's the problem with your proposal of "ethics module". Let's suppose that we have system of "ethics module" and "nanotech design module". Nanotech design module outputs 3D-model of supramolecular unholy abomination. What exactly should ethics module do to ensure that this abomination doesn't kill everyone?

Nanotech design module has to be evaluatable by the ethics module. For that it also be made from multiple sequential LLM calls in explicit natural language. Other type of modules should be banned.

The Solution to Sleeping Beauty

Ape in the coat4d20

indeed. But still… if she wonders out loud “what day is it?” at the very moment she says that, it has an answer.

There is no "but". As long as the Beauty is unable to distinguish between Monday and Tuesday awakenings, as long as the decision process which leads her to say the phrase "what day is it" works the same way, from her perspective there is no one "very moment she says that". On Tails, there are two different moments when she says this, and the answer is different for them. So there is no answer for her

An experimenter who overhears her knows the answer. It seems to me that you “resolve” this tension is that the two of them are technically asking a different question, even though they are using the same words

Yes, you are correct. From the position of the experimenter, who knows which day it is, or who is hired to work only on one random day this is a coherent question with an actual answer. The words we use are the same but mathematical formalism is different.

For an experimenter who knows that it's Monday the probability that today is Monday is simply:

P(Monday|Monday) = 1

For an experimenter who is hired to work only on one random day it is:

P(Monday|Monday xor Tuesday) = 1/2

But still… how surprised should she be if she were to learn that today is Monday? It seems that taking your stance to its conclusion, the answer would be “zero surprise: she knew for sure she would wake up on Monday so no need to be surprised it happened”

And even if she were to learn that the coin landed tails, so she knows that this is just one of a total of two awakenings, she should have zero surprise upon learning the day of the week, since she now knows both awakenings must happen.

Completely correct. Beauty knew that she would be awaken on Monday either way and so she is not surprised. This is a standard thing with non-mutually exclusive events. Consider this:

A coin is tossed and you are put to sleep. On Heads there will be a red ball in your room. On Tails there will be a red and a blue ball in your room. How surprised should you be to find a red ball in your room?

Which seems to violate conservation of expected evidence, except you already said that the there’s no coherent probabilities here for that particular question, so that’s fine too.

The appearance of violation of conservation of expected evidence comes from the belief that awakening on Monday and on Tuesday are mutually exclusive, while they are, in fact sequential.

This makes sense, but I’m not used to it. For instance, I’m used to these questions having the same answer:
P(today is Monday)?
P(today is Monday | the sleep lab gets hit by a tornado)
Yet here, the second question is fine (assuming tornadoes are rare enough that we can ignore the chance of two on consecutive days) while the first makes no sense because we can’t even define “today”
It makes sense but it’s very disorienting, like incompleteness theorem level of disorientation or even more

I completely understand. It is counterintuitive because evolution didn't prepare us to deal with situations where an experience is repeated the same while we receive memory loss. As I write in the post:

If I forget what is the current day of the week in my regular life, well, it's only natural to start from a 1/7 prior per day and work from there. I can do it because the causal process that leads to me forgetting such information can be roughly modeled as a low probability occurrence which can happen to me at any day.
It wouldn't be the case, if I was guaranteed to also forget the current day of the week on the next 6 days as well, after I forgot it on the first one. This would be a different causal process, with different properties - causation between forgetting - and it has to be modeled differently. But we do not actually encounter such situations in everyday life, and so our intuition is caught completely flat footed by them.

The whole paradox arises from this issue with our intuition, and just like with incompleteness theorem (thanks for the flattering comparison, btw), what we need to do now is to re-calibrate our intuitions, make it more accustomed to the truth, preserved by the math, instead of trying to fight it.

LLMs seem (relatively) safe

Ape in the coat4d3-2

"here is how to make LLMs more capable but less humanlike, it will be adopted because it makes LLMs more capable".

Thankfully, this is a class of problems that humanity has an experience dealing with. The solution boils down to regulating all the ways to make LLMs less human-like out of existence.

LLMs seem (relatively) safe

Ape in the coat5d5-2

With every technology there is a way to make it stop working. There are any number of ways to make a plane unable to fly. But the important thing is that we know a way to make a plane fly - therefore humans can fly via a plane.

Likewise, the point that LLM-based-architecture can in principle be safe still stands even if there is a way to make an unsafe LLM-based-architecture.

And this is a huge point. Previously we were in a state where alignment wasn't even a tractable problem. Where capabilities progressed and alignment stayed in the dirt. Where AI system may understand human values but still not care about them and we didn't know what to do with it.

But now we can just

have an “ethics module” where the underlying LLM produces text which then feeds into other parts of the system to help guide behavior.

Which makes alignment tractable. Alignment can now be reduced to capability of the ethics module. We know that the system will care about our values as it understands them because we can explicitly code it to do this way via an if-else statement. This is an enormous improvement over the previous status quo.

Magic by forgetting

Ape in the coat6d50

Universal guide to magic via anthropics:

Be not randomly sampled from a set
Assume that you you are randomly sampled from the set anyway
Arrive to an absurd conclusion
Magic!

Either a strong self-sampling assumption is false

Of course it is false. What are the reasons to even suspect that it might be true?

and-or path-based identity is true.

Note that path-dependent identity also has its own paradoxes: two copies can have different ‘weights” depending on how they were created while having the same measure. For example, if in sleep two copies of me will be created and one of the copies will be copied again – when there will be 3 copies in the morning in the same world, but if we calculate chances to be one of them based on paths, they will be ½ and ¼ and ¼.

This actually sounds about right. What's paradoxical here?

The Solution to Sleeping Beauty

Ape in the coat12d20

I knew that not any string of English words gets a probability, but I was naïve enough to think that all statements that are either true or false get one.

Well, I think this one is actually correct. But, as I said in the previous comment, the statement "Today is Monday" doesn't actually have a coherent truth value throughout the probability experiment. It's not either True or False. It's either True or True and False at the same time!

I was hoping they this sequence of posts which kept saying “don’t worry about anthropics, just be careful with the basics and you’ll get the right answer” would show how to answer all possible variations of these “sleep study” questions… instead it turns out that it answers half the questions (the half that ask about the coin) while the other half is shown to be hopeless… and the reason why it’s hopeless really does seem to have an anthropics flavor to it.

We can answer every coherently formulated question. Everything that is formally defined has an answer Being careful with the basics allows to understand which question is coherent and which is not. This is the same principle as with every probability theory problem.

Consider Sleeping-Beauty experiment without memory loss. There, the event Monday xor Tuesday also can't be said to always happen. And likewise "Today is Monday" also doesn't have a stable truth value throughout the whole experiment.

Once again, we can't express Beauty's uncertainty between the two days using probability theory. We are just not paying attention to it because by the conditions of the experiment, the Beauty is never in such state of uncertainty. If she remembers a previous awakening then it's Tuesday, if she doesn't - then it's Monday.

All the pieces of the issue are already present. The addition of memory loss just makes it's obvious that there is the problem with our intuition.

When is a mind me?

Ape in the coat16d4-1

"You should anticipate having both experiences" sounds sort of paradoxical or magical, but I think this stems from a verbal confusion.

You can easily clear this confusion if you rephrase it as "You should anticipate having any of these experiences". Then it's immediately clear that we are talking about two separate screens. And it's also clear that our curriocity isn't actually satisfied. That the question "which one of these two will actually be the case" is still very much on the table.

Rob-y feels exactly as though he was just Rob-x, and Rob-z also feels exactly as though he was just Rob-x

Yes, this is obvious. Still as soon as we got Rob-y and Rob-z they are not "metaphysically the same person". When Rob-y says "I" he is reffering to Rob-y, not Rob-z and vice versa. More specifically Rob-y is refering to some causal curve through time ans Rob-z is refering to another causal curve through time. These two curves are the same to some point, but then they are not.

LESSWRONG
LW

Posts

Wiki Contributions

Comments

God is good*