Wiki Contributions

Comments

From our state of knowledge about consciousness it's indeed not impossible that modern LLMs are conscious. I wouldn't say it's likely, I definitely wouldn't say that they are as likely to be conscious as uploaded humans. But the point stands. We don't know for sure and we lack proper way to figure it out.

Previously we could've vaguely point towards Turing test, but we are past this stage now. Behavioral analysis of a model at this point is mostly unhelpful. A few tweaks can make the same LLM that previously confidently claimed not to be conscious, to swear that it's conscious and is suffering. So what a current LLM says about the nature of its consciousness gives us about 0 bit of evidence.

This is another reason to stop making bigger models and spend a lot of time figuring out what we have already created. At some point we may create a conscious LLM, won't be able to tell the difference and it would be a moral catastrophe. 

You mean, "ban superintelligence"? Because superintelligences are not human-like.

The kind of superintelligence that doesn't possess human-likeness that we want it to possess.

That's the problem with your proposal of "ethics module". Let's suppose that we have system of "ethics module" and "nanotech design module". Nanotech design module outputs 3D-model of supramolecular unholy abomination. What exactly should ethics module do to ensure that this abomination doesn't kill everyone?

Nanotech design module has to be evaluatable by the ethics module. For that it also be made from multiple sequential LLM calls in explicit natural language. Other type of modules should be banned.

indeed. But still… if she wonders out loud “what day is it?” at the very moment she says that, it has an answer.

There is no "but". As long as the Beauty is unable to distinguish between Monday and Tuesday awakenings, as long as the decision process which leads her to say the phrase "what day is it" works the same way, from her perspective there is no one "very moment she says that". On Tails, there are two different moments when she says this, and the answer is different for them. So there is no answer for her

An experimenter who overhears her knows the answer. It seems to me that you “resolve” this tension is that the two of them are technically asking a different question, even though they are using the same words

Yes, you are correct. From the position of the experimenter, who knows which day it is, or who is hired to work only on one random day this is a coherent question with an actual answer. The words we use are the same but mathematical formalism is different. 

For an experimenter who knows that it's Monday the probability that today is Monday is simply:

P(Monday|Monday) = 1

For an experimenter who is hired to work only on one random day it is:

P(Monday|Monday xor Tuesday) = 1/2

But still… how surprised should she be if she were to learn that today is Monday? It seems that taking your stance to its conclusion, the answer would be “zero surprise: she knew for sure she would wake up on Monday so no need to be surprised it happened”

And even if she were to learn that the coin landed tails, so she knows that this is just one of a total of two awakenings, she should have zero surprise upon learning the day of the week, since she now knows both awakenings must happen.

Completely correct. Beauty knew that she would be awaken on Monday either way and so she is not surprised. This is a standard thing with non-mutually exclusive events. Consider this:

A coin is tossed and you are put to sleep. On Heads there will be a red ball in your room. On Tails there will be a red and a blue ball in your room. How surprised should you be to find a red ball in your room?

Which seems to violate conservation of expected evidence, except you already said that the there’s no coherent probabilities here for that particular question, so that’s fine too.

The appearance of violation of conservation of expected evidence comes from the belief that awakening on Monday and on Tuesday are mutually exclusive, while they are, in fact sequential.

This makes sense, but I’m not used to it. For instance, I’m used to these questions having the same answer:

  1. P(today is Monday)?
  2. P(today is Monday | the sleep lab gets hit by a tornado)

Yet here, the second question is fine (assuming tornadoes are rare enough that we can ignore the chance of two on consecutive days) while the first makes no sense because we can’t even define “today”

It makes sense but it’s very disorienting, like incompleteness theorem level of disorientation or even more

I completely understand. It is counterintuitive because evolution didn't prepare us to deal with situations where an experience is repeated the same while we receive memory loss. As I write in the post:

If I forget what is the current day of the week in my regular life, well, it's only natural to start from a 1/7 prior per day and work from there. I can do it because the causal process that leads to me forgetting such information can be roughly modeled as a low probability occurrence which can happen to me at any day. 

It wouldn't be the case, if I was guaranteed to also forget the current day of the week on the next 6 days as well, after I forgot it on the first one. This would be a different causal process, with different properties - causation between forgetting - and it has to be modeled differently. But we do not actually encounter such situations in everyday life, and so our intuition is caught completely flat footed by them.

The whole paradox arises from this issue with our intuition, and just like with incompleteness theorem  (thanks for the flattering comparison, btw), what we need to do now is to re-calibrate our intuitions, make it more accustomed to the truth, preserved by the math, instead of trying to fight it.

"here is how to make LLMs more capable but less humanlike, it will be adopted because it makes LLMs more capable". 

Thankfully, this is a class of problems that humanity has an experience dealing with. The solution boils down to regulating all the ways to make LLMs less human-like out of existence.

With every technology there is a way to make it stop working. There are any number of ways to make a plane unable to fly. But the important thing is that we know a way to make a plane fly - therefore humans can fly via a plane.

Likewise, the point that LLM-based-architecture can in principle be safe still stands even if there is a way to make an unsafe LLM-based-architecture. 

And this is a huge point. Previously we were in a state where alignment wasn't even a tractable problem. Where capabilities progressed and alignment stayed in the dirt. Where AI system may understand human values but still not care about them and we didn't know what to do with it.

But now we can just

have an “ethics module” where the underlying LLM produces text which then feeds into other parts of the system to help guide behavior.

Which makes alignment tractable. Alignment can now be reduced to capability of the ethics module. We know that the system will care about our values as it understands them because we can explicitly code it to do this way via an if-else statement. This is an enormous improvement over the previous status quo.

Universal guide to magic via anthropics:

  1. Be not randomly sampled from a set
  2. Assume that you you are randomly sampled from the set anyway
  3. Arrive to an absurd conclusion
  4. Magic!

Either a strong self-sampling assumption is false

Of course it is false. What are the reasons to even suspect that it might be true?

 and-or path-based identity is true.

 

Note that path-dependent identity also has its own paradoxes: two copies can have different ‘weights” depending on how they were created while having the same measure. For example, if in sleep two copies of me will be created and one of the copies will be copied again – when there will be 3 copies in the morning in the same world, but if we calculate chances to be one of them based on paths, they will be ½ and ¼ and ¼.

This actually sounds about right. What's paradoxical here?

I knew that not any string of English words gets a probability, but I was naïve enough to think that all statements that are either true or false get one.

Well, I think this one is actually correct. But, as I said in the previous comment, the statement "Today is Monday" doesn't actually have a coherent truth value throughout the probability experiment. It's not either True or False. It's either True or True and False at the same time!

I was hoping they this sequence of posts which kept saying “don’t worry about anthropics, just be careful with the basics and you’ll get the right answer” would show how to answer all possible variations of these “sleep study” questions… instead it turns out that it answers half the questions (the half that ask about the coin) while the other half is shown to be hopeless… and the reason why it’s hopeless really does seem to have an anthropics flavor to it.

We can answer every coherently formulated question. Everything that is formally defined has an answer Being careful with the basics allows to understand which question is coherent and which is not. This is the same principle as with every probability theory problem. 

Consider Sleeping-Beauty experiment without memory loss. There, the event Monday xor Tuesday also can't be said to always happen. And likewise "Today is Monday" also doesn't have a stable truth value throughout the whole experiment. 

Once again, we can't express Beauty's uncertainty between the two days using probability theory. We are just not paying attention to it because by the conditions of the experiment, the Beauty is never in such state of uncertainty. If she remembers a previous awakening then it's Tuesday, if she doesn't - then it's Monday.

All the pieces of the issue are already present. The addition of memory loss just makes it's obvious that there is the problem with our intuition.

"You should anticipate having both experiences" sounds sort of paradoxical or magical, but I think this stems from a verbal confusion.

You can easily clear this confusion if you rephrase it as "You should anticipate having any of these experiences". Then it's immediately clear that we are talking about two separate screens. And it's also clear that our curriocity isn't actually satisfied. That the question "which one of these two will actually be the case" is still very much on the table.

Rob-y feels exactly as though he was just Rob-x, and Rob-z also feels exactly as though he was just Rob-x

Yes, this is obvious. Still as soon as we got Rob-y and Rob-z they are not "metaphysically the same person". When Rob-y says "I" he is reffering to Rob-y, not Rob-z and vice versa. More specifically Rob-y is refering to some causal curve through time ans Rob-z is refering to another causal curve through time. These two curves are the same to some point, but then they are not. 

In case the bet is offered on every awakening: do you mean if she gives conflicting answers on Monday and Tuesday that the bet nevertheless is regarded as accepted?

Yes I do. 

Of course, if the experiment is run as stated she wouldn't be able to give conflicting answers, so the point is moot. But having a strict algorithm for resolving such theoretical cases is a good thing anyway.

My initial idea was, that if for example only her Monday answer counts and Beauty knows that, she could reason that when her answer counts it is Monday, arriving at the conclusion that it is reasonable to act as if it was Monday on every awakening, thus grounding her answer on P(H/Monday)=1/2. Same logic holds for rule „last awakening counts“ and „random awakening counts“.

Yes, I got it. As a matter of fact this is unlawful. Probability estimate is about the evidence you receive not about what "counts" for a betting scheme. If the Beauty receives the same evidence when her awakening counts and when it doesn't count she can't update her probability estimate. If in order to arrive to the correct answer she needs to behave as if every day is Monday it means that there is something wrong with her model.

Thankfully for thirdism, she does not have to do it. She can just assign zero utility to Tuesday awakening and get the correct betting odds.

Anyway, all this is quite tangental to the question of utility instability. Which is about the Beauty making a bet on Sunday and then reflecting on it during the experiment, even if no bets are proposed. According to thirdism probability of the coin being Heads changes on awakening, so, in order for Beauty not to regret about making an optimal bet on Sunday, her utility has to change as well. Therefore utility instability.

There are indeed ways to obfuscate the utility instability under thirdism by different betting schemes where it's less obvious, as the probability relevant to betting isn't P(Heads|Awake) = 1/3 but one of thoses you meantion which equal 1/2.

The way to define the scheme specifically for P(Heads|Awake), is this: you get asked to bet on every awakening. One agreement is sufficient, and only one agreement counts. No random selecting takes place.

This way the Beauty doesn't get any extra evidence when she is asked to bet, therefore she can't update her credence for the coin being Heads based on the sole fact of being asked to bet, the way you propose.

Load More