JBlack - LessWrong

amongst random length-k (k>2) sequences of independent coin tosses with at least one heads before toss k, the expected proportion of (heads after heads)/(tosses after heads) is less than 1/2.

Does this need to be k>3? Checking this for k=3 yields 6 sequences in which there is at least one head before toss 3. In these sequences there are 4 heads-after-heads out of 8 tosses-after-heads, which is exactly 1/2.

Edit: Ah, I see this is more like a game score than a proportion. Two "scores" of 1 and one "score" of 1/2 out of the 6 equally likely conditional sequences.

Thomas Kwa's Shortform

JBlack4d90

Which particular p(doom) are you talking about? I have a few that would be greater than 50%, depending upon exactly what you mean by "doom", what constitutes "doom due to AI", and over what time spans.

Most of my doom probability mass is in the transition to superintelligence, and I expect to see plenty of things that appear promising for near AGI, but won't be successful for strong ASI.

About the only near-future significantly doom-reducing update that seems plausible would be if it turns out that a model FOOMs into strong superintelligence and turns out to be very anti-doomy and both willing and able to protect us from more doomy AI. Even then I'd wonder about the longer term, but it would at least be serious evidence against "ASI capability entails doom".

Seed's Shortform

JBlack5d4-1

Grabby aliens doesn't even work as an explanation for what it purports to explain. The enormous majority of conscious beings in such a universe-model are members of grabby species who have expanded to fill huge volumes and have a history of interstellar capability going back hundreds of billions of years or more.

If this universe model is correct, why is this not what we observe?

Benaya Koren's Shortform

JBlack11d52

They probably would. One trouble is that there are typically substantial economic losses (both extra expenditures and risks to income) involved in moving house involuntarily, on top of losses in aspects that aren't typically tracked economically.

On “first critical tries” in AI alignment

JBlack12d21

The degree to which 6-short is additionally worrying (once we’ve taken into account (1) and (3)) depends on the probability that the relevant agents will all choose to seek power in problematic ways within the relevant short period of time, without coordinating. If the “short period” is “the exact same moment,” the relevant sort of correlation seems unlikely.

Is this really true? It seems likely that some external event (which could be practically anything) plausibly could alert a sufficient subset of agents to all start trying to seek power as soon as they notice that event, and not before.

Pi Rogers's Shortform

JBlack15d2-2

The second type of preference seems to apply to anticipated perceptions of the world by the agent - such as the anticipated perception of eating ice cream in a waffle cone. It doesn't have to be so immediately direct, since it could also apply to instrumental goals such as doing something unpleasant now for expected improved experiences later.

The first seems to be a more like a "principle" than a preference, in that the agent is judging outcomes on the principle of whether needless suffering exists in it, regardless of whether that suffering has any effect on the agent at all.

To distinguish them, we could imagine a thought experiment in which such a person could choose to accept or deny some ongoing benefit for themselves that causes needless suffering on some distant world, and they will have their memory of the decision and any psychological consequences of it immediately negated regardless of which they chose.

Pi Rogers's Shortform

JBlack15d20

It's even worse than that. Maybe I would be happier with my ice cream in a waffle cone the next time I have ice cream, but actually this is just a specific expression of being happier eating a variety of tasty things over time and it's just that I haven't had ice cream in a waffle cone for a while. The time after that, I will likely "prefer" something else despite my underlying preferences not having changed. Or something even more complex and interrelated with various parts of history and internal state.

It may be better to distinguish instances of "preferences" that are specific to a given internal state and history, and an agent's general mapping over all internal states and histories.

Is CDT with precommitment enough?

JBlack22d20

Yes, such an agent will self-modify if it is presented with a Newcombe game before Omega determines how much money to put into boxes. It will even self-modify if there is a 1-in-1000 credence that Omega has not yet done so (or might change their mind).

At this point considerations come in such as what will happen if such an agent expects that they will face Newcombe-like games in the future but aren't yet certain what form they will take or what the exact payoffs will be. Should they self-modify to something UDT-like now?

Nathan Young's Shortform

JBlack23d20

Is it a necessary non-epistemic truth? After all, it has a very lengthy partial proof in Principia Mathematica, and maybe they got something wrong. Perhaps you should check?

But then maybe you're not using a formal system to prove it, but just taking it as an axiom or maybe as a definition of what "2" means using other symbols with pre-existing meanings. But then if I define the term "blerg" to mean "a breakfast product with non-obvious composition", is that definition in itself a necessary truth?

Obviously if you mean "if you take one object and then take another object, you now have two objects" then that's a contingent proposition that requires evidence. It probably depends upon what sorts of things you mean by "objects" too, so we can rule that one out.

Or maybe "necessary non-epistemic truth" means a proposition that you can "grok in fullness" and just directly see that it is true as a single mental operation? Though, isn't that subjective and also epistemic? Don't you have to check to be sure that it is one? Was it a necessary non-epistemic truth for you when you were young enough to have trouble with the concept of counting?

So in the end I'm not really sure exactly what you mean by a necessary truth that doesn't need any checking. Maybe it's not even a coherent concept.

Current LLMs are conscious and are AGI.

JBlack23d41

We don't know how consciousness arises, in terms of what sort of things have subjective experience. Your assertion is one reasonable hypothesis, but you don't support it or comment on any of the other possible hypotheses.
I don't think many people use "better than every human in every way" as a definition for the term "AGI". However, LLMs are fairly clearly not yet AGI even for less extreme meanings of the term, such as "at least as capable for almost all cognitive tasks as an average human". It is pretty clear that current LLMs are still quite a lot less capable in many important ways than fairly average humans, despite being as capable and even more capable in other important ways.
They do meet a very loose definition of AGI such as "comparable or better in most ways to the mental capabilities of a significant fraction of human population", so saying that they are AGI is at least somewhat justifiable.
LLMs emit text consistent with the training corpus and tuning processes. If that means using a first person pronoun "I am an ..." instead of a third-person description such as "This text is produced by an ...", then that doesn't say anything about whether the LLM is conscious or not. Even a 1-line program can print "I am a computer program but not a conscious being", and have that be a true statement to the extent that the pronoun "I" can be taken to mean "whatever entity produced the sentence" and not "a conscious being that produced the sentence".

To be clear, I am not saying that LLMs are not conscious, merely that we don't know. What we do know is that they are optimized to produce outputs that match those from entities that we generally believe to be conscious. Using those outputs as evidence to justify a hypothesis of consciousness is begging the question to a much greater degree than looking at outputs of systems that were not so directly optimized.

LESSWRONG
LW

Posts

Wiki Contributions

Comments