JBlack - LessWrong

Markets usually take some time to resolve, and money has a time value. Paying only $10 seems incredibly cheap for tying up a million dollars for even one day, and cheaper still when you consider any of the possible risks of putting $1M into a market that claims to resolve N/A with 99.9% chance.

If Moral Realism is true, then the Orthogonality Thesis is false.

JBlack7d168

Granting for the sake of argument that a superintelligence can determine that the universe in which it exists has an associated objectively "true" moral system in some sense, what requires it to conform to that system?

In short "I believe that there exists an objectively privileged system in which X is wrong" is far from equivalent to "I believe that X is wrong". Even the latter is far from equivalent to "I will not do X".

There are multiple additional weaknesses in the argument, some of which are addressed by other commenters.

The Bellman equation does not apply to bounded rationality

JBlack7d51

I'm somewhat confused by the example at the end. SHA-3 (and other hash functions) are not injective, so it's not at all clear what "the preimage" means here. The example appears to choose a preimage of length 512 bits (matching the output), but SHA-3 is not designed to be injective on that domain either. It almost certainly is not, and therefore also not surjective so that some bitstrings have no such preimage.

It would be possible to restrict the chosen preimages further, e.g. to the lexicographically first preimage, but then just knowing that SHA3(x) = y does not guarantee that x is the preimage as defined.

Maybe a function that is guaranteed to be bijective yet difficult to invert would work better here?

No77e's Shortform

JBlack14d20

Absolutely agreed. Wider public social norms are heavily against even mentioning any sort of major disruption due to AI in the near future (unless limited to specific jobs or copyright), and most people don't even understand how to think about conditional predictions. Combining the two is just the sort of thing strange people like us do.

Karl Krueger's Shortform

JBlack14d20

Yes, it would be difficult to hold belief (3) and also believe that p-zombies are possible. By (3) all truthful human statements about self-OC are causally downstream from self-OC and so the premises that go into the concept of p-zombie humans are invalid.

It's still possible to imagine beings that appear and behave exactly like humans even under microscopic examination but aren't actually human and don't quite function the same way internally in some way we can't yet discern. This wouldn't violate (3), but would be a different concept from p-zombies which do function identically at every level of detail.

~~I expect that (3) is true~~, but don't think it's logically necessary that it be true. I think it's more likely a contingent truth of humans. I can only have experience of one human consciousness, but it would be weird if some were conscious and some weren't without any objectively distinguishable differences that would explain the distinction.

Edit: On reflection, I don't think (3) is true. It seems a reasonable possibility that causality is the wrong way to describe the relationship between OC and reports on OC, possibly in a way similar to saying that a calculator displaying "4" after entering "2+2" is causally downstream of mathematical axioms. They're perhaps different types of things and causality is an inapplicable concept between them.

Karl Krueger's Shortform

JBlack16d30

I disagree with (4) in that many sentences concerning nonexistent referents will be vacuously true rather than false. For those that are false, their manner of being false will be different from any of your example sentences.

I also think that for all behavioural purposes, statements involving OC can be transformed into statements not involving OC with the same externally verifiable content. That means that I also disagree with (8) and therefore (9): Zombies can honestly promise things about their 'intentions' as cashed out in future behaviour, and can coordinate.

For (14), some people can in fact see ultraviolet light to an extent. However it apparently doesn't look a great deal different from violet, presumably because the same visual pathways are used with similar activations in these cases.

What Caused the Fertility Collapse?

JBlack18d30

There is no link to the actual essay in this LW post, and Google does not appear to have indexed it: an exact phrase search for some key phrases return zero results. Can you provide the link?

How could I tell someone that consciousness is not the primary concern of AI Safety?

Answer by JBlackJun 14, 202520

Most of the book was written in 2020 or earlier, which makes it ancient in terms of technical advances and social recognition of AI concerns. I would say that the paragraph is correct as of the date of writing, where it talks about the non-technical articles generally circulated in the media at the time.

For example, not even GPT-2 is mentioned until page 422 of the book, possibly written later than these background chapters. The "success stories" for deep learning on the previous page refer mostly to Siri, Alexa, progress in ImageNet benchmarks, and AlphaGo. They refer to self-driving cars with full autonomy as being "not yet within reach". Anything written in the present tense should be interpreted as being back when GPT-2 was new.

Their statements are less true now, and it is possible that the authors would no longer endorse those paragraphs as being true of the current day if brought back to their attention. By now I would expect them to be aware of the recent AI safety literature including technical publications assessing safety of current AI systems in ways that present counterexamples to multiple statements in the second paragraph without any reference to sentience.

eggsyntax's Shortform

JBlack20d20

I don't see how it is necessarily not introspective. All the definition requires is that the report be causally downstream of the prior internal state, and accurate. Whether or not the internal state is "gone forever" (what you mean by this is unclear), if something depends upon what it was and that something affects the self-report, then that satisfies the causality requirement.

In the case of humans the internal state is also "gone forever", but it leaves causally-dependent traces in memory, future thoughts, and the external world. When humans self-report on internal states, their accuracy varies rather widely.

In the case of LLMs the internal state is not even necessarily "gone forever"! In principle every internal state is perfectly determined by the sequence of tokens presented to the LLM up to that point, and so reproduced at any later time. Generally a given LLM won't be able to perfectly recall what its own internal state was in such a manner, but there's no reason why a hypothetical LLM with a good self-model shouldn't be able to accurately report on many aspects of it.

pramodbiligiri's Shortform

JBlack1mo20

Absent further explanation, that looks like a misapplication of the Bayes formula. The event "idea is applicable" is very definitely not the same event as "idea is applied". An idea can be applicable without being applied, and can be applied when it's not actually applicable (as I believe you have done here). Furthermore there is a tense difference which diverges the events even further.

So to answer your question: no this is not "well-known", for the reason that it is incorrect.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments