azergante - LessWrong

Is the "cure cancer goal ends up as a nuke humanity action" hypothesis valid and backed by evidence?

My understanding is that the meaning of the "cure cancer" sentence can be represented as a point in a high-dimensional meaning space, which I expect to be pretty far from the "nuke humanity" point.

For example "cure cancer" would be highly associated with saving lots of lives and positive sentiments, while "nuke humanity" would have the exact opposite associations, positioning it far away from "cure cancer".

A good design might specify that if the two goals are sufficiently far away they are not interchangeable. This could be modeled in the AI as an exponential decrease of the reward based on the distance between the meaning of the goal and the meaning of the action.

Does this make any sense? (I have a feeling I might be mixing concepts coming from different types of AI)

Don't Believe You'll Self-Deceive

azergante2mo10

If you know your belief isn't correlated to reality, how can you still believe it?

Interestingly, physics models (map) are wrong (inaccurate) and people know that but still use them all the time because they are good enough with respect to some goal.

Less accurate models can even be favored over more accurate ones to save on computing power or reduce complexity.

As long as the benefits outweigh the drawbacks, the correlation to reality is irrelevant.

Not sure how cleanly this maps to beliefs since one would have to be able to go from one belief to another, however it might be possible by successively activating different parts of the brain that hold different beliefs, in a way similar to someone very angry that completely switches gears to answer an important phone call.

Truly Part Of You

azergante2mo10

@Eliezer, some interesting points in the article, I will criticize what frustrated me:

> If you see a beaver chewing a log, then you know what this thing-that-chews-through-logs looks like,
> and you will be able to recognize it on future occasions whether it is called a “beaver” or not.
> But if you acquire your beliefs about beavers by someone else telling you facts about “beavers,”
> you may not be able to recognize a beaver when you see one.

Things do not have intrinsic meaning, rather meaning is an emergent property of
things in relation to each other: for a brain, an image of a beaver and the sound
"beaver" are just meaningless patterns of electrical signals.

Through experiencing reality the brain learns to associate patterns based on similarity, co-occurence and so on, and labels these clusters with handles in order to communicate. ’Meaning’ is the entire cluster itself, which itself bears meaning in relation to other clusters.

If you try to single out a node off the cluster, you soon find that it loses all meaning and
reverts back to meaningless noise.

> G1071(G1072, G1073)

Maybe the above does not seem dumb now? experiencing reality is basically entering and updating relationships that eventually make sense as a whole in a system.

I feel there is a huge difference in our models of reality:

In my model everything is self-referential, just one big graph where nodes barely exist (only aliases for the whole graph itself). There is no ground to knowledge, nothing ultimate. The only thing we have
is this self-referential map, from which we infer a non-phenomenological territory.

You seem to think the territory contains beavers, I claim beavers exist only in the map, as a block arbitrarily carved out of our phenomenological experience by our brain, as if it were the only way to carve a concept out of experience and not one of infinitely many valid ways (e.g. considering the beaver and the air around and not have a concept for just a beaver with no air), and as if only part experience could be considered without being impacted by the whole of experience (i.e. there is no living beaver without air).

This view is very influenced by emptiness by the way.

Burdensome Details

azergante3mo20

The examples seem to assume that "and" and "or" as used in natural language work the same way as their logical counterpart. I think this is not the case and that it could bias the experiment’s results.

As a trivial example the question "Do you want to go to the beach or to the city?" is not just a yes or no question, as boolean logic would have it.

Not everyone learns about boolean logic, and those who do likely learn it long after learning how to talk, so it’s likely that natural language propositions that look somewhat logical are not interpreted as just logic problems.

I think that this is at play in the example about Russia. Say you are on holidays and presented with one these 2 statements:

1. "Going to the beach then to the city"

2. "Going to the city"

The second statement obviously means you are going only to the city, and not to the beach nor anywhere else before.

Now back to Russia:

1. "Russia invades Poland, followed by suspension of diplomatic relations between the USA and the USSR”

2. “Suspension of diplomatic relations between the USA and the USSR”

Taken together, the 2nd proposition strongly implies that Russia did not invade Poland: after all if Russia did invade Poland no one would have written the 2nd proposition because it would be the same as the 1st one.

And it also implies that there is no reason at all for suspending relations: the statements look like they were made by an objective know-it-all, a reason is given in the 1st statement, so in that context it is reasonable to assume that if there was a reason for the 2nd statement it would also be given, and the absence of further info means there is no reason.

Even if seeing only the 2nd proposition and not the 1st, it seems to me that humans have a need to attribute specific causes to effects (which might be a cognitive bias), and seeing no explanation for the event, it is natural to think "surely, there must be SOME reason, how likely is it that Russia suspends diplomatic relations for no reason?", but confronted to the fact that no reason is given, the probability of the event is lowered.

It seems that the proposition is not evaluated as pure boolean logic, but perhaps parsed taking into account the broader social context, historical context and so on, which arguably makes more sense in real life.

LESSWRONG
LW

Posts

Wiki Contributions

Comments