Sorted by New

Wiki Contributions


[Responding to an old comment, I know, but I've only just found this discussion.]

Never mind special access protocols, you could make code unmodifiable (in a direct sense) by putting it in ROM. Of course, it could still be modified indirectly, by the AI persuading a human to change the ROM. Even setting aside that possibility, there's a more fundamental problem. You cannot guarantee that the code will have the expected effect when executed in the unpredictable context of an AGI. You cannot even guarantee that the code in question will be executed. Making the code unmodifiable won't achieve the desired effect if the AI bypasses it.

In any case, I think the whole discussion of an AI modifying its own code is rendered moot by the fuzziness of the distinction between code and data. Does the human brain have any code? Or are the contents just data? I think that question is too fuzzy to have a correct answer. An AGI's behaviour is likely to be greatly influenced by structures that develop over time, whether we call these code or data. And old structures need not necessarily be used.

AGIs are likely to be unpredictable in ways that are very difficult to control. Holden Karnofsky's attempted solution seems naive to me. There's no guarantee that programming an AGI his way will prevent agent-like behaviour. Human beings don't need an explicit utility function to be agents, and neither does an AGI. That said, if AGI designers do their best to avoid agent-like behaviour, it may reduce the risks.

P.S. Bayes Theorem is derived from a basic statement about conditional probability, such as the following:

P(S/T) = P(S&T)/P(T)

According to the SEP ( this is usually taken as a "definition", not an axiom, and Bayesians usually give conditional probability some real-world significance by adding a Principle of Conditionalization. In that case it's the Principle of Conditionalization that requires justification in order to establish that Bayes Theorem is true in the sense that Bayesians require.

The inferential method that solves the problems with frequentism — and, more importantly, follows deductively from the axioms of probability theory — is Bayesian inference.

You seem to be conflating Bayesian inference with Bayes Theorem. Bayesian inference is a method, not a proposition, so cannot be the conclusion of a deductive argument. Perhaps the conclusion you have in mind is something like "We should use Bayesian inference for..." or "Bayesian inference is the best method for...". But such propositions cannot follow from mathematical axioms alone.

Moreover, the fact that Bayes Theorem follows from certain axioms of probability doesn't automatically show that it's true. Axiomatic systems have no relevance to the real world unless we have established (whether explicitly or implicitly) some mapping of the language of that system onto the real world. Unless we've done that, the word "probability" as used in Bayes Theorem is just a symbol without relevance to the world, and to say that Bayes Theorem is "true" is merely to say that it is a valid statement in the language of that axiomatic system.

In practice, we are liable to take the word "probability" (as used in the mathematical axioms of probability) as having the same meaning as "probability" (as we previously used that word). That meaning has some relevance to the real world. But if we do that, we cannot simply take the axioms (and consequently Bayes Theorem) as automatically true. We must consider whether they are true given our meaning of the word "probability". But "probability" is a notoriously tricky word, with multiple "interpretations" (i.e. meanings). We may have good reason to think that the axioms of probability (and hence Bayes Theorem) are true for one meaning of "probability" (e.g. frequentist). But it doesn't automatically follow that they are also true for other meanings of "probability" (e.g. Bayesian).

I'm not denying that Bayesian inference is a valuable method, or that it has some sort of justification. But justifying it is not nearly so straightforward as your comment suggests, Luke.

[Re-post with correction]

Hi Luke,

I've questioned your metaethical views before (in your "desirist" days) and I think you're making similar mistakes now as then. But rather than rehash old criticisms I'd like to make a different point.

Since you claim to be taking a scientific or naturalized approach to philosophy I would expect you to offer evidence in support of your position. Yet I see nothing here specifically identified as evidence, and very little that could be construed as evidence. I don't see how your approach here is significantly different from the intuition-based philosophical approaches that you've criticised elsewhere.

Some people who say "Stealing is wrong" are really just trying to express emotions: "Stealing? Yuck!" Others use moral judgments like "Stealing is wrong" to express commands: "Don't steal!" Still others use moral judgments like "Stealing is wrong" to assert factual claims, such as "stealing is against the will of God" or "stealing is a practice that usually adds pain rather than pleasure to the world."

How do you know this? Where's the evidence? I don't doubt that some people say, "Stealing is wrong because it's against the will of God". But where's the evidence that they use "Stealing is wrong" to mean "Stealing is against the will of God"?

But moral terms and value terms are about what we want.

How do you know? And this seems to contradict your claim above that some people use "Stealing is wrong" to mean "stealing is against the will of God". That's not about what we want. (I say that moral terms are primarily about obligations, not wants.)