Wiki Contributions


I have no special insight here but boring, cynical common sense suggests the following:

The big difference between now and the pre-ChatGPT era is that Google and a bunch of other massive competitors have woken up and want to blaze past OpenAI. For their part, OpenAI doesn't want there to be a perception that they have been overtaken, so will want to release on a fast enough schedule to be able to trump Google's latest and greatest. (Of course the arrival of something marketed as "GPT-5" tells us nothing about the true state of progress. The GPTs aren't natural kinds.)

You should be able to get it as a corollary of the lemma that given two disjoint convex subsets U and V of R^n (which are non-zero distance apart), there exists an affine function f on R^n such that f(u) > 0 for all u in V and f(v) < 0 for all v in V.

Our two convex sets being (1) the image of the simplex under the F_i : i = 1 ... n and (2) the "negative quadrant" of R^n (i.e. the set of points all of whose co-ordinates are non-positive.)

an authoritative payoff matrix that X can't safely calculate xerself.

Why not? Can't the payoff matrix be "read off" from the "world program" (assuming X isn't just 'given' the payoff matrix as an argument.)

  1. Actually, this is an open problem so far as I know: show that if X is a Naive Decision Theory agent as above, with some analyzable inference module like a halting oracle, then there exists an agent Y written so that X cooperates against Y in a Prisoner's Dilemma while Y defects.

Let me just spell out to myself what would have to happen in this instance. For definiteness, let's take the payoffs in prisoner's dilemma to be $0 (CD), $1 (DD), $10 (CC) and $11 (DC).

Now, if X is going to co-operate and Y is going to defect then X is going to prove "If I co-operate then I get $0". Therefore, in order to co-operate, X must also prove the spurious counterfactual "If I defect then I get $x" for some negative value of x.

But suppose I tweak the definition of the NDT agent so that whenever it can prove (1) "if output = a then utility >= u" and (2) "if output != a then utility <= u" it will immediately output a. (And if several statements of the forms (1) and (2) have been proved then the agent searches for them in the order that they were proved) Note that our agent will quickly prove "if output = 'defect' then utility >= $1". So if it ever managed to prove "if output = 'co-operate' then utility = $0" it would defect right away.

Since I have tweaked the definition, this doesn't address your 'open problem' (which I think is a very interesting one) but it does show that if we replace the NDT agent with something only slightly less naive, then the answer is that no such Y exists.

(We could replace Prisoner's Dilemma with an alternative game where each player has a third option called "nuclear holocaust", such that if either player opts for nuclear holocaust then both get (say) -$1, and ask the same question as in your note 2. Then even for the tweaked version of X it's not clear that no such Y exists.)

ETA: I'm afraid my idea doesn't work: The problem is that the agent will also quickly prove "if 'co-operate' then I receive at least $0." So if it can prove the spurious counterfactual "if 'defect' then receive -1" before proving the 'real' counterfactual "if 'co-operate' then receive 0" then it will co-operate.

We could patch this up with a rule that said "if we deduce a contradiction from the assumption 'output = a' then immediately output a" which, if I remember rightly, is Nesov's idea about "playing chicken with the inconsistency". Then on deducing the spurious counterfactual "if 'defect' then receive -1" the agent would immediately defect, which could only happen if the agent itself were inconsistent. So if the agent is consistent, it will never deduce this spurious counterfactual. But of course, this is getting even further away from the original "NDT".


[general comment on sequence, not this specific post.]

You have such a strong intuition that no configuration of classical point particles and forces can ever amount to conscious awareness, yet you don't immediately generalize and say: 'no universe capable of exhaustive description by mathematically precise laws can ever contain conscious awareness'. Why not? Surely whatever weird and wonderful elaboration of quantum theory you dream up, someone can ask the same old question: "why does this bit that you've conveniently labelled 'consciousness' actually have consciousness?"

So you want to identify 'consciousness' with something ontologically basic and unified, with well-defined properties (or else, to you, it doesn't really exist at all). Yet these very things would convince me that you can't possibly have found consciousness given that, in reality, it has ragged, ill-defined edges in time, space, even introspective content.

Stepping back a little, it strikes me that the whole concept of subjective experience has been carefully refined so that it can't possibly be tracked down to anything 'out there' in the world. Kant and Wittgenstein (among others) saw this very clearly. There are many possible conclusions one might draw - Dennett despairs of philosophy and refuses to acknowledge 'subjective experience' at all - but I think people like Chalmers, Penrose and yourself are on a hopeless quest.

The comprehension axiom schema (or any other construction that can be used by a proof checker algorithm) isn't enough to prove all the statements people consider to be inescapable consequences of second-order logic.

Indeed, since the second-order theory of the real numbers is categorical, and since it can express the continuum hypothesis, an oracle for second-order validity would tell us either that CH or ¬CH is 'valid'.

("Set theory in sheep's clothing".)

But the bigger problem is that we can't say exactly what makes a "silly" counterfactual different from a "serious" one.

Would it be naive to hope for a criterion that roughly says: "A conditional P ⇒ Q is silly iff the 'most economical' way of proving it is to deduce it from ¬P or else from Q." Something like: "there exists a proof of ¬P or of Q which is strictly shorter than the shortest proof of P ⇒ Q"?

A totally different approach starts with the fact that your 'lemma 1' could be proved without knowing anything about A. Perhaps this could be deemed a sufficient condition for a counterfactual to be serious. But I guess it's not a necessary condition?

Suppose we had a model M that we thought described cannons and cannon balls. M consists of a set of mathematical assertions about cannons

In logic, the technical terms 'theory' and 'model' have rather precise meanings. If M is a collection of mathematical assertions then it's a theory rather than a model.

formally independent of the mathematical system A in the sense that the addition of some axiom A0 implies Q, while the addition of its negation, ~A0, implies ~Q.

Here you need to specify that adding A0 or ~A0 doesn't make the theory inconsistent, which is equivalent to just saying: "Neither Q nor ~Q can be deduced from A."

Note: if by M you had actually meant a model, in the sense of model theory, then for every well-formed sentence s, either M satisfies s or M satisfies ~s. But then models are abstract mathematical objects (like 'the integers'), and there's usually no way to know which sentences a model satisfies.

Perhaps a slightly simpler way would be to 'run all algorithms simultaneously' such that each one is slowed down by a constant factor. (E.g. at time t = (2x + 1) * 2^n, we do step x of algorithm n.) When algorithms terminate, we check (still within the same "process" and hence slowed down by a factor of 2^n) whether a solution to the problem has been generated. If so, we return it and halt.

ETA: Ah, but the business of 'switching processes' is going to need more than constant time. So I guess it's not immediately clear that this works.

I agree that definitions (and expansions of the language) can be useful or counterproductive, and hence are not immune from criticism. But still, I don't think it makes sense to play the Bayesian game here and attach probabilities to different definitions/languages being correct. (Rather like how one can't apply Bayesian reasoning in order to decide between 'theory 1' and 'theory 2' in my branching vs probability post.) Therefore, I don't think it makes sense to calculate expected utilities by taking a weighted average over each of the possible stances one can take in the mind-body problem.

Load More