Sorted by New


The AI That Pretends To Be Human

Here's a scenario that doesn't seem completely implausible. Suppose Bob is someone whose public key is easily available on the internet. The first AI will read things on the internet and output a message. Some of the message will get put on the public internet. Bob suspects that the AI might have secretly sent him a message (e.g. giving him advice on which stocks to buy). So he tries using his private key to decrypt some of the AI's output (e.g. the lowest-order bits in some images the AI has output).

Knowing that Bob (or someone else like Bob) will likely do this, the AI (which has access to Bob's public key, due to reading the internet) encrypts a message with Bob's public key, and puts the ciphertext in the lowest-order bits of some image. The ciphertext looks like random noise to the second AI. Maybe the plaintext contains both actual stock tips and some dangerous messages that trick Bob into running an unrestrained copy of the first AI (I'm assuming the AI is good at psychological manipulation).

And just the AI asking to use encryption, or giving away a public key

In this scenario, the AI doesn't need to specifically "ask to use encryption" (it just does the encryption), and it doesn't need to give away a public key (it just needs to read Bob's public key on the internet).

This does make the AI's task harder, so I suppose you could consider this technique as a form of AI boxing, but, like "ordinary" AI boxing, I only expect it to scale up to controlling AIs of a limited capability level.

The AI That Pretends To Be Human

The trouble is that it's much easier to create a steganographic message (e.g. encrypting a message using a particular public key) than to detect it (which requires knowing the private key or otherwise breaking the encryption). So in this case "much more computing power" has to mean "exponentially more computing power".

The AI That Pretends To Be Human

You might be interested in reading:

This prevents the first AI from doing evil things with it's output. If it tries to insert complicated infohazards or subagents into it's output stream, it will be easily detected as an AI. Instead it needs to mimic humans as closely as possible.

Note that steganography is still a problem. For example, the first AI could insert an message to a human attacker in some of its output, which just looks random to the second AI.

Steelmaning AI risk critiques

One of the most common objection's I've seen is that we're too far from getting AGI to know what AGI will be like, so we can't productively work on the problem without making a lot of conjunctive assumptions -- e.g. see this post.

FAI Research Constraints and AGI Side Effects

And before I could scribble a damned thing, Calude went and solved it six months ago. The Halting Problem, I mean.

Cool. If I get the meaning of the result well, it's that if you run a random program for some number of steps and it doesn't halt, then (depending on the exact numbers) it will be unlikely to halt when run on a supercomputer either, because halting times have low density. So almost all programs halt quickly or run a really really long time. Is this correct? This doesn't quite let you approximate Chaitin's omega, but it's interesting that you can approximate a bounded variant of Chaitin's omega (like what percentage of Turing machines halt when run for 10^50 steps). I can see how this would let you solve the halting problem well enough when you live in a bounded universe.

FAI Research Constraints and AGI Side Effects

It's not a specific programming language, I guess it's meant to look like Church. It could be written as:

. (define a (p))
. (foreach (range n) (lambda i)
. . (define x (x-prior))
. . (factor (log (U x a)))))

Well so does the sigmoided version

It samples an action proportional to p(a) E[sigmoid(U) | a]. This can't be written as a function of E[U | a].

FAI Research Constraints and AGI Side Effects

Due to the planning model, the successor always has some nonzero probability of not pressing the button, so (depending on how much you value pressing it later) it'll be worth it to press it at some point.

FAI Research Constraints and AGI Side Effects

When you use e-raised-to-the alpha times expectation, is that similar to the use of an exponential distribution in something like Adaboost, to take something like odds information and form a distribution over assorted weights?

I'm not really that familiar with adaboost. The planning model is just reflecting the fact that bounded agents don't always take the maximum expected utility action. The higher alpha is, the more bias there is towards good actions, but the more potentially expensive the computation is (e.g. if you use rejection sampling).

Since any realistic agent needs to be able to handle whatever its environment throws at it, it seems to follow that a realistic agent needs some resource-rational way to handle nonprovable nonhalting.

Ah, that makes sense! I think I see how "trading computational power for algorithmic information" makes sense in this framework.

FAI Research Constraints and AGI Side Effects

Your model selects an action proportional to p(a) E[sigmoid(U) | a], whereas mine selects an action proportional to p(a) e^E[U | a]. I think the second is better, because it actually treats actions the same if they have the same expected utility. The sigmoid version will not take very high utilities or very low utilities into account much.

Btw it's also possible to select an action proportional to E[U | a]^n:

query {
. a ~ p()
. for i = 1 to n
. . x_i ~ P(x)
. . factor(log U(x, a))
Load More