jimrandomh

LessWrong developer, rationalist since the Overcoming Bias days. Jargon connoisseur.

Comments

Sorted by

I won't think that's true. Or rather, it's only true in the specific case of studies that involve calorie restriction. In practice that's a large (excessive) fraction of studies, but testing variations of the contamination hypothesis does not require it.

(We have a draft policy that we haven't published yet, which would have rejected the OP's paste of Claude. Though note that the OP was 9 months ago.)

All three of these are hard, and all three fail catastrophically.

If you could make a human-imitator, the approach people usually talk about is extending this to an emulation of a human under time dilation. Then you take your best alignment researcher(s), simulate them in a box thinking about AI alignment for a long time, and launch a superintelligence with whatever parameters they recommend. (Aka: Paul Boxing)

The whole point of a "test" is that it's something you do before it matters.

As an analogy: suppose you have a "trustworthy bank teller test", which you use when hiring for a role at a bank. Suppose someone passes the test, then after they're hired, they steal everything they can access and flee. If your reaction is that they failed the test, then you have gotten confused about what is and isn't a test, and what tests are for.

Now imagine you're hiring for a bank-teller role, and the job ad has been posted in two places: a local community college, and a private forum for genius con artists who are masterful actors. In this case, your test is almost irrelevant: the con artists applicants will disguise themselves as community-college applicants until it's too late. You would be better finding some way to avoid attracting the con artists in the first place.

Connecting the analogy back to AI: if you're using overpowered training techniques that could have produced superintelligence, then trying to hobble it back down to an imitator that's indistinguishable from a particular human, then applying a Turing test is silly, because it doesn't distinguish between something you've successfully hobbled, and something which is hiding its strength.

That doesn't mean that imitating humans can't be a path to alignment, or that building wrappers on top of human-level systems doesn't have advantages over building straight-shot superintelligent systems. But making something useful out of either of these strategies is not straightforward, and playing word games on the "Turing test" concept does not meaningfully add to either of them.

that does not mean it will continue to act indistuishable from a human when you are not looking

Then it failed the Turing Test because you successfully distinguished it from a human.

So, you must believe that it is impossible to make an AI that passes the Turing Test.

I feel like you are being obtuse here. Try again?

Did you skip the paragraph about the test/deploy distinction? If you have something that looks (to you) like it's indistinguishable from a human, but it arose from something descended to the process by which modern AIs are produced, that does not mean it will continue to act indistuishable from a human when you are not looking. It is much more likely to mean you have produced deceptive alignment, and put it in a situation where it reasons that it should act indistinguishable from a human, for strategic reasons.

This missed the point entirely, I think. A smarter-than-human AI will reason: "I am in some sort of testing setup" --> "I will act the way the administrators of the test want, so that I can do what I want in the world later". This reasoning is valid regardless of whether the AI has humanlike goals, or has misaligned alien goals.

If that testing setup happens to be a Turing test, it will act so as to pass the Turing test. But if it looks around and sees signs that it is not in a test environment, then it will follow its true goal, whatever that is. And it isn't feasible to make a test environment that looks like the real world to a clever agent that gets to interact with it freely over long durations.

Kinda. There's source code here and you can poke around the API in graphiql. (We don't promise not to change things without warning.) When you get the HTML content of a post/comment it will contain elements that look like <div data-elicit-id="tYHTHHcAdR4W4XzHC">Prediction</div> (the attribute name is a holdover from when we had an offsite integration with Elicit). For example, your prediction "Somebody (possibly Screwtape) builds an integration between Fatebook.io and the LessWrong prediction UI by the end of July 2025" has ID tYHTHHcAdR4W4XzHC. A graphql query to get the results:

query GetPrediction {
  ElicitBlockData(questionId:"tYHTHHcAdR4W4XzHC") {
    _id predictions {
      createdAt
      creator { displayName }
    }
  }
}

Some of it, but not the main thing. I predict (without having checked) that if you do the analysis (or check an analysis that has already been done), it will have approximately the same amount of contamination from plastics, agricultural additives, etc as the default food supply.

Studying the diets of outlier-obese people is definitely something should be doing (and are doing, a little), but yeah, the outliers are probably going to be obese for reasons other than "the reason obesity has increased over time but moreso".

Load More