Aransentin

LESSWRONG
LW

Aransentin — LessWrong

3mo

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

I predict we are shortly going to see platforms using generative AI + A/B testing to make "hyperslop".

Imagine a music service, or a TikTok-like platform with AI-generated shortform videos. The generator gets hooked up to an optimiser which tweaks its input parameters. These could be legible, such as "colour saturation", "cuteness", or "content variability", or entirely opaque weights somewhere. If a tweak is statistically established to increase engagement, it is applied and another A/B test begins.

You could even have specific optimisers which gets run on various subgroups, like "female American teens 16-18" gets their own sub-optimiser, as well as every subculture and every little attractor basin you can identify. This could go... (read more)

Replying toThe Cats are On To Something

Aransentin5mo

The Cats are On To Something

(a sharp distinction from Sniffles the teacup poodle. I don't care if you think you're happy, this would not please the prowling wolves of the stone age.)

This is indeed very much the obvious failure mode! Discovering that an alien species has bred a group of humans into what a pug is to a wolf would be absolutely horrific.

Moreover the path between utopia and "lovecraftian horror" seems pretty fragile? I don't know exactly what property cats had that made the shoggoth take the good one (mostly, maybe except for those flat-faced Persian and hairless Sphynxes) for them, and it's plausible it was just a lucky combination of minor stuff (harder to selectively breed, different social niche, different types of people liking cats) that won't be stable/generalize in extremis.

Replying toDo you care about your clone?

Aransentin7mo

Do you care about your clone?

People generally care more about furthering personal pleasure and minimizing personal pain than the pleasure/pain of others; but this is because internal personal pleasure was a straightforward good heuristic for evolution to take when it wanted to maximize genetic fitness in the ancestral environment where there weren't that many sudden out-of-distribution things (like contraceptives) that could derail it.

I assume a more strongly-optimized intelligent being would have increasingly better correlation between the state of its internal utility to the state of the external world, as it fits whatever goal it was optimized for better. In that case it should more readily collaborate with its clone.

This especially if it gets optimized with other instances of itself so that "cloning" is no longer a weird out-of-distribution event; in which case I expect it to rapidly start behaving like an ant or bee, or even cell or mitochondria, in how it'll sacrifice itself for whatever goal the group has.

Replying toIt's hard to make scheming evals look realistic for LLMs

Aransentin9mo

It's hard to make scheming evals look realistic for LLMs

Ah, thanks, I should have read more carefully.

My revised thoughts is that by asking it if the question is a real-world question as opposed to an evaluation bumps the prior from "almost certainly not an evaluation" to "the likely fraction of questions I'm asked that are evaluations : real-world, e.g. 1:1 unless explicitly stated". That, combined with the presence or absence of suspicious elements like an ethical dilemma may move the likelihood to the one we see it claim.

Ideally I'd want to see it independently coming up with the evaluation hypothesis itself without having to remind it that it's a possibility.

Replying toIt's hard to make scheming evals look realistic for LLMs

Aransentin9mo

It's hard to make scheming evals look realistic for LLMs

Surely the basic fact that a model is asked if it thinks it's in an evaluation is extremely strong evidence for it actually being evaluated? Models are more or less never asked that normally, so it'd be very strong evidence that they are. The reasons it gives could then be post-hoc justifications.

Replying toEliezer and I wrote a book: If Anyone Builds It, Everyone Dies

Aransentin9mo

Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies

I imagine most disagreement comes from the first paragraph.

The problem with assuming that since the publisher is famous their design is necessarily good is that even huge companies make much worse baffling design decisions all the time, and in this case one can directly see the design and know that it's not great – the weak outside-view evidence that prestigious companies usually do good work doesn't move this very much.

Replying toEliezer and I wrote a book: If Anyone Builds It, Everyone Dies

Aransentin9mo

Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies

The "lightcone-eating" effect on the website is quite cool. The immediate obvious idea is to have that as a background and write the title inside the black area.

If one wanted to be cute you could even make the expansion vaguely skull-shaped; perhaps like so?

Replying toRemap your caps lock key

Aransentin1y

Remap your caps lock key

I worry that if I remap it to something actually useful I will commit it to muscle memory and begin to inadvertently press it when using a computer that's not my own. Depending on how often you switch computers this could be worse than the status quo.

Replying toPollsters Should Publish Question Translations

Aransentin1y

Pollsters Should Publish Question Translations

This issue also shows up when doing surveys to compare support for things across countries.

Here, for example, is a typical example one might find on social media where the connotation of the question might vary wildly depending on the language it's translated to. Reasoning about modest differences in percentage between countries then becomes rather meaningless.

Replying toYou're not a simulation, 'cause you're hallucinating

Aransentin3y

You're not a simulation, 'cause you're hallucinating

Yeah. An even more obvious example would be something like "what would Spock say if reviewing 'Warp Drives for Dummies'". In that case, it seems pretty clear that the author is expected to invent some "hallucinatory" content for the book, and not output something like "I don't know that one".

The actual examples can be interpreted similarly; the author should assume that the movie/book exists in the hypothetical counterfactual world they are asked to generate content from.

LESSWRONG
LW

LESSWRONG
LW

Aransentin

Aransentin

Aransentin's Shortform

Aransentin

Aransentin

Aransentin

Aransentin's Shortform