Nick_Tarleton

Wiki Contributions

Comments

Sorted by

Let's look at preference for eating lots of sweets, for example. Society tries to teach us not to eat too much sweets because it's unhealthy, and from the perspective of someone who likes eating sweets, this often feels coercive. Your explanation applied here would be that upon reflection, people will decide "Actually, eating a bunch of candy every day is great" -- and no doubt, to a degree that is true, at least with the level of reflection that people actually do.

However when I decided to eat as much sweet as I wanted, I ended up deciding that sweets were gross, except in very small amounts or as a part of extended exercise where my body actually needs the sugar. What's happening here is that society has a bit more wisdom than the candy loving kid, tries clumsily to teach the foolish kid that their ways are wrong and they'll regret it, and often ends up succeeding more in constraining behavior than integrating the values in a way that the kid can make sense of upon reflection.

The OP addresses cases like this:

One thing that can cause confusion here - by design - is that perverted moralities are stabler if they also enjoin nonperversely good behaviors in most cases. This causes people to attribute the good behavior to the system of threats used to enforce preference inversion, imagining that they would not be naturally inclined to love their neighbor, work diligently for things they want, and rest sometimes. Likewise, perverted moralities also forbid many genuinely bad behaviors, which primes people who must do something harmless but forbidden to accompany it with needlessly harmful forbidden behaviors, because that's what they've been taught to expect of themselves.

I agree that the comment you're replying to is (narrowly) wrong (if understanding 'prior' as 'temporally prior'), because someone might socially acquire a preference not to overeat sugar before they get the chance to learn they don't want to overeat sugar. ISTM this is repaired by comparing not to '(temporally) prior preference' but something like 'reflectively stable preference absent coercive pressure'.

I can easily imagine an argument that: SBF would be safe to release in 25 years, or for that matter tomorrow, not because he'd be decent and law-abiding, but because no one would trust him and the only crimes he's likely to (or did) commit depend on people trusting him. I'm sure this isn't entirely true, but it does seem like being world-infamous would have to mitigate his danger quite a bit.

More generally — and bringing it back closer to the OP — I feel interested in when, and to what extent, future harms by criminals or norm-breakers can be prevented just by making sure that everyone knows their track record and can decide not to trust them.

Though — I haven't read all of his recent novels, but I think — none of those are (for lack of a better word) transhumanist like Permutation City or Diaspora, or even Schild's Ladder or Incandescence. Concretely: no uploads, no immortality, no artificial minds, no interstellar civilization. I feel like this fits the pattern, even though the wildness of the physics doesn't. (And each of those four earlier novels seems successively less about the implications of uploading/immortality/etc.)

In practice, it just requires hardware with limited functionality and physical security — hardware security modules exist.

An HSM-analogue for ML would be a piece of hardware that can have model weights loaded into its nonvolatile memory, can perform inference, but doesn't provide a way to get the weights out. (If it's secure enough against physical attack, it could also be used to run closed models on a user's premises, etc.; there might be a market for that.)

This doesn't work. (Recording is Linux Firefox; same thing happens in Android Chrome.)

An error is logged when I click a second time (and not when I click on a different probability):

[GraphQL error]: Message: null value in column "prediction" of relation "ElicitQuestionPredictions" violates not-null constraint, Location: line 2, col 3, Path: MakeElicitPrediction instrument.ts:129:35

How can I remove an estimate I created with an accidental click? (Said accidental click is easy to make on mobile, especially because the way reactions work there has habituated me to tapping to reveal hidden information and not expecting doing so to perform an action.)

If specifically with IQ, feel free to replace the word with "abstract units of machine intelligence" wherever appropriate.

By calling it "IQ", you were (EDIT: the creator of that table was) saying that gpt4o is comparable to a 115 IQ human, etc. If you don't intend that claim, if that replacement would preserve your meaning, you shouldn't have called it IQ. (IMO that claim doesn't make sense — LLMs don't have human-like ability profiles.)

Learning on-the-fly remains, but I expect some combination of sim2real and muZero to work here.

Hmm? sim2real AFAICT is an approach to generating synthetic data, not to learning. MuZero is a system that can learn to play a bunch of games, with an architecture very unlike LLMs. This sentence doesn't typecheck for me; what way of combining these concepts with LLMs are you imagining?

I don't think it much affects the point you're making, but the way this is phrased conflates 'valuing doing X oneself' and 'valuing that X exist'.

Among 'hidden actions OpenAI could have taken that could (help) explain his death', I'd put harassment well above murder.

Of course, the LessWrong community will shrug it off as a mere coincidence because computing the implications is just beyond the comfort level of everyone on this forum.

Please don't do this.

Load More