I can easily imagine an argument that: SBF would be safe to release in 25 years, or for that matter tomorrow, not because he'd be decent and law-abiding, but because no one would trust him and the only crimes he's likely to (or did) commit depend on people trusting him. I'm sure this isn't entirely true, but it does seem like being world-infamous would have to mitigate his danger quite a bit.
More generally — and bringing it back closer to the OP — I feel interested in when, and to what extent, future harms by criminals or norm-breakers can be prevented just by making sure that everyone knows their track record and can decide not to trust them.
Though — I haven't read all of his recent novels, but I think — none of those are (for lack of a better word) transhumanist like Permutation City or Diaspora, or even Schild's Ladder or Incandescence. Concretely: no uploads, no immortality, no artificial minds, no interstellar civilization. I feel like this fits the pattern, even though the wildness of the physics doesn't. (And each of those four earlier novels seems successively less about the implications of uploading/immortality/etc.)
In practice, it just requires hardware with limited functionality and physical security — hardware security modules exist.
An HSM-analogue for ML would be a piece of hardware that can have model weights loaded into its nonvolatile memory, can perform inference, but doesn't provide a way to get the weights out. (If it's secure enough against physical attack, it could also be used to run closed models on a user's premises, etc.; there might be a market for that.)
This doesn't work. (Recording is Linux Firefox; same thing happens in Android Chrome.)
An error is logged when I click a second time (and not when I click on a different probability):
[GraphQL error]: Message: null value in column "prediction" of relation "ElicitQuestionPredictions" violates not-null constraint, Location: line 2, col 3, Path: MakeElicitPrediction instrument.ts:129:35
How can I remove an estimate I created with an accidental click? (Said accidental click is easy to make on mobile, especially because the way reactions work there has habituated me to tapping to reveal hidden information and not expecting doing so to perform an action.)
If specifically with IQ, feel free to replace the word with "abstract units of machine intelligence" wherever appropriate.
By calling it "IQ", you were (EDIT: the creator of that table was) saying that gpt4o is comparable to a 115 IQ human, etc. If you don't intend that claim, if that replacement would preserve your meaning, you shouldn't have called it IQ. (IMO that claim doesn't make sense — LLMs don't have human-like ability profiles.)
Learning on-the-fly remains, but I expect some combination of sim2real and muZero to work here.
Hmm? sim2real AFAICT is an approach to generating synthetic data, not to learning. MuZero is a system that can learn to play a bunch of games, with an architecture very unlike LLMs. This sentence doesn't typecheck for me; what way of combining these concepts with LLMs are you imagining?
I don't think it much affects the point you're making, but the way this is phrased conflates 'valuing doing X oneself' and 'valuing that X exist'.
Among 'hidden actions OpenAI could have taken that could (help) explain his death', I'd put harassment well above murder.
Of course, the LessWrong community will shrug it off as a mere coincidence because computing the implications is just beyond the comfort level of everyone on this forum.
Please don't do this.
The OP addresses cases like this:
I agree that the comment you're replying to is (narrowly) wrong (if understanding 'prior' as 'temporally prior'), because someone might socially acquire a preference not to overeat sugar before they get the chance to learn they don't want to overeat sugar. ISTM this is repaired by comparing not to '(temporally) prior preference' but something like 'reflectively stable preference absent coercive pressure'.