LESSWRONG
LW

3636
J Bostock
2794Ω14772850
Message
Dialogue
Subscribe

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Dead Ends
Statistical Mechanics
Independent AI Research
Rationality in Research
2Jemist's Shortform
4y
62
Jan_Kulveit's Shortform
J Bostock22h20

What someone spends time writing is an important signal. More generally, what someone is interested in---and spend time on---is one of the most powerful signals as to what they'll do in future.

"Real views" is a bit slippery of a concept. But I strongly predict that most of their outputs in future will look like it's taken "AI will happen and we need to be there" as a primitive, and few will look like they're asking themselves "Should AI happen?". Because the latter question is just not a thought that interests them.

Reply
abramdemski's Shortform
J Bostock3d132

I think this misses the point, since the problem is[1] less "One guy got made psychotic by 4o." and more "A guy who got some kind of AI-orientated psychosis was allowed to continue to make important decisions at an AI company, while still believing a bunch of insane stuff."

  1. ^

    Conditional on the story being true

Reply
What can Canadians do to help end the AI arms race?
Answer by J BostockOct 06, 202552

Recently I was talking to someone about Pause AI's activities in the UK. They asked something like

"Even if Kier Starmer (UK Prime Minister) decided to pause, what would that do?"

And I responded:

"We would have the resources of an entire nation state pointed towards our cause."

Canada (all this applies to the UK too) has a lot more resources to throw around than MIRI, Lighthaven, OpenPhil, and Pause/Control/Stop AI all put together. Canada is a founding NATO member. If Canada's government, and military intelligence leaders, were "Doomer-pilled" on AI, they could do a lot more than we have done.

Reply
Cole Wyeth's Shortform
J Bostock4d74

I recently, rather embarrassingly, made a post with a massive error which an LLM would have found immediately. I seriously misread a paper in a way that cut/pasting the paper and the post into Claude and asking "any egregious misreadings" would have stopped me from making that post. This is far too useful for me to turn down, and this kind of due diligence is +EV for everyone.

Reply
porby's Shortform
J Bostock5d62

The presence of induction heads would be a decent answer. If you show a model ...[A][B]...[A] It will often complete it with [B]. One specific mechanism which can cause this is well-studied, but the general reason is that this pattern is really common in lots of different types of pretraining data.

Reply
Alex_Altair's Shortform
J Bostock6d174

Books are quite long. I think we should go with the shortest possible description of his life.

Reply12
IABIED and Memetic Engineering
J Bostock7d50

I saw someone (cannot find source) say that "Eliezer wrote 300% of the book, and then Nate wrote -200%". I've not read much of Nate's writing other than on LessWrong, so I don't know if he's solely responsible for the change in tone, or if Eliezer was writing more normie-ish to begin with, or what.

Reply3
Jemist's Shortform
J Bostock7d80

Shrimp Interventions

The hypothetical ammonia-reduction-in-shrimp-farm intervention has been touted as 1-2 OOMs more effective than shrimp stunning.

I think this is probably an underestimate, because I think that the estimates of shrimp suffering during death are probably too high.

(While I'm very critical of all of RP's welfare range estimates, including shrimp, that's not my point here. This argument doesn't rely on any arguments about shrimp welfare ranges overall. I do compare humans and shrimp, but IIUC this sort of comparison is the thing you multiply by the welfare range estimate to get your utility value, if you're into that)

(If ammonia interventions are developed that are really 1-2 OOMs better than stunning, then even under my utility function it might be in the same ballpark as the campaign against cage-free eggs and other animal charities)

Shrimp Freezing != Human Freezing

Shrimp stunning, as an intervention, attempts to "knock out" shrimp before they're killed by being placed on ice, which has been likened to "suffocating in a suitcase in the antarctic".

I think that's a bad metaphor. Humans are endotherms and homeotherms, which means we maintain a constant internal body temperature which we generate internally. If it drops, a bunch of stress responses are triggered: shivering, discomfort, etc. which attempt to raise our temperature. Shrimp are poikilotherms, meaning they don't regulate body temperature much at all. This means they don't have the same stress responses to cold that we do.

(I also doubt that, given they have no lungs and can basically never "not breathe" in their normal environment, they'd experience the same stress from asphyxiation that we do, but this is weaker)

I would guess that being thrown onto ice effectively "stuns" the shrimp pretty quickly, as their metabolism---and therefore their synaptic activity---drops with their body temperature.

Reply
"Pessimization" is Just Ordinary Failure
J Bostock8d103

That seems roughly right but I don't think chaos/revolution is necessary for this thesis.

Imagine the world sitting in a deep basin. You're trying to get it out. You can try a revolution: launch it up into the air and hold your breath waiting for it to land.

Or you can try, sisyphus-style, to roll it out. You might get a bunch of allies, who you try and lead along a narrow and precarious ridgeline to your destination. Then, when you're halfway to the top, some might notice that there's a much easier path requiring much less of a push, and so the world gets pushed down off your intended route along the mountain pass and into a new crater.

The latter seems to have been the case with environmentalism and AI Safety. They pushed the world fairly gradually, but it only takes a few individuals to push the world downwards into a new equilibrium, for the upward pushers to lose that fight.

(There's a Yudkowsky tweet somewhere which says something like this, which I've basically based the entire metaphor here on)

Reply1
Claude Sonnet 4.5: System Card and Alignment
J Bostock10d3-1

I think we have evidence that this is not the primary thing going on, because there are parts of the evaluation where the model lacks context, could easily face an identical situation in deployment, and thus can’t know it is being evaluated.

This in particular is not strong evidence, since the optimal strategy for a schemer is usually misbehave only very rarely, when it's very confident that it isn't in an evaluation. This is because the number of deployment runs will be >> the number of evaluations. If it's unable to tell, then it should probably just not misbehave in that instance.

Reply
Load More
56"Pessimization" is Just Ordinary Failure
9d
2
62[Retracted] Guess I Was Wrong About AIxBio Risks
11d
7
192Will Any Crap Cause Emergent Misalignment?
1mo
37
8Steelmanning Conscious AI Default Friendliness
2mo
0
101Red-Thing-Ism
2mo
9
10Demons, Simulators and Gremlins
3mo
1
58You Can't Objectively Compare Seven Bees to One Human
3mo
26
37Lurking in the Noise
4mo
2
11We Need a Baseline for LLM-Aided Experiments
5mo
1
34Everything I Know About Semantics I Learned From Music Notation
7mo
2
Load More