LESSWRONG
LW

3248
mattmacdermott
1395Ω15391710
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
4mattmacdermott's Shortform
2y
36
Which side of the AI safety community are you in?
mattmacdermott6d20

Some people likely think

don't build ASI until it can be done safely > build ASI whenever but try to make it safe > never build ASI

Those people might give different prescriptions to the "never build ASI" people, like not endorsing actions that would tank the probability of ASI ever getting built. (Although in practice I think they probably mostly make the same prescriptions at the moment.)

Reply
Bubble, Bubble, Toil and Trouble
mattmacdermott9d194

I think "Will there be a crash?" is a much less ambiguous question than "Is there a bubble?"

Reply
Charbel-Raphaël's Shortform
mattmacdermott13d20

Yeah, I think “training for transparency” is fine if we can figure out good ways to do it. The problem is more training for other stuff (e.g. lack of certain types of thoughts) pushes against transparency.

Reply
abramdemski's Shortform
mattmacdermott22d*5536

I often complain about this type of reasoning too, but perhaps there is a steelman version of it.

For example, suppose the lock on my front door is broken, and I hear a rumour that a neighbour has been sneaking into my house at night. It turns out the rumour is false, but I might reasonably think, "The fact that this is so plausible is a wake-up call. I really need to change that lock!"

Generalising this: a plausible-but-false rumour can fail to provide empirical evidence for something, but still provide 'logical evidence' by alerting you to something that is already plausible in your model but that you hadn't specifically thought about. Ideal Bayesian reasoners don't need to be alerted to what they already find plausible, but humans sometimes do.

Reply3
The quotation mark
mattmacdermott23d70

But then we have to ask — why two ‘ marks, to make the quotation mark? A quotidian reason: when you only use one, it’s an apostrophe. We already had the mark that goes in “don’t”, in “I’m”, in “Maxwell’s”; so two ‘ were used to distinguish the quote mark from the existing apostrophe.


Incidentally I think in British English people normally do just use single quotes. I checked the first book I could find that was printed in the UK and that’s what it uses:


 

Reply
Markets in Democracy: What happens when you can sell your vote?
mattmacdermott24d*77

He'd be a fool to part with his vote for less than the amount of the benefits he gets.

Doesn't seem right. Even assuming the person buying his vote wants to use it to remove his benefits, that one vote is unlikely to be the difference between the vote-buyer's candidate winning and losing. The expected effect of the vote on the benefits is going to be much less than the size of the benefits.

Reply
Checking in on AI-2027
mattmacdermott25d30

An intuition you might be able to invoke is that the procedure they describe is like greedy sampling from an LLM, which doesn’t get you the most probable completion.

Reply
CFAR update, and New CFAR workshops
mattmacdermott1mo1612

“A Center for Applied Rationality” works as a tagline but not as a name

Reply2
Notes on fatalities from AI takeover
mattmacdermott1mo72

We have a ~25% chance of extinction


Maybe add the implied 'conditional on AI takeover' to the conclusion so people skimming don't come away with the wrong bottom line? I had to go back through the post to check whether this was conditional or not.

Reply
leogao's Shortform
mattmacdermott1mo20

Fair enough yeah. But at least (1)-style effects weren’t strong enough to prevent any significant legislation in the near future.

Reply
Load More
33Is instrumental convergence a thing for virtue-driven agents?
7mo
37
39Validating against a misalignment detector is very different to training against one
Ω
8mo
Ω
4
44Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?
Ω
8mo
Ω
15
31Context-dependent consequentialism
Ω
1y
Ω
6
28Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)
Ω
1y
Ω
0
76Bengio's Alignment Proposal: "Towards a Cautious Scientist AI with Convergent Safety Bounds"
Ω
2y
Ω
19
4mattmacdermott's Shortform
2y
36
59What's next for the field of Agent Foundations?
Ω
2y
Ω
23
36Optimisation Measures: Desiderata, Impossibility, Proposals
Ω
2y
Ω
9
29Reward Hacking from a Causal Perspective
Ω
2y
Ω
6
Load More