LESSWRONG
LW

974
mattmacdermott
1377Ω15391690
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
4mattmacdermott's Shortform
2y
36
Charbel-Raphaël's Shortform
mattmacdermott8h20

Yeah, I think “training for transparency” is fine if we can figure out good ways to do it. The problem is more training for other stuff (e.g. lack of certain types of thoughts) pushes against transparency.

Reply
abramdemski's Shortform
mattmacdermott9d*5536

I often complain about this type of reasoning too, but perhaps there is a steelman version of it.

For example, suppose the lock on my front door is broken, and I hear a rumour that a neighbour has been sneaking into my house at night. It turns out the rumour is false, but I might reasonably think, "The fact that this is so plausible is a wake-up call. I really need to change that lock!"

Generalising this: a plausible-but-false rumour can fail to provide empirical evidence for something, but still provide 'logical evidence' by alerting you to something that is already plausible in your model but that you hadn't specifically thought about. Ideal Bayesian reasoners don't need to be alerted to what they already find plausible, but humans sometimes do.

Reply3
The quotation mark
mattmacdermott11d70

But then we have to ask — why two ‘ marks, to make the quotation mark? A quotidian reason: when you only use one, it’s an apostrophe. We already had the mark that goes in “don’t”, in “I’m”, in “Maxwell’s”; so two ‘ were used to distinguish the quote mark from the existing apostrophe.


Incidentally I think in British English people normally do just use single quotes. I checked the first book I could find that was printed in the UK and that’s what it uses:


 

Reply
Markets in Democracy: What happens when you can sell your vote?
mattmacdermott12d*77

He'd be a fool to part with his vote for less than the amount of the benefits he gets.

Doesn't seem right. Even assuming the person buying his vote wants to use it to remove his benefits, that one vote is unlikely to be the difference between the vote-buyer's candidate winning and losing. The expected effect of the vote on the benefits is going to be much less than the size of the benefits.

Reply
Checking in on AI-2027
mattmacdermott13d30

An intuition you might be able to invoke is that the procedure they describe is like greedy sampling from an LLM, which doesn’t get you the most probable completion.

Reply
CFAR update, and New CFAR workshops
mattmacdermott19d1612

“A Center for Applied Rationality” works as a tagline but not as a name

Reply2
Notes on fatalities from AI takeover
mattmacdermott23d72

We have a ~25% chance of extinction


Maybe add the implied 'conditional on AI takeover' to the conclusion so people skimming don't come away with the wrong bottom line? I had to go back through the post to check whether this was conditional or not.

Reply
leogao's Shortform
mattmacdermott1mo20

Fair enough yeah. But at least (1)-style effects weren’t strong enough to prevent any significant legislation in the near future.

Reply
leogao's Shortform
mattmacdermott1mo2-2

Some evidence for (2) is that before the 1957 act no civil rights legislation had been passed for 82 years[1], and after it three more civil rights acts were passed in the next 11 years, including the Civil Rights Act of 1964, which in my understanding is considered very significant.


  1. Going off what's listed in the wikipedia article on civil rights acts in the United States. ↩︎

Reply
Nathan Young's Shortform
mattmacdermott1mo20

I thought the post was fine and was surprised it was so downvoted. Even if people don’t agree with the considerations, or think all the most important considerations are missing, why should a post saying, “Here’s what I think and why I think it, feel free to push back in the comments,” be so poorly received? Commenters can just say what they think is missing.

Seems likely that it wouldn’t have been so downvoted if its bottom line was that AI risk is very high. Increases my P(LW groupthink is a problem) a bit.

Reply
Load More
33Is instrumental convergence a thing for virtue-driven agents?
7mo
37
39Validating against a misalignment detector is very different to training against one
Ω
7mo
Ω
4
44Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?
Ω
8mo
Ω
15
31Context-dependent consequentialism
Ω
1y
Ω
6
28Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)
Ω
1y
Ω
0
76Bengio's Alignment Proposal: "Towards a Cautious Scientist AI with Convergent Safety Bounds"
Ω
2y
Ω
19
4mattmacdermott's Shortform
2y
36
59What's next for the field of Agent Foundations?
Ω
2y
Ω
23
36Optimisation Measures: Desiderata, Impossibility, Proposals
Ω
2y
Ω
9
29Reward Hacking from a Causal Perspective
Ω
2y
Ω
6
Load More