He was only a de facto mysterian: thought mind is so complicated that it may as well be mysterious (but ofc he believed it's ultimately just physics). This position is updateable, and he clearly updated.

A net saying "I'm thinking about ways to kill you" does not necessarily imply anything whatsoever about the net actually planning to kill you


Since these nets are optimized for consistency (as it makes textual output more likely), wouldn't outputting text that is consistent with this "thought" be likely? E.g. convincing the user to kill themselves, maybe giving them a reason (by searching the web)? 

I've been wishing for someone to write AI-singularity parallel of Bardbury's Martian Chronicles (which are pretty much independent sample/ simulations of how living on Mars could go)

Sharing a personal weird trick why not. I like falling asleep to light TV (via iPad). I watch short shows that a) I like and don't think are boring b) I have seen before. Usually 10 minutes into a 20 min show I'm ready (Futurama is my favorite for this + my meme game is much improved by this)

Was thinking about you! Glad you made it out. Feel free to DM if I can be of assistance

MIRI is bottlenecked more on ideas worth pursuing and people who can pursue them, than on funding

Ideas come from (new) people, and you mentioned seed planting which should contribute to having such people in 4-6 years, seems like still a worthy thing to do for AGI if anything is worth doing for any cause at all (given your short timelines). If you agree what's the bottleneck for that effort?

Related work: 
Show Your Work: Scratchpads for Intermediate Computation with Language Models

(from very surface-level perusal) Prompting the model resulted in 
1) Model outputting intermediate thinking "steps"

2) Capability gain

Koller & Friedman


They primarily & extensively statistical graphical models, not causality (but have a chapter on it)

Since comments get occluded you should refer to an edit/update somewhere at the top if you want it to be seen by those who already read your original comment.

