Currently a summer research fellow at Center on Long-Term Risk. Former intern at Center for Reducing Suffering, Wild Animal Initiative, Animal Charity Evaluators. Former co-head of Haverford Effective Altruism.

Interests include: • AI alignment • animal advocacy from a longtermist perspective • decision theory, especially acausal interactions • digital minds • s-risks • ways to improve the EA career pipeline • wild animal welfare

Feel free to contact me for whatever reason! You can set up a meeting with me here.

Wiki Contributions


There is another very important component of dying with dignity not captured by the probability of success: the badness of our failure state. While any alignment failure would destroy much of what we care about, some alignment failures would be much more horrible than others. Probably the more pessimistic we are about winning, the more we should focus on losing less absolutely (e.g. by researching priorities in worst-case AI safety).

I feel conflicted about this post. Its central point as I'm understanding it is that much evidence we commonly encounter in varied domains is only evidence about the abundance of extremal values in some distribution of interest, and whether/how we should update our beliefs about the non-extremal parts of the distribution is very much dependent on our prior beliefs or gears-level understanding of the domain. I think this is a very important idea, and this post explains it well.

Also, felt inspired to search out other explanations of the moments of a distribution - this one looks pretty good to me so far.

On the other hand, the men's rights discussion felt out of place to me, and unnecessarily so since I think other examples would be able to work just as well. Might be misjudging how controversial various points you bring up are, but as of now I'd rather see topics of this level of potential political heat discussed in personal blogposts or on other platforms, so long as they're mostly unrelated to central questions of interest to rationalists / EAs.

This is super interesting!

Quick typo note (unless I'm really misreading something): in your setups, you refer to coins that are biased towards tails, but in your analyses, you talk about the coins as though they are biased towards heads.

One is the “cold pool”, in which each coin comes up 1 (i.e. heads) with probability 0.1 and 0 with probability 0.9. The other is the “hot pool”, in which each coin comes up 1 with probability 0.2

 random coins with heads-probability 0.2

We started with only  tails

full compression would require roughly  tails, and we only have about 

As far as I'm aware, there was not (in recent decades at least) any controversy that word/punctuation choice was associative. We even have famous psycholinguistics experiments telling us that thinking of the word "goose" makes us more likely to think of the word "moose" as well as "duck" (linguistic priming is the one type of priming that has held up to the replication crisis as far as I know). Whenever linguists might have bothered to make computational models, I think those would have failed to produce human-like speech because their associative models were not powerful enough.

This comment does not deserve to be downvoted; I think it's basically correct. GPT-2 is super-interesting as something that pushes the bounds of ML, but is not replicating what goes on under-the-hood with human language production, as Marcus and Pinker were getting at. Writing styles don't seem to reveal anything deep about cognition to me; it's a question of word/punctuation choice, length of sentences, and other quirks that people probably learn associatively as well.

Why should we say that someone has "information empathy" instead of saying they possess a "theory of mind"?

Possible reasons: "theory of mind" is an unwieldy term, it might be useful to distinguish in fewer words a theory of mind with respect to beliefs from a theory of mind with respect to preferences, you want to emphasise a connection between empathy and information empathy.

I think if there's established terminology for something we're interesting in discussing, there should be a pretty compelling reason why it doesn't suffice for us.

It felt weird to me to describe shorter timeline projections as "optimistic" and longer ones as "pessimistic"- AI research taking place over a longer period is going to be more likely to give us friendly AI, right?

This approach can be made a little more formal with FDT/LDT/TDT: being the sort of agent who robustly does not respond to blackmail maximises utility more than being the sort of agent who sometimes gives in to blackmail, because you will not wind up in situations where you're being blackmailed.

The subjunctive mood and really anything involving modality is complicated. Paul Portner has a book on mood which is probably a good overview if you're willing to get technical. Right now I think of moods as expressing presuppositions on the set of possible worlds you quantify over in a clause. I don't think it's often a good idea to try to get people to speak a native language in a way incompatible with the language as they acquired it in childhood; it adds extra cognitive load and probably doesn't affect how people reason (the exception being giving them new words and categories, which I think can clearly help reasoning in some circumstances).

Load More