This is the fourth post in my Effective-Altruism-funded project aiming to deconfuse goal-directedness. Comments are welcomed. All opinions expressed are my own, and do not reflect the attitudes of any member of the body sponsoring me. The funding has come to an end, but I expect to finish off this...
This is the third post in my Effective-Altruism-funded project aiming to deconfuse goal-directedness. Comments are welcomed. All opinions expressed are my own, and do not reflect the attitudes of any member of the body sponsoring me. My strategy for achieving a formalisation of goal-directed behaviour is to equate it with...
This post is my capstone project for the AI Safety Fundamentals programme. I would like to thank the organizers of the programme for putting together the resources and community which have broadened my horizons in the field. Thanks to my cohort and facilitator @sudhanshu_kasewa for the encouragement. Thanks also to...
This is the second post in my Effective-Altruism-funded project aiming to deconfuse goal-directedness. Comments are welcomed. All opinions expressed are my own, and do not reflect the attitudes of any member of the body sponsoring me. In my first post I started thinking about goal-directedness in terms of explanations, and...
This is the first post in my Effective-Altruism-funded project aiming to deconfuse goal-directedness. Comments are welcomed. All opinions expressed are my own, and do not reflect the attitudes of any member of the body sponsoring me. In my preliminary post, I described my basic intuitions about goal-directedness, and focussed on...
In a short time I will be starting a project (funded by LTFF) under the supervision of Adam Shimi with the goal of deconfusing goal-directedness. I have decided to record the progress on this project on a biweekly basis here on LW, to test whether this helps to keep me...
Earlier this year, when looking for an inroad to AI safety, I learned about the Löbian Obstacle, which is a problem encountered by 'purely logical' agents when trying to reason about and trust one another. In the original paper of Yudkowsky and Herreshoff [1], they show that a consequence of...