Ceramic engineering researcher by training. Been interested in ethics for several years. More recently have gotten into data science.

Wiki Contributions


Thanks for the post. I don't know if you saw this one: "Thank you for triggering me", but it might be of interest. Cheers!

Thanks for the interesting post! I agree that understanding ourselves better through therapy or personal development it is a great way to gain insights that could be applicable to AI safety.  My personal development path got started mostly due to stress from not living up to my unrealistic expectations of how much I "should" have been succeeding as an engineer. It got me focused on self-esteem, and that's a key feature of the AI safety path I'm pursuing.

If other AI safety researchers are interested in a relatively easy way to get started on their own path, I suggest this online course which can be purchased for <$20 when on sale: https://www.udemy.com/course/set-yourself-free-from-anger

Good luck on your boundaries work!

Thanks for the feedback! I’m not exactly sure what you mean by “no pattern-matching to actually glue those variables to reality.” Are you suggesting that an AGI won’t be able to adequately apply the ethics calculator unless it’s able to re-derive the system for itself based on its own observations of reality? The way I envision things happening is that the first AGI’s won’t be able to derive a mathematically consistent system of ethics over all situations (which is what the ethics calculator is supposed to be) - no human has done it yet, as far as I know - but an ASI likely will - if it’s possible.

If a human can figure it out before the first AGI comes online, I think this could (potentially) save us a lot of headaches, and the AGI could then go about figuring out how to tie the ethics calculator to its reality-based worldview - and even re-derive the calculator - as its knowledge/cognitive abilities expand with time. Like I said in the post, I may fail at my goal, but I think it’s worth pursuing, while at the same time I’d be happy for others to pursue what you suggest, and hope they do! Thanks again for the comment!

I don't know if you saw this post from yesterday, but you may find it useful: https://www.lesswrong.com/posts/ELbGqXiLbRe6zSkTu/a-review-of-weak-to-strong-generalization-ai-safety-camp

Thanks for the comment. I agree that context and specifics are key. This is what I was trying to get at with “If you’d like to change or add to these assumptions for your answer, please spell out how.”

By “controlled,” I basically mean it does what I actually want it to do, filling in the unspecified blanks at least as well as a human would to follow as closely as it can to my true meaning/desire.

Thanks for your “more interesting framing” version. Part of the point of this post was to give AGI developers food for thought about what they might want to prioritize for their first AGI to do.

Thank you for the comment. I think all of what you said is reasonable. I see now that I probably should’ve been more precise in defining my assumptions, as I would put much of what you said under “…done significant sandbox testing before you let it loose.”

Thanks for the post. I’d like to propose another possible type of (or really, way of measuring) subjective welfare: self-esteem-influenced experience states. I believe having higher self-esteem generally translates to assigning more of our experiences as “positive.” For instance, someone with low self-esteem may hate exercise and deem the pain of it to be a highly negative experience. Someone with high self-esteem, on the other hand, may consider a particularly hard (painful) workout to be a “positive” experience as they focus on how it’s going to build their fitness to the next level and make them stronger.

Further, I believe that our self-esteem depends on to what degree we take responsibility for our emotions and actions - more responsibility translates to higher self-esteem (see “The Six Pillars of Self-Esteem” by Nathaniel Branden for thoughts along these lines). At low self-esteem levels, "experience states" basically translate directly to hedonic states, in that only pleasure and pain can seem to matter as "positive experiences" and "negative experiences" to a person with low self-esteem (the exception may be if someone's depressed, when not much at all seems to matter). At high self-esteems, hedonic states play a role in experience states, but they’re effectively seen through a lens of responsibility, such as the pain of exercise seen through the lens of one’s own responsibility for getting oneself in shape, and deciding to feel good emotionally about pushing through the physical pain (here we could perhaps be considered to be getting closer to belief-like preferences).

Load More