American Philosophical Association (APA) announces two $10,000 AI2050 Prizes for philosophical work related to AI, with June 23, 2024 deadline: https://dailynous.com/2024/04/25/apa-creates-new-prizes-for-philosophical-research-on-ai/

https://www.apaonline.org/page/ai2050

https://ai2050.schmidtsciences.org/hard-problems/

yanni's Shortform

sweenesm7d10

Nice write up on this (even if it was AI-assisted), thanks for sharing! I believe another benefit is Raising One's Self-Esteem: If high self-esteem can be thought of as consistently feeling good about oneself, then if someone takes responsibility for their emotions, recognizing that they can change their emotions at will, they can consistently choose to feel good about and love themselves as long as their conscience is clear.

This is inline with "The Six Pillars of Self-Esteem" by Nathaniel Branden: living consciously, self-acceptance, self-responsibility, self-assertiveness, living purposefully, and personal integrity.

What if Ethics is Provably Self-Contradictory?

sweenesm10d30

Thanks for the post. I don’t know the answer to whether a self-consistent ethical framework can be constructed, but I’m working on it (without funding). My current best framework is a utilitarian one with incorporation of the effects of rights, self-esteem (personal responsibility) and conscience. It doesn’t “fix” the repugnant or very repugnant conclusions, but it says how you transition from one world to another could matter in terms of the conscience(s) of the person/people who bring it about.

It’s an interesting question as to what the implications are if it’s impossible to make a self-consistent ethical framework. If we can’t convey ethics to an AI in a self-consistent form, then we’ll likely rely in part on giving it lots of example situations (that not all humans/ethicists will agree on) to learn from and hope it’ll augment this with learning from human behavior, and then generalize well to outside all this not perfectly consistent training data. (Sounds a bit sketchy, doesn't it - at least for the first AGI's, but perhaps ASI's could fare better?) Generalize "well” could be taken to mean that an AI won’t do anything that most people would strongly disapprove of if they understood the true implications of the action.

[This paragraph I'm less sure of, so take it with a grain of salt:] An AI that was trying to act ethically and taking the approval of relatively wise humans as some kind of signal of this might try to hide/avoid ethical inconsistencies that humans would pick up on. It would probably develop a long list of situations where inconsistencies seemed to arise and of actions it thought it could "get away with" versus not. I'm not talking about deception with malice, just sneakiness to try to keep most humans more or less happy, which, I assume would be part of what its ethics system would deem as good/valuable. It seems to me that problems may come to the surface if/when an "ethical" AI is defending against bad AI, when it may no longer be able to hide inconsistencies in all the situations that could rapidly come up.

If it is possible to construct a self-consistent ethical framework and we haven't done it in time or laid the groundwork for it to be done quickly by the first "transformative" AI's, then we'll have basically dug our own grave for the consequences we get, in my opinion. Work to try to come up with a self-consistent ethical framework seems to me to be a very under explored area for AI safety.

Consequentialism is a compass, not a judge

sweenesm15d20

Thanks for the interesting post! I basically agree with what you're saying, and it's mostly in-line with the version of utilitarianism I'm working on refining. Check out a write up on it here.

Quadratic Reciprocity's Shortform

sweenesm1mo10

Thanks for the post. I don't know if you saw this one: "Thank you for triggering me", but it might be of interest. Cheers!

Anxiety vs. Depression

[+]sweenesm1mo-70

How I turned doing therapy into object-level AI safety research

sweenesm1mo20

Thanks for the interesting post! I agree that understanding ourselves better through therapy or personal development it is a great way to gain insights that could be applicable to AI safety. My personal development path got started mostly due to stress from not living up to my unrealistic expectations of how much I "should" have been succeeding as an engineer. It got me focused on self-esteem, and that's a key feature of the AI safety path I'm pursuing.

If other AI safety researchers are interested in a relatively easy way to get started on their own path, I suggest this online course which can be purchased for <$20 when on sale: https://www.udemy.com/course/set-yourself-free-from-anger

Good luck on your boundaries work!

Update on Developing an Ethics Calculator to Align an AGI to

sweenesm1mo10

Thanks for the feedback! I’m not exactly sure what you mean by “no pattern-matching to actually glue those variables to reality.” Are you suggesting that an AGI won’t be able to adequately apply the ethics calculator unless it’s able to re-derive the system for itself based on its own observations of reality? The way I envision things happening is that the first AGI’s won’t be able to derive a mathematically consistent system of ethics over all situations (which is what the ethics calculator is supposed to be) - no human has done it yet, as far as I know - but an ASI likely will - if it’s possible.

If a human can figure it out before the first AGI comes online, I think this could (potentially) save us a lot of headaches, and the AGI could then go about figuring out how to tie the ethics calculator to its reality-based worldview - and even re-derive the calculator - as its knowledge/cognitive abilities expand with time. Like I said in the post, I may fail at my goal, but I think it’s worth pursuing, while at the same time I’d be happy for others to pursue what you suggest, and hope they do! Thanks again for the comment!

W2SG: Introduction

sweenesm2mo20

I don't know if you saw this post from yesterday, but you may find it useful: https://www.lesswrong.com/posts/ELbGqXiLbRe6zSkTu/a-review-of-weak-to-strong-generalization-ai-safety-camp