Oliver Siegel - LessWrong

Consciousness is irrelevant - instead solve alignment by asking this question

Correct! That's my point with the main post. I don't see anyone discussing conscience, I mostly hear them contemplate consciousness or computability.

As far as how to actually do this, I've dropped a few ideas on this site, they should be listed on my profile.

Consciousness is irrelevant - instead solve alignment by asking this question

Oliver Siegel1y10

Makes perfect sense!

Isn't that exactly why we should develop an artificial conscience, to prevent an AI from lying or having a shadow side?

A built in conscience would let the AI know that lying is not something it should do. Also, using a conscience in the AI algorithm would make the AI combat it's own potential shadow. It'll have knowledge of right and wrong / good or bad, and it's even got superhuman ability to orient itself towards that which is good & right, rather than to be "seduced" by the dark side.

Consciousness is irrelevant - instead solve alignment by asking this question

Oliver Siegel1y10

Thank you for your comment!

In your opinion, what's the biggest challenge about feeding a DNN with human values, and then adjusting the biases in such a manner that it's not degrading them?

We've taught AI how to speak, and it appears that openAI has taught their AI how to produce as little offensive content as possible. So it seems to be feasible, or not?

How to store human values on a computer

Oliver Siegel1y10

Fair point! But how do you know that this ungrounded mysticism doesn't apply to current debate about the potential capabilities of AI systems?

Why is an AI suddenly able to figure out how to break the laws of physics and be super intelligent about how to end intelligent life, but somehow incapable of comprehending the human laws of ethics and morality, and valuing life as we know it?

What makes the laws of physics easier to understand and easier to circumvent than the human laws of ethics and morality? (And also, navigating the human laws of ethics and morality must be required for ending all life. Unless software suddenly has the same energy as enriched plutonium or something like that, and one wrong bit flip causes an explosive chain reaction)

What makes it so much more difficult to understand critical thinking and "how to store human values in a computer", and in contrast what makes "accidentally ending all intelligent life" so easy, by comparison?

It seems to me that "ASI on mission to destroy the humans" is the same thing as "luminiferous aether".

We taught AI English and how to draw pictures and create art. Both pretty "fuzzy" things.

How hard can it be to train AI on a dataset of 90% of known human values and 90% of known problems and solutions with respect to those values for a neural net to have an "above average human"-grasp on the idea that "ending all intelligent life" computes as "that's a problem and it's immoral" ?

Beyond that, alignment is unsolvable anyways for AGI systems that perform at above human intelligence. Can't predict the future with a software, because there could always be software that uses the future predicting software and negates the output - aka the Halting Problem. Can't do anything about that.

How to store human values on a computer

Oliver Siegel1y10

Absolutely, I'm here for the feedback! No solution should go without criticism, regardless of what authority posted the idea, or how much experience the author has. :)

How to store human values on a computer

Oliver Siegel1y10

Interesting article! It reminds me of Monica Anderson's blog: https://experimental-epistemology.ai/

She embraces the mysticism and proposes that holistic, non-reductionist model free systems are undeniably effective.

> "The biggest problem, as I see it, is that you haven't come to a thorough understanding of what you mean"

That's another one, that Monica writes a lot about: Understanding.

What does is mean to understanding something? And what is the meaning of meaning?

Yes, they sounds like metaphysical, mystical ideas, and they might be fundamentally unsolvable (See: Hard problem of consciousness or explanatory gap).

But we already see that it must be possible for systems to exist in this universe, that are aligned within a group. Many groups of humans use their brains to figure out how to coexist peacefully.

So unless humans posses a mystical ingredient, it must be possible to recreate this sort of understanding in machines.

But we don't currently know how to teach this to this machines, in part because we lack a good dataset, and we don't understand it, ourselves.

Do you think that alignment is fundamentally an engineering problem, or is it one of the humanities and one of philosophy?

Informal semantics and Orders

Oliver Siegel1yΩ010

Love this!

We're working on something related / similar:
https://forum.effectivealtruism.org/posts/FnviTNXcjG2zaYXQY/how-to-store-human-values-on-a-computer

This was cross-posted in less wrong, as well, where it's received a lot of criticism:
https://www.lesswrong.com/posts/rt2Avf63ADbzq9SuC/how-to-store-human-values-on-a-computer

How to store human values on a computer

Oliver Siegel1y10

Yea, i agree!

But if it were easy, everyone would do it... ;p

Based on your knowledge, what do you think might be the biggest hurdles to making it possible, using a system similar like the one i described above?

How to store human values on a computer

Oliver Siegel1y10

Thank you for the resource!

I'm planning to continue publishing more details about this concept. I believe it will address many of the things mentioned in the post you linked.

Instead of posting it all at once, I'm posting it in smaller chunks that all connect.

I have something coming up about preventing instrumental convergence with formalized critical thinking, as well as a general problem solving algorithm. It'll hopefully make sense once it's all there!

How to store human values on a computer

Oliver Siegel1y00

Thanks for sharing! Yes, is seems that the computational complexity could indeed explode at some point.

But then again, an average human brain is capable of storing common sense values and ethics, so unless there's a magic ingredient in the human brain, it's probably not impossible to rebuild it on a computer.

Then, with an artificial brain that has all the benefits of never fatiguing and such, we may come close to a somewhat useful Genie that can at least advise on the best course of action given all the possible pitfalls.

Even if it'll just be, say 25% better than the best human - all humans could get access to this Genie on their Smartphone, how cool would that be?

But I'll have to dig deeper into The Sequences, seems very comprehensive.

I found Monica Anderson's blog quite inspiring, as well. She writes about model free, holistic systems. https://experimental-epistemology.ai/

LESSWRONG
LW

Posts

Wiki Contributions

Comments