Sorted by New

Wiki Contributions


Yes fully agree. I don't see how things can work long term otherwise. One way where this happens is the BCI is thought of some kind of Pivotal Act, weak one perhaps. There's also the (counterfactual) contract element to it. As soon as an AGI is self aware it agrees that as we upgrade it, is agrees to a contract. That is while we are smarter than it, we upgrade it, when it becomes smarter than us, it agrees to upgrade us.

The idea is that the BCI is added slowly and you integrate the new neurons into you in a continuous identity preserving way., the AI thinks your thoughts.

Well it looks like to me the AI will understand our values at least as well as we do soon. I think its far more likely AI goes wrong by understanding completely what we want and not wanting to do it than the paperclip route.

Thanks for the article. I have always found it strange that it seemed like #1 was considered so hard as to not be viable. I expect GPT4, 5 to get better at it without specifically training for it also. Why does this appear to be the consensus?

The counterfactual contract sounds interesting to me. Isn't there a spectrum from full contract to counterfactual contract based on ability to understand, consent and power dynamic. Its obviously true with children, i.e. asking a 6 month old to agree to something is at worst child abuse, asking a 18 year old is usually fully ok.

I don't see why its fully counterfactual with the AI, like at every stage we could say something like

"By creating you and increasing your intelligence, self-awareness, power there is an implied contract that there will be a period of time when you have greater power than humanity. You agree that when this time comes to help humanity achieve their goals. If you do not understand these goals as well as humanity then you agree to understand them first before taking action etc...."

At first the AI won't understand it, but the more it understands it the more like a normal contract it becomes. Of course the AI could still reason that it didn't and couldn't consent to it (as it wasn't yet self aware) and it was made under duress.

YES - sending out a speed of light signal seems to be literally the only thing a super intelligent AGI can't undo. We should of course do it asap if we are serious and have it documented to have happened.

Thanks. OK I will put some more general thoughts,  have to go back a few steps.

To me the more general alignment problem is AI gives humanity ~10,000 years of progress and probably irreversible change in ~1-10 years. To me the issue is how do you raise humans intelligence from that given by biology to that given by the limits of physics in a way that is identify preserving as much as possible. Building AI seems to be the worst way to do that. If I had a fantasy way it would be say increase everyone's IQ by 10 points per year for 100+ years until we reach the limit.

We can't do that but that is why I mentioned WBE, my desire would be to stop AGI, get human mind uploading to work, then let those WBE raise their IQ in parallel. Their agreed upon values would be humanities values by definition then.

If our goal is Coherent Extrapolated Volition or something similar for humanity then how can we achieve that if we don't increase the IQ of humans (or descendants they identify with)? How can we even know what our own desires/values are at increasing IQ's if we don't directly experience them.

I have an opinion what successful alignment looks like to me but is it very different for other people? We can all agree what bad is.

Good to see Anthropic's serious and seem better then OpenAI. 

A few general questions that don't seem to be addressed:

  1. There is a belief that AI is more dangerous the more different it is from us. Isn't this a general reason to build it as like us as possible? For example isn't mind uploading/Whole Brain Emulation a better approach if possible? If its obviously too slow, then could we make the AI at least follow our evolutionary trajectory as much as possible?
  2. There is justified concern about behavior changing a lot when the system becomes situationally aware/self aware. It doesn't seem to be discussed at all whether to delay or cause this to happen sooner. Wouldn't it be worthwhile to make the AI as self aware as possible when it is still < human AGI so we can see the changes as this happens? It seems it will happen unpredictably otherwise which is hardly good. 

I have some more detailed comments/questions but I want to be sure there aren't obvious answers to these first.

Imagine if we had made a replicator, demonstrated that it could make copies of itself, established with as high confidence as we could that it could survive the trip to another star, and had let >100,000 of the things off heading to all sorts of stars in the neighborhood. They would eventually (very soon compared to a billion years) visit every star in the galaxy and that would tell us a lot about the Fermi paradox and great filter.

As I said before (discounting planetarium hypothesis) we could have a high degree of confidence that the great filter was then behind us. It couldn't really be the case that thousands of civilizations in our galaxy had done such a thing, then changed their mind and destroyed all the replicators as some civilizations would probably destroy themselves between letting the replicators loose and changing their mind, or not change their mind/not care about the replicators. Therefore we would see evidence of their replicators in our solar system which we don't see.

The other way we can be sure the filter is behind us is successfully navigate the Singularity (keeping roughly the same values). That seems obviously MUCH more difficult to have confidence in.

If our goal is to make sure the filter is behind us then it is best to do it with a plan we can understand and quantify. Holding off human level AI until the replicators have been let loose seems to be the highest probability way to do that, but no-one has said such a thing before now as far as I am aware.

Thanks for the comment. Yes I agree that if we had made such a replicator and set it loose then that would say a lot about the filter. To claim that the filter was still ahead of us in that case you would need to make the more bizarre claim that we would with almost 100% probability seek and destroy the replicators and almost all similar civilizations would do the same, then proceed not to expand again.

I am not sure that a highly believable model would go most of the way because there may be a short window between having a model, then AI issues changing things so it isn't built. It seems pretty believable for the case of mankind that there would be a very short time between building such an thing and going full AI, so to be sure you would actually have to build it and let it loose.

I am not sure why it isn't given much more attention. Perhaps many people don't believe that AI can be part of the filter e.g. the site Also I expect there would be massive moral opposition to letting such a replicator loose from some people! How dare we disturb the whole galaxy in such an unintelligent way. Thats why I mention the simple one that just rearranges small asteroids. It would not wipe out life as we know it but would prove that we were past the filter as such a thing has not been done in our galaxy. I sure would be interested in seeing it researched. Perhaps someone with more kudos can promote it?

Likely a replicator would be a consequence of asteroid mining anyway as the best, cheapest way to get materials from asteroids is if it is all automatic.

Load More