berglund

Wiki Contributions

Comments

Thanks for the post! What follows is a bit of a rant. 

I'm a bit torn as to how much we should care about AI sentience initially. On one hand, ignoring sentience could lead us to do some really bad things to AIs. On the other hand, if we take sentience seriously, we might want to avoid a lot of techniques, like boxing, scalable oversight, and online training. In a recent talk, Buck compared humanity controlling AI systems to dictators controlling their population. 

One path we might take as a civilization is that we initially align our AI systems in an immoral way (using boxing, scalable oversight, etc) and then use these AIs to develop techniques to align AI systems in a moral way. Although this wouldn't be ideal, it might still be better than creating a sentient squiggle maximizer and letting it tile the universe. 

There are also difficult moral questions here, like if you create a sentient AI system with different preferences than yours, is it okay to turn it off?

I see, thanks for clarifying.

The universal learning/scaling model was largely correct - as tested by openAI scaling up GPT to proto-AGI.

I don't understand how OpenAIs success at scaling GPT proves the universal learning model. Couldn't there be an as yet undiscovered algorithm for intelligence that is more efficient?

From the last bullet point: "it doesn't much matter relative to the issue of securing the cosmic endowment in the name of Fun."

Part of the post seems to be arguing against the position "The AI might take over the rest of the universe, but it might leave us alone." Putting us in an alien zoo is pretty equivalent to taking over the rest of the universe and leaving us alone.  It seems like the last bullet point pivots from arguing that AI will definitely kill us to arguing that even though if it doesn't kill us this is pretty bad.

I want to defend the term Goal Misgeneralization. (Steven Byrnes makes a similar point in another comment). 

I think what's misgeneralizing is the "behavioral goal" of the system: a goal that you can ascribe to a system to accurately model its behavior. Goal misgeneralization does not refer to the innate goal of the system.  (In fact, I think this perspective is trying to avoid thorny discussions of these topics, partly because people in ML are averse to philosophy.)


For example, the coin run agent pursues the coin in training, but when the coin is put on the other side of the level it still just goes to the right. In training, the agent could have been modeled as having a bunch of goals including getting the coin, getting to the right of the maze, and maximizing the reward it gets. By putting the coin on the left side of the maze we see that its behavior cannot always be modeled by the goal of getting the coin and we get misgeneralization.

This is analogous to a Husky classifier that learns to classify whether the dog is on snow. Here, the models behavior can be explained by classifying any number of things about the image, including whether the pictured dog is a Husky and whether the pictured dog is in snow. These things come apart when you show it a Husky that's not standing in snow and we get "concept misgeneralization".

[This comment is no longer endorsed by its author]Reply

The story you sketched reminds me of one of claims Robin Hanson makes in The Elephant in the Brain. He says that humans have evolved certain adaptations, like unconscious facial expressions, that make them bad at lying. As a result, when humans do something that's socially unacceptable (e.g. leaving someone because they are low-status) our brain makes us believe we are doing something more socially acceptable (e.g. leaving someone because you don't get along).

So humans have evolved imperfect adaptations to make us less deceptive along with workarounds to avoid those adaptations.

In particular, I strongly suspect that acausal norms are not so compelling that AI technologies would automatically discover and obey them.  So, if your aim in reading this post was to find a comprehensive solution to AI safety, I'm sorry to say I don't think you will find it here.  

To make sure I understand, would this mean that the AI technologies would be acting suboptimally, in the sense they could achieve their goals better if they joined the aucausal economy?

Thanks for the post. I have a clarifying question.

  • Alien civilizations should obtain our consent in some fashion before visiting Earth.

It seems like the acausal economy wouldn't benefit from treating us well, since humans currently can't contribute very well to it. I would expect that this means that alien civilizations are fine with visiting us. What am I missing here?


 

I agree with habryka that the title of this post is a little pedantic and might just be inaccurate, but I nevertheless found the content to be thought-provoking, easy to follow, and well written.

Load More