AI-doom — LessWrong

Genetic fitness is a measure of selection strength, not the selection target

I agree that humans are not aligned with inclusive genetic fitness, but i think you could look at evolution as a bunch of different optimizers at any small stretch in time and not just a singel optimizer. If not getting killed by spiders is necessary for IGF for example, then evolution could be though off as both an optimizer for IGF and not getting killed by spiders. Some of these optimizers have created mesaoptimizers that resemble the original optimizer to a strong degree. Most people really care about their own biological children not dying for example. I think that thinking about evolution as multiple optimizers, makes it seem more likely that gradient descent is able to instill correct human values sometimes rather than never.

Destroying the fabric of the universe as an instrumental goal.

AI-doom2y30

"Since the AIs were more likely to get ”killed” if they lost a game, being able to crash the game was an advantage for the genetic selection process. Therefore, several AIs developed ways to crash the game. One was particular memorable, because it involved the combination of several complex actions to crash the game. These would have been hard to find by conventional beta testing, since it involved several phenomena human players would instinctively avoid."

https://cs.pomona.edu/~mwu/CourseWebpages/CS190-fall15-Webpage/Readings/2008-Gameplaying.pdf

A Theory of Laughter

AI-doom2y10

This is much closer to my model of humor as well. I think most humor can be categorized as novel patterns of ideas at a certain level of abstraction, that the brain is not used to processing.When the brain gets used to the pattern, the joke and similar jokes are no longer funny.

I find it unlikely that sexual selection hasn't played a major role in the development of humor given that people report it as highly attractive. If humor is mostly novel patterns, then pattern recognition is a skill required to be good at it. Being funny could therefore be a proxy for intelligence. This seems to be he case if you look at the IQ scores of comedians.

Displaying honest signs of intelligence seems adaptive in general in a social primate for various obvious reasons. Combining novel frames might also allow you to convey a concept in a way that is challenging a taboo without risking to actually make statements that are taboo. A useful tool in ancestral politics.

The playfighting theory does a very good job of explaining why tickling and scaring people make people laugh on the other hand. My guess is that humor and laughter started out with the evolutionary purpose described by the theory, before being adapted for other purposes like sexual selection and tribal politics.

Scaffolded LLMs as natural language computers

AI-doom3y10

Great and informative post! It seems to me that this architecture could enhance safety to some extent in the short term. Let's imagine an AI system similar to Auto-GPT, consisting of three parts: a large language model agent focused on creating stamps, a smaller language model dedicated to producing paperclips, and an even smaller scaffolding agent that leverages the language models to devise plans for world domination. Individually, none of these systems possess the intelligence to trigger an intelligence explosion or take over the world. If such a system reaches a point where it is capable of planning world domination, it is likely less dangerous than a simple language model with that goal would be, since the agent providing the goal is too simple to comprehend the importance of self-preservation and is further from superintelligence than the other parts. If so, scaffolding-like structures could be employed as a safety measure, and stop buttons might actually prove effective. Am I mistaken in my intuition? What would likely be the result of an intelligence explosion in the above example? Paperclip maximizers?

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments