Emmanouil Michellis

I believe we are looking at the issue the wrong way.

Have you heard the idiom "like herding cats" It's often used when you can't get a group of people to concentrate on a common goal. But that idiom exists for a reason. You could say that cats are already unalignable.

Your average adult cat has an equivalent intelligence to that of a 3-year-old kid. If I were to place a cat in a room along with the smartest human I could find and tasked the smartest human to only use their intelligence alone to control the behavior and motives of the cat, the smartest person in the room would fail the task every time. He might be able to use food to attract or lure the cat but when the food is gone, he's limited influence of the cat is going to vanish too. Even though the intelligence gap is vast. The person has several orders of magnitude greater mental capacity and depth of thought and yet this overwhelming difference would not offer an advantage to the task at hand.

Now contrast this with a dog instead of a cat. Dogs are considered to be equivalent or smarter than cats. If you repeat that experiment chances are good that the dog will get trained to what the person wants the dog to do with lasting behavior changes in the dog.

The intelligence gab is largely the same. But they present radically different alignability outcomes.

This suggests that alignability depends on a far greater number of variables than just raw intelligence. And depending on your perspective this thought experiment can offer revelations for how a less intelligent creature can influence or train the behavior of something as smart as a human.

I believe that there is merit in investing more of our time investigating the underlying mechanisms instead of worrying about raw intelligence alone. Like analyzing goal structures and figuring out what makes a system alignable in the first place.

Scientific breakthroughs of the year

Emmanouil Michellis2mo10

That's an unusual way of approaching this. Using probability to breakthrough the hype seems like a novel idea to me. At least to my knowledge. Kudos to the team.

Anthropic: Three Sketches of ASL-4 Safety Case Components

Emmanouil Michellis2mo-10

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments