Reason About Intelligence, Not AI

[-]Sheikh Abdur Raheem Ali9h72

If an LLM says "I enjoy going on long walks" or "I don't like the taste of coffee", it is obviously lying because LLMs do not have access to those experiences or sensations. But a human saying those things might also be lying, you just can't tell quite as easily. There is nothing wrong about an LLM saying these things other than the wrongness of lying, as with humans.

Why would it be obviously lying? Would you also say that a blind person cannot have a favorite color? You could be talking about the idea of a thing, rather than the thing itself.

There is a distinction between simulator and simulacra which I feel this section of the post may not be taking into account. An LLM assistant can enjoy writing about certain topics more than others. If a character in a story has some property, then it seems to me that we can make true and false statements about the state of that attribute.

Also, I am not sure I agree with considering corporations, nations, and other organizations to be a good example of superintelligence. I can see how it meets the criteria for the particular definition you use— you define the term more broadly than usual and I think it makes the concept less useful.

[-]Karl Krueger2h21

People can easily be mistaken about what experiences they will enjoy, when they haven't tried them yet but have only read about them. It's pretty common for people to read exciting descriptions of an activity and believe vividly that they would like doing it, but then actually do the activity and find out they don't.

So we should be careful to distinguish between enjoying fiction about an activity, and enjoying doing that activity.

[-]dr_s4m20

Of course, human-based entities are superintelligent in a different way than ASI probably will be, but I think that difference is irrelevant in many discussions involving ASI.

I think while the analogy absolutely does make sense and is worth taking seriously, this is wrong. The main reason why the analogy is worth taking seriously is that using partial evidence is still generally better than using no evidence at all, but the evidence is partial because the fact that ultimately a corporation is still made of people means there's tons of values that are already etched into it from the get go, ways it can fail at coordinating itself, and so on so forth, which makes it a rather different case from an ASI.

If anything, I guess the argument would be "obviously aligning a corporation should be way easier than aligning an ASI, and look at our track record there!".

[-]JBlack3h21

Unlike ASI, some forms of biological superintelligence already exist and have for a long time, and we call them corporations, nation states, and other human organizations.

Most of these social structures are, in the aggregate, substantially stupider than individual humans in many important ways.

Alignment

Alignment is about agent-agent synchronized values, not human-AI synchronized values. The vast majority of agent-agent interactions that we have available to draw conclusions from are interactions where neither agent is AI. Instead they are between human-based entities, humans, other animals, or between two different types of these agents.

People have already made this point. But I'm pretty sad to see that it mostly hasn't caught on yet. When people talk about aligning ASI, they're usually not really talking about ASI, they're just talking about SI; most ASI discussion applies to biological superintelligences.

Unlike ASI, some forms of biological superintelligence already exist and have for a long time, and we call them corporations, nation states, and other human organizations. There has been some alignment-oriented study of these entities but way less than I'd like, especially between entities differing significantly in intellectual capability. For example: Individuals almost always lose when they go against major corporations. The way this usually plays out is one incredibly large and well-paid team of lawyers hired by the corporation going against a much smaller and poorer team hired by the individual. This is analogous to human-ASI interactions. Of course, human-based entities are superintelligent in a different way than ASI probably will be, but I think that difference is irrelevant in many discussions involving ASI.

Communication

I enjoyed this recent post about why humans communicating via LLM-generated text is bad. I agree that this is bad, but think the argument against it is much stronger as a specific case of general bad agent-agent communication patterns, instead of mostly LLM-specific arguments. Here is that more general argument, examining long quotes and lying.

Relying on long quotes from other agents seems bad whether or not you're quoting an LLM. The point of discussion is to engage, not to act as an intermediary between two other agents. LLMs, and especially past humans, don't have the full context for the current discussion. Link or briefly quote other agents' views, but only as a supplement to your own.

If an LLM says "I enjoy going on long walks" or "I don't like the taste of coffee", it is obviously lying because LLMs do not have access to those experiences or sensations. But a human saying those things might also be lying, you just can't tell quite as easily. There is nothing wrong about an LLM saying these things other than the wrongness of lying, as with humans.

If an LLM gives a verifiable mathematical proof, it is very easy to tell whether or not it's lying, which you do in exactly the same way you would if a human presented the same proof.

I think the argument against communicating via LLM-generated text hits harder as a general, agent-agnostic examination of long quotes and lying and why they're bad.

The linked post additionally argues that LLMs are always lying when they say "I think.." or "I believe.." (just like they're lying by claiming to go on long walks or taste coffee). As someone who disagrees with only that latter argument, this framing also makes the point of disagreement clearer.

Conclusion

There are certainly times where the specific "shape" of AI (easier self-improvement, copyable over shorter time scales, requiring significantly different resources) does matter, and that shape is why there is so much more discussion about AI than, say, gene editing or selective breeding.

But the current base assumption seems to be that differences in shape between artificial and biological intelligence matter to the discussion of <current topic>. I think this is usually false, this false assumption is degrading reasoning, and a justification of the impact of differing shapes should be given per topic, if one believes that the differences are impactful.

LESSWRONG
LW

LESSWRONG
LW

17

Reason About Intelligence, Not AI

17

17

Alignment

Communication

Conclusion