Maximizing paperclips is the de facto machine ethics / AI alignment meme. I showcase some practical problems with Nick Bostrom's paperclip thought experiment and posit that if we really tried to maximize paperclips in the universe we would have to sacrifice utility measurements as a result.

Let's Start Making Paperclips

How do we actually maximize paperclips? Ought we make tiny nano-scale paperclips or large planet-sized paperclips?[1] Do we need to increase the number of paperclips in the universe or can we simply increase the paperclip-ness of the universe instead? Assuming the same amount of mass gets converted to paperclips either way, which way of producing paperclips is best?

To be very clear, I don't want to make paperclips up to some arbitrary limit, I want to make the entire universe into paperclips, and so would an AI given the same purpose; since it's entire world (utility function / terminal goal / whatever) is paperclips, it cannot think a world that is not paperclips. So we are not trying to figure out how to make just five paperclips in the most efficient way, nor five planet's worth of paperclips, nor any finite quantity. We are trying to truly maximize paperclips, to make an infinity of paperclips — I would even shoot for an uncountable infinity if higher cardinalities are possible. I want to touch the Sun and keep my wings. I am a paperclip maximalist.

This is a very dumb question but I promise you it's important: What is a paperclip? If our definition is too broad, it will include things that aren't really paperclips, and that defeats the purpose of the exercise. If our definition is too narrow, we get a paperclip that is impossible to produce ab initio. This example given for Goodhart's Law illustrates a similar problem:

There’s a story about a Soviet nail factory. The factory was instructed to produce as many nails as possible, with rewards for high numbers and punishments for low numbers. Within a few years, the factory was producing huge numbers of nails - tiny useless nails, more like thumbtacks really. They were not very useful for nailing things.

So the planners changed the incentives: they decided to reward the factory for the total weight of nails produced. Within a few years, the factory was producing big heavy nails, more like lumps of steel really. They were still not very useful for nailing things.

And if we left an AI to internally decide its own definition of a paperclip, i.e. to determine its own utility function and what methods to use to maximize it, it would first need to figure out how how to create and optimize utility functions, which would require utility generalization and by then we just have a general intelligence that would see no reason to narrow its existence back down to just paperclips. There are nuances here that I'm sure many of you will unnecessarily mention in the comments; what matters is that it is not clear nor relevant how AI would self-design its own utility function / purpose. I don't really care what the AI would do, I just want to maximize paperclips. So for sake of discussion, we will define a paperclip as: any wire of material that is rendered into the canonical shape of the Microsoft mascot Clippy, but without the eyes.

The Problem With Small Paperclips

If we were maximizing purely for the quantity, and not quality, of paperclips in the universe, we would not only want to produce a great number of paperclips, but produce the greatest possible number of them for any given amount of conversion mass. This forces us to produce the smallest possible paperclips — if from X amount of mass we can create Y amount of paperclips, then optimizing for both X and Y, and not just one or the other, is what creates the maximal amount of paperclips in the universe. 

We know there is plenty of room at the bottom of physics but there is still a limit to how small we can go. The smaller the paperclip, the faster the paperclip degrades. The smallest possible paperclip would be constituted of a wire that is only one atom wide, and if a single atom is displaced then the paperclip breaks and stops being a paperclip.[2] The half-life of the atoms used and their resistance to cosmic ray bombardment, gravitational disturbance, and general entropy, become the limiters on how long paperclips can survive — but ideally we want them to last until the universe ends.

If we marginally increase the minimum mass requirements to make paperclips, they will last significantly longer than the rate that they are destroyed, but they will still degrade over time and eventually change enough to fall out of our definition of a paperclip. The problem with degradation is that it constitutes a body of mass that isn't being used for paperclips anymore. This is waste material that we must decide to recover or abandon. If we recover the material then we must have machines that stay behind (or go back to) every place in which we have already produced paperclips to ensure that destroyed paperclips get turned back into paperclips again. Or we just abandon them. Either way there is an increasing amount of material that will never be paperclips. This wastefulness can be easily avoided by infinitely scaling up their size, infinitely increasing their resistance to decay in turn.[3]

The Problem With Big Paperclips

If we were to maximize the quality of the universe's paperclip property, meaning we made more physically contiguous paperclip-space, we would do so by making increasingly larger paperclips. Planet-sized paperclips would let us circumvent mereological questions (i.e., the heap-like problem of what constitutes sufficient mass for a unit of paperclip), and would allow us to abandon the paperclip for the longest possible period of time before the paperclip began to degrade into something that was no longer a paperclip (giving us the longest possible paperclip half-life). Any sufficiently robust AI would have to consider these problems of paperclip identity and degradation in order to optimally maximize, so really big paperclips are a legitimate candidate for what we aim to produce.

But if we optimize for increasingly larger paperclips then we quickly run into physical constraints having to do with the gravitational effects of massive bodies. A planet-sized paperclip would start to form molten cores, and if we kept adding mass, the core would become so dense that fusion would begin to occur. This is obviously not tenable since it would destroy the paperclip.

If the big paperclips were only as big as the Moon, then their mass would still exert enough gravitational force to warp the shape of the paperclip, destroying the canonical Clippy morphology, which falls out of the definition we established earlier. This means there is some theoretical upper bound for the size of a paperclip and I think that's funny. The limit is somewhere between the size of Australia and some other very large island (I don't want to bother doing the actual calculations for this). The expected life of a paperclip that size is just about as long as the amount of time until heat death, so very large paperclips are optimal in terms of longevity.

Or are they? Even though one single island-sized paperclip wouldn't have enough mass to gravitationally warp itself out of shape, lots of these paperclips in close proximity would attract each other, collide, break apart, reform with molten cores, and so on. Our goal is to make paperclips, not destroy them, so we would have to ensure they avoid collision. The island-sized paperclips would need to be uniformly spaced out in such a way that we could guarantee any movement resulted in them moving away from each other, not closer. As it turns out, the universe is expanding at an accelerating rate, which we can use to our advantage and expand the space between paperclips faster than they attract each other. Each paperclip would have to be spaced out several light-years from each other (I don't want to bother calculating attractor forces but this estimate is close enough).[4]

But it is overtly impossible to move and position massive bodies of that scale in orchestration. Ruh roh.

The Problem With Medium Paperclips

Any sufficiently intelligent system capable of long-term planning would have to contend with the fact that very small paperclips don't survive very long and very large paperclips collapse under their own mass. Inconveniently, medium-sized paperclips fail to avoid either problem.

If we were to produce paperclips of the various standard sizes that we use irl, we still eventually produce enough mass to create a planet-sized body, which results in a molten core, destroying the paperclips at an alarming rate. If fusion starts we get even worse returns.

Medium-sized paperclips don't even have very long lives. In hard vacuum, a paperclip of any atomic consistency can be expected to be bombarded with cosmic rays at elevated rates, significantly reducing the otherwise very long half-life of stable isotopes (1018 years) and resulting in unmanageable entropy that degrades the paperclips faster than we could produce them (after some exorbitantly large but otherwise arbitrary amount have been made).

To be very clear about the conclusion drawn from medium-sized paperclips: maximization is not possible at all given the issues just described. It is not simply a problem of some hard limit where we can turn the planet into a zillion paperclips and then we can't make any more, no, the problem is that the paperclips start getting destroyed faster than we can make them — we end up minimizing the number of paperclips in the universe by creating ever-growing banks of material that can never be paperclips.[5]

Let's Start Making Paperclips?

Before we make paperclips, we have to make the machines that make paperclips. This requires us to optimize for paperclip making machines, which runs into the exact same list of problems we just described. In turn, optimizing for any thing in particular results in the same or similar problems.

If the world described by a thought experiment doesn't include the laws of our actual world, then the thought experiment doesn't describe anything we'll have to contend with in the actual world. If we include the laws of physics, there are hard lower and upper bounds on the kinds of paperclips we can make, and there are measurable speeds at which the paperclips are destroyed by entropy — something that would be known to a sufficiently intelligent AI making plans around paperclip maximization. This means Nick Bostrom's original thought experiment is inconsequential to reality; it is structurally incapable of justifying itself.

This is true for anything else we try to maximize, including utility. There are lots of already-known problems with maximizing utility, like the repugnant conclusion, utility monsters, etcetera. If we were to maximize utility, we would have to determine how much we can produce before it reaches critical mass and starts to implode on itself. Happiness? Well you've got to produce a lot of humans to get a lot more happiness, and humans have mass. Fulfillment? Thriving? Purpose? Whatever arbitrary definition we assign to utility results in the same conclusion — exponentially increasing quantities of mass and then nuclear fusion at the center of some stellar body.

When we look at the billions of examples of general intelligence on our planet we see they don't discursively optimize for unbounded mass conversion of any kind — this is simply not something we do — and so all of our goals, instrumental or final, never lead to anything like out-of-control paperclip maximizers. This strongly suggests that real intelligence doesn't operate by means of utility functions (at least not as they are given in Bostrom's paper), so the idea that an instrumental goal could emerge that requires everyone be converted into paperclips (or any other thing) seems like such a leap in logic that I would even challenge the predication of its statistical ontology. By this I mean it is statistically impossible, logically impossible, nomologically impossible, and every other kind of impossible. It cannot happen.

Paperclip maximizers (true, unbounded maximizers) don't exist. Any conceptions we come up with for optimizing the quantity or quality of paperclips (or anything else) in the universe swiftly run into hard physical limits that preclude us from achieving any extreme goal.

But despite everything said, what if we start converting unreasonable amounts of mass into paperclips anyways? I mean, paperclip manufacturers already exist and they try to maximize output, right? As it turns out, the more people we convert into paperclips, the less people we have making paperclips. The diminishing returns hit fast enough that paperclip manufacturers have decided not to turn people into paperclips.

Entropy is real, and all mechanical systems require energy to fight entropy — energy which is not infinite and has to come from somewhere. The more paperclips you try to make, the more energy is required, and then you have to start optimizing for entire energy systems that otherwise have nothing to do with paperclips, recursively inscribing all the prior problems back into the thought experiment.

Given everything said, there is an obvious futility to maximizing paperclips in reality that defeats the grounding of Bostrom's original thought experiment, and this is reason enough to dismiss its conclusions.

  1. ^

    When I asked ChatGPT, it decided a small paperclip was the ideal kind. You can see its reasoning here.

  2. ^

    Paperclips constructed from anything smaller than atoms would decay even faster, so that's off the table.

  3. ^

    Increasing the size of the paperclip decreases the surface area relative to an equivalent amount of mass split between smaller paperclips, and therefore decreases the rate of decay. Technically there is still a substantial amount of mass that is being rendered non-paperclip at any given moment anyways, but at least the paperclip units themselves would last longer.

  4. ^

    As the expansion of space accelerates, the paperclips can be laid out closer together, but this is an otherwise arbitrary improvement to spacing efficiency.

  5. ^

    There are legitimate arguments to be made that since both the number of paperclips being made and destroyed trend towards infinity (and are both potential infinities rather than actual infinities in the Aristotelian sense), and they have the same cardinality, they then cancel each other out and we are more successful at maximizing paperclips if we simply do nothing at all.

    I don't include any of this in the body of the text because this was just supposed to be a joke post but you guys have no sense of humor and ratio'd me hard enough that my negative karma now disallows me from responding to comments, so I leave this here to punish those that bother to read it.

New to LessWrong?

New Comment
6 comments, sorted by Click to highlight new comments since: Today at 9:37 AM

This is a peculiar essay. If there are limits to how big, how small, or how stable you can make some object, that doesn't mean it's impossible to maximize the number of copies of the object. On the contrary, knowing those limits tells you what maximization looks like. 

Perhaps you're interpreting "maximize" to mean "increase without limit"? Maximization just means to increase as much as possible. If there's a limit, maximization means you go up to the limit. 

The most interesting issue touched upon, is the uncertainty over exactly what counts as a paperclip. This corresponds to a genuinely important issue in AI value systems, namely, how does the AI decide what counts as goodness or happiness or consent (etc.), when it encounters new situations and new possibilities. But the essay doesn't explore that direction for long. 

There is a futility to maximizing paperclips in reality that defeats the purpose of the original thought experiment, and this is reason enough to dismiss its conclusions.

I basically liked this post until the very last paragraph. I think it's an interesting point "if you were to actually maximize paperclips, you'd have a variety of interesting ontological and physical problems to solve." (This seems similar to the Diamond Maximizer problem)

But your last paragraph sounds like a) you misunderstood the point of the paperclip thought experiment (i.e. it's not about telling an AI to maximize paperclips, it was about telling an AI to maximize some other thing and accidentally making a bunch of molecular paperclip-like-objects because that happened to maximize the function it was given. See https://www.lesswrong.com/tag/squiggle-maximizer-formerly-paperclip-maximizer )

But even assuming we're talking about an AI maximizing paperclips because we told it to, I don't really understand the point you're making here.

This is an interesting discussion of the scenario in some depth, but with a one-line conclusion that is completely unsupported by any of the preceding discussion.

This post doesn't seem to be making a logical argument. That could be why it's getting downvotes.

It seems like you're arguing that since paperclips don't last forever (without protection), you can't make as many paperclips as possible. This doesn't seem to logically follow. It seems like you just need to include protecting those paperclips as part of your efforts to make as many as possible.

Your concluding paragraph says that, since making paperclips isn't completely thought out, there's no reason to think there's a problem with an agent that pursues a goal singlemindedly being dangerous. That doesn't follow either.

It seems like you could use the same logic and say that Napoleon trying to conquer Europe makes no sense, since Europe isn't completely defined, and it won't stay conquered forever. But if you used that logic to choose your actions, you'd wind up dead when Napoleon and his armies marched through (taking your food as a subgoal of their perhaps poorly-defined goal) despite their objectives being incompletely defined.

You can't use sophistry to avoid physical death. And poorly defining an agent's goals won't make the outcome any better for us, just worse for those goals getting accomplished.

The words "maximise paperclips" cover many different possibilities. But any particular AI with a goal that can be described by those words will have whatever specific goal it actually has. It has no decisions to make about that, no philosophical problems to solve, any more than an autopilot in a plane needs to wonder what an "optimal flight path" is, e.g. whether to minimise fuel consumption or flight time. It does whatever the task built into it is.

What is in the mind of someone looking at the AI and what is in the AI are different things. The issues you bring up stem from the former, and amount to nothing more than the fuzziness of all language, but the latter just is whatever it is.

And if the AI is powerful enough and devotes everything to "making paperclips", it won't matter to us what precisely it is trying to do, it will still blithely take the atoms of our bodies for its own purposes.