by [anonymous]
1 min read4th Feb 20222 comments
This is a special post for quick takes by [anonymous]. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

New to LessWrong?

2 comments, sorted by Click to highlight new comments since: Today at 3:39 PM
[-][anonymous]2y20

A lot of current safety research is on intent alignment—ensuring that AI learns and seeks to fulfill the intentions of one human user, or of a small group of human users. Since the prospect of developing an AI assistant may be attractive to researchers and lucrative for companies, this type of alignment research may be prioritized over research that tackles the unpleasant realities of balancing the sometimes opposing interests of various demographics of humans.

In particular, I can imagine a failure scenario in which a small number of corporations use AI to promote their own influence and the growth of their platforms. The interests of the top users of the platforms are prioritized above others, and life genuinely improves for them. Some users want to improve life for others around them, but this is largely focused on their immediate community, so AI-assisted humans prioritize the development of already-wealthy countries while doing very little to help the global poor or solve problems like climate change, which only worsen. Cures for diseases are created, new technologies are invented, and a few people enjoy profound leisure and luxury, but while life gets better for them, it worsens for people whose interests aren’t represented on the platforms—which includes most of humanity.

Meanwhile, AI subtly discourages more people from joining the platform, finding that one way to satisfy its humans is to make them feel lucky and superior to others. Countries that were reliant on rich nations for outsourced labor are thrown into disarray as automation accelerates. Food insecurity and starvation loom, worsened by the disruption of agriculture by climate change. Bioengineered diseases are released by human actors, and although AI helps develop countermeasures to keep the death toll to a few million, this results in widespread panic and fear in poorer regions, while rapid responses and new technologies limit their influence in rich nations. Constant surveillance and behavioral engineering prevents crime and discourages the coordinated expression of dissent. Most of humanity feels powerless, dehumanized, and alone, while a tiny elite is on track to get the utopia they dreamed of. In the back of their minds, they know that not everyone shares their life of luxury—oh, how lucky they are—but they’re confident it’s just a matter of time.

The scenario I just described represents the main cluster of outcomes that I’m most worried about, although my views will surely change over time. The future above, which might be classified as an s-risk, is terrifying to me because it wouldn’t fit the definition of an existential catastrophe by the best evidence available to elite humans—people’s lives would be getting better by all sorts of metrics, but only ones that elites cared about. It would be a tragic case of Goodhart’s Law and the principal-agent problem gone wrong.

[-][anonymous]2y10

Quote from "Where Recursive Justification Hits Rock Bottom":

Now suppose you write on a sheet of paper:  "(1) Everything on this sheet of paper is true, (2) The mass of a helium atom is 20 grams."  If that trick actually worked in real life, you would be able to know the true mass of a helium atom just by believing some circular logic which asserted it.  Which would enable you to arrive at a true map of the universe sitting in your living room with the blinds drawn.  Which would violate the second law of thermodynamics by generating information from nowhere.  Which would not be a plausible story about how your mind could end up believing something true.

Challenge accepted.

You're sitting in your living room with the blinds drawn. You have a good command of the English language and a high-school level of physics knowledge. You write down the two aforementioned statements on a piece of paper: "Everything on this sheet of paper is true. The mass of a helium atom is 20 grams."

Truth, you decide, is relative. Right here in your living room, it's true that the mass of a helium atom is 20 grams—because you say so! There, you've already made some progress. But you'd like to show the world that this is true more broadly as well, so you start thinking.

Somewhere far away, perhaps in another galaxy, there is bound to be a society whose measurements line up in such a way that they'd say a helium atom is 20 grams. Is that the same thing, or are you changing the definitions? You decide it doesn't matter; all you care about right now is expanding the number of ways from which it's truthful to say that a helium atom is 20 grams. But how could you show the world that this distant society truly exists and that their measurements show a helium atom as being 20 "grams"?

You start visualizing this society—let's call it Gramtopia—imagining it in great detail. None of it is real—yet. You write a series of novels, and gradually it becomes more real in your mind. Large parts of their society revolve around their conception of the helium atom as being 20 grams: this is necessary for redundancy later on in your plan. You start embarking on lucid dreams in which you explore Gramtopia, talking to its scientists and asking questions about their technology and culture. In one particularly passionate conversation with a scientist in your dreams, you and the scientist figure out that there is a way to communicate via quantum entanglement across the vast stretches of space separating your dream version of Gramtopia from the actual physical location of Gramtopia, wherever it is. After all, you reason when you wake up, Gramtopia has to be out there somewhere! You were taught that the universe is infinite, and an infinite universe must contain all physically possible worlds. There is nothing about Gramtopia preventing it from existing in your universe.

This process of communication doesn't transmit any information and hence doesn't violate any laws of physics. In fact, it's the same process that intelligent sentient beings use all the time to communicate. Instead, it allows each party to rule out alternative meanings and arrive at something close enough to the original message. By exploiting subtle asymmetries in the math of quantum mechanics, from what little you remember of high-school physics, you figure out that two parties can assemble large structures of entangled particles that can "communicate" one bit of information at a time with 50.001% accuracy. By layering these assemblies on top of each other, this accuracy can be raised to an arbitrarily high level. In effect, this is what humans do all the time when they communicate, you realize. It's just that it took us a hell of a long time to get to the point where we could communicate very complex ideas. Now we take our biological structures enabling speech production (whether verbal or nonverbal) completely for granted, and the structure of our shared languages somewhat less for granted. But together, these allow us to communicate with one another.

The only catch is that for such a system to enable communication with Gramtopia, you need to have some existing particle arrangements that are already entangled with the real-life Gramtopia. At first, this is a devastating realization. But as you reflect on the problem later, you realize that your brain and series are novels are a complex particle arrangement that is already with Gramtopia. Moreover, as long as Gramtopia is within the observable universe, the particles that eventually ended up in both Earth and Gramtopia were once right next to each other and hence entangled.

Your series only needs to get one detail right about Gramtopia: scientists say that the hydrogen atom is 20 grams. Every other detail may be varied. But even if the vast majority of the information is lost, so much of your version of Gramtopia redundantly encodes this concept that the core message should be preserved. You condense the message into as few bits as possible (64 bits) and write it on your sheet of paper. This is the message you hope to receive via quantum entanglement from Gramtopia.

The last step requires a high-quality source of randomness that can be entangled with the information represented in the novels you've written about Gramtopia—and hence, with Gramtopia itself. You enter a lucid dream and go into Gramtopia, telling them the plan. (Of course, at this point you can't tell whether you're making Gramtopia in your mind or not, but you press on.) They will use their most advanced technology to send a message that will perturb a random number generator on your living room computer in such a way that, when you repeat the measurements hundreds of times for your 64 bits, the average value of each bit when layered on top of each other will give you back the message saying that a helium atom is 20 grams.

When you wake up, you start the experiment. It takes several weeks for you get to get enough measurements for the randomness to begin morphing into order, but it works! Well, almost. One bit in the 64-bit message ended up being different than expected for some reason, but the other 63 bits worked out. And with the system already in place, you taught yourself some programming to automate the process of layering the measurements so you could communicate more concisely with Gramtopia. A long and fruitful relationship began.

You rested in the knowledge that you'd found a way to find true information about the universe from the confines of your living room by forcing the territory to fit the map and not the other way around.

Even if you started out believing the sheet of paper, it would not seem that you had any reason for why the paper corresponded to reality.  It would just be a miraculous coincidence that (a) the mass of a helium atom was 20 grams, and (b) the paper happened to say so.

Miraculous coincidence indeed. Or just a lot of ingenuity?

Obviously, I engaged in a lot of hand-waving, but I kind of had to. The reason the computer was entangled with Gramtopia is that the computer is entangled with your mind, which in turn is entangled with Gramtopia. The quantum communication device that the scientists use in Gramtopia works because they have independently been searching for a way to communicate with civilizations similar to them, and they decide that their first message should be to establish some simple common ground: they introduce their units, and then the masses of the elements in their units. It just so happens that the universe most similar to their expectations was your universe. It's a two-way street, you see?

Now, why is Eliezer still (generally) right when he says that you can't just write something down on a piece of paper and expect it to map reality? Because even though the map does change the territory a very tiny amount, the effect is so small that you can barely notice it in most cases, especially when you're trying to find the truth about something that is difficult to vary like the "mass" of a helium atom.

Obviously, I cheated in this story by having the "mass" refer to a social construct rather than sticking to our world's precise definition in terms of the kilogram, which is now defined in terms of the Planck constant. That, in turn, defines the kilogram in terms of the second and the meter. It would be much harder to create a story that postulated different laws of physics since our universe appears to have identical laws of physics, at least if the spectra of stars are any indication.