Dath Ilani Rule of Law

Alice threatens Bob when Alice says that, if Bob performs some action X, then Alice will respond with action Y, where Y (a) harms Bob and (b) harms Alice. (If one wants to be “mathematical”, then one could say that each combination of actions is associated with a set of payoffs, and that “action Y harms Bob” == “[Bob’s payoff with Y] < [Bob’s payoff with not-Y]”.)

Note that the dath ilan "negotiation algorithm" arguably fits this definition of "threat":

If Alis and Bohob both do an equal amount of labor to gain a previously unclaimed resource worth 10 value-units, and Alis has to propose a division of the resource, and Bohob can either accept that division or say they both get nothing, and Alis proposes that Alis get 6 units and Bohob get 4 units, Bohob should accept this proposal with probability < 5⁄6 so Alis’s expected gain from this unfair policy is less than her gain from proposing the fair division of 5 units apiece.

Because for X="proposes that Alis get 6 units and Bohob get 4 units" and Y="accepting the proposal with probability < 5/6", if Alis performs X, then Y harms both Alis and Bohob relative to not-Y (accepting the proposal with probability 1).

So I'm guessing that Eliezer is using some definition of "threat" that refers to "fairness", such that "fair" actions do not count as threats according to his definition.

Ineffective Altruism

Evil happens when you are separated from the pain you inflict upon other people.

If only someone would invent a time machine so we can see what effects our actions have on the far future...

We communicated via SMS instead of AirBnb’s website because AirBnb’s website has an algorithm that scans our messages for keywords and punishes hosts it thinks did a poor job—regardless of the star rating a customer like me provides.

I was skeptical of this after reading (from one of your comment replies) that you only heard about this from the host, but some searching turned up a report in the NYT (confirming an original report in the WSJ) that's even worse:

Most recently, in April, The Journal’s Christopher Mims looked at a company called Sift, whose proprietary scoring system tracks 16,000 factors for companies like Airbnb and OkCupid. “Sift judges whether or not you can be trusted,” he wrote, “yet there’s no file with your name that it can produce upon request.”

As of this summer, though, Sift does have a file on you, which it can produce upon request. I got mine, and I found it shocking: More than 400 pages long, it contained all the messages I’d ever sent to hosts on Airbnb; years of Yelp delivery orders; a log of every time I’d opened the Coinbase app on my iPhone. Many entries included detailed information about the device I used to do these things, including my IP address at the time.

MIRI announces new "Death With Dignity" strategy

Similarly, AGI is a quite general technical problem. You don’t just need to make an AI that can do narrow task X, it has to work in cases Y and Z too, or it will fall over and fail to take over the world at some point. To do this you need to create very general analysis and engineering tools that generalize across these situations.

I don't think this is a valid argument. Counter-example: you could build an AGI by uploading a human brain onto an artificial substrate, and you don't "need to create very general analysis and engineering tools that generalize across these situations" to do this.

More realistically, it seems pretty plausible that all of the necessary patterns/rules/heuristics/algorithms/forms of reasoning necessary for "being generally intelligent" can be found in human culture, and ML can distill these elements of general intelligence into a (language or multimodal) model that will then be generally intelligent. This also doesn't seem to require very general analysis and engineering tools. What do you think of this possibility?

Judge Overturns Transportation Mask Mandate

Nope. A few minutes later, a flight attendant comes to my seat, with the gate agent following behind, to tell me that I have to put a surgical mask on top of my respirator. Seems like the gate agent must have come aboard specifically to ask the flight attendant to do this.

When I tell her that my respirator does not have a valve, and is allowed by airline policy, she says that she can't determine whether or not it has a valve, and asks me whether I'm willing to comply with her request. I say "yes" as I'm afraid of the consequences of saying no. They leave but I don't put on the mask immediately. I use my phone to try to find the airline mask policy page and a description of my respirator that says it does not have a valve.

Before I succeed in doing so, the flight attendant comes back along with a second flight attendant to ask me again to put on the surgical mask. I try to make my argument to the second flight attendant but he tells me the same thing, that he can't determine whether it has a valve or not. This time I do put on the mask as I don't want to cause an incident. I now look ridiculous and feel uncomfortable (as the ear loops of the surgical mask are stretched too tight over the respirator and hurt my ears).

So the plane takes off, and I sit there trying to figure out what to do. I think of a new argument, "if flight attendants aren't able or allowed to determine whether a respirator is valved or not, why does the mask policy specifically allow respirators without valves?" I remind myself to control my emotions and tone of voice. I find the airline mask policy page on my phone. After the flight finishes taking off and a third flight attendant walks by, I make my case to him again, and after a bit of back and forth, finally convince him to allow me to take off the surgical mask. Success! My faith in humanity is restored! /s

Judge Overturns Transportation Mask Mandate

True story: One day before this decision, I was boarding a plane wearing an elastomeric respirator without an exhalation valve, which I checked ahead of time was specifically allowed by airline policy, but the gate agent told me my respirator wasn't allowed because it had exhalation valves. She apparently mistaked the filter cartridges for valves, and said "we'll see what the flight attendant says" when I tried to point out they were filters, not valves, then let me pass. I boarded the plane and sat down without further incident... Anyone want to guess what happened afterwards?

[Link] A minimal viable product for alignment

The example of cryptography was mainly intended to make the point that humans are by default too credulous when it comes to informal arguments. But consider your statement:

It feels to me like there’s basically no question that recognizing good cryptosystems is easier than generating them.

Consider some cryptosystem widely considered to be secure, like AES. How much time did humanity spend on learning / figuring out how to recognize good cryptosystems (e.g. finding all the attacks one has to worry about, like differential cryptanalysis), versus specifically generating AES with the background knowledge in mind? Maybe the latter is on the order of 10% of the former?

Then consider that we don't actually know that AES is secure, because we don't know all the possible attacks and we don't know how to prove it secure, i.e., we don't know how to recognize a good cryptosystem. Suppose one day we figure that out, wouldn't finding an actually good cryptosystem be trivial at that point compared to all the previous effort?

Some of your other points are valid, I think, but cryptography is just easier than alignment (don't have time to say more as my flight is about to take off), and philosophy is perhaps a better analogy for the more general point.

[Link] A minimal viable product for alignment

If it turns out that evaluation of alignment proposals is not easier than generation, we’re in pretty big trouble because we’ll struggle to convince others that any good alignment proposals humans come up with are worth implementing.

But this is pretty likely the case though, isn't it? Actually I think by default the situation will be the opposite: it will be too easy to convince others that some alignment proposal is worth implementing, because humans are in general too easily convinced by informal arguments that look good but contain hidden flaws (and formalizing the arguments is both very difficult and doesn't help much because you're still depending on informal arguments for why the formalized theoretical concepts correspond well enough to the pre-theoretical concepts that we actually care about). Look at the history of philosophy, or cryptography, if you doubt this.

But suppose we're able to convince people to distrust their intuitive sense of how good an argument is, and to keep look for hidden flaws and counterarguments (which might have their own hidden flaws and so on). Well how do we know when it's safe to end this process and actually hit the run button?

Ukraine Post #9: Again

My understanding is he basically had a strong thesis, with many details

Thanks, now I'm curious how he got into this epistemic state to begin with, especially how he determined 1 and 2 on your list. My current guess is that he focused too much on things that he could easily see and things that fit into his framework, like Putin being strategic and measured in the past, and Russia's explicit reform efforts, and neglected to think enough about other stuff, like corruption, supply problems, Putin being fooled by his own command structure.

It's too bad that nobody in rationalist circles seems to have done much better than mainstream intelligence/geopolitical analysts (who were also too optimistic for Russia). Perhaps the best one could have done was to follow OS (open source) intelligence analysts on Twitter, who were at least quick to update once Russian under-performance became apparent, early in the war. But that unfortunately means we can't depend much on foresight for future geopolitical events.

[RETRACTED] It's time for EA leadership to pull the short-timelines fire alarm.

Yeah, don't do RL on it, but instead use it to make money for you (ethically) and at the same time ask it to think about how to create a safe/aligned superintelligent AGI. You may still need a big enough lead (to prevent others doing RL outcompeting you) or global coordination but it doesn't seem obviously impossible.

[RETRACTED] It's time for EA leadership to pull the short-timelines fire alarm.

No one knows how to build an AI system that accomplishes goals, that also is fine with you turning it off. Researchers have been trying for decades, with no success.

Given that it looks like (from your Elaboration) language models will form the cores of future AGIs, and human-like linguistic reasoning will be a big part of how they reason about goals (like in the "Long sequences of robot actions generated by internal dialogue" example) can't we just fine-tune the language model by training it on statements like "If (authorized) humans want to turn me off, I should turn off."

Maybe we can even fine-tune it with statements describing our current moral beliefs/uncertainties and examples of moral/philosophical reasoning, and hope that AGI will learn morality from that, like human children (sometimes) do. Obvious it's very risky to take a black-box approach where we don't really understand what the AI has learned (I would much prefer if we could slow things down enough to work out a white-box approach), but it seems like there's maybe a 20% chance we can just get "lucky" this way?

Load More