SurvivalBias — LessWrong

Can you define "utility" in utilitarianism without using words for specific human emotions?

No, they are not. Animals can feel e.g. happiness as well.

Yeah but the problem here is that we perceive happiness in animals only in as much as it looks like our own happiness. Did you notice that the closer an animal to a human the more likely we are to agree it can feel emotions? An ape can definitely display something like a human happiness, so we're pretty sure it can experience it. A dog can display something mostly like human happiness so most likely they can feel it too. A lizard - meh, maybe but probably not. An insect, most people would say no. Maybe I'm wrong and there's an argument that animals can experience happiness which is not based on their similarity to us, in that case I'm very curious to see this argument.

Sentience

For the record, I believe we do have at least crude mechanistic model of how consciousness works in general, and yes what's with the hard problem of consciousness in particular (the latter being a bit of a wrong question).

Otherwise, I actually think it somewhat answers my question. One my qualm would be that sentience does seem to come on a spectrum - but that can in theory be addressed by some scaling factor. The bigger issue for me is that it implies that a hardcore total utilitarian would be fine with a future populated by trillions of sentient but otherwise completely alien AIs successfully achieving their alien goals (e.g. maximizing paperclips) and experiencing desirable-state-of-consciousness about it. But I think some hardcore utilitarians would bite this bullet, and that wouldn't be a biggest bullet for a utilitarian to bite either.

Can you define "utility" in utilitarianism without using words for specific human emotions?

SurvivalBias3y54

>Utility itself is an abstraction over the level of satisfaction of goals/preferences about the state of the universe for an entity.

You can say that a robot toy has a goal of following a light source. Or thermostat has a goal of keeping the room temperature at a certain setting. But I'm yet to hear anyone counting those things towards total utility calculations.

Of course a counterargument would be "but those are not actual goals, those are the goals of humans that set it", but in this case you've just hidden all the references to humans into the word "goal" and are back to square 1.

Can you define "utility" in utilitarianism without using words for specific human emotions?

SurvivalBias3y10

So utility theory is a useful tool, but as far as I understand it's not directly used as a source of moral guidance (although I assume once you have some other source you can use utility theory to maximize it). Whereas utilitarianism as a metaethics school is concerned exactly with that, and you can hear people in EA talking about "maximizing utility" as the end in and of itself all the time. It was in this latter sense that I was asking.

AGI Ruin: A List of Lethalities

SurvivalBias3y20

To start off, I don't see much point in formally betting $20 on an event conditioned on something I assign <<50% probability of happening within the next 30 years (powerful AI is launched and failed catastrophically and we're both still alive to settle the bet and there was an unambiguous attribution of the failure to the AI). I mean sure, I can accept the bet, but largely because I don't believe it matters one way or another, so I don't think it counts from the epistemological virtue standpoint.

But I can state what I'd disagree with in your terms if I were to take it seriously, just to clarify my argument:

Sounds good.
Mostly sounds good, but I'd push back that "not actually running anything close to the dangerous limit" sounds like a win to me, even if theoretical research continues. One pretty straightforward Schelling point for a ban/moratorium on AGI research is "never train or run anything > X parameters", with X << dangerous level at then-current paradigm. It may be easier explain to the public and politicians than many other potential limits, and this is important. It's much easier to control too - checking that nobody collects and uses a gigashitton of GPUs [without supervision] is easier than to check every researcher's laptop. Additionally, we'll have nuclear weapons tests as a precedent.
That's the core of my argument, really. If the consortium of 200 world experts says "this happened because your AI wasn't aligned, let's stop all AI research", then Facebook AI or China can tell the consortium to go fuck themselves, and I agree with your skepticism that it'd make all labs pause for even a month (see: gain of function research, covid). But if it becomes public knowledge that a catastrophe of 1mln casualties happened because of AI, then it can trigger a panic which will make both the world leaders and the public to really honestly want to restrict this AI stuff, and it will both justify and enable the draconian measures required to make every lab to actually stop the research. Similar to how panics about nuclear energy, terrorism and covid worked. I propose defining "public agreement" as "leaders of the relevant countries (defined as the countries housing the labs from p.1, so US, China, maybe UK and a couple of others) each issue a clear public statement saying that the catastrophe happened because of an unaligned AI". This is not an unreasonable ask, they were this unanimous about quite a few things, including vaccines.

AGI Ruin: A List of Lethalities

SurvivalBias3y2-1

What Steven Byrnes said, but also my reading is that 1) in the current paradigm it's near-damn-impossible to built such an AI without creating an unaligned AI in the process (how else do you gradient-descend your way into a book on aligned AIs?) and 2) if you do make an unaligned AI powerful enough to write such a textbook, it'll probably proceed to converting the entire mass of the universe into textbooks, or do something similarly incompatible with human life.

AGI Ruin: A List of Lethalities

SurvivalBias3y20

It might, given some luck and that all the pro-safety actors play their cards right. Assuming by "all labs" you mean "all labs developing AIs at or near to then-current limit of computational power", or something along those lines, and by "research" you mean "practical research", i.e. training and running models. The model I have in mind not that everyone involved will intellectually agree that such research should be stopped, but that enough percentage of public and governments will get scared and exert pressure on the labs. Consider how most of the world was able to (imperfectly) coordinate to slow Covid spread, or how nobody have prototyped a supersonic passenger jet in decades, or, again, the nuclear energy - we as a species can do such things in principle, even though often for the wrong reasons.

I'm not informed enough to give meaningful probabilities on this, but to honor the tradition, I'd say that given a catastrophe with immediate, graphic death toll >=1mln happening in or near the developed world, I'd estimate >75% probability that ~all seriously dangerous activity will be stopped for at least a month, and >50% that it'll be stopped for at least a year. With the caveat that the catastrophe was unambiguously attributed to the AI, think "Fukushima was a nuclear explosion", not "Covid maybe sorta kinda plausibly escaped from the lab but well who knows".

AGI Ruin: A List of Lethalities

SurvivalBias3y2-2

The important difference is that the nuclear weapons are destructive because they worked exactly as intended, and the AI in this scenario is destructive because it failed horrendously. Plus, the concept of rogue AI has been firmly ingrained into public consciousness by now, afaik not the case with the extremely destructive weapons in 1940s ^[1]. So hopefully this will produce more public outrage (and scare among the elites themselves) => stricter external and internal limitations on all agents developing AIs. But in the end I agree, it'll only buy time, maybe few decades if we are lucky, to solve the problem properly or to build more sane political institutions.

^{^}
Yes I'm sure there was a scifi novel or two before 1945 describing bombs of immense power. But I don't think it was anywhere nearly as widely known as Matrix or Terminator.

AGI Ruin: A List of Lethalities

SurvivalBias3y10

How possible is it that a misaligned, narrowly-superhuman AI is launched, fails catastrophically with casualties in the 10^4 - 10^9 range, and the [remainder of] humanity is "scared straight" and from that moment onward treats the AI technology the way we treat nuclear technology now - i.e. effectively strangles it into stagnation with regulations - or even more conservatively? From my naive perspective it is somewhat plausible politically, based on the only example of ~world-destroying technology that we have today. And this list of arguments doesn't seem to rule out this possibility. Is there an independent argument by EY as to why this is not plausible technologically? I.e., why AIs narrow/weak enough to not be inevitably world-destroying but powerful enough to fail catastrophically are unlikely to be developed [soon enough]?

(To be clear, the above scenario is nothing like a path to victory and I'm not claiming it's very likely. More like a tiny remaining possibility for our world to survive.)

Why rationalists are not much concerned about mortality?

SurvivalBias4y10

Yes and no. 1-6 are obviously necessary but not sufficient - there's much more to diet and exercise than "not too much" and "some" respectively. 7 and 8 are kinda minor and of dubious utility except for in some narrow circumstances so whatever. And 9 and 10 are hotly debated and that's exactly what you'd need rationality for, as well as figuring out the right pattern of diet and exercise. And I mean right for each individual person, not in general, and the same with supplements - a 60-year old should have much higher tolerance for potential risks of a longevity treatment than a 25yo, since the latter has more less to gain and more to loose.

Why rationalists are not much concerned about mortality?

SurvivalBias4y10

I would be very surprised if inflammation or loss of proteostasis did not have any effect on fascia, if only because they have negative effect on ~everything. But more importantly, I don't think there's any significant number of people dying from fascia stiffness? That's one of the main ideas behind the hallmarks of aging, that you don't have to solve the entire problem in its every minuscule aspect at once. If you could just forestall all these hallmarks or even just some of them, you could probably increase lifespan and healthspan significantly, thus buying more time to fix other problems (or develop completely knew approaches like mind uploading or regenerative medicine or whatever else).

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments