Lukas Finnveden — LessWrong

LESSWRONG
LW

On Fleshling Safety: A Debate by Klurl and Trapaucius.

Certainly the track record is disappointing compared to what's possible, and what seems like it ought to be reasonable. And the track record shows that even pretty obvious mistakes are common. And I imagine that success probability falls off worryingly quickly as success requires more foresight and allows for less trial and error. (Fwiw, I think all this is compatible with "humans trying to prevent bad events very often prevents bad events", when quantifying over a very broad range of possible events.)

On Fleshling Safety: A Debate by Klurl and Trapaucius.

Lukas Finnveden4d30

The most analogous argument that applies to us would be: Bad events are very often prevented by humans being moderately competent and successfully trying to prevent bad events.

Which is indeed a great reason to be more optimistic about the situation than if that wasn't true. Indeed, I expect humans to put in many, many orders of magnitude more effort on alignment (and alignment evaluation) than Klurl and Trapaucius did in the story. Still unclear if it'll be sufficient.

On Fleshling Safety: A Debate by Klurl and Trapaucius.

Lukas Finnveden4d136

By such rationalizations, Klurl, you can excuse any possible example I try to bring you, to show you that by default Reality is a safe, comfortable, unchanging, unsurprising, and above all normal place! You will just say some sort of 'filter' is involved! Well, my position is just that, by one means or another, the fleshlings will no doubt be subjected to some similar filter

So this bit turned out to actually be a valid argument for the situation being safe. Their reality did have a track record of not being blown up by new intelligences, and there was a systematic reason for that which saved them from the fleshlings too. (Though it failed as an argument for why the fleshlings would "end up with emotions that mechanical life would find normal and unsurprising.")

Not super reassuring for our own future though. Our reality doesn't seem systematically safe/comfortable/unchanging/unsurprising to me.

Anna and Oliver discuss Children and X-Risk

Lukas Finnveden11d20

I'd have thought that the people with fertility problems might be even better to study than the voluntarily childless ones — because there's less of a causal connection "obsessed with their job -> chooses to not have children" which seems like a major confounder to the primary object of study ("chooses to have children -> less high-quality focus on job").

The IABIED statement is not literally true

Lukas Finnveden11d*42

My vague impression of the authors’ position is approximately that:

AIs are alien and will have different goals-on-reflection than humans
They’ll become powerseeking when they become smart enough and have enough thinking time to realize that they have different goals than humans and that this implies that they ought to take over (if they get a good opportunity.) This is within the human range of smartness.

I’m not sure what the authors think about the argument that you can get the above two properties in a regime where the AI is too dumb to hide its misalignment from you, and that this gives you a great opportunity to iterate and learn from experiment. (Maybe just that the iteration will produce an AI that’s good at hiding its scheming before one that isn’t scheming inclined at all? Or that it’ll produce one that doesn’t scheme in your test cases, but will start scheming once you give it much more time to think on its own, and you can’t afford much testing and iteration on years or decades worth of thinking.)

Jan_Kulveit's Shortform

Lukas Finnveden23d3926

My impression is that the authors held similar views significantly before they started mechanize. So the explanatory model that these views are downstream of working at mechanize, and wanting to rationalize that, seems wrong to me.

Why you should eat meat - even if you hate factory farming

Lukas Finnveden1mo52

I'm somewhat sympathetic to this reasoning. But I think it proves too much.

For example: If you're very hungry and walk past someone's fruit tree, I think there's a reasonable ethical case that it's ok to take some fruit if you leave them some payment, if you're justified in believing that they'd strongly prefer the payment to having the fruit. Even in cases where you shouldn't have taken the fruit absent being able to repay them, and where you shouldn't have paid them absent being able to take the fruit.

I think the reason for this is related to how it's nice to have norms along the lines of "don't leave people on-net worse-off" (and that such norms are way easier to enforce than e.g. "behave like an optimal utilitarian, harming people when optimal and benefitting people when optimal"). And then lots of people also have some internalized ethical intuitions or ethics-adjacent desires that work along similar lines.

And in the animal welfare case, instead of trying to avoid leaving a specific person worse-off, it's about making a class of beings on-net better-off, or making a "cause area" on-net better-off. I have some ethical intuitions (or at least ethics-adjacent desires) along these lines and think it's reasonable to indulge them.

Why you should eat meat - even if you hate factory farming

Lukas Finnveden1mo308

I thought a potential issue with wild caught fish is that other consumers would simply substitute away from wild to farmed fish, since most people don’t care much and wild caught fish supply isn’t very elastic.

But anchovies and sardines (as suggested in the post) seem like they avoid that issue since apparently there’s basically no farming of them.

I also think it’s just super reasonable to eat animal products and offset with donations — which can easily net reduce animal suffering given how good donation opportunities there are.

"Shut It Down" is simpler than "Controlled Takeoff"

Lukas Finnveden1mo*104

IMO, a big appeal of controlled takeoff is that, if successful, it slows down all of takeoff.

Whereas a global shut down, that might have happened at a time before we had great automated alignment research, and that might incidentally ban a lot of safety research as well… might just end some number of years later, whereupon we might quickly go through the remainder of takeoff, and incur similarly much risk as without the shutdown.

(Things that can cause a shutdown to end: elections or deaths swap out who rules countries, geopolitical power shifts, verification becoming harder as it becomes more plausible that ppl could invest a lot to develop and hide compute and data centers where they can’t be seen, and maybe as AI software efficiency advances using smaller scale experiments that were hard to ban.)

Successful controlled takeoff definitely seems more likely to me than ”shutdown so long that intelligence augmented humans have time to grow up”, and also more likely than ”shutdown so long that we can solve superintelligence alignment up front without having very smart models to help us or to experiment with”.

Short shutdown to do some prep before controlled takeoff seems reasonable.

Edit: I guess technically, some very mildly intelligence augmented humans (via embryo selection) are already being born, and they have a decent chance to grow up before superintelligence even without shutdown. I was thinking about intelligence augmentation that was good enough to significantly reduce x-risk. (Though I'm not sure how long people expect that to take.)

Trapped Priors As A Basic Problem Of Rationality

Lukas Finnveden1mo30

Lots of plausible mechanisms by which something could be "a little off" suggested in this Rohin comment.

LESSWRONG
LW

LESSWRONG
LW

Sequences

Posts

Wikitag Contributions

Comments