jbash

Posts

Sorted by New

Wiki Contributions

Comments

Leaving Orbit

Is it necessary to come up with a two-word phrase that won't mean anything to anybody who hasn't had it explicitly taught to them? Why not say something like "hey, I'm bowing out of this conversation now, but it's not intended to be any sort of reflection on you or the topic, I'm not making a statement, I'm just doing what's good for me and that's all"?

Although honestly that literal text sounds really passive-aggressive, and I would read it to mean "you guys are an annoying waste of time, you will never get anywhere, and I have better things to do". And I suspect I would start to attach that same meaning to any code phrase, regardless of what people claimed it was supposed to mean. Especially since this isn't a temporally constrained CFAR workshop where everybody is briefed on the way in the door.

Also, I think that even talking about either using a code phrase or to spell it out inevitably pushes toward that being a norm. Online discussion in general already has a very effective, functional norm that, unless you've made some explicit commitment, you can just disappear at any time, without any implications about why. Why mess with it?

On edit: I can't believe I missed what Dagon said. That.

Comment on "Deception as Cooperation"

If you haven't already been built to believe that lying is bad, there's nothing to object to: the agency is just doing straightforwardly correct consequentialist optimization of the information channel between states of the world, and actions.

Except that by doing it, the agency blows its credibility and loses some of its influence over whatever happens when it sends the next signal.

It's a dumb strategy even in the context of a single pandemic, let alone a world where you will have to deal with other pandemics, with later political negotiations outside of pandemics, and whatever else. It only looks good in an artificially constrained model.

... and that's where the whole concept of "lying" comes in. "Lying" is what you do when you try to send signals that cause others to act in ways that favor your interests over their own, and thereby induce them to invoke their own power of "withholding actions" in future rounds. And it's frowned upon because, in the long-term, open-ended, indefinitely-iterated, unpredictable, non-toy-model "game" of the real world, it tends to reduce both total utility and individual utility in the long run. To the point where it becomes valuable to punish it.

What would we do if alignment were futile?

Point taken. Although, to be honest, I don't think I can tell what would or would not be conscious anyway, and I haven't heard anything to convince me that anybody else can either.

... and probably I shouldn't have answered the headline question while ignoring the text's points about delay and prevention...

What would we do if alignment were futile?

Try to come up with the best possible "unaligned" AGI. It's better to be eaten by something that then goes out and explores a broad range of action in an interesting way, especially if you can arrange that it enjoy it, than it is to be eaten by Clippy.

Discussion with Eliezer Yudkowsky on AGI interventions

I think you mean something like 'since the mainline looks doomy, try to increase P(success) in non-mainline scenarios'.

Yes, that's basically right.

I didn't bring up the "main line", and I thought I was doing a pretty credible job of following the metaphor.

Take a simplified model where a final outcome can only be "good" (mostly doom-free) or "bad" (very rich in doom). There will be a single "winning" AGI, which will simply be the first to cross some threshold of capability. This cannot be permanently avoided. The winning AGI will completely determine whether the outcome is good or bad. We'll call a friendly-aligned-safe-or-whatever AGI that creates a good outcome a "good AGI", and one that creates a bad outcome a "bad AGI". A randomly chosen AGI will be bad with probability 0.999.

You want to influence the creation of the winning AGI to make sure it's a good one. You have certain finite resources to apply to that: time, attention, intelligence, influence, money, whatever.

Suppose that you think that there's a 0.75 probability that something more or less like current ML systems will win (that's the "ML timeline" and presumptively the "main line"). Unfortunately, you also believe that there's only 0.05 probability that there's any path at all to find a way for an AGI with an "ML architecture" to be good, within whatever time it takes for ML to win (probably there's some correlation between how long it takes ML to win and how long it takes out to figure out how to make it good). Again, that's the probability that it's possible in the abstract to invent good ML in the available time, not the probability that it will actually be invented and get deployed.

Contingent on the ML-based approach winning, and assuming you don't do anything yourself, you think there's maybe a 0.01 probability that somebody else will actually arrange for a the winning AGI to be good. You're damned good, so if you dump all of your attention and resources into it, you can double that to 0.02 even though lots of other people are working on ML safety. So you would contribute 0.01 times 0.75 or 0.0075 probability to a good outcome. Or at least you hope you would; you do not at this time have any idea how to actually go about it.

Now suppose that there's some other AGI approach, call it X. X could also be a family of approaches. You think that X has, say, 0.1 probability of actually winning instead of ML (which leaves 0.15 for outcomes that are neither X nor ML). But you think that X is more tractable than ML; there's a 0.75 probability that X can in principle be made good before it wins.

Contingent on X winning, there's a 0.1 probability that somebody else will arrange for X to be good without you. But at the moment everybody is working on ML, which gives you runway to work on X before capability on the X track starts to rise. So with all of your resources, you could really increase the overall attention being paid to X, and raise that to 0.3. You would then have contributed 0.2 times 0.1 or 0.02 probability to a good outcome. And you have at least a vague idea how to make progress on the problem, which is going to be good for your morale.

Or maybe there's a Y that only has a 0.05 probability of winning, but you have some nifty and unique idea that you think has a 0.9 probability of making Y good, so you can get nearly 0.045 even though Y is itself an unlikely winner.

Obviously these are sensitive to the particular probabilities you assign, and I am not really very well placed to assign such probabilities, but my intuition is that there are going to be productive Xs and Ys out there.

I may be biased by the fact that, to whatever degree I can assign priorities, I think that ML's probability of winning, in the very manichean sense I've set up here, where it remakes the whole world, is more like 0.25 than 0.75. But even if it's 0.75, which I suspect is closer to what Eliezer thinks (and would be most of his "0.85 by 2070"), ML is still handicapped by there not being any obvious way to apply resources to it.

Sure, you can split your resources. And that might make sense if there's low hanging fruit on one or more branches. But I didn't see anything in that transcript that suggested doing that. And you would still want to put the most resources on the most productive paths, rather than concentrating on a moon shot to fix ML when that doesn't seem doable.

Discussion with Eliezer Yudkowsky on AGI interventions

Nuclear war was just an off-the-top example meant to illustrate how far you might want to go. And I did admit that it would probably basically be a delaying tactic.

If I thought ML was as likely to "go X-risk" as Eliezer seems to, then I personally would want to go for the "grab probability on timelines other than what you think of as the main one" approach, not the "nuclear war" approach. And obviously I wouldn't treat nuclear war as the first option for flipping tables... but just as obviously I can't come up with a better way to flip tables off the top of my head.

If you did the nuclear war right, you might get hundreds or thousands of years of delay, with about the same probability [edit: meant to say "higher probability" but still indicate that it was low in absolute terms] that I (and I think Eliezer) give to your being able to control[1] ML-based AGI. That's not nothing. But the real point is that if you don't think there's a way to "flip tables", then you're better off just conceding the "main line" and trying to save other possibilities, even if they're much less probable.


  1. I don't like the word "alignment". It admits too many dangerous associations and interpretations. It doesn't require them, but I think it carries a risk of distorting one's thoughts. ↩︎

Against the idea that physical limits are set in stone

To suggest that our understanding of physics will always keep changing to let us escape whatever limits appear to exist at any given time is to suggest that there are no (knowable) true laws of physics. I think that's kind of a big conclusion to reach by extrapolation, especially on a relatively short history.

Discussion with Eliezer Yudkowsky on AGI interventions

OK, so it sounds like Eliezer is saying that all of the following are very probable:

  1. ML, mostly as presently practiced, can produce powerful, dangerous AGI much sooner than any other approach.

    • The number of technical innovations needed is limited, and they're mostly relatively easy to think of.

    • Once those innovations get to a sort of base-level AGI, it can be scaled up to catastropic levels by throwing computing power at it.

    • ML-based AGI isn't the "best" approach, and it may or may not be able to FOOM, but it will still have the ability and motivation to kill everybody or worse.

  2. There's no known way to make that kind of AGI behave well, and no promising approaches.

    • Lack of interpretability is a major cause of this.

    • There is a very low probability of anybody solving this problem before badly-behaved AGI has been created and has taken over.

  3. Nonetheless, the main hope is still to try to build ML-based AGI which--

    • Behaves well

    • Is capable of either preventing other ML-based AGI from being created, or preventing it from behaving badly. Or at least of helping you to do those things.

    I would think this would require a really major lead. Even finding other projects could be hard.

  4. (3) has to be done while keeping the methods secret, because otherwise somebody will copy the intelligence part, or copy the whole thing before its behavior is under control, maybe add some minor tweaks of their own, and crank the scale knob to kill everybody.

    • Corollary: this almost impossible system has to be built by a small group, or by set of small "cells" with very limited communication. Years-long secrets are hard.

    • That group or cell system will have to compete with much larger, less constrained open efforts that are solving an easier problem. A vastly easier problem under assumptions (1) and (2).

    • A bunch of resources will necessarily get drained away from the main technical goal, toward managing that arms race.

  5. (3) is almost impossible anyway and is made even harder by (4). Therefore we are almost certainly screwed.

Well. Um.

If that's really the situation, then clinging to (3), at least as a primary approach, seems like a very bad strategy. "Dying gracefully" is not satisfying. It seems to me that--

  1. People who don't want to die on the main line should be doing, or planning, something other than trying to win an AGI race... like say flipping tables and trying to foment nuclear war or something.

    That's still not likely to work, and it might still only be a delaying tactic, but it still seems like a better, more achievable option than trying to control ultra-smart ML when you have no idea at all how to do that. If you can't win, change the game.

    and/or

  2. People who feel forced to accept dying on the main line ought to be putting their resources into approaches that will work on other timelines, like say if ML turns out not to be able to get powerful enough to cause true doom.

    If ML is really fated to win the race so soon and in such a total way, then people almost certainly can't change the main line by rushing to bolt on a safety module. They might, however, be able to significantly change less probable time lines by doing something else involving other methods of getting to AGI. And the overall probability mass they can add to survival that way seems like it's a lot more.

    The main line is already lost, and it's time to try to salvage what you can.

Personally, I don't think I see that you can turn an ML system that has say 50-to-250 percent of a human's intelligence into an existential threat just by pushing the "turbo" button on the hardware. Which means that I'm kind of hoping nobody goes the "nuclear war" route in real life.

I suspect that anything that gets to that point using ML will already be using a significant fraction of the compute available in the world. Being some small multiple of as good as a human isn't going to let you build more or better hardware all that fast, especially without getting yourself shut down. And I don't think you can invent working nanotech without spending a bunch of time in the lab unless you are already basically a god. Doing that in your head is definitely a post-FOOM project.

But I am far from an expert. If I'm wrong, which I very well could be, then it seems crazy to be trying to save the "main line" by trying to do something you don't even have an approach for, when all that has to happen for you to fail is for somebody else to push that "turbo" button. That feels like following some script for heroically refusing to concede anything, instead of actually trying to grab all the probability you can realistically get.

Resurrecting all humans ever lived as a technical problem

It's stuff like this that makes me glad this sort of thing is physically impossible.

unless we also repair their will to live.

There are two ways to look at that. Either you are not resurrecting the target, but instead creating somebody who is "off by one" from the target... or you are engaging in nonconsensual mind control.

From where I am standing, resurrecting somebody who is "no different except that they want to live" looks very much like resurrecting somebody who is "no different except that they want to have sex with you", or "no different except that they love the dictator".

Can we possibly agree that mind control is not OK?

Can't a person even die without being harassed by busybodies?

While you're at it, don't go around simulating/constructing all possible human minds, without regard for their individual preferences. It would be rude. And twisted charicatures that have weird desires to be there inserted into otherwise incompatible mind states would be pretty fucking creepy, too.

Converting them into decent people will be another huge problem.

No, sorry, that would not be OK, either.

An Unexpected Victory: Container Stacking at the Port of Long Beach

A wild idea that I haven’t heard proposed, which won’t solve our short term problems but does seem like a good idea, is how about we create a new port? If I asked you exactly where not to try and hire a bunch of people and especially not to drive a truck away from efficiently, and also to not try to expand into more space and capacity for the future, in all the Western part of the land, I’m pretty sure that my two answers would have been Los Angeles and San Francisco. Not that they are bad places for ports, but they’re where all the people and high prices and land scarcity and traffic are right now.

Um, I don't really know for sure about Los Angeles, but the whole city of San Francisco is there because it's geographically a really good place for a port. And I would suspect that LA is the same way. Harbors create big cities. And then huge road and rail networks get built to serve them... over decades and centuries.

Even if a big container port magically materialized in Morro Bay (and I have no idea whether that's a suitable place from the ocean's point of view), there'd be no way to move the freight to or from it. But in fact you'd have to start by creating the infrastructure to move the material to build the port itself.

I mean, if you have a magic wand, then instead of moving ports that require massive infrastructure because people have cluttered up the areas with a bunch of software businesses that could operate anywhere, why not just relocate the software businesses?

Load More