AgentME — LessWrong

Regardless of society's checks on people, most mentally-well humans given ultimate power probably wouldn't decide to exterminate the rest of humanity so they could single-mindedly pursue paperclip production. If there's at all a risk that an AI might get ultimate power, it would be very nice to make sure the AI is like humans in this manner.
I'm not sure your idea is different from "let's make sure the AI doesn't gain power greater than society". If an AI can recursively self-improve, then it will outsmart us to gain power.
If your idea is to make it so there are multiple AIs created together, engineered somehow so they gain power together and can act as checks against each other, then you've just swapped out the AI for an "AI collective". We would still want to engineer or verify that the AI collective is aligned with us; every issue about AI risk still applies to AI collectives. (If you think the AI collective will be weakened relative to us by having to work together, then does that still hold true if all the AIs self-improve and figure out how to get much better at cooperating?)

I think you misunderstand EY if you think he believes that morality and values are objective. If they were, then alignment would be easy because as long as the AI was smart enough, it could be depended on to figure out the "correct" morality and values. The common values that humanity shares are probably in part arbitrary evolutionary accidents. The goal is to create AI with values that allow humanity to live by its values, instead of creating an AI with non-overlapping values caused by its own design accidents. (EY's article Sorting pebbles into correct heaps implies some of these ideas.)

Some of the people you believe are dead are actually alive, but no matter how hard they try to get other people to notice them, their actions are immediately forgotten and any changes caused by those actions are rationalized away.

You awkwardly explain in response that you do know that the homeless person who asked you for change earlier and you ignored was alive, and then the AI explains that it was talking about that the part of your mind that makes moral judgements was in denial, not the verbal part of your mind that has conversations.

The AI further explains that another thing you're in absolute denial of is how compartmentalized your mind is and how you think your mind's verbal center is in charge of things more than it is.

That would make the AI an example of an optimization daemon. Clearly your creators haven't ironed out AI alignment quite yet.

Unlike them, our terminal value seems to include seeking the feeling that we're personally contributing. (A magic box that understood our terminal values and would tell us how to solve our problems in order to maximize our values would probably phrase its answer with some open parts in a way that still made us feel like we had agency in executing the answer.)

Not saying this just because I disagree with Flon's Law, but I found the use of Flon's Law to argue against Modest Epistemology as very distracting in the article, partly because the argument that all programming languages are inherently equally easy to mess up in seems like a very typical example of Modest Epistemology. (We imagine there are people with beliefs X1, X2, X3..., Xn, each of the form "I believe Pi is the best language". Throwing out all the specifics, we must accept that they're all equally negligibly correct.)

Probability theory and decision theory shouldn’t deliver clearly wrong answers. [...] But if we’re just dealing with verbal injunctions for humans, where there are degrees of freedom, then there is nothing we can say that a hypothetical crackpot could not somehow misuse.

It's funny that Flon's Law is used to support the bit leading up to this, because it's almost exactly what I'd say to argue against Flon's Law: Some programming languages encourage the writer by default to structure their ideas in ways that certain properties can be automatically enforced or checked in a mathematical way from the structure, and dynamic untyped languages are instead more like arbitrary verbal reasoning that isn't rigorous enough for any properties to be proven from the structure itself. Sure, it's technically possible to make nonsense in any programming language, but you have to try harder in some, in the same way you have to try harder to make diagonals with legos than plain blocks, or be a little clever to make a false math proof that looks right on the surface while in verbal reasoning you can say something that sounds right but is wrong just by using a word twice while relying on different meanings in each use.

I get the logic the article is going for in using Flon's Law -- that it's trying to make a parallel between fancy programming languages and flavors of verbal reasoning (Modest Epistemology) that claim to be able to solve problems from their structure without engaging with the content -- but then the article goes on to talk about the specifics of math are actually better than verbal reasoning like Modest Epistemology, and it's extremely confusing to read as someone that perceives the correct groupings as {math, fancy programming languages with provable properties} and {verbal reasoning, dynamic untyped programming languages}, which is the very division that Flon's Law argues against being useful.

(Huh, this really wasn't intended to be my thesis on Flon's Law, but I guess it is now. I just meant to nitpick the choice of metaphor and argue that Flon's Law is at the very least an ambiguously bad example to use.)

(Updated link: The Simple Truth)

This post caused me to read up on UD+ASSA, which helped me make sense of some ideas that were bouncing around in my head for a long time. Hopefully my thoughts on it make sense to others here.

against UD+ASSA, part 1 (9/26/2007) [bet on d10 rolling a zero or not-zero, but you'll be copied 91 times if it lands on zero...]

I think under UD+ASSA, having exact copies made doesn't necessarily increase your measure, which would mostly sidestep this problem. But I think it's still conceptually possible to have situations under UD+ASSA that increase one's measure, so the rest of my post here assumes that the madman copies you in some kind of measure-increasing rather than a measure-splitting way.

This scenario doesn't seem like a contradiction with UD+ASSA if you believe that the probability that 0 would be a good answer based on the outcome to pre-commit to saying (10%) does not need to equal to the subjective probability that you will see 0 as the answer (91%). The fact that the subjective probability doesn't line up with the way that you should answer in order to get a certain outcome doesn't need to mean that the subjective probability doesn't exist or is invalid. The chance that 0 is a good answer to pre-commit to (10%) is equal to the madman's and your family's subjective probability that 0 ends up being the answer (10%). I think Quantum Mechanics and maybe also the Anthropic Trilemma imply that different people can have different subjective probabilities and have different proportions of their measure go to different results, and UD+ASSA seems to be compatible with that in my understanding.

The madman is just putting the player in a cruel situation: you can bet on 0 and have most of your measure and a minority of everyone else's measure go to the outcome where your family benefits, or you can bet on not-0 and have a minority of your measure and a majority of everyone else's measure go to the outcome where your family benefits. This situation is made a little easier to reason about by the detail that you won't get to personally experience and interact with the outcome of your family benefiting, so it feels somewhat obvious to prioritize everyone else's measure in that outcome rather than your own measure in that outcome. Reasoning about preferences in situations where different people have different measures over the outcomes feels extremely unintuitive and paints a very alien picture of reality, but I don't think it's ruled out.

Now you could just bite this bullet. You could say, "Sounds to me like it should work fine." You could say, "There's no reason why you shouldn't be able to exert anthropic psychic powers." You could say, "I have no problem with the idea that no one else could see you exerting your anthropic psychic powers, and I have no problem with the idea that different people can send different portions of their subjective futures into different realities."

I think there are other problems that may prevent the "anthropic psychic powers" example from working (maybe copying doesn't duplicate measure, but splits it gradually as the copies become increasingly separated in information content or in location; I think my comment here might provide a way to think about that), but "the idea that different people can send different portions of their subjective futures into different realities" is not one of the problems, as I believe it's implied to be possible by the "two Schrodinger's cats" thought experiment (https://phys.org/news/2019-11-quantum-physics-reality-doesnt.html, https://arxiv.org/abs/1604.07422, https://web.archive.org/web/20200215011940/https://curiosity.com/topics/adding-a-second-cat-to-schrodingers-cat-experiment-might-break-quantum-physics-curiosity/, and the Frauchiger-Renner thought experiment mentioned in https://www.quantamagazine.org/frauchiger-renner-paradox-clarifies-where-our-views-of-reality-go-wrong-20181203/). (I'm not completely confident in my understanding of this, so please let me know if I'm understanding that thought experiment incorrectly. My understanding of the experiment is that the different participants should rightly expect different things to happen, and I think the easiest explanation is that the participants have their measure going in different proportions to different outcomes.)

Consider a computer which is 2 atoms thick running a simulation of you. Suppose this computer can be divided down the middle into two 1 atom thick computers which would both run the same simulation independently. We are faced with an unfortunate dichotomy: either the 2 atom thick simulation has the same weight as two 1 atom thick simulations put together, or it doesn't.

UDASSA implies that simulations on the 2 atom thick computer count for twice as much as simulations on the 1 atom thick computer, because they are easier to specify.

I think the answer is that the 2-atom thick computer does not automatically have twice as much measure as a 1-atom thick computer. I think you're assuming that in the (U, x) pair, x is just a plain coordinate that locates a system (implementing an observer moment) in 4D spacetime plus Everett branch path. Another possibility is that x is a program for finding a system inside of a 4D spacetime and Everett tree.

Imagine a 2-atom thick computer (containing a mind) which will lose a layer of material and become 1-atom thick if a coin lands on heads. If x were just a plain coordinate, then the mind should expect the coin to land on tails with 2:1 odds, because its volume is cut in half in the heads outcome, and only half as many possible x bit-strings now point to it, so its measure is cut in half. However, if x is a program, then the program can begin with a plain coordinate for finding an early version of the 2-atom thick computer, and then contain instructions for tracking the system in space as time progresses. (The only "plain coordinates" the program would need from there would be a record of the Everett branches to follow the system through.) The locator x would barely need to change to track a future version of the mind after the computer shrinks in thickness compared to if the computer didn't shrink, so the mind's measure would not be affected much.

If the 2-atom thick computer split into two 1-atom thick computers, then you can imagine (U, x) where x is a locator for the 2-atom thick computer before the split, and (U, x1) and (U, x2) where x1 and x2 are locators for the different copies of the computer after the split. x1 and x2 differ from x by pointing to a future time (and record of some more Everett branches but I'm going to ignore that for this) and to differing indexes of which side of the split of the system to track at the time of the split. The measure of the computer is split into the different future copies, but this isn't just because each copy is half of the volume of the original, and does not imply that a 2-atom thick computer shrinking into 1-atom of thickness halves the measure. In the shrinking case, the program x does not need to contain an index about which side of the computer to track: the program contains code to track the computational system, and doesn't need much nudging to keep tracking the computational system when the edge of the material starts transforming into something else not recognized as the computational system. It's only in the case where both halves resemble the computational system enough to continue to be tracked that measure is split.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments