awenonian

Posts

Sorted by New

Wiki Contributions

Comments

Yudkowsky and Christiano discuss "Takeoff Speeds"

Can I try to parse out what you're saying about stacked sigmoids? Because it seems weird to me. Like, in that view, it still seems like showing a trendline is some evidence that it's not "interesting". I feel like this because I expect the asymptote of the AlphaGo sigmoid to be independent of MCTS bots, so surely you should see some trends where AlphaGo (or equivalent) was invented first, and jumped the trendline up really fast. So not seeing jumps should indicate that it is more a gradual progression, because otherwise, if they were independent, about half the time the more powerful technique should come first.

The "what counter argument can I come up with" part of me says, tho, that how quickly the sigmoid grows likely depends on lots of external factors (like compute available or something). So instead of sometimes seeing a sigmoid that grows twice as fast as the previous ones, you should expect one that's not just twice as tall, but twice as wide, too. And if you have that case, you should expect the "AlphaGo was invented first" sigmoid to be under the MCTS bots sigmoid for some parts of the graph, where it then reaches the same asymptote as AlphaGo in the mainline. So, if we're in the world where AlphaGo is invented first, you can make gains by inventing MCTS bots, which will also set the trendline. And so, seeing a jump would be less "AlphaGo was invented first" and more "MCTS bots were never invented during the long time when they would've outcompeted AlphaGo version -1"

Does that seem accurate, or am I still missing something?

Ngo and Yudkowsky on alignment difficulty

So, I'm not sure if I'm further down the ladder and misunderstanding Richard, but I found this line of reasoning objectionable (maybe not the right word):

"Consider an AI that, given a hypothetical scenario, tells us what the best plan to achieve a certain goal in that scenario is. Of course it needs to do consequentialist reasoning to figure out how to achieve the goal. But that’s different from an AI which chooses what to say as a means of achieving its goals."

My initial (perhaps uncharitable) response is something like "Yeah, you could build a safe system that just prints out plans that no one reads or executes, but that just sounds like a complicated way to waste paper. And if something is going to execute them, then what difference is it whether that's humans or the system itself?"

This, with various mention of manipulating humans, seems to me to like it would most easily arise from an imagined scenario of AI "turning" on us. Like that we'd accidentally build a Paperclip Maximizer, and it would manipulate people by saying things like "Performing [action X which will actually lead to the world being turned into paperclips] will end all human suffering, you should definitely do it." And that this could be avoided by using an Oracle AI that just will tell us "If you perform action X, it will turn the world into paperclips." And then we can just say "oh, that's dumb, let's not do that."

And I think that this misunderstands alignment. An Oracle that tells you only effective and correct plans for achieving your goals, and doesn't attempt to manipulate you into achieving its own goals, because it doesn't have its own goals besides providing you with effective and correct plans, is still super dangerous. Because you'll ask it for a plan to get a really nice lemon poppy seed muffin, and it will spit out a plan, and when you execute the plan, your grandma will die. Not because the system was trying to kill your grandma, but because that was the most efficient way to get a muffin, and you didn't specify that you wanted your grandma to be alive.

(And you won't know the plan will kill your grandma, because if you understood the plan and all its consequences, it wouldn't be superintelligent)

Alignment isn't about guarding against an AI that has cross purposes to you. It's about building something that understands that when you ask for a muffin, you want your grandma to still be alive, without you having to say that (because there's a lot of things you forgot to specify, and it needs to avoid all of them). And so even an Oracle thing that just gives you plans is dangerous unless it knows those plans need to avoid all the things you forgot to specify. This was what I got out of the Outcome Pump story, and so maybe I'm just saying things everyone already knows... 

The Useful Idea of Truth

"Since my expectations sometimes conflict with my subsequent experiences, I need different names for the thingies that determine my experimental predictions and the thingy that determines my experimental results. I call the former thingies 'beliefs', and the latter thingy 'reality'."

I think this is a fine response to Mr. Carrico, but not to the post-modernists. They can still fall back to something like "Why are you drawing a line between 'predictions' and 'results'? Both are simply things in your head, and since you can't directly observe reality, your 'results' are really just your predictions of the results based off of the adulterated model in your head! You're still just asserting your belief is better."

The tack I came up with in the meditation was that the "everything is a belief" might be a bit falsely dichotomous. I mean, it would seem odd, given that everything is a belief, to say that Anne telling you the marble is in the basket is just as good evidence as actually checking the basket yourself. It would imply weird things like, once you check and find it in the box, you should be only 50% sure of where the marble is, because Anne's statement is weighed equally. 

(And thought it's difficult to put my mind in this state, I can think of this as not in service of determining reality, but instead as trying to inform my belief that, after I reach into the box, I will believe that I am holding a marble.)

Once you concede that different beliefs can weigh as different evidence, you can use Bayesian ideas to reconcile things. Something like "nothing is 'true' in the sense of deserving 100% credence assigned to it (saying something is true really does just mean that you really really believe it, or, more charitably, that belief has informed your future beliefs better than before you believed it), but you can take actions to become more 'accurate' in the sense of anticipating your future beliefs better. While they're both guesses (you could be hallucinating, or something), your guess before checking is likely to be worse, more diluted, filtered through more layers from direct reality, than your guess after checking."

I may be off the mark if the post-modernist claim is that reality doesn't exist, not just that no one's beliefs about it can be said to be better than anyone else's.

"Rational Agents Win"

In a Newcombless problem, where you can either have $1,000 or refuse it and have $1,000,000, you could argue that the rational choice is to take the $1,000,000, and then go back for the $1,000 when people's backs were turned, but it would seem to go against the nature of the problem.

In much the same way, if Omega is a perfect predictor, there is no possible world where you receive $1,000,000 and still end up going back for the second. Either Rachel wouldn't have objected, or the argument would've taken more than 5 minutes, and the boxes disappear, or something.

I'm not sure how Omega factors in the boxes' contents in this "delayed decision" version. Like, let's say Irene is will, absent external forces, one box, and Rachel, if Irene receives $1,000,000, will threaten Irene sufficiently to take the second box, and will do nothing if Irene receives nothing. (Also they're automatons, and these are descriptions of their source code, and so no other unstated factors are able to be taken into account)

Omega simulates reality A, with the box full, sees that Irene will 2 box after threat by Rachel.

Omega simulates reality B, with the box empty, and sees that Irene will 1 box.

Omega, the perfect predictor, cannot make a consistent prediction, and, like the unstoppable force meeting the immovable object, vanishes in a puff of logic.

I think, if you want to aim at this sort of thing, the better formulation is to just claim that Omega is 90% accurate. Then there's no (immediate) logical contradiction in receiving the $1,000,000 and going back for the second box. And the payoffs should still be correct.

1 box: .9*1,000,000 + .1*0 = 900,000

2 box: .9*1,000 + .1*1,001,000 = 101,000

I expect that this formulation runs folly of what was discussed in this post around the Smoking Lesions problem, where repeated trials may let you change things you shouldn't be able to (in their example, if you chose to smoke every time, then if the correlation between smoking and lesions was held, then you can change the base rate of the lesions).

That is, I expect that if you ran repeated simulations, to try things out, then strategies like "I will one box, and iff it is full, then I will go back for the second box" will make it so Omega is incapable of predicting at the proposed 90% rate.

I think all of these things might be related to the problem of embedded agency, and people being confused (even if they don't put it in these terms) that they have an atomic free will that can think about things without affecting or being affected by the world. I'm having trouble resolving this confusion myself, because I can't figure out what Omega's prediction looks like instead of vanishing in a puff of logic. It may just be that statements like "I will turn the lever on if, and only if, I expect the lever to be off at the end" are a nonsense decision criteria. But the problem as stated doesn't seem like it should be impossible, so... I am confused.

This Can't Go On

In much the same way, estimates of value and calculations based on the number of permutations of atoms shouldn't be mixed together. There being a googleplex possible states in no way implies that any of them have a value over 3 (or any other number). It does not, by itself, imply that any particular state is better than any other. Let alone that any particular state should have value proportional to the total number of states possible.

Restricting yourself to atoms within 8000 light years, instead of the galaxy, just compounds the problem as well, but you noted that yourself. The size of the galaxy wasn't actually a relevant number, just a (maybe) useful comparison. It's like when people say that chess has more possible board states than there are atoms in the observable universe times the number of seconds since the Big Bang. It's not that there's any specifically useful interaction between atoms and seconds and chess, it's just to recognize the scale of the problem.

This Can't Go On

I still think the argument holds in this case, because even computer software isn't atom-less. It needs to be stored, or run, or something somewhere.

I don't doubt that you could drastically reduce the number of atoms required for many products today. For example, you could in future get a chip in your brain that makes typing without a keyboard possible. That chip is smaller than a keyboard, so represents lots of atoms saved. You could go further, and have that chip be an entire futuristic computer suite, by reading and writing your brain inputs and outputs directly it could replace the keyboard, mouse, monitors, speakers, and entire desktop, plus some extra stuff, like also acting as a VR Headset, or video game console, or whatever. Lets say you manage to squeeze all that into a single atom. Cool. That's not enough. For this growth to go on for those ~8000 years, you'd need to have that single-atom brain chip be as valuable as everything on Earth today. Along with every other atom in the galaxy

I think at some point, unless the hottest thing in the economy becomes editing humans to value specific atoms arbitrary amounts (which sounds bad, even if it would work), you can't get infinite value out of things. I'm not even sure human minds have the capability of valuing things infinitely. I think even with today's economy, you'd start to hit some asymptotes (i.e. if one person had everything in the world, I'm not sure what they'd do with it all. I'm also not sure they'd actually value it any more than if they just had 90% of everything, except maybe the value on saying "I have it all", which wouldn't be represented in our future economy)

And still, the path to value per atom has to come from somewhere, and in general it's going to be making stuff more useful, or smaller, but there's only so useful a single atom can be, and there's only so small a useful thing can be. (I imagine some math on the number of ways you could arrange a set of particles, multiplied by the number of ways a particular arrangement could be used, as an estimate. But a quick guess says that neither of those values are infinite, and, I expect that number to be dominated by ways of arranging particles, not by number of uses, considering that even software on a computer is actually different arrangements of the electrons.)

So I guess that's the heart of it to me, there's certainly a lot more value we can squeeze out of things, but if there's not literally infinite, it will run out at some point, and that ~8000 year estimate is looking pretty close to whatever the limit is, if it's not already over it.

2-Place and 1-Place Words

I think this is a useful post, but I don't think the water thing helped in understanding:

"In the Twin Earth, XYZ is "water" and H2O is not; in our Earth, H2O is "water" and XYZ is not."

This isn't an answer, this is the question. The question is "does the function, curried with Earth, return true for XYZ, && does the function, curried with Twin Earth, return true for H2O?"

Now, this is a silly philosophy question about the "true meaning" of water, and the real answer should be something like "If it's useful, then yes, otherwise, no." But I don't think this is a misunderstanding of 2-place functions. At least, thinking about it as a 2-place function that takes a world as an argument doesn't help dissolve the question.

I was thinking about applying the currying to topic, instead of world, (e.g. "heavy water" returns true for an isWater("in the ocean"), but not for an isWater("has molar mass ~18")), but this felt like a motivated attempt to apply the concept, when the answer is just [How an Algorithm Feels from the Inside](https://www.lesswrong.com/posts/yA4gF5KrboK2m2Xu7/how-an-algorithm-feels-from-inside).

Either way, the Sexiness example is better.

The Point of Trade

I feel like I might be being a little coy stating this, but I feel like "heterogeneous preferences" may not be as inadequate as it seems. At least, if you allow that those heterogeneous preferences are not only innate like taste preference for apples over oranges.

If I have a comparative advantage in making apples, I'm going to have a lot of apples, and value the marginal apple less than the marginal orange. I don't think this is a different kind of "preference" than liking the taste of oranges better: Both base out in me preferring an orange to an apple. And so we engage in trade for exactly that reason. In fact, I predict we stop trading once I value the marginal apple more than the marginal orange (or you, vice versa), regardless of the state of my comparative advantage. (That is, in a causal sense, the comparative advantage is covered by my marginal value assignments. My comparative advantage may inform my value assignments, but once you know those, you don't need to know if I still have a comparative advantage or not for the question "will I trade apples for oranges?")

Comparative advantage is why trade is useful, but I don't know if it's really accurate to say that heterogeneous preferences are an "inadequate answer to why we trade."

Introduction To The Infra-Bayesianism Sequence

I'm glad to hear that the question of what hypotheses produce actionable behavior is on people's minds. 

I modeled Murphy as an actual agent, because I figured a hypothesis like "A cloaked superintelligence is operating the area that will react to your decision to do X by doing Y" is always on the table, and is basically a template for allowing Murphy to perform arbitrary action Y.

I feel like I didn't quite grasp what you meant by "a constraint on Murphy is picked according to this probability distribution/prior, then Murphy chooses from the available options of the hypothesis they picked"

But based on your explanation after, it sounds like you essentially ignore hypotheses that don't constrain Murphy, because they act as an expected utility drop on all states, so it just means you're comparing -1,000,000 and -999,999, instead of 0 and 1. For example, there's a whole host of hypotheses of the form "A cloaked superintelligence converts all local usable energy into a hellscape if you do X", and since that's a possibility for every X, no action X is graded lower than the others by its existence.

That example is what got me thinking, in the first place, though. Such hypotheses don't lower everything equally, because, given other Laws of Physics, the superintelligence would need energy to hell-ify things. So arbitrarily consuming energy would reduce how bad the outcomes could be if a perfectly misaligned superintelligence was operating in the area. And, given that I am positing it as a perfectly misaligned superintelligence, we should both expect it to exist in the environment Murphy chooses (what could be worse?) and expect any reduction of its actions to be as positive of changes as a perfectly aligned superintelligence's actions could be, since preventing a maximally detrimental action should match, in terms of Utility, enabling a maximally beneficial action. Therefore, entropy-bombs.

Thinking about it more, assuming I'm not still making a mistake, this might just be a broader problem, not specific to this in any way. Aren't I basically positing Pascal's Mugging?

Anyway, thank you for replying. It helped.

Core Pathways of Aging

I'm still confused. My biology knowledge is probably lacking, so maybe that's why, but I had a similar thought to dkirmani after reading this: "Why are children born young?" Given that sperm cells are active cells (which should give transposons opportunity to divide), why do they not produce children with larger transposon counts? I would expect whatever sperm divide from to have the same accumulation of transposons that causes problems in the divisions off stem cells. 

Unless piRNA and siRNA are 100% at their jobs, and nothing is explicitly removing transposons in sperm/eggs better than in the rest of the body, then surely there should be at least a small amount of accumulation of transposons across generations. Is this something we see?

I vaguely remember that women are born with all the egg cells they'll have, so, if that's true, then maybe that offers a partial explanation (only half the child genome should be as infected with transposons?). I'm not sure it holds water, because since egg cells are still alive, even if they aren't dividing more, they should present opportunities for transposons to multiply.

Another possible explanation I thought of was that, in order to be as close to 100% as possible, piRNA and siRNA work more than normal in the gonads, which does hurt the efficacy of sperm, but because you only need 1 to work, that's ok. Still, unless it is actually 100%, there should be that generational accumulation.

This isn't even just about transposons. It feels like any theory of aging would have to contend with why sperm and eggs aren't old when they make a child, so I'm not sure what I'm missing.

Load More