I’m not sure I understand your weighting argument. Some capabilities are “convergently instrumental” because they are useful for achieving a lot of purposes. I agree that AIs construction techniques will target obtaining such capabilities, precisely because they are useful.

But if you gain a certain convergently instrumental capability, it then automatically allows you to do a lot of random stuff. That’s what the words mean. And most of that random stuff will not be safe.

I don’t get what the difference is between “the AI will get convergently instrumental c... (read more)

I don't see much of a disagreement here? I'm just saying that the way in which random things are accelerated is largely via convergent stuff; and therefore there's maybe some way that one can "repurpose" all that convergent stuff towards some aligned goal. I agree that this idea is dubious / doesn't obviously work. As a contrast, one could imagine instead a world in which new capabilities are sort of very idiosyncratic to the particular goal they serve, and when you get an agent with some goals, all its cognitive machinery is idiosyncratic and hard to parse out, and it would be totally infeasible to extract the useful cognitive machinery and repurpose it.

Exactly. You can’t generalize from “natural” examples to adversarial examples. If someone is trying hard to lie to you about something, verifying what they say can very well be harder than finding the truth would have been absent their input, particularly when you don’t know if and what they want to lie about.

I’m not an expert in any of these and I’d welcome correction, but I’d expect verification to be at least as hard as “doing the thing yourself” in cases like espionage, hacking, fraud and corruption. 

AI accelerates the timetable for things we know how to point AI at

It also accelerates the timetable for random things that we don’t expect and don’t even try to point the AI at but that just happen to be easier for incrementally-better AI to do.

Since the space of stuff that helps alignment seems much smaller than the space of dangerous things, you’d expect most things the AI randomly accelerates without us pointing it at will be dangerous.

Not exactly, because it's not exactly "random things", it's heavily weighted on convergently instrumental things. If you could repurpose ~all the convergently instrumental stuff that randomly targeted AI can do towards AI alignment, like I think Christiano is trying to do, then you'd have a pretty strong coupling. Whether you can do that though is an open question, whether that would be sufficient is an open question.

See above. Don’t become a munitions engineer, and, being aware that someone else will take that role, try to prevent anyone from taking that role. (Hint: That last part is very hard.)

The conclusions might change if planet-destroying bombs are necessary for some good reason, or if you have the option of safely leaving the planet and making sure nobody that comes with you will also want to build planet-destroying bombs. (Hint: That last part is still hard.)

For what it’s worth, the grammar and spelling was much better than is usual for even the native English part of the Internet. That’s probably fainter praise than it deserves, I don’t remember actually noticing any such fault, which probably means there are few of them.

The phrasing and wording did sound weird, but I guess that’s at least one reason why you’re writing, so congratulations and I hope you keep it up! I’m quite curious to see where you’ll take it.

Indeed, the only obvious “power” Harry has that is (as far as we know) unique to him is Partial Transfiguration. I’m not sure if Voldie “knows it not”; as someone mentioned last chapter, Harry used it to cut trees when he had his angry outburst in the Forbidden Forest, and in Azkhaban as well. In the first case Voldie was nearby, allegedly to watch out for Harry, but far enough that to be undetectable via their bond, so it’s possible he didn’t see what exact technique Harry used. In Azkhaban as well he was allegedly unconscious.

I can’t tell if he could ha... (read more)

It has been established that air can't be Transfigured due to the constant motion of its particles; they don't hold still long enough for you to Transfigure them.

Well, we only know that Harry feels doom when near Q and/or his magic, and that in one case in Azkhaban something weird happened when Harry’s Patronus interacted with what appeared to be an Avada Kedavra bolt, and that Q appears to avoid touching Harry.

Normally I’d say that faking the doom sensations for a year, and faking being incapacitated while trying to break someone out of Azkhaban, would be too complicated. But in this case...

Thank you, that was very interesting!

I sort of get your point, but I’m curious: can you imagine learning (with thought-experiment certainty) that there is actually no reality at all, in the sense that no matter where you live, it’s simulated by some “parent reality” (which in turn is simulated, etc., ad infinitum)? Would that change your preference?

I can imagine many things, including that one, but I am unconcerned with how I might react to them. Eliezer Yudkowsky, "A Technical Explanation of Technical Explanation"

most "earthlike" planets in habitable zones around sunlike stars are on average 1.8 Billion years older than the Earth

How do you know? (Not rhethorical, I have no idea and I’m curious.)

It was in a paper I read. Here it is

If the final goal is of local scope, energy acquisition from out-of-system seems to be mostly irrelevant, considering the delays of space travel and the fast time-scales a strong AI seems likely to operate at. (That is, assuming no FTL and the like.)

Do you have any plausible scenario in mind where an AI would be powerful enough to colonize the universe, but do it because it needs energy for doing something inside its system of origin?

I might see one perhaps extending to a few neighboring systems in a very dense cluster for some strange reason, but I can’t... (read more)

Any unbounded goal in the vein of 'Maximize concentration of in this area' has local scope but potentially unbounded expenditure necessary. Also, as has been pointed out for general satisficing goals (which most naturally local-scale goals will be); acquiring more resources lets you do the thing more to maximize the chances that you have properly satisfied your goal. Even if the target is easy to hit, being increasingly certain that you've hit it can use arbitrary amounts of resource.

Honestly, I can’t really find anything significant in this comment I disagree with.

It's a bit like opening a thread arguing that the Spanish inquisition was right for torturing nonbelievers because they they acted under the assumption that they could save souls from eternal damnation by doing so.

But the OP didn’t argue in support of torturing people, as far as I can tell. In the terms of your analogy, my reading was of the OP was a bit like:

“Hey, if the Spanish Inquisition came to you and offered the following two options, would you pick either of them, or refuse both? The options are (1) you’re excommunicated, then you get all the ca... (read more)

My example about the Spanish Inquisition was supposed to indicate that it assumes that God exists does certain things. Those aren't beliefs that any reasonable person holds. If you judge the actions of the Spanish inquisition while presuming that their beliefs are true you miss the core issue, that their beliefs aren't true. The OP did advocate certain beliefs about the nature of memory and experience that I consider wrong. We live in a world where people make real decisions about tradeoff between experience and memories. I do think you are likely to get those decisions wrong if you train yourself to think about memory based on thought experiments that ignore how memory and experience works. You don't get an accurate idea about memory by ignoring scientific research about memory. If you want to discuss examples, there are a bunch of real world examples where you increase the pain that people experience but don't give them painful memories. Discussing them based on what we know from scientific research would bring you much more relevant knowledge about the nature of memory. Saying that you are unsure about memory and then assume that memory works a certain way is not a good road to go if you want to understand it better. Especially when you are wrong about how memory works in the first place.

Sure, but then why do you expect memory and experience would also behave in a common sense manner? (At least, that’s what I think you did in your first comment.)

I interpreted the OP as “I’m confused about memory and experience; let’s try a thought experiment about a very uncommon situation just to see what we think it would happen”. And your first comment reads to me as “you picked a bad thought experiment, because you’re not describing a common situation”. Which seems to completely miss the point, the whole purpose of the thought experiment was to investi... (read more)

If you are confused about memory then go read cognitive psychology. It's a science that among other things studies memory. Don't engage in thought experiments based on flawed folk psychology concepts of memory when science is available. It's simply the history of the subject. Doctors did surgery on small children without full anesthesia because children won't remember anyway. We do live today (or at least a decade ago) in a world where people inflict pain and then erase the memories of the experience and argue that it means that the pain they inflicted doesn't matter. It's a bit like opening a thread arguing that the Spanish inquisition was right for torturing nonbelievers because they they acted under the assumption that they could save souls from eternal damnation by doing so.

Once AI is developed, it could "easily" colonise the universe.

I was wondering about that. I agree with the could, but is there a discussion of how likely it is that it would decide to do that?

Let’s take it as a given that successful development of FAI will eventually lead to lots of colonization. But what about non-FAI? It seems like the most “common” cases of UFAI are mistakes in trying to create an FAI. (In a species with similar psychology to ours, a contender might also be mistakes trying to create military AI, and intentional creation by... (read more)

Energy acquisition is a useful subgoal for nearly any final goal and has non-starsystem-local scope. This makes strong AIs which stay local implausible.

The problem with that is that life on Earth appeared about 4 billion years ago, while the Milky Way is more than 13 billion years old. If life were somewhat common, we wouldn’t expect to be the first, because there was time for it to evolve several times in succession, and it had lots of solar systems where it could have done it.

A possible answer could be that there was a very strong early filter during the first part of the Milky Way’s existence, and that filter lessened in intensity in the last few billion years.

The only examples I can think of are elem... (read more)

Even within the Milky Way, most "earthlike" planets in habitable zones around sunlike stars are on average 1.8 Billion years older than the Earth. If the "heavy bombardment" period at the beginning of a rocky planet's life is approximately the same length for all rocky stars, which is likely, then each of those 11 Billion potentially habitable planets still had 1.8 billion years during which life could have formed. On Earth, life originated almost immediately after the bombardment ended and the earth was allowed to cool. Even if the probability of each planet developing life in a period of 1 Billion years is mind-bogglingly low, we still should expect to see life forming on some of them given 20 Billion Billion planet-years.

Your rephrasing essentially says that you torture an identical copy of a person for a week.

If you read it carefully, my first rephrasing actually says that you torture the original person for a week, and then you (almost) perfectly erase their memories (and physical changes) during that week.

This is not changing the nature of the thought experiment in the OP; it is exactly the same experiment, plus a hypothetical example of how it could be achieved technically, because you implied that the experiment in the OP is impossible to achieve and thus ill-pose... (read more)

This depends very much on the definition of "original" and notions of identity. You can't expect that they behave in a common sense manner in such a thought experiment.

It seems rather silly to argue about that, when the thought experiment starts with Omega and bets for amounts of a billion dollars. That allows glossing over a lot of details. Your position is like objecting to a physics thought experiment that assumes frictionless surfaces, while the same thought experiment also assumes mass-less objects.

As a simple example: Omega might make a ridiculously precise scan of your entire body, subject you to the experiment (depending on which branch you chose), then restore each molecule to the same position and state it was ... (read more)

If the goal of the thought experiment is to think about the notion of mass and how it affects frictions that's indeed a bad thought experiment. Your rephrasing essentially says that you torture an identical copy of a person for a week. It raises all sorts of issues around identity and copying but it ceases to be an experiment that's about memory.

perhaps costly, but worth the price

How about extending the metaphor and calling these techniques "Rituals" (they require a sacrifice, and even though it’s not as “permanent” as in HPMOR, it’s usually dangerous), reserving “Dark” for the arguably-immoral stuff?

The nice thing about hacking instrumental goals into terminal goals is that while they’re still instrumental you can easily change them.

In your case: You have the TG of becoming fit (BF), and you previously decided on the IG of going to the gym (GG). You’re asking about how to turn GG into a TG, which seems hard.

But notice that you picked GG as an instrument towards attaining BF before thinking about Terminal Goal Hacking (TGH), which suggests it’s not optimal for attainging BF via TGH. The better strategy would be to first ask yourself if another IG woul... (read more)

It doesn’t work if you just click the link, but if you copy the link address and paste it in a browser then it works. (Because there isn’t a referrer header anymore.)

Medical issues that make life miserable but can be fixed with ~1M$ would be a (bit more concrete) example. Relatively rare, as you said.

I have a rare but recurring dream that resembles very much what you describe.

There's no good reason to assume

I agree, but I’m not sure the examples you gave are good reasons to assume the opposite. They’re certainly evidence of intelligence, and there are even signs of something close to self-awareness (some species apparently can recognize themselves in mirrors).

But emotions are a rather different thing, and I’m rather more reluctant to assume them. (Particularly because I’m even less sure about the word than I am about “intelligence”. But it also just occurred to me that between people emotions seem much easier to fake than i... (read more)

Personally I agree, but if I were a devil I’d just fall in love with the kind of double-think you’d need to . After all, I wouldn’t actually want to suppress faith, I’d just want to create in people’s minds associations between atheism and nice places like Stalinist Russia. Phrases like “scientific socialism” would just send nice little shivers of pleasure down any nice devil’s spine, wouldn’t they?

Funny how if I were a devil, and I tried to make the world miserable through faith, and I were getting concerned about those dangerous anti-faith ideas, I’d try to create a horrible political regime based on suppressing faith ;)

Stalin's soviet Russia suppressed religion, not faith. It wasn't exactly a haven for critical thinking.

I see your point (I sometimes get the same feeling), but if you think about it, it’d be much more astonishing if someone built a universal computer before having the idea of a universal computer. It’s not really common to build something much more complex than a hand ax by accident. Natural phenomena are often discovered like that, but machines are usually imagined a long time before we can actually build them.

Yeah, that's a good point. Turing must have been one of the first people to realize that there's a "maximum amount of flexibility" a computer can have, so to speak, where it's so flexible it can do anything that any computer can.

Everyone can remember a phone number because it's three numbers, where they might have problems remembering ten separate digits

This is slightly irrelevant, but for some reason I can’t figure out at all, pretty much all phone numbers I learned (and, incidentally, the first thirty or so decimals of π) I learned digit-by-digit rather than in groups. The only exception was when I moved to France, I learned my french number by-separate-digits (i.e., five-eight instead of fifty-eight) in my native language but grouped in tens (i.e., by pairs) in French. This isn’t a characteristic of my native language, either, nobody even in my family does this.

I once had memorized the periodic table to 54 places (Xenon) by name, as a sequence with a few numeral fixed points. This helped me in High-school chemistry. Lost some chunks of the higher parts, but I have intuits about most anything in the periodic table. Some of this is visual memory. I memorized that as a verbal thing initially, kinda like the alphabet song (which I know a large number of people still sing internally when they need to sort stuff lexicographically). But even the alphabet I have with sucess moved partially to visual memory. IMO, visual memory is an underused resource to audiotorial thinkers (like myself) and probably vice versa.

You’re right, I remember now.

Hmm, it still sounds like they should be used more often. If you’re falsely accused and about to be condemned to Azkhaban, wouldn’t you sacrifice a portion on of your magic if it could compel your accuser to confess? As corrupt as the Wizengamot is, it should still happen on occasion.

Yeah, but I think he was mentioned before (and he shows up in most of the guards books). Vetinari is awesome in kind of an obvious way, but he’s not very relevant outside the city. (Well, except for a few treaties with dwarves and the like.)

In contrast, Granny (and sometimes the other witches) arguably saved the entire world several times. There are other characters who do that, but it’s more... luck I guess. The witches actually know what they’re doing, and work hard to achieve their goals.

(For example, though it’s never explicitly said, I got a very strong suspicion that Granny remained a life-long virgin specifically because she expected that it might be useful against unicorns.)

I’ve seen people here repeatedly mention the city watch books, but I’m surprised the witches books are almost never mentioned. Seriously, am I the only one who thought Granny Weatherwax and her team are basically the most useful people on the disc?

I love love love the Granny books. And if you only read one of them, I'd make it Witches Abroad. When I started my blog and wasn't sure what to write about, I did a sequence of posts on Granny.
Also Vetinari is basically the best thing ever.
Equal Rites is pretty bad and I like to get people to start from the starts of sequences. So Guards Guards

Perhaps, although “story logic” can imply parents being willing to sacrifice for their children. That’s a problem with thinking of the world in terms of stories, you can find a trope to justify almost anything. Authors always can (and often do) pull deus ex machinas out of their nether regions.

I wouldn’t be surprised if it did happen, at least once or twice. After all, it happened with the adults too, e.g. Juergen or whatever his name was.

Well, yes, but the whole point of building AI is that it work for our gain, including deciding what that means and how to balance between persons. Basically if you include in “US legal system” all three branches of government, you can look at it as a very slow AI that uses brains as processor elements. Its friendliness is not quite demonstrated, but fortunately it’s not yet quite godlike.

A couple more recent thoughts:

  • Dodging Deatheaters (at least competent ones) on a broom is not something I expect to happen in MoR. Well, not unless it’s rocket powered, and I wouldn’t expect that to work more than once either.

  • Most of the big, non-line-of-sight weapons we (muggles) have arose for the purpose of killing lots of people in big battles (even though we’re using them of other stuff now), which isn’t really useful for wizards due to their low numbers, but:

  • The Interdict of Merlin is MoR-specific, and at the beginning of the Ministry chapters

... (read more)

Well, she certainly was, and plausibly will be. I’m not quite sure about is, but it’s mostly because my intuition seems to think that either evaluating age() on a currently-not-living person should throw an IllegalStateException, or comparing its result with that for living persons should throw ClassCastException. But that’s probably just me :)

Well, regardless of whatever other plotting among each other, all participants actually do have a very good reason to join—their kids still go to Hogwarts, they want to keep them safe but at the same time groom them to inherit the family fortunes, and as was pointed out explicitly in the chapter, there are still good reasons, both politically and for safety, not to go to the other schools. An (at least temporary) alliance for the protection of their children is actually quite logical for all concerned.

Well, Hermione is (well, was) slightly older than Harry, and she seemed to have entered the romantic stage already. A couple years to let Harry catch up might not be such a bad thing.


I agree with your analysis, but I also thought this was intended as a straightforward signal to the other students that “we have to fight for ourselves” is not just the usual adult “lording over” the kids. I think it was meant to reinforce solidarity, defuse instinctive teenage rebellion against “the adults’ rules”, and also reinforces the message that the professors are no longer to be trusted to handle things.

This makes sense, but thinking along the same lines, I would see a lot of the upperclassmen getting upset at being told what to do by firsties.

I think the rapid expansion when the transfiguration ends would be enough to set it off.

Also: Metallic sodium and potassium, as well as phosphorus, are quite reactive with human tissue. For bonus points you can make a mixed projectile with two transfigurations, e.g. a core transfigured from ice surrounded by a shell transfigured from sodium, which will explode once the two transfigurations end.

To be fair, it ate her legs, not just her feet.

To be even fairer, that might be just because the legs were bite-sized, and polite trolls are taught by their mothers not to nibble their food.

only person Dumbledore knows and has access to that really matters to Harry

Well, he could have killed Harry’s parents. It might not trigger Harry’s “kill death by any means necessary” reaction, but then I don't think anyone would have anticipated that in-universe, given that even Q was surprised by the prophecy.

Point. That said, I suspect that to Dumbledore Hermione's self-proclaimed hero status automatically signals "willing to die for the cause", whereas Harry's parents are innocent bystanders in every possible way.

but of higher status than craftsmen and peasants.

I don’t think that was intrinsic to being a merchant, just a consequence of (some of them) being richer.

With regards to (2), I think you’re confusing first-year war games with actual combat magic.

Actual “I really want to kill you” spells are probably much more powerful. Fiendfyre for example has at least the destructive potential of a tank, and in canon even Goyle could cast it. (It’s hard to control, but then again so is a tank.) Avada Kedavra can probably kill you even through a nuclear bunker wall, and it can be used by at least some teenagers. Sectumsempra is probably a instant-kill against a muggle, even with body armor, and it was invented by Snape wh... (read more)

Good points, all. Fiendfyre seems robust. I might counter that most combat magic, even the adult sort, seems to be line-of-sight, which is a huge handicap. It also seems to be very inaccurate. If Harry & Co can literally dodge Deatheaters on foot and brooms, supersonic jets and HALO insertions are going to be really hard to target. Not to mention artillery shells in flight. And Wizards seem weirdly resistant to (biased against?) using magical heavy weapons or fire team tactics. They have a real duelist mentality. But the ability to erase from time does really trump. I concede.

ask whether wizard shields actually do prevent inert lumps of lead from hitting their caster

Almost certainly they do. Minerva mentions that guns aren’t a big threat to a prepared witch, and even if you assume she’s not really knowledgeable, I’m pretty sure someone would have tried throwing (with magic) hard, heavy things at their opponent during life-and-death fights. Or at least using bows and arrows.

In the end, at the highest level, their life is a story

I wouldn’t put it above Eliezer to find a way of having Harry be “the End of the World” literally by just ending the story somehow. But I can’t think of any explanation in that vein for destroying the stars, other than maybe breaking the ceiling in the Hogwards hall, which doesn’t fit. And style-wise it doesn’t feel right.

and Draco can attest to these under Veritaserum!

Technically speaking, Draco can only attest that Harry claimed those things. (Harry’s an Occlumens, and the way Occlumency works in MoR implies that an Occlumens is very good at lying. So he can plausibly claim that he lied to his enemies.)

I don’t remember, does Eliezer allow unbreakable vows, or are those nerfed in MoR like Felix Felicis? Because I’m pretty sure even an Occlumens can’t lie if he vows to say the truth without suffering the penalty.

IIRC unbreakable vows require some large, permanent sacrifice of magical power and as such are fairly rare in HPMoR.

I think the orbs only come to people (things that think, and can make decisions), and it’s not clear Dementors pass that test. (In particular, Harry leans against that hypothesis. He’s certainly not infallible, but he’s basically the best expert on the subject whose thoughts we have access to.)

Otherwise prophecies mentioning things like life, wands and clothes would attack everyone.

