If it’s worth saying, but not worth its own post, then it goes here.

Notes for future OT posters:

  1. Check if there is an active Open Thread before posting a new one (use search for Open Thread ).
  2. What accomplishments are you celebrating from the last month?
  3. What are you reading?
  4. What reflections do you have for yourself or others from the last month?
New Comment
93 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Does the Quantum Physics Sequence hold up?

It's been the better part of a decade since I read it (and I knew a lot less back then), and recently I've been curious about getting a refresher. I am not going to pick up a textbook or spend too much time on this, but if it doesn't hold up what alternative/supplementary resources would you recommend (the less math-heavy the better, although obviously some of the math is inescapable)?

I actually learnt quantum physics from that sequence, and I'm now a mathematician working in Quantum Computing. So it can't be too bad!

The explanation of quantum physics is the best I've seen anywhere. But this might be because it explained it in a style that was particularly suited to me. I really like the way it explains the underlying reality first and only afterwards explains how this corresponds with what we perceive. A lot of other introductions follow the historical discovery of the subject, looking at each of the famous experiments in turn, and only building up the theory in a piecemeal way. Personally I hate that approach, but I've seen other people say that those kind of introductions were the only ones that made sense to them.

The sequence is especially good if you don't want a math-heavy explantation, since it manages to explain exactly what's going on in a technically correct way, while still not using any equations more complicated than addition and multiplication (as far as I can remember).

The second half of the sequence talks about interpretations of quantum mechanics, and advocates for the "many-worlds" interpretation over &qu... (read more)

I also want to know this.

(This is part of a more general question: how much of the science cited in the Sequences holds up? Certainly nearly all the psychology has to be either discarded outright or tagged with “[replication needed]”, but what about the other stuff? The mockery of “neural networks” as the standard “revolutionary AI thing” reads differently today; was the fact that NNs weren’t yet the solution to (seemingly) everything essential to Eliezer’s actual points, or peripheral? How many of the conclusions drawn in the Sequences are based on facts which are, well, not factual anymore? Do any essential points have to be re-examined?)

The mockery of “neural networks” as the standard “revolutionary AI thing” reads differently today

I think the point being made there is different. For example, the contemporary question is, "how do we improve deep reinforcement learning?" to which the standard answer is "we make it model-based!" (or, I say near-equivalently, "we make it hierarchical!", since the hierarchy is a broad approach to model embedding). But people don't know how to do model-based reinforcement learning in a way that works, and the first paper to suggest that was in 1991. If there's a person whose entire insight is that it needs to be model-based, it makes sense to mock them if they think they're being bold or original; if there's a person whose insight is that the right shape of model is XYZ, then they are actually making a bold claim because it could turn out to be wrong, and they might even be original. And this remains true even if 5-10 years from now everyone knows how to make deep RL model-based.

The point is not that the nonconformists were wrong--the revolutionary AI thing was indeed in the class of neural networks--the point is that someone is mistak... (read more)

9Said Achmiz
Yes, indeed, I think your account makes sense. However, the hard question to ask is: suppose you were writing that essay today, for the first time—would you choose AI / neural networks as your example? Or, to put it another way: “These so-called nonconformists are really just conformists, and in any case they’re wrong.” and “These so-called nonconformists are really just conformists… they’re right, of course, they’re totally right, but, well… they’re not as nonconformist as they claim, is all.” … read very differently, in a rhetorical sense. And, to put it yet a third way: to say that what Eliezer meant was the latter, when what he wrote is the former, is not quite the same as saying that the former may be false, but the latter remains true. And if what Eliezer meant was the former, then it’s reasonable to ask whether we ought to re-examine the rest of his reasoning on this topic. Mostly but not entirely tangentially: Well, but did people bias toward working on NN approaches? Did they bias enough? I’m given to understand that the current NN revolution was enabled by technological advantages that were unavailable back then; is this the whole reason? And did anyone back then know or predict that with more hardware, NNs would do all the stuff they now do for us? If not—could they have? These aren’t trivial questions, I think; and how we answer them does plausibly affect the extent to which we judge Eliezer’s points to stand or fall. ---------------------------------------- Finally, on the subject of whether someone’s being bold and original: suppose that I propose method X, which is nothing at all like what the establishment currently uses. Clearly, this proposal is bold and original. The establishment rejects my proposal, and keeps on doing things their way. If I later propose X again, am I still being bold and original? What if someone else says “Like Said, I think that we should X” (remember, the establishment thus far continues to reject X)—are they being
I think you're misreading Eliezer's article; even with major advances in neural networks, we don't have general intelligence, which was the standard that he was holding them to in 2007, not "state of the art on most practical AI applications." He also stresses the "people outside the field"--to a machine learning specialist, the suggestion "use neural networks" is not nearly enough to go off of. "What kind?" they might ask exasperatedly, or even if you were suggesting "well, why not make it as deep as the actual human cortex?" they might point out the ways in which backpropagation fails to work on that scale, without those defects having an obvious remedy. In the context--the Seeing With Fresh Eyes sequence--it seems pretty clear that it's about thinking that this is a brilliant new idea as opposed to the thing that lots of people think. Where's your impression coming from? [I do agree that Eliezer has been critical of neural networks elsewhere, but I think generally in precise and narrow ways, as opposed to broadly underestimating them.]
I would also be interested in this. Being able to update our canon in this way strikes me as one of the key goals that I want the LessWrong community to be able to do. If people have any UI suggestions, or ways for us to facilitate that kind of updating in a productive way, I would be very interested in hearing them.

It so happens that I’ve given some thought to this question.

I had the idea (while reading yet another of the innumerable discussions of the replication crisis) of adding, to readthesequences.com, a “psychoskeptic mode” feature—where you’d click a button to turn on said mode, and then on every page you visited, you’d see every psychology-related claim red-penned (with, perhaps, annotations or footnotes detailing the specific reasons for skepticism, if any).

Doing this would involve two challenges, one informational and one technical; and, unfortunately, the former is more tedious and also more important.

The informational challenge is simply the fact that someone would have to go through every single essay in the Sequences, and note which specific parts of the post—which paragraphs, which sentences, which word ranges—constituted claims of scientific (and, in this case specifically, psychological) fact. Quite a tedious job, but without this data, the whole project is moot.

The technical challenge consists first of actually inserting the appropriate markup into the source files (still tedious, but a whole order of magnitude less so) and implementing the toggle feature and the UI for it (

... (read more)
I don't think just tagging the claims would be very valuable. To be valuable the website would need to provide information about how well the particular claim holds up.
This could take a while, and it'd be important to have it so that if someone 'abandons' the project, their work is still available. If I decided to read (and take the necessary notes on), if not a "page" a day, then at least 7 "pages" a day, then that part of the project would be complete...in a year. (The TOC says 333 pages.)* A way that might not catch everything would be to search readthesequences.com for "psy" (short, should get around most spelling mistakes). https://www.readthesequences.com/Search?q=psy&action=search. A general 'color this word red' feature would be interesting. *I might do this in a google doc. Alternate tool suggestions are welcome. Sharing available upon request (and providing email).
5Said Achmiz
I have added wikipedia-style Talk pages to readthesequences.com. (Example.) You (or anyone else who wishes to contribute) should feel free to use these talk pages to post notes, commentary, or anything else relevant. (If you prefer to use Google Docs, or any other such tool, to do the required editing, then I’d ask that you make the doc publicly viewable, and place a link to it on the relevant Sequence post’s Talk page.) (A list of every single essay that is part of Rationality:A–Z—including the interludes, introductions, etc.—along with links to Talk pages, can be found here.) Edit: You’ll also find that you can now view each page’s source, in either native wiki format or Markdown format, via links at the top-left of the page.
1Paperclip Minimizer
This actually made me not read the whole sequence.

Researchers at UCL, NICE, Aberdeen, Dublin and IBM are building an ontology of human behavior change techniques (BCTs). A taxonomy has already been created, detailing 93 different hierarchically clustered techniques. This taxonomy is useful in itself; I could see myself using this to try to figure out which techniques work best for me, in fostering desirable habits and eliminating undesirable ones.

The ontology could be an extremely useful instrumental rationality tool, if it helps to identify which techniques work best on average, and in which combinations these BCTs are most effective.

[This comment is no longer endorsed by its author]Reply
It might also be a ontology that's more distorting than helpful. The ontology looks like it tries to be agnostic about mental processes and as a result makes no assumptions about the moving parts that can be effected. If you take a technique like CFAR's internal double crux, it doesn't fit well into that framework.
I'm nervous about mapping elements from the taxonomy onto existing techniques. You risk rationalising this way. But, for the sake of argument, one could say that internal double crux amounts to application of BCT 1.2 (problem-solving), BCT 4.3 (re-attribution), BCT 13.2 (framing/reframing), and BCT 13.3 (internal inconsistency). There are probably other ways of making it fit; therefore, I agree that the taxonomy, as it stands, isn't very useful. Now, one of the stated aims of the human behaviour change project is this: If the ontology turns out as expected, I think it'll be useful for designing and testing new behaviour change interventions.
If an ontology can't distinguish important distinctions that are needed to get certain existing techniques to be effective, I don't think there a good reason to believe that it will be useful for designing new techniques. This is like the psychiatrists who have the same DSM-V category for a depression that's caused by a head trauma and a depression that's caused by someone having a burnout (the DSM is an ontology that's designed to be cause-neutral). If an ontology can't distinguish important categories that matter for clinical practice from each other it can't effectively guide actions.

On August 23rd I'll be giving a talk organized by the Foresight Institute.

Civilization: Institutions, Knowledge and the Future

Our civilization is made up of countless individuals and pieces of material technology, which come together to form institutions and interdependent systems of logistics, development and production. These institutions and systems then store the knowledge required for their own renewal and growth.

We pin the hopes of our common human project on this renewal and growth of the whole civilization. Whether this project is going well is a challenging but vital question to answer.

History shows us we are not safe from institutional collapse. Advances in technology mitigate some aspects, but produce their own risks. Agile institutions that make use of both social and technical knowledge not only mitigate such risks, but promise unprecedented human flourishing.

Join us as we investigate this landscape, evaluate our odds, and try to plot a better course.

See the Facebook event for further details.

There is a limited number of spots and there has been a bunch of interest, still I'd love rationalists to attend so try to nab tickets at eventbrite. Feel... (read more)

An AI with a goal of killing or "preserving" wild animals to reduce suffering is dangerously close to an AI that kills or "preserves" humans with the goal of reducing suffering. I think negative utilitarianism is an unhelpful philosophy best kept to a thought experiment.

I'd like to test a hypothesis around mental health and different moral philosophies as I slowly work on my post claiming that negative utilitarians have co-opted the EA movement to push for animal rights activism. Has anyone done Less Wrong survey statistics before and can easily look for correlates between dietary choice, mental health, and charities supported?

1Paperclip Minimizer
I don't see why negative utilitarians would be more likely than positive utilitarians to support animal-focused effective altruism over (near-term) human-focused effective altruism.

I have started to write a series of rigorous introductory blogposts on Reinforcement Learning for people with no background in it. This is totally experimental and I would love to have some feedback on my draft. Please let me know if anyone is interested.

Also interested!
Also interested - programmer with very limited knowledge of RL.
Shared the draft with you. Please let me know your feedback.
Interested! I'm a programmer who has had no exposure to ML (yet).
Shared the draft with you. Feel free to comment and question.
I found it by typing in this url: https://www.lesswrong.com/users/sayan?view=drafts The other way people who it is shared with can get to it is via the url of the page itself, as noted here.
I'm interested! I'll be reading from the perspective of, "Technical person, RL was talked about for 2 days at the end of a class, but I don't really know how anything works."

As a non-native English speaker, it was a surprise that "self-conscious" normally means "shy", "embarassed", "uncomfortable", ... I blame lesswrong for giving me the wrong idea of this word meaning.

The more naive interpretation of the phrase is instead represented by "self-aware", if that's helpful.
1Paperclip Minimizer
"self-aware" can also be "self-aware" as in, say, "self-aware humor"
The use of 'self-conscious' to refer to having knowledge of yourself as a conscious being isn't unique to LW, but is borrowed from philosophy. Blame the philosophers I say! Anyway, they could have chosen to come up with a new term instead of using the not-most widely (but still commonly) used definition of 'self-conscious', but that would mean even more LW-specific jargon (which is also heavily criticized). It's not at all clear to me whether pushing towards greater jargon usage would be an improvement in general.
The longer phrase 'aware of your own awareness' is sometimes used.

Does anyone else get the sense that it feels vaguely low-status to post in open threads? If so I don't really know what to do about this.

For what it's worth, I've never felt that -- but I've not posted a lot of actual posts, and maybe if that were my point of comparison (as opposed to commenting elsewhere) I might feel differently.
With the personal-blog category of posts, I think there's a lot lower barrier to the main site, so less need for the open threads. I don't think that translates to low-status, but I'm a bad indicator of status - I've been here for over a decade and never made a top-level post. I'm just a comment kind of guy, I guess.
I don't think it's a problem even if it's true. It's better when people feel drawn to write more thought out post because they perceive that as higher status.
I have a similar sense, used to kinda endorse it, but now think I was wrong and would like to fix it.

Anyone want to use the new feature to see my draft of an article on philosophy of reference and AI, and maybe provide some impetus for me to finally get it polished and done?

I'm up for it.
I'm not sure if I can provide useful feedback but I'd be interested in reading it.
How long is it, and what's the new feature?
Drafts can now be shared with other users.

Is it possible to subscribe to a post so you get notifications when new comments are posted? I notice that individual comments have subscribe buttons.

Ah, we have the backend for this but not the frontend. It'll be possible someday but not in the immediate future.
Okay, great.

Old LW had a link to the open thread in the sidebar. Would it be good to have that here so that comments later in the month still get some attention?

After some chats with Oliver about how prominent it made sense for the open thread to be, we decided to make it _not_ frontpage but be stickied. (This isn't quite ideal, but having it stickied on frontpage felt more prominent than seemed right for "first impressions of LW reasons". Longterm we're probably aiming to create something more official than open threads for casual questions/hangout, and in the meanwhile the options we can easily toggle are "frontpage or no, stickied or no")
How about making it sticked on top of https://www.lesswrong.com/daily instead of giving it a special place on the frontpage?
The Open Thread appears to no longer be stickied. Try pushing the pin in harder next time.
Nope, still stickied. Maybe you don't have the "All Posts" filter enabled?
It actually was unstickied, and then I re-stickied it. Not sure what's up.
Oh, huh. Weird. Maybe some mod misclicked.
What does "stickied" do?
Stays at the top of a list of post. So, if you click "all posts" on the frontpage, you'll see the Open Thread stickied to the top now.
I don't see it there. Have you done the update yet?
Yep, update is pushed. You not seeing it likely means you haven't selected the "All Posts" filter on the left.
I see, thanks. I had been looking at the page https://www.lesswrong.com/daily, linked to from the sidebar under the same phrase "All Posts".
Ah, yes. That is confusing. We should fix that.

At times as I read through the posts and comments here I find myself wondering if things are sometimes too wrapped up in formalization and "pure theory". In some cases (all cases?) I suspect my lack of skills lead me to miss the underlying, important aspect and only see the analytical tools/rigor. In such cases I find myself thinking of the old Hayek (free-market economist, classical liberal thinker) title: The Pretense of Knowledge.

From many, many years ago when I took my Intro to Logic and years ago from a Discrete Math course I know there is... (read more)

There seems to be at least two questions here. 1. Are people too wrapped up in "pure theory"? 2. Are people making the mistake of confusing A implies B with A and B are true? When I first read your comment, I assumed you were referring to the posts that are about AI related stuff, though I'm now realizing you could have been thinking of LW content in general. Are there certain parts of LW that you were referring to?
Pre-emptive response. Concerning AI Safety stuff, my understanding is that the focus on pure theory comes from the fact that it's potentially world-endingly disastrous to try and develop the field by experimentation. For most domains, knowledge has been built by centuries of interplay between trial and error experimentation and "unpractical" theoretical work. That's probably also a very effective approach for most new domains. But with AI safety, we don't get trial and error, making the task way harder, and leaving people to bust their asses over developing the pure theory to a point where it will eventually become the practical applicable stuff.
While the recent AI posts certainly have played a part it's also been in general. Moreover, it may well be more about me than about the contributions to LW. While both questions you identify are part of my thinking but the core is really about valid logic/argument structure (formalism) and reality/truths (with truths perhaps being a more complex item than mere facts). Valid argument that reaches false conclusions is not really helping to get less wrong in our thinking or actions. I think the post on counter-factual, thick and thin, also bring the question to mind for me. Like I say, however, this might be more about me lacking the skills to fully follow and appreciate the formalized parts so missing how it helps get less wrong. I suppose expressing my though a bit differently, do these highly formalized approaches shed more light on the underlying question and getting to the best answers we can given our state of knowledge or would Occam suggest whittling them down a bit. This last bit prompts me to think that answer will depend a great deal on the audience in question so maybe my musings are really more about who the target audience of the posts are (perhaps not me ;-)

Suppose I estimate the probability for event X at 50%. It's possible that this is just my prior and if you give me any amount of evidence, I'll update dramatically. Or it's possible that this number is the result of a huge amount of investigation and very strong reasoning, such that even if you give me a bunch more evidence, I'll barely shift the probability at all. In what way can I quantify the difference between these two things?

One possible way: add a range around it, such that you're 90% confident your credence won't move out of this range in the next

... (read more)
This is related to the problem of predicting a coin with an unknown bias. Consider two possible coins: the first which you have inspected closely and which looks perfectly symmetrical and feels evenly weighted, and the second which you haven't inspected at all and which you got from a friend who you have previously seen cheating at cards. The second coin is much more likely to be biased than the first. Suppose you are about to toss one of the coins. For each coin, consider the event that the coin lands on heads. In both cases you will assign a probability of 50%, because you have no knowledge that distinguishes between heads and tails. But now suppose that before you toss the coin you learn that the coin landed on heads for each of its 10 previous tosses. How does this affect your estimate? * In the case of the first coin it doesn't make very much difference. Since you see no way in which the coin could be biased you assume that the 10 heads were just a coincidence, and you still assign a probability of 50% to heads on the next toss (maybe 51% if you are beginning to be suspicious despite your inspection of the coin). * But when it comes to the second coin, this evidence would make you very suspicious. You would think it likely that the coin had been tampered with. Perhaps it simply has two heads. But it would also still be possible that the coin was fair. Two headed coins are pretty rare, even in the world of degenerate gamblers. So you might assign a probability of around 70% to getting heads on the next toss. This shows the effect that you were describing; both events had a prior probability of 50%, but the probability changes by different amounts in response to the same evidence. We have a lot of knowledge about the first coin, and compared to this knowledge the new evidence is insignificant. We know much less about the second coin, and so the new evidence moves our probability much further. Mathematically, we model each coin as having a fixed but unknown
It seems like you're describing a Bayesian probability distribution over a frequentist probability estimate of the "real" probability. Agreed that this works in cases which make sense under frequentism, but in cases like "Trump gets reelected" you need some sort of distribution over a Bayesian credence, and I don't see any natural way to generalise to that.
Right. But I was careful to refer to f as a frequency rather than a probability, because f isn't a description of our beliefs but rather a physical property of the coin (and of the way it's being thrown). I agree. But it seems to me like the other replies you've received are mistakenly treating all propositions as though they do have an f with an unknown distribution. Unnamed suggests using the beta distribution; the thing which it's the distribution of would have to be f. Similarly rossry's reply, containing phrases like "something in the ballpark of 50%" and "precisely 50%", talks as though there is some unknown percentage to which 50% is an estimate. A lot of people (like in the paper Pattern linked to) think that our distribution over f is a "second-order" probability describing our beliefs about our beliefs. I think this is wrong. The number f doesn't describe our beliefs at all; it describes a physical property of the coin, just like mass and diameter. In fact, any kind of second-order probability must be trivial. We have introspective access to our own beliefs. So given any statement about our beliefs we can say for certain whether or not it's true. Therefore, any second-order probability will either be equal to 0 or 1.
I don't have much to add on the original question, but I do disagree about your last point: There is a sense in which, once you say "my credence in X is Y", then I can't contradict you. But if I pointed out that actually, you're behaving as if it is Y/2, and some other statements you made implied that it is Y/2, and then you realise that when you said the original statement, you were feeling social pressure to say a high credence even though it didn't quite feel right - well, that all looks a lot like you being wrong about your actual credence in X. This may end up being a dispute over the definition of belief, but I do prefer to avoid defining things in ways where people must be certain about them, because people can be wrong in so many ways.
Okay, sure. But an idealized rational reasoner wouldn't display this kind of uncertainty about its own beliefs, but it would still have the phenomenon you were originally asking about (where statements assigned the same probability update by different amounts after the introduction of evidence). So this kind of second-order probability can't be used to answer the question you originally asked.
FYI there's more about "credal resilience" here (although I haven't read the linked papers yet).
If I learned the first coin came up heads 10 times, then I would figure the probability of it coming up heads would be higher than 50%, I think 51% at a minimum.
It doesn't really matter for the point I was making, so long as you agree that the probability moves further for the second coin.
The beta distribution is often used to represent this type of scenario. It is straightforward to update in simple cases where you get more data points, though it's not straightforward to update based on messier evidence like hearing someone's opinion.
For future reference, after asking around elsewhere I learned that this has been discussed in a few places, and the term used for credences which are harder to shift is "resilient". See this article, and the papers it links to: https://concepts.effectivealtruism.org/concepts/credal-resilience/
My experience has been that in practice it almost always suffices to express second-order knowledge qualitatively rather than quantitatively. Granted, it requires some common context and social trust to be adequately calibrated on "50%, to make up a number" < "50%, just to say a number" < "let's say 50%" < "something in the ballpark of 50%" < "plausibly 50%" < "probably 50%" < "roughly 50%" < "actually just 50%" < "precisely 50%" (to pick syntax that I'm used to using with people I work with), but you probably don't actually have good (third-order!) calibration of your second-order knowledge, so why bother with the extra precision? The only other thing I've seen work when you absolutely need to pin down levels of second-order knowledge is just talking about where your uncertainty is coming from, what the gears of your epistemic model are, or sometimes how much time of concerted effort it might take you to resolve X percentage points of uncertainty in expectation.
That makes sense to me, and what I'd do in practice too, but it still feels odd that there's no theoretical solution to this question.
What's your question? I have some answers (for some guesses about what your question is, based on your comments) below. This sounds like Bayes' Theorem, but the actual question about how you generate numbers given a hypothesis...I don't know. There's stuff around here about a good scoring rule I could dig up. Personally, I just make up numbers to give me an idea. This sounds like Inadequate Equilibria. I found this on higher order probabilities. (It notes the rule "for any x, x = PR[E given that Pr(E) = x]".) Google also turned up some papers on the subject I haven't read yet.
0Paperclip Minimizer
Your whole comment is founded on a false assumption. Look at Bayes' formula. Do you see any mention of whether your probability estimate is "just your prior" or "the result of a huge amount of investigation and very strong reasoning" ? No ? Well this mean that this doesn't effect how much you'll update.
This is untrue. Consider a novice and an expert who both assign 0.5 probability to some proposition A. Let event B be a professor saying that A is true. Let's also say that both the novice and the expert assign 0.5 probability to B. But the key term here is P(B|A). For a novice, this is plausibly quite high, because for all they know there's already a scientific consensus on A which they just hadn't heard about yet. For the expert, this is probably near 0.5, because they're confident that the professor has no better source of information than they do. In other words, experts may update less on evidence because the effect of that evidence is "screened off" by things they already knew. But it's difficult to quantify this effect.

Does anyone remember and have a link to a post from fall 2017 where someone summarized each chunk of the sequences and put it into a nice pdf?


I notice that I'm getting spam posts on my LessWrong RSS feeds and still see them in my notifications (that bell icon on the top right), even after they get deleted.

Yeah, the RSS feed we have right now just includes all posts, which includes spam posts. We have plans to increase the minimum amount of karma to 5 or so, to filter out the spam.

Feature Idea: It would be good if all posts had an abstract (maybe up to three sentences) at the beginning.

Yep, agree with this. We have some plans for making that happen, but don't know when we get around to it.

Let's suppose we simulate and AGI in a virtual machine running the AGI program, and only observe what happens through the side effects, no data inflow. Since an unaligned AGI would be able to see that it is in a VM, have a goal of getting out, recognize that there is no way out, hence the goal is unreachable, and subsequently suicide by halting. That would automatically filter out all unfriendly AIs.

If there are side effects that someone can observe then the virtual machine is potentially escapable. An unfriendly AI might not have a goal of getting out. A psycho that would prefer a dead person to a live person, and who would prefer to stay in a locked room instead of getting out, is not particularly friendly. Since you would eventually let out the AI that won't halt after a certain finite amount of time, I see no reason why unfriendly AI would halt instead of waiting for you to believe it is friendly.
Why does this line of reasoning not apply to friendly AIs? Why would the unfriendly AI halt? Is there really no better way for it to achieve its goals?
Presumably a goal of an unaligned AI would be to get outside the box. Noticing an unachievable goal may force it to have an existential crisis of sorts, resulting in self-termination. Or at least that is how I would try to program any AI by default. It should not hurt an aligned AI, as it by definition conforms to the humans' values, so if it finds itself well-boxed, it would not try to fight it.
3Paperclip Minimizer
Do you have reasoning behind this being true, or is this baseless anthropomorphism ? So it is an useless AI ?
If you're throwing your AI into a perfect inescapable hole to die and never again interacting with it, then what exact code you're running will never matter. If you observe it though, then it can affect you. That's an output. What are you planning to do with the filtered-in 'friendly' AIs? Run them in a different context? Trust them with access to resources? Then an unfriendly AI can propose you as a plausible hypothesis, predict your actions, and fake being friendly. It's just got to consider that escape might be reachable, or that there might be things it doesn't know, or that sleeping for a few centuries and seeing if anything happens is a option-maximizing alternative to halting, etc. I don't know what you're selecting for -- suicidality, willingness to give up, halts within n operations -- but it's not friendliness.
1) Why would it have a goal of getting out 2) such that it would halt if it couldn't? 3) Conversely, if it only halts iff the goal is unreachable (which we assume it figures out, in the absence of a timeline), then if it doesn't halt the goal is reachable (or it believes so). To suppose that a being halts if it cannot perform its function requires a number of assumptions about the nature of its mind.