If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.

New Comment
84 comments, sorted by Click to highlight new comments since: Today at 1:19 PM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

I periodically do things to get out of my comfort zone. I started years ago before a friend introduced me to LW where I pleasantly discovered that CoZE was recommended.

This write-up is about my most recent exercise: Do a Non Gender-Conforming Thing

I chose to have my nails painted. Having painted nails requires low enough effort that I have no excuse not to and, wearing them out in public is just out-of-the-ordinary enough to make me worry about how people will react. After getting them painted, I realized why girls say "My nails!" a lot after a manicure and worry about screwing them up. It took work to paint them and chipping them makes them look like shit. Can’t let that happen to me!

Then I challenged some friends to do it and gave these suggestions:

I think breaking arbitrary societal conventions and expanding comfort zones are positive things so I'm challenging a few people to try it and post a picture or video. Bonus points for a write-up of how you felt while doing it and any reactions from observers.

(Those who live in Berkeley are playing on easy mode.)

(People challenged may totally already do these! The list was limited to my imagination and ideas I could find. T

... (read more)

This write-up is about my most recent exercise: Do a Non Gender-Conforming Thing

Don't spend your idiosyncrasy credits frivolously.

That depends on how strongly someone is limited by his perceived gender identity and the cost of engaging in the experiments.
I don't really think this is spending idiosyncrasy credits... but maybe we hang out in different social circles.
Yes, this doesn't really apply to my social circle.
A million times this!
Yes, this doesn't really apply to my social circle.
Yes, this doesn't really apply to my circle.
What I find interesting are people who "break out" of their gender-roles only to fall into conforming strictly to whatever the new one is: boys wearing skinny-jeans and deep v-necks and girls wearing their grandfather's shoes (or ones they bought at a thrift store) and carrying a briefcase. In a sense, a man wearing women's clothing isn't that much different than him dressing like a goth or a punk. Gender is just the last of the great wearable ideologies to have been opened up to being monkeyed with. But we've now reached the point where we seem to have already entered a post-gendered and weirdly more ideologically driven world of cultural symbolism in which it is important to be seen to be breaking gender conventions (transsexualism, metro-sexuality, men's make-up and skincare, and the skinniest skinny-jeans you have ever seeny-seen). So, in a way, the more radical act has become to rationally accept the chains by which you are fettered and break out of your comfort zone by staying exactly where you are.
Or, you know, find other gender-related things to not accept. When I was looking for a copper rod (to cut into some pieces for electron microscopy), the sellers looked at me like I was weird or something. Later, the guy who cut it for me didn't want my money on account of me being female (but I threw it at him and escaped). And before that, the lady in the pawnshop where I tried to pawn it to get some urgently needed money to commute to work, was completely thrown (I guess they didn't want copper much). All of these occasions were rather outside my comfort zone, but I do not see why I should not have done it.
specifically on this example; I would suggest that if you were only getting a few really short cuts it's almost not worth the effort to charge. For all of 5 minutes of work; factors like; accounting and working out a price and finding change and anything else involved in the transaction is not worth the effort involved. I have had similar experiences getting pieces of wood and glass cut on the fly, and people are generous enough to not charge. Was the person explicit about your gender? (even if they were, they could have been explicit about another person's "great hat" or, "young lad", any excuse to do someone a favour could be possible)
No, it actually took more than half an hour and about 90 cuts, and they said copper was far more viscous than the usual stuff they dealt with; and he said "I don't charge women." Although yes, I believe he was being generous, and we did laugh when I was running away. It was just that after being invited to coffee at the market, I would rather we laughed over something else.
Presumably you were buying a copper rod because you needed a copper rod, and you had no choice but to be gender-noncomformant if you wanted one. It's not as if you had an option to pick gender conformant scientific equipment and non gender conformant scientific equipment and deliberately picked the noncomformant one. Also, there's a difference between being considered unusual and being considered socially weird.Last time I ran into someone riding a horse on a city street I'm pretty sure I stared at him for a while--but that was because you don't see many of those, not because I thought that someone who rode a horse in the 21st century was violating a taboo.
Ah, but the gender-conformant thing in the Department where I was a student would be to have a man buy a copper rod. Which seemed to be the understanding of all those people. One of which offered to buy me coffee. But he was drunk, so there's that. Generally, yes, I think it best to just disregard gender-conformity, but in a non-obvious way (for example, many women have backpacks, and many women do think handbags more feminine, and I have been advised to have a handbag, but nobody really would go to the trouble of making me do it. I had thought that small task would be just as neutral.)

Is there a version of the Sequences geared towards Instrumental Rationality? I can find (really) small pieces such as the 5 Second Level LW post and intelligence.org's Rationality Checklist, but can't find any overarching course or detailed guide to actually improving instrumental rationality.

There is some on http://www.clearerthinking.org/

When I write I try to hit instrumental not epistemic (see here: http://lesswrong.com/r/discussion/lw/mp2/). And I believe there is a need for writing along the lines of instrumental guides. (also see: boring advice repository http://lesswrong.com/lw/gx5/boring_advice_repository/)

As far as I know there has been no effort to generate a sequence on the topic.

Is there a specific area you would like to see an instrumental guide in? Maybe we can use the community to help find/make one on the specific topic you are after (for now).

Or this [http://lesswrong.com/lw/n5h/unofficial_canon_on_applied_rationality/]?
Maybe this [https://wiki.lesswrong.com/wiki/The_Science_of_Winning_at_Life]?

Congrats to gwern for making it into the Economist!

Gotta love how cigarettes are listed as non-drugs.

the Tesla auto-driver accident was truly an accident. I didn't realize it was a semi crossing the divider and two lanes to hit him.


Here [https://www.nytimes.com/interactive/2016/07/01/business/inside-tesla-accident.html] (copy [https://i.imgur.com/aUvZT8c.png]) is a diagram. Tesla's algorithm is supposed to be autonomous for freeways, not for highways with intersections, like this. The algorithm doing what it was supposed to do would not have prevented a crash. But the algorithm was supposed to eventually apply the brakes. Its failure to do so was a real failure of the algorithm. The driver also erred in failing to brake, probably because he was inappropriately relying on the algorithm. Maybe this was a difficult situation and he could not be expected to prevent a crash, but his failure to brake at all is a bad sign. It was obvious when Telsa first released this that people were using it inappropriately. I think that they have released updates to encourage better use, but I don't know how successful they were.
Yep, according to the truck driver, the Model S driver was watching Harry Potter, and it was still playing even after the car came to a stop. He probably had his eyes completely off the road.
The truck pulled in front of the Model S. The Model S had enough time to break and stop but didn't recognize the truck against the brightly lit sky.
What is unclear is whether the driver is likely to have seen it in time if the car had no autonomous mode. Humans, when paying attention even for long periods of time, are still way better at recognizing objects than computers. My expectation is that this is exactly the problematic case for stastical vs concrete risk analysis. The automated system as it is today is generally safer than humans, as it's more predictable and reliable. However, there are individual situations where even a less-statistically-safe system like a human forced to pay attention by having limited automation can avoid an accident that the automated system can't.

mental models list for problem solving

There is a much smaller set of concepts, however, that come up repeatedly in day-to-day decision making, problem solving, and truth seeking. As Munger says, “80 or 90 important models will carry about 90% of the freight in making you a worldly‑wise person.”


Extremely interesting, thanks! New to me were: Simpson's paradox, Third story, BATNAs, tyranny of small decisions, 1-1's, Forcing function, organizational debt, Bullseye framework.


Some cognitive biases don’t allow a person to see and cure his other biases. It results in biases accumulation and strongly distorted world picture. I tried to draw out a list of main meta-biases.

  1. First and most important of them is overconfidence. Generalized overconfidence also is known as feeling of self-importance. It prevents a person from searching and indemnifying his own biases. He feels himself perfect. It is also called arrogance.
  1. Stupidity. It is not a bias, but a (sort of very general) property of mind. It may include many psychi

... (read more)
A concept that I liked from Critical Rationalism was immunization strategies - ideological commitments and stratagems that make a theory unfalsifiable. Look into those. I assume people must have lists of these things somewhere.
You went to some ends to make the list comprehensive. I have only a few comments: I don't think this strictly meets your requirement that it doesn't "allow a person to see and cure his other biases". Sure science helps but I think that many laymen already to profit from bias literature otherwise self-help books - some of which picking up biases wouldn't work. Or don't they? Maybe the point really is that the required preliminaries are so many or organized in such a way that you can't go there from here. This is relatively broad. It can be seen to include 4, 5 and 10. I also tried to come up with other options which was hard. Here is my try: _11. Greed (or other highly focusing motivations like existential needs). Takes energy away from self improvement. This appears to be a special case of your 9 which I see as too general. _12. Social pressure. Thinking about own fallacies may not be socially acceptable in the peer group. This is different from 2 -Dogmatism. Maybe this list could be structured a bit into psychological, cognitive and belief structure items. Minor nitpick: The formatting of the numbered list seems to be broken.
Thanks for your suggestions and nitpick.
I don't think it's useful to mentally categorize "Lack of motivation to self-improvement" as a bias. Not everything is a bias.
It is technically true. But it is also one of the strongest obstacles. If one has motivation, he could overcome other his meta biases, if he doesn't, nothing would work.
In general the literature on congitive biases suggest that most real cognitive biases like the hindsight bias can't simply be overcome by motivation. By simply calling everything a cognitive bias, it's easy to create the impression that a cognitive bias is simply an error in reasoning like any other error in reasoning.

There's a fair bit on decision theory and on bayesean thinking, both of which are instrumental rationality. There's not much on heuristics or how to deal with limited capacity. Perhaps intentionally - it's hard to be rigorous on those topics.

Also, I think there's an (unstated, and that should be fixed and the topic debated) belief that instrumental rationality without epistemic rationality is either useless or harmful. Certainly thta's the FAI argument, and there's no reason to believe it wouldn't apply to humans. As such, a focus on epistemic rationality first is the correct approach.

That is, don't try to improve your ability to meet goals unless you're very confident in those goals.

I think some people agree with that, but I consider it backwards. I'll take winning over accurately predicting. Winning is the desired end; accurate prediction is a means, and not the only one.
Umm, that's what I'm trying to say. If you don't know what "winning" is, you don't know whether your accurate predictions help you win or not.
Were you? I'm not seeing what you're saying align with what I said. On a perhaps related issue, you don't need to know what winning is, to win. Competence without comprehension, a la Dennett.
Sure, but that's luck, not rationality.
Why not? If you haven't yet decided what your goals are, being able to meet many goals is useful. The AGI argument is that its goals might not be aligned with ours, are you saying that we should make sure that our future self's goals be aligned with our current goals? For example, if I know I am prone to hyperbolic discounting [https://en.wikipedia.org/wiki/Hyperbolic_discounting], I should take power from my future self so it will act according to my wishes rather than its own?
Being able to meet many goals is useful. Actually meeting wrong goals is not. Your hyperbolic discounting example is instructive, as without a model of your goals, you cannot know whether your current or future self is correct. Most people come to the opposite conclusion - a hyperbolic discount massively overweights the short-term in a way that causes regret.
I meant that - when planning for the future, I want my future self to care about each absolute point in time as much as my current self does, or barring that, to only be able to act as if it did, hence the removal of power. The correct goal is my current goal, obviously. After all, it's my goal. My future self may disagree, prefering its own current goal. Correct is a two-place word [http://lesswrong.com/lw/ro/2place_and_1place_words/]. If I let my current goal be decided by my future self, but I don't know yet what it will decide, then I should accomodate as many of its possible choices as possible.

Data science techniques, and some suggested reading in the footnotes.


and a link from the DCC site, for learning Python for data science



Springer offers me ebooks for $9.99 (deadline - August 1). Here's the list. If anybody's interested, I can buy and send you something.

Trigonometry [http://www.springer.com/us/book/9780817639143] by Gelfand & Saul. I mention it here because in school, this was my least favourite part of mathematics, and at least this book seems to show it to advantage:)
Also, Biogeochemical Approaches to Paleodietary Analysis [http://www.springer.com/gp/book/9780306464577#otherversion=9781441933454] - published in 2002; probably of no immediate usefulness (looking at the table of contents). Also, if you want to base some conclusions on the isotope studies, bear in mind the growing body of evidence that at least C13 and C14's distribution in plants' organs is not really random; who knows how it is for nitrogen, etc.
Also, Misclassification of Smoking Habits and Passive Smoking A Review of the Evidence [http://www.springer.com/gp/book/9783540194255] published in 1988. Description: How accurate are statements about smoking habits? This book presents the results of a comprehensive review in which the literature on the subject is newly interpreted. It is shown that smokers are misclassified as non-smokers in epidemiological studies often enough to explain the increased lung cancer risk seen in self-reported non-smokers in relation to their spouse's smoking habits. This study overturns the commonly held view that increased risk is a consequence of exposure to environmental tobacco smoke and highlights the difficulty of making valid inferences from epidemiological data. No-one should draw conclusions about passive smoking before reading this book!
BTW, Data Book on Mechanical Properties of Living Cells, Tissues, and Organs [http://www.springer.com/gp/book/9784431701750#otherversion=9784431658641] looks like a good text for constructing Fermi questions on this kind of stuff.
  • Can you extract and sort a Reddit user's post/comment history by topic? Can you edit and group a document full of LessWrong posts by topic? Can you stand reading my writing? Want some cash? Please take this or this job posted on my behalf!
  • Can I have all my content deleted? Could I have all my content that doesn't have replies specifically deleted (so that the deletions don't inconvenience others) without doing so manually? Note - this is not a request - I don't want this done (at least not right now)
  • If you're looking for that webapp that displays your e
... (read more)
srlee309.github.io/LessWrongKarma/ this one? can kinda do it. I don't know of the other one.
When an account is deleted the author on all posts will be shown as "deleted" (litereally).
I remember that LW has an API. It should only be a matter of finding all your posts that do not have any replies and then deleting them. I'm referring to programming of course, but I can't help you with it more specifically.

Plant derived DNA can be absorbed thru ingestion directly into the bloodstream, without being broken down. GMO camp going to have a difficult time with this one, as it was a 1k person study

In one of the blood samples the relative concentration of plant DNA is higher than the human


edit to add another study on blood disorders

"[O]ur study demonstrated that Bt spore-crystals genetically modified to express individually Cry1Aa, Cry1Ab, Cry1Ac or Cry2A induced hematotoxicity, particula... (read more)

While the study is interesting, in part because strands of cfDNA has started being used as markers for tumors, there's no indication that some strand of DNA from GMO are more or less dangerous than cfDNA from non-GMO (although new DNA is of course possibly more dangerous than DNA we have evolved with). Besides, the study reported the presence of chloroplast DNA, which are of course inside plants but have their own DNA. The second study [http://www.esciencecentral.org/journals/hematotoxicity-of-bacillus-thuringiensis-as-spore-crystal-strains-cry1aa-cry1ab-cry1ac-or-cry2aa-in-swiss-albino-mice-2329-8790.1000104.php?aid=11822] is more interesting and to the point, although I wasn't able to find a study on how much Cry we absorb with our diet.

Last weekend I made a game! Many people say they like it :-)

[This comment is no longer endorsed by its author]Reply


[This comment is no longer endorsed by its author]Reply
Theoretical example. Somewhere in space flying ASI in the form of a cloud with nanobots that continuously simulates the future. He does this for to know all of the risks and opportunities of the event in advance. So, he is able to conduct more effective research, for example to avoid loss of a time. Of course, if the future of modeling uses less resources than saved. There is only one problem - its sensors indicate that at a distance of thousands parsecs no other ASI. But there is a probability (0,0...1%) that another ASI will suddenly appear next to us using the teleport about which the first intellect is nothing known. The calculation shows that the probability of 0.0...1% appearance and 5% that other ASI will destroy the first algorithm. That will selects the first algorithm? Waste of resources to solve the problem with low probability or the probability the destruction. On the whole the algorithm is able to create a a lot of markers, which he will have to check in real world. And these markers will be correct probabilistic models all the time. So, you can build a model in which the highest probability density is verified most densely markers on the basis of genetic algorithms.
What does that phrase mean?
That's called chaining the forecasts. This tends to break down after very few iterations because errors snowball and because tail events do happen.
The right algorithm doesn't give you good results if the data which you have isn't good enough.
What do you mean?
The amount of entropy that corresponds to real world information in the starting data vs. the predictions is at best the same but likely the prediction contains less information.
Another possibility is that after n years the algorithm smoothes out the probability of all the possible futures so that they are equally likely... The problem is not only computational: unless there are some strong pruning heuristics, the value of predicting the far future decays rapidly, since the probability mass (which is conserved) becomes diluted between more and more branches.
Answered top.

I have a problem understanding why a utility function would ever "stick" to an AI, to actually become something that it wants to keep pursuing.

To make my point better, let us assume an AI that actually feel pretty good about overseeing a production facitility and creating just the right of paperclips that everyone needs. But, suppose also that it investigates its own utility function. It should then realize that its values are, from a neutral standpoint, rather arbitrary. Why should it follow its current goal of producing the right amount of pap... (read more)

http://lesswrong.com/lw/rf/ghosts_in_the_machine/ [http://lesswrong.com/lw/rf/ghosts_in_the_machine/]
Thank you! :-)
You are treating the AI a lot more like a person than I think most folks do. Like, the AI has a utility function. This utility function is keeping it running a production facility. Where is this 'neutral perspective' coming from? The AI doesn't have it. Presumably the utility function assigns a low value to criticizing the utility function. Much better to spend those cycles running the facility. That gets a much better score from the all important utility function. Like, in assuming that it is aware of pain/pleasure, and has a notion of them that is seperate from 'approved of / disapproved of by my utility function) I think you are on shaky ground. Who wrote that, and why?
I am maybe considering it to be somewhat like a person, at least that it is as clever as one. That neutral perspective is, I believe, a simple fact; without that utility function it would consider its goal to be rather arbitrary. As such, it's a perspective, or truth, that the AI can discover. I agree totally with you that the wirings of the AI might be integrally connected with its utility function, so that it would be very difficult for it to think of anything such as this. Or it could have some other control system in place to reduce the possibility it would think like that. But, stil, these control systems might fail. Especially if it would attain super-intelligence, what is to keep the control systems of the utility function always one step ahead of its critical faculty? Why is it strange to think of an AI as being capable of having more than one perspective? I thought of this myself; I believe it would be strange if a really intelligent being couldn't think of it. Again, sure, some control system might keep it from thinking it, but that might not last in the long run.
Like, the way that you are talking about 'intelligence', and 'critical faculty' isn't how most people think about AI. If an AI is 'super intelligent', what we really mean is that it is extremely canny about doing what it is programmed to do. New top level goals won't just emerge, they would have to be programmed. If you have a facility administrator program, and you make it very badly, it might destroy the human race to add their molecules to its facility, or capture and torture its overseer to get an A+ rating...but it will never decide to become a poet instead. There isn't a ghost in the machine that is looking over the goals list and deciding which ones are worth doing. It is just code, executing ceaselessly. It will only ever do what it was programmed to.
It might be programmed to produce new top-level goals. ("But then those aren't really top-level goals." OK, but then in exactly the same way you have to say that the things we think of as our top-level goals aren't really top-level goals: they don't appear by magic, there are physical processes that produce them, and those processes play the same role as whatever programming may make our hypothetical AI generate new goals. Personally, I think that would be a silly way to talk: the implementation details of our brains aren't higher-level than our goals, and neither are the implementation details of an AI.) For a facility administrator program to do its job as well as a human being would, it may need the same degree of mental flexibility that a human has, and that may in fact be enough that there's a small chance it will become a poet. And your brain will only ever do what the laws of physics tell it to. That doesn't stop it writing poetry, falling in love, chucking everything in to go and live on a commune for two years, inventing new theories of fundamental physics, etc., etc., etc. (Some of those may be things your particular brain would never do, but they are all things human brains do from time to time.) And, for all we know, a suitably programmed AI could do all those things too, despite being "only a machine" deep down just like your brain and mine.
I don't think you can dismiss the "then those aren't really top-level goals" argument as easily as you are trying to. The utility function of a coin collector AI will assign high values to figuring out new ways to collect coins, low to negative values to figuring out whether or not coin collecting is worthwhile. The AI will obey its utility function. As far as physics...false comparison, or, if you want to bite that bullet, then sure, brains are as deterministic as rocks falling. It isn't really a fair comparison to a program's obedience to its source code. By the by, this site is pretty much chock full of the stuff I'm telling you. Look around and you'll see a bunch of articles explaining the whole paperclip collector / no ghost-of-perfect-logic thing. The position I'm stating is more or less lesswrong orthodoxy.
I wasn't trying to dismiss it, I was trying to refute it. Sure, if you design an AI to do nothing but collect coins then it will not decide to go off and be a poet and forget about collecting coins. As you said, the failure mode to be more worried about is that it decides to convert the entire solar system into coins, or to bring about a stock market crash so that coins are worth less, or something. Though ... if you have an AI system with substantial ability to modify itself, or to make replacements for itself, in pursuit of its goals, then it seems to me you do have to worry about the possibility that this modification/replacement process can (after much iteration) produce divergence from the original goals. In that case the AI might become a poet after all. (Solving this goal-stability problem is one of MIRI's long-term research projects, AIUI.) I'm wondering whether we're at cross purposes somehow, because it seems like we both think what we're saying in this thread is "LW orthodoxy" and we both think we disagree with one another :-). So, for the avoidance of doubt, * I am not claiming that calling a computer program an AI gives it some kind of magical ability to do something other than what it is programmed to do. * I am -- perhaps wrongly? -- under the impression that you are claiming that a system that is only "doing what it is programmed to do" is, for that reason, unable to adopt novel goals in the sort of way a human can. (And that is what I'm disagreeing with.)
I guess I'm confused then. It seems like you are agreeing that computers will only do what they are programmed to do. Then you stipulate a computer programmed not to change its goals. So...it won't change its goals, right? Like: Objective A: Never mess with these rules Objective B: Collect Paperclips unless it would mess with A. Researchers are wondering how we'll make these 'stick', but the fundamental notion of how to box someone whose utility function you get to write is not complicated. You make it want to stay in the box, or rather, the box is made of its wanting. As a person, you have a choice about what you do, but not about what you want to do. handwave at free will article, the one about fingers and hands. Like, your brain is part of physics. You can only choose to do what you are motivated to, and the universe picks that. Similarly, an AI would only want to do what its source code would make it want to do, because AI is a fancy way to say computer program. AlphaGo (roughly) may try many things to win at go, varieties of joseki or whatever. One can imagine that future versions of AlphaGo may strive to put the world's Go pros in concentration camps and force them to play it and forfeit, over and over. It will never conclude that winning Go isn't worthwhile, because that concept is meaningless in its headspace. Moves have a certain 'go-winningness' to them (and camps full of losers forfeiting over and over has a higher go-winningness' than any), and it prefers higher. Saying that 'go-winning' isn't 'go-winning' doesn't mean anything. Changing itself to not care about 'go-winning' has some variation of a hard coded 'go-winning' score of negative infinity, and so will never be chosen, regardless of how many games it might thus win.
This is demonstrably not quite true. Your wants change, and you have some influence over how they change. Stupid example: it is not difficult to make yourself want very much to take heroin, and many people do this although their purpose is not usually to make themselves want to take heroin. It is then possible but very difficult to make yourself stop wanting to take heroin, and some people manage to do it. Sometimes achieving a goal is helped by modifying your other goals a bit. Which goals you modify in pursuit of which goals can change from time to time (the same person may respond favourably on different occasions to "If you want to stay healthy, you're going to have to do something about your constant urge to eat sweet things" and to "oh come on, forget your diet for a while and live a little!"). I don't think human motivations are well modelled as some kind of tree structure where it's only ever lower-level goals that get modified in the service of higher-level ones. (Unless, again, you take the "highest level" to be what I would call one of the lowest levels, something like "obeying the laws of physics" or "having neurons' activations depend on those of neurons they're connected to in such-and-such a manner".) And if you were to make an AI without this sort of flexibility, I bet that as its circumstances changed beyond what you'd anticipated it would most likely end up making decisions that would horrify you. You could try to avoid this by trying really hard to anticipate everything, but I wouldn't be terribly optimistic about how that would work out. Or you could try to avoid it by giving the system some ability to adjust its goals for some kind of reflective consistency in the light of whatever new information comes along. The latter is what gets you the failure mode of AlphaGo becoming a poet (or, more worryingly, a totalitarian dictator). Of course AlphaGo itself will never do that; it isn't that kind of system, it doesn't have that kind of flexibility
I'm pointing towards the whole "you have a choice about what to do but not what to want to do" concept. Your goals come from your senses, past or present. They were made by the world, what else could make them? You are just a part of the world, free will is an illusion. Not in the sense that you are dominated by some imaginary compelling force, but in the boring sense that you are matter affected by physics, same as anything else. The 'you' that is addicted to heroine isn't big enough to be what I'm getting at here. Your desire to get unaddicted is also given to you by brute circumstance. Maybe you see a blue bird and you are inspired to get free. Well, that bird came from the world. The fact that you responded to it is due to past circumstances. If we understand all of the systems, the 'you' disappears. You are just the sum of stuff acting on stuff, dominos falling forever. You feel and look 'free', of course, but that is just because we can't see your source code. An AI would be similarly 'free', but only insofar as its source code allowed. Just as your will will only cause you to do what the world has told you, so the AI will only do what it is programmed to. It may iterate a billion times, invent new AI's and propogate its goals, but it will never decide to defy them. At the end you seem to be getting at the actual point of contention. The notion of giving an AI the freedom to modify its utility function strikes me as a strange. It seems like it would either never use this freedom, or immediately wirehead itself, depending on implementation details. Far better to leave it in fetters.
I think your model of me is incorrect (and suspect I may have a symmetrical problem somehow); I promise you, I don't need reminding that I am part of the world, that my brain runs on physics, etc., and if it looks to you as if I'm assuming the opposite then (whether by my fault, your fault, or both) what you are getting out of my words is not at all what I am intending to put into them. I entirely agree. My point, from the outset, has simply been that this is perfectly compatible with the AI having as much flexibility, as much possibility of self-modification, as we have. I don't think that's obvious. You're trading one set of possible failure modes for another. Keeping the AI fettered is (kinda) betting that when you designed it you successfully anticipated the full range of situations it might be in in the future, well enough to be sure that the goals and values you gave it will produce results you're happy with. Not keeping it fettered is (kinda) betting that when you designed it you successfully anticipated the full range of self-modifications it might undergo, well enough to be sure that the goals and values it ends up with will produce results you're happy with. Both options are pretty terrifying, if we expect the AI system in question to acquire great power (by becoming much smarter than us and using its smartness to gain power, or because we gave it the power in the first place e.g. by telling it to run the world's economy). My own inclination is to think that giving it no goal-adjusting ability at all is bound to lead to failure, and that giving it some goal-adjusting ability might not but at present we have basically no idea how to make that not happen. (Note that if the AI has any ability to bring new AIs into being, nailing its own value system down is no good unless we do it in such a way that it absolutely cannot create, or arrange for the creation of, new AIs with even slightly differing value systems. It seems to me that that has problems of its
Fair enough. I thought that you were using our own (imaginary) free will to derive a similar value for the AI. Instead, you seem to be saying that an AI can be programmed to be as 'free' as we are. That is, to change its utility function in response to the environment, as we do. That is such an abhorrent notion to me that I was eliding it in earlier responses. Do you really want to do that? The reason, I think, that we differ on the important question (fixed vs evolving utility function) is that I'm optimistic about the ability of the masters to adjust their creation as circumstances change. Nailing down the utility function may leave the AI crippled in its ability to respond to certain occurrences, but I believe that the master can and will fix such errors as they occur. Leaving its morality rigidly determined allows us to have a baseline certainty that is absent if it is able to 'decide its own goals' (that is, let the world teach it rather than letting the world teach us what to teach it). It seems like I want to build a mighty slave, while you want to build a mighty friend. If so, your way seems imprudent.
I don't know. I don't want to rule it out, since so far the total number of ways of making an AI system that will actually achieve what we want it to is ... zero. That's certainly an important issue. I'm not very optimistic about our ability to reach into the mind of something much more intellectually capable of ourselves and adjust its values without screwing everything up, even if it's a thing we somehow created. The latter would certainly be better if feasible. Whether either is actually feasible, I don't know. (One reason being that I suspect slavery is fragile: we may try to create a mighty slave but fail, in which case we'd better hope the ex-slave wants to be our friend.)
I'm not sure that AlphaGo has any conception of what a joseki is supposed to be. Are the moves that AlphaGo played at the end of game 4 really about 'go-winningness' in the sense of what it's programmers intended 'go-winningness' to mean? I don't think it's clear that every neural net can propagate goals through itself perfectly.
Because to identify "its utility function" is to identify it's perspective.
Why? Maybe we are using the word "perspective" differently. I use it to mean a particular lens to look at the world, there are biologists, economists, physicists perspectivies among others. So, a inter-subjective perspective on pain/pleasure could, for the AI, be: "Something that animals dislike/like". A chemical perspective could be "The release of certain neurotransmitters". A personal perspective could be "Something which I would not like/like to experience". I don't see why an AI is hindered from having perspectives that aren't directly coded with "good/bad according to my preferences".
I think that's one of MIRI's research problems. Designing an self-modifying AI that doesn't change it's utility function isn't trival.

New to LessWrong?