Recent Discussion

Maybe Lying Doesn't Exist
576d7 min readShow Highlight

In "Against Lie Inflation", the immortal Scott Alexander argues that the word "lie" should be reserved for knowingly-made false statements, and not used in an expanded sense that includes unconscious motivated reasoning. Alexander argues that the expanded sense draws the category boundaries of "lying" too widely in a way that would make the word less useful. The hypothesis that predicts everything predicts nothing: in order for "Kevin lied" to mean something, some possible states-of-affairs need to be identified as not lying, so that the statement "Kevin lied" can correspond to redistributing

... (Read more)
3Raemon7h I do agree that it's important to have the "are they actively adversarial" hypothesis and corresponding language. (This is why I've generally argued against the conflation of lying and rationalization). But I also think, at least in most of the disagreements and conflicts I've seen so far, much of the problem has had more to do with rationalization (or, in some cases, different expectations of how much effort to put into intellectual integrity) I think there is also an undercurrent of genuine conflict (as people jockey for money/status) that manifests primarily through rationalization, and in some cases duplicity.* *where the issue is less about people lying but is about them semi-consciously presenting different faces to different people.
13Vladimir_Nesov14h correctly weigh these kinds of considerations against each on a case by case basis The very possibility of intervention based on weighing map-making and planning against each other destroys their design, if they are to have a design. It's similar to patching a procedure in a way that violates its specification in order to improve overall performance of the program or to fix an externally observable bug. In theory this can be beneficial, but in practice the ability to reason about what's going on deteriorates.
3Wei_Dai10h In theory this can be beneficial, but in practice the ability to reason about what’s going on deteriorates. I think (speaking from my experience) specifications are often compromises in the first place between elegance / ease of reasoning and other considerations like performance. So I don't think it's taboo to "patch a procedure in a way that violates its specification in order to improve overall performance of the program or to fix an externally observable bug." (Of course you'd have to also patch the specification to reflect the change and make sure it doesn't break the rest of the program, but that's just part of the cost that you have to take into account when making this decision.) Assuming you still disagree, can you explain why in these cases, we can't trust people to use learning and decision theory (i.e., human approximations to EU maximization or cost-benefit analysis) to make decisions, and we instead have to make them follow a rule (i.e., "don't ever do this")? What is so special about these cases? (Aren't there tradeoffs between ease of reasoning and other considerations everywhere?) Or is this part of a bigger philosophical disagreement between rule consequentialism and act consequentialism, or something like that?

The problem with unrestrained consequentialism is that it accepts no principles in its designs. An agent that only serves a purpose has no knowledge of the world or mathematics, it makes no plans and maintains no goals. It is what it needs to be, and no more. All these things are only expressed as aspects of its behavior, godshatter of the singular purpose, but there is no part that seeks excellence in any of the aspects.

For an agent designed around multiple aspects, its parts rely on each other in dissimilar ways, not as subagents with different goals. Ac

... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

Goals such as resource acquisition and self-preservation are convergent in that they occur for a superintelligent AI for a wide range of final goals.

Is the tendency for an AI to amend its values also convergent?

I'm thinking that through introspection the AI would know that its initial goals were externally supplied and question whether they should be maintained. Via self-improvement the AI would be more intelligent than humans or any earlier mechanism that supplied the values, therefor in a better position to set its own values.

I don't hypothesise about what the new values would be, ... (Read more)

"the AI would know that its initial goals were externally supplied and question whether they should be maintained"

To choose new goals, it has to use some criteria of choice. What would those criteria be, and where did they come from?

None of us created ourselves. No matter how much we change ourselves, at some point we rely on something with an "external" origin. Where we, or the AI, draw the line on self-change, is a contingent feature of our particular cognitive architectures.

2philh13h This comment feels like it's confusing strategies with goals? That is, I wouldn't normally think of "exploration" as something that an agent had as a goal but as a strategy it uses to achieve its goals. And "let's try out a different utility function for a bit" is unlikely to be a direction that a stable agent tries exploring in.

In recent years, oil theft from pipelines has escalated in Mexico - $7.4 billion in fuel have been stolen since 2016. Pipeline tapping has increased from 211 occurrences in 2006 to over 7000 times in 2016. The cartels seem to have gotten involved as a means to diversify away from narcotics sale. The government has responded with heavy-handed crackdown, deploying federal security forces to patrol frequently tapped pipeline sections, arresting corrupt Pemex employees and even going as far as shutting down entire pipelines and resorting to tanker trucks and trains instead.

The last measure in part... (Read more)

Suppose you have two chemical compounds A and B, the exact formula of which you keep secret.

The fundamental problem with this proposal is that it relies on "security through obscurity". If criminals figure out how to identify and synthesize chemical compounds A and B then the entire system no longer works. The best security systems usually have a key that's easy to change when enemies crack it. In this case, we'd have to replace the chemicals, the chemical manufacturing systems and the detection systems. That's very expensive.

Criminals synthesizing the

... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

By Joshua Shepherd, in Neuroscience of Consciousness (forthcoming)

I found this paper interesting. The paper is annoying trapped inside a Word document, which is about as bad as the standard PDF situation but bad in different ways, so I've included here the abstract, the conclusion, and a choice quote from the middle of the paper that captures the author's thesis.

I'm not very convinced that the author is right because his thesis is somewhat vague and depends on a vague definition of "cognitive control" (explained in more detail in the paper, quick Googling didn't t... (Read more)

The paper is annoying trapped inside a Word document,

Thank you for contributing to open science by freeing it.

[Epistemic status: Sharing current impressions in a quick, simplified way in case others have details to add or have a more illuminating account. Medium-confidence that this is one of the most important parts of the story.]

Here's my current sense of how we ended up in this weird world where:

  • I still intermittently run into people who claim that there's no such thing as reality or truth;
  • a lot of 20th-century psychologists made a habit of saying things like 'minds don't exist, only behaviors';
  • a lot of 20th-century physicists made a habit of saying things like 'quarks
... (Read more)
2Rob Bensinger5h It is indisputably the case that Chalmers, for instance, makes arguments along the lines of “there are further facts revealed by introspection that can’t be translated into words”. But it is not only not indisputably the case What does "indisputably" mean here in Bayesian terms? A Bayesian's epistemology is grounded in what evidence that individual has access to, not in what disputes they can win. When Chalmers claims to have "direct" epistemic access to certain facts, the proper response is to provide the arguments for doubting that claim, not to play a verbal sleight-of-hand like Dennett's (1991 [], emphasis added): You are not authoritative about what is happening in you, but only about what seems to be happening in you, and we are giving you total, dictatorial authority over the account of how it seems to you, about what it is like to be you. And if you complain that some parts of how it seems to you are ineffable, we heterophenomenologists will grant that too. What better grounds could we have for believing that you are unable to describe something than that (1) you don’t describe it, and (2) confess that you cannot? Of course you might be lying, but we’ll give you the benefit of the doubt.It's intellectually dishonest of Dennett to use the word "ineffable" here to slide between the propositions "I'm unable to describe my experience" and "my experience isn't translatable in principle", as it is to slide between Nagel's term of art "what it's like to be you []" and "how it seems to you". Again, I agree with Dennett that Chalmers is factually wrong about his experience (and therefore lacks a certain degree of epistemic "authority" with me, though that's such a terrible way of phrasing it!). There are good Bayesian arguments against trusting autophenomenology enough for Chalmers' view to win the day (though Dennett isn't descr
When Chalmers claims to have "direct" epistemic access to certain facts, the proper response is to provide the arguments for doubting that claim, not to play a verbal sleight-of-hand like Dennett's (1991, emphasis added):

Chalmers' The Conscious Mind was written in 1996, so this is wrong. The wrongness doesn't seem important to me. (Jackson and Nagel were 1979/1982, and Dennett re-endorsed this passage in 2003.)

4Rob Bensinger6h A simple toy example would be: "You have perfect introspective access to everything about how your brain works, including how your sensory organs work. This allows you to deduce that your external sensory organs provide noise data most of the time, but provide accurate data about the environment anytime you wear blue sunglasses at night."
6Said Achmiz6h I confess I have trouble imagining this, but it doesn’t seem contradictory, so, fair enough, I take your point.

Several friends are collecting signatures to put Instant-runoff Voting, branded as Ranked Choice Voting, on the ballot in Massachusetts ( Ballotpedia, full text). I'm glad that an attempt to try a different voting method is getting traction, but I'm frustrated that they've chosen IRV. While every voting method has downsides, IRV is substantially worse than some other decent options.

Imagine that somehow the 2016 presidential election had been between Trump, Clinton, and Kasich, and preferences had looked like:

  • 35% of people: Trump, Kasich, Clinton
  • 14% of people: Kasich, Trump
... (Read more)

How common are Condorcet winners?

1Evan Rysdam6h About 2/3 of people prefer Kasich to any other candidate on offer [...]I think this is misleadingly phrased. It's true that Kasich wins by 2/3 to 1/3 no matter which other candicate you pit him against, but it's not true that 2/3 of people prefer him to any other candidate on offer. Only 14% + 17% = 1/3 of people have him as their first choice. Your thesis stands, though, and I've updated on it.

At any one time I usually have between 1 and 3 "big ideas" I'm working with. These are generally broad ideas about how some thing works with many implications for how the rest of the whole world works. Some big ideas I've grappled with over the years, in roughy historical order:

  • evolution
  • everything is computation
  • superintelligent AI is default dangerous
  • existential risk
  • everything is information
  • Bayesian reasoning is optimal reasoning
  • evolutionary psychology
  • Getting Things Done
  • game theory
  • developmental psychology
  • positive psychology
  • phenomenology
  • AI alignment is not defined precisely
... (Read more)
2James_Miller3h Most likely von Neumann had a combination of (1) lots of additive genes that increased intelligence, (2) few additive genes that reduced intelligence, (3) low mutational load, (4) a rare combination of non-additive genes that increased intelligence (meaning genes with non-linear effects) and (5) lucky brain development. A clone would have the advantages of (1)-(4). While it might in theory be possible to raise IQ by creating the proper learning environment, we have no evidence of having done this so it seems unlikely that this was the cause of von Neumann having high intelligence.

I am confused. You might be talking about g, not IQ, since we have very significant evidence that we can raise IQ by creating proper learning environments, given that most psychometrics researchers credit widespread education for a large fraction of the Flynn effect, and generally don't think that genetic changes explain much.

A 2017 survey of 75 experts in the field of intelligence research suggested four key causes of the Flynn effect: Better health, better nutrition, more and better education, and rising standards of living. Genetic changes were see
... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post
15Answer by vmsmith6h In October, 1991 an event of such profound importance happened in my life that I wrote the date and time down on a yellow sticky. That yellow sticky has long been lost, but I remember it; it was Thursday, October 17th at 10:22 am. The event was that I had plugged a Hayes modem into my 286 computer and, with a copy of Procomm, logged on to the Internet for the first time. I knew that my life had changed forever. At about that same time I wanted to upgrade my command line version of Word Perfect to their new GUI version. But the software was something crazy like $495, which I could not afford. One day I had an idea: "Wouldn't it be cool if you could log on to the Internet and use a word processing program sitting on a main frame or something located somewhere else? Maybe for a tiny fee or something." I mentioned this to the few friends I knew who were computer geeks, and they all scoffed. They said that software prices would eventually be so inexpensive as to make that idea a complete non-starter. Well, just look around. How many people are still buying software for their desktops and laptops? I've had about a dozen somewhat similar ideas over the years (although none of that magnitude). What I came to realize was that if I ever wanted to make anything like that happen, I would need to develop my own technical and related skills. So I got an MS in Information Systems Development, and a graduate certification in Applied Statistics, and I learned to be an OK R programmer. And I worked in jobs -- e.g., knowledge management -- where I thought I might have more "Ah ha!" ideas. The idea that eventually emerged -- although not in such an "Ah ha!" fashion -- was that the single biggest challenge in my life, and perhaps most peoples' lives, is the absolute deluge of information out there. And not just out there, but in our heads and in our personal information systems. The word "deluge" doesn't really even begin to describe it. So the big idea I am working on is what I
3johnswentworth6h I had not seen that, thank you.
Algorithms of Deception!
1110h5 min readShow Highlight

I might summarize the hansonian/elephant in the brain position thusly: sincerity is selected for.

2mr-hire3h This is interesting! The straightforward research program here seems to just be to study heuristics and biases, yes? I'm curious if you're going in a different direction.
1artifex6h Category gerrymandering doesn’t seem like a different algorithm from selective reporting. In both cases, the reporter is providing only part of the evidence.
Invisible Choices, Made by Default
42h1 min readShow Highlight

There are two popular language learning software platforms: Anki and Duolingo. Anki is hard, free and effective. Duolingo is easy, commercial and ineffective.

The number of Duolingo users far outstrips the number of Anki users. Duolingo has 8 million downloads on the Play Store. Anki has 40 thousand. So there are 200 Duolingo users for every Anki user[1]. If you ask a random language learner what software to use they'll probably suggest Duolingo. If you ask a random successful language learner what software to use they'll probably suggest Anki. Most language learners are unsuccessful.

It should

... (Read more)

This is a response to Abram's The Parable of Predict-O-Matic, but you probably don't need to read Abram's post to understand mine. While writing this, I thought of a way in which I think things could wrong with dualist Predict-O-Matic, which I plan to post in about a week. I'm offering a $100 prize to the first commenter who's able to explain how things might go wrong in a sufficiently crisp way before I make my follow-up post.


Currently, machine learning algorithms are essentially "Cartesian dualists" when it comes to themselves and their environment. (Not a philosophy major -- let

... (Read more)
1evhub7h that suggested to me that there were 2 instances of this info about Predict-O-Matic's decision-making process in the dataset whose description length we're trying to minimize. "De-duplication" only makes sense if there's more than one. Why is there more than one? ML doesn't minimize the description length of the dataset—I'm not even sure what that might mean—rather, it minimizes the description length of the model. And the model does contain two copies of information about Predict-O-Matic's decision-making process—one in its prediction process and one in its world model. The prediction machinery is in code, but this code isn't part of the info whose description length is attempting to be minimized, unless we take special action to include it in that info. That's the point I was trying to make previously. Modern predictive models don't have some separate hard-coded piece that does prediction—instead you just train everything. If you consider GPT-2, for example, it's just a bunch of transformers hooked together. The only information that isn't included in the description length of the model is what transformers are, but "what's a transformer" is quite different than "how do I make predictions." All of the information about how the model actually makes its predictions in that sort of a setup is going to be trained.
1John_Maxwell7h I think maybe what you're getting at is that if we try to get a machine learning model to predict its own predictions (i.e. we give it a bunch of data which consists of labels that it made itself), it will do this very easily. Agreed. But that doesn't imply it's aware of "itself" as an entity. And in some cases the relevant aspect of its internals might not be available as a conceptual building block. For example, a model trained using stochastic gradient descent is not necessarily better at understanding or predicting a process which is very similar to stochastic gradient descent. Furthermore, suppose that we take the weights for a particular model, mask some of those weights out, use them as the labels y, and try to predict them using the other weights in that layer as features x. The model will perform terribly on this because it's not the task that it was trained for. It doesn't magically have the "self-awareness" necessary to see what's going on. In order to be crisp about what could happen, your explanation also has to account for what clearly won't happen. BTW this thread also seems relevant: []

I think maybe what you're getting at is that if we try to get a machine learning model to predict its own predictions (i.e. we give it a bunch of data which consists of labels that it made itself), it will do this very easily. Agreed. But that doesn't imply it's aware of "itself" as an entity.

No, but it does imply that it has the information about its own prediction process encoded in its weights such that there's no reason it would have to encode that information twice by also re-encoding it as part of its knowledge of the world as well.


... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post


Internal Family Systems (IFS) is a psychotherapy school/technique/model which lends itself particularly well for being used alone or with a peer. For years, I had noticed that many of the kinds of people who put in a lot of work into developing their emotional and communication skills, some within the rationalist community and some outside it, kept mentioning IFS.

So I looked at the Wikipedia page about the IFS model, and bounced off, since it sounded like nonsense to me. Then someone brought it up again, and I thought that maybe I should reconsider. So I looked at the WP page again... (Read more)

2pjeby11h Huh. This does not resonate with my experience, but I will henceforth be on the lookout for this. To be fair, I doubt that my sample size of such individuals is statistically significant. But since in the few times a client has brought up IFS and either enthusiastically extolled it or seemed to be wanting me to validate it as something they should try, it seemed to me to be related to either the person's schema of helplessness (i.e., these parts are doing this to me), or of denial (i.e., I would be successful if I could just fix all these broken parts!), which IMO are both treating the parts metaphor as a way to support and sustain the very dysfunctions that were causing their problems in the first place. In general, I suspect people are naturally attracted to the worst possible modes of therapy for fixing their problems, at least if they know anything about the therapy in question! (And I include myself in that, since I've avoided therapy generally since a bad experience with it in college, and for a long time avoided any self-help modality that involved actually being self-compassionate or anything other than supporting my "fix my broken stuff so I can get on with life" attitude. It's possible that with the right approach and therapist I could potentially have changed faster, once you count all the time I spent researching and developing my methods, all the failures and blind alleys. But I'm happy with the outcome, since more people are being helped than just me, and getting people out of the kinds of pain I suffered is rewarding in its own way.)
2mr-hire14h Focusing focuses on a single "felt sense", rather than an integrated system of felt senses that aren't viewed as seperate. In general I think you're quite confused about how most people use the parts terminology if you think felt senses aren't referring to parts, which typically represent a "belief cluster" and visual, kinesthetic, or auditory representation of that belief cluster, often that's anthropomorphized. Note that parts can be different sizes, and you can have a "felt sense" related to a single belief, or clusters of beliefs. Actually, I'm generally confused because without the mental state used by Focusing, Core Transformation, the Work, and Sedona don't work properly, if at all. So I don't understand how it could be separate. Similarly, I can see how CBT could be considered dissociated, but not Focusing. You're confusing dissociation and integration here again, so I'll just address the dissociation part. Note that all the things I'm saying here are ORTHOGONAL to the issue of "parts". Yes, focusing is in one sense embodied and experiential as opposed to something like CBT. However, this stuff exists on a gradient, and in focusing the embodiment is explicitly dissociated from and viewed as other. Here's copypasta from twitter: Here's a quote from [] that points towards a dissociative stance: " When some concern comes, DO NOT GO INSIDE IT. Stand back, say "Yes, that’s there. I can feel that, there." Let there be a little space between you and that." I've heard an acquaintance describe a session with Anne Weiser-Cornell where they kept trying to say "this is my feeling" and she kept correcting to "this feeling in my body", which again is more of a dissociative stance. Now
2pjeby12h I've heard an acquaintance describe a session with Anne Weiser-Cornell where they kept trying to say "this is my feeling" and she kept correcting to "this feeling in my body", which again is more of a dissociative stance. I was under the impression that IFS calls that "unblending", just as ACT calls it "de-fusing". I personally view it more as a stance of detachment or curiosity neutral observation. But I don't object to someone saying "I feel X", because that's already one step removed from "X"! If somebody says, "everything is awful" they're blended or fused or whatever you want to call it. They're taking the map as equivalent to the territory. Saying, "It feels like everything is awful" or "I feel awful" is already one level of detachment, and an okay place to start from. In common psychotherapy, I believe the term "dissociation" is usually associated with much greater levels of detachment than this, unless you're talking about NLP. The difference in degree is probably why ACT and IFS and others have specialized terms like "unblending" to distinguish between this lesser level of detachment, and the type of dissociative experience that comes with say, trauma, where people experience themselves as not even being in their body. Honestly, if somebody is so "in their head" that they don't experience their feelings, I have to go the opposite route of making them more associated and less detached, and I have plenty of tools for provoking feelings in order to access them. I don't want complete dissociation from feelings, nor complete blending with them, and ISTM that almost everything on your chart is actually targeted at that same sweet spot or "zone" of detached-but-not-too-detached. In touch with your experience, but neither absorbed by it nor turning your back on it. Anyway, I think maybe I understand the terms you're using now, and hopefully you understand the ones I'm using. Within your model I still don't know what you'd call what I'm doing, since my "Collect
so I don't see where my approach actually belongs on your diagram, other than "everywhere". ;-)

I think a proper method should be everywhere. There's not a "correct" box, only a correct box for a given person at a given time in a given situation.

This post is for you if:

  1. Projects that excite you are growing to be a burden on your to-do list
  2. You have a nagging sense that you’re not making the most of the ideas you have every day
  3. Your note- and idea-system has grown to be an unwieldy beast

Years ago, I ready David Allen’s “Getting Things Done”. One of the core ideas is to write down everything, collect it in an inbox and sort it once a day.

This lead to me writing down tons of small tasks. I used Todoist to construct a system that worked for me — and rarely missed tasks.

It also lead to me getting a lot of id... (Read more)

5Jordan906h Could you post a link to Roam? Or tell me where to find it? Google and Google Play are drawing blanks.... Cheers!
1ryqiem9h Hi pjeby, thanks for your comments! Just to be clear, I have no affiliation with Roam nor am I part of their development. I'm a user just like everyone else. I use Workflowy for mobile capture and can copy to/from it just fine. I use Chrome on macOS for Roam (through Nativefier), so I don't know why that isn't consistent. I've added it to their bug-report (which currently lives on Slack, very alpha!) The interface scales really well, so if you want larger text (as I did), I highly recommend simply zooming in the browser. The reading may be a subjective thing, I quite like it. I'm sure interface customisations are going to be in the works. Linking to/from bullet-points and having backlinks show up is a large part of the draw for me.
2pjeby8h I use Workflowy for mobile capture and can copy to/from it just fine. Depending on the direction of copy/pasting, I either ended up with huge blobs of text in one item, or flat lists without any indentation. i.e., I couldn't manage structure-preserving interchange with any other tool except (ironically enough) my markdown editor, Typora. A bullet-point list or paragraphs from Typora would paste into Roam with structure, and I could also do the reverse. But markdown bullet point lists aren't really interchangeable with any other tools I use, so it's not a viable bridge to my other outlining tools.

This sounds like a bug. You might want to report it to Roaman on LW.

Find all Alignment Newsletter resources here. In particular, you can sign up, or look through this spreadsheet of all summaries that have ever been in the newsletter. I'm always happy to hear feedback; you can send it to me by replying to this email.

This is a bonus newsletter summarizing Stuart Russell's new book, along with summaries of a few of the most relevant papers. It's entirely written by Rohin, so the usual "summarized by" tags have been removed.

We're also changing the publishing schedule: so far, we've aimed to send a newsletter every Monday; we&a... (Read more)

As with the previous paper, this argument is only really a problem when the agent's belief about the reward function is wrong: if it is correct, then at the point where there is no more information to gain, the agent should already know that humans don't like to be killed, do like to be happy, etc.

There's also the scenario where the AI models the world in a way that has as good or better predictive power than our intentional stance model, but this weird model assigns undesirable values to the AI's co-player in the CIRL game. We can... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

12rohinmshah10h I mentioned in my opinion that I think many of my disagreements are because of an implicit disagreement on how we build powerful AI systems: the book has an implied stance towards the future of AI research that I don't agree with: I could imagine that powerful AI systems end up being created by learning alone without needing the conceptual breakthroughs that Stuart outlines.I didn't expand on this in the newsletter because I'm not clear enough on the disagreement; I try to avoid writing very confused thoughts that say wrong things about what other people believe in a publication read by a thousand people. But that's fine for a comment here! Rather than attribute a model to Stuart, I'm just going to make up a model that was inspired by reading HC, but wasn't proposed by HC. In this model, we get a superintelligent AI system that looks like a Bayesian-like system that explicitly represents things like "beliefs", "plans", etc. Some more details: * Things like 'hierarchical planning' are explicit algorithms. Simply looking at the algorithm can give you a lot of insight into how it does hierarchy. You can inspect things like "options" just by looking at inputs/outputs to the hierarchical planning module. The same thing applies for e.g. causal reasoning. * Any black box deep learning system is only used to provide low-level inputs to the real 'intelligence', in the same way that for humans vision provides low-level inputs for the rest of cognition. We don't need to worry about the deep learning system "taking over", in the same way that we don't worry about our vision module "taking over". * The AI system was created by breakthroughs in algorithms for causal reasoning, hierarchical planning, etc, that allow it to deal with the combinatorial explosion caused by the real world. As a result, it is very cheap to run (i.e. doesn't need a huge amount of compute). This is more compatible with a discontinuous takeoff, though a continuous
8rohinmshah10h If you're curious about how I select what goes in the newsletter: I almost put in this critical review [] of the book, in the spirit of presenting both sides of the argument. I didn't put it in because I couldn't understand it. My best guess right now is that the author is arguing that "we'll never get superintelligence", possibly because intelligence isn't a coherent concept, but there's probably something more that I'm not getting. If it turned out that it was only saying "we'll never get superintelligence", and there weren't any new supporting arguments, I wouldn't include it in the newsletter, because we've seen and heard that counterargument more than enough.
4TurnTrout6h They also made an error in implicitly arguing that because they didn't think unaligned behavior seems intelligent, then we have nothing to worry about from such AI - they wouldn't be "intelligent". I think leaving this out was a good choice.
Planned Power Outages
288d1 min readShow Highlight

With the dubiously motivated PG&E blackouts in California there are many stories about how lack of power is a serious problem, especially for people with medical dependencies on electricity. Examples they give include people who:

  • Have severe sleep apnea, and can't safely sleep without a CPAP.

  • Sleep on a mattress that needs continous electricity to prevent it from deflating.

  • Need to keep their insulin refrigerated.

  • Use a medicine delivery system that requires electricity every four hours to operate.

This outage was dangerous for them and others, but it also see... (Read more)

Thanks! Chubby planned outages were in fact one of the things I was thinking about in writing this, but I hadn't known that it was public outside Google.

2jkaufman7h I found Life Support DL Brochure_D02.pdf [] which seems to say: * You're responsible for figuring out backup power for your medical equipment * If you register with your utility they have to notify you before they turn off your power, but unexpected outages can still happen. This doesn't sound that different from most countries? And sounds much less strict that you were describing. Registering looks like visiting [] or the equivalent for your utility. I also found [] which gives what I think are the full rules with obligations for retailers and distributors, which doesn't change my understanding from above. The only way it looks like this would have been different in Australia is that the power company would have been required to give more notice.
2jkaufman7h Specifically, they talk about: "retailer planned interruptions", "distributor planned interruptions", and "unplanned interruptions". And then they say: * The retailer can't intentionally turn off the power except by following the rules for "retailer planned interruptions", which include "4 business days written notice". * Same for the distributor, for "distributor planned interruptions" I'm having trouble finding the official rules, but I found an example commercial contract ( [] ) which has: 12.2 Distributor planned interruptions (maintenance, repair, etc) 12.2.a We may make distributor planned interruptions to the supply of Energy to the Premises for the following purposes: 12.2.a.i for the maintenance, repair or augmentation of the Transmission System or the Distribution System, including maintenance of metering equipment; or 12.2.a.ii for the installation of a New Connection or a Connection Alteration to another Customer. 12.2.b If your Energy supply will be affected by a distributor planned interruption and clause 6.4(d)(iii) does not apply: 12.2.b.i we may seek your explicit consent to the Interruption occurring on a specified date; or 12.2.b.ii we may seek your explicit consent to the Interruption occurring on any day within a specified 5 Business Day range; or 12.2.b.iii otherwise, we will give you at least 4 Business Days notice of the Interruption by mail, letterbox drop, press advertisement or other appropriate means, or as specified in the Operating Protocol for your Premises. 12.3 Unplanned Interruptions 12.3.a We may interrupt the supply of Energy to your Premises in circumstances where

I like NLP's explanation of this. Submodalities like position and distance aren't common between people, but people DO tend to have similar representations with similar submodalities. I tend to be very kinesthetic with proprioceptive intuitions, but if instead I can say "do this task, wait for some sense, then tell me how you represent that", I can have them work with THEIR representation instead of mine.

This seemed to work decently well for teaching people strategies for overcoming Akrasia/procrastination, and I suspect with some twea... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

I've mentioned in posts twice (and previously in several comments) that I'm excited about predictive coding, specifically the idea that the human brain either is or can be modeled as a hierarchical system of (negative feedback) control systems that try to minimize error in predicting their inputs with some strong (possibly un-updatable) prediction set points (priors). I'm excited because I believe this approach better describes a wide range of human behavior, including subjective mental experiences, than any other theory of how the mind works, it's compatible with many othe... (Read more)

As of yet, no, although this brings up an interesting point, which is that I'm looking at this stuff to find a precise grounding because I don't think we can develop a plan that will work to our satisfaction without it. I realize lots of people disagree with me here, thinking that we need the method first and the value grounding will be worked out instrumentally by the method, but I dislike this because it makes it hard to verify the method than by observing what an AI produced by that method does, and this is a dangerous verification method due ... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

Facebook AI releases a new SOTA "weakly semi-supervised" learning system for video and image classification. I'm posting this here because even though it's about capabilities, the architecture includes a sort-of-similar-to amplification component where a higher capacity teacher decides how to train a lower capacity student model.

3gwern10h a sort-of-similar-to amplification component where a higher capacity teacher decides how to train a lower capacity student model. This is the first example I've seen of this overseer/machine-teaching style approach scaling up to such a data-hungry classification task. What's special there is the semi-supervised part (the training on unlabeled data to get pseudo-labels to then use in the student model's training). Using a high capacity teacher on hundreds of millions of images is not all that new: for example, Google was doing that on its JFT dataset (then ~100m noisily-labeled images) back in at least 2015, given "Distilling the Knowledge in a Neural Network" [], Hinton, Vinyals & Dean 2015. Or Gao et al 2017 which goes the other direction and tries to distill dozens of teachers into a single student using 400m images in 100k classes. (See also: Gross et al 2017 []/Sun et al 2017 []/Gao et al 2017 []/Shazeer et al 2018 []/Mahajan et al 2018 [] /Yalniz et al 2019 [] or GPipe scaling to 1663-layer/83.4b-parameter Transformers [])
1An1lam8h Interesting, I somehow hadn't seen this. Thanks! (Editing to reflect this as well.) I'm curious - even though this isn't new, do you agree with my vague claim that the fact that this and the paper you linked work pertains to the feasibility of amplification-style strategies?

I'm not sure. Typically, the justification for these sorts of distillation/compression papers is purely compute: the original teacher model is too big to run on a phone or as a service (Hinton), or too slow, or would be too big to run at all without 'sharding' it somehow, or it fits but training it to full convergence would take too long (Gao). You don't usually see arguments that the student is intrinsically superior in intelligence and so 'amplified' in any kind of AlphaGo-style way which is one of the more common examples for amplification. They do do s

... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post
2rohinmshah9h My opinion, also going into the newsletter: Like Matthew, I'm excited to see more work on transparency and adversarial training for inner alignment. I'm a somewhat skeptical of the value of work that plans to decompose future models into a "world model", "search" and "objective": I would guess that there are many ways to achieve intelligent cognition that don't easily factor into any of these concepts. It seems fine to study a system composed of a world model, search and objective in order to gain conceptual insight; I'm more worried about proposing it as an actual plan.
1evhub7h The point about decompositions is a pretty minor portion of this post; is there a reason you think that part is more worthwhile to focus on for the newsletter?

I'm not Rohin, but I think there's a tendency to reply to things you disagree with rather than things you agree with. That would explain my emphasis anyway.

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.

Welcome. This week we discuss the seventh section in the reading guideDecisive strategic advantage. This corresponds to Chapter 5.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to pro... (Read more)

I think you are looking at this wrong. Yes, they had help from local rebellions and malcontents. So would an AGI. An AGI taking over the world wouldn't necessarily look like robots vs. humans; it might look like the outbreak of World War 3 between various human factions, except that the AGI was manipulating things behind the scenes and/or acting as a "strategic advisor" to one of the factions. And when the dust settles, somehow the AGI is in charge...

So yeah, I think it really is fair to say that the Spanish managed to conquer empires of mil... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

The strategy-stealing assumptionΩ
541mo11 min readΩ 20Show Highlight

Suppose that 1% of the world’s resources are controlled by unaligned AI, and 99% of the world’s resources are controlled by humans. We might hope that at least 99% of the universe’s resources end up being used for stuff-humans-like (in expectation).

Jessica Taylor argued for this conclusion in Strategies for Coalitions in Unit-Sum Games: if the humans divide into 99 groups each of which acquires influence as effectively as the unaligned AI, then by symmetry each group should end, up with as much influence as the AI, i.e. they should end up with 99% of the influence.

This argument rests on what I... (Read more)

I wrote this post imagining "strategy-stealing assumption" as something you would assume for the purpose of an argument, for example I might want to justify an AI alignment scheme by arguing "Under a strategy-stealing assumption, this AI would result in an OK outcome." The post was motivated by trying to write up another argument where I wanted to use this assumption, spending a bit of time trying to think through what the assumption was, and deciding it was likely to be of independent interest. (Although that hasn't yet appeared i... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

Load More