All of Buck's Comments + Replies

Discussion with Eliezer Yudkowsky on AGI interventions

By "checkable" do you mean "machine checkable"?

I'm confused because I understand you to be asking for a bound on the derivative of an EfficientNet model, but it seems quite easy (though perhaps kind of a hassle) to get this bound.

I don't think the floating point numbers matter very much (especially if you're ok with the bound being computed a bit more loosely).

3Zac Hatfield Dodds14dAh, crux: I do think the floating-point matters! Issues of precision, underflow, overflow, and NaNs bedevil model training and occasionally deployment-time behavior. By analogy, if we deploy an AGI the ideal mathematical form of which is aligned we may still be doomed, even it's plausibly our best option in expectation. Checkable meaning that I or someone I trust with this has to be able to check it! Maxwell's proposal is simple enough that I can reason through the whole thing, even over float32 rather than R, but for more complex arguments I'd probably want it machine-checkable for at least the tricky numeric parts.
Discussion with Eliezer Yudkowsky on AGI interventions

Take an EfficientNet model with >= 99% accuracy on MNIST digit classification. What is the largest possible change in the probability assigned to some class between two images, which differ only in the least significant bit of a single pixel? Prove your answer before 2023.


You aren't counting the fact that you can pretty easily bound this based on the fact that image models are Lipschitz, right? Like, you can just ignore the ReLUs and you'll get an upper bound by looking at the weight matrices. And I believe there are techniques that let you get tighter bounds than this.

3Zac Hatfield Dodds14dIf you can produce a checkable proof of this over the actual EfficientNet architecture, I'd pay out the prize. Note that this uses floating-point numbers, not the reals!
Discussion with Eliezer Yudkowsky on AGI interventions

Am I correct that you wouldn't find a bound acceptable, you specifically want the exact maximum?

4Buck14dYou aren't counting the fact that you can pretty easily bound this based on the fact that image models are Lipschitz, right? Like, you can just ignore the ReLUs and you'll get an upper bound by looking at the weight matrices. And I believe there are techniques that let you get tighter bounds than this.
3Zac Hatfield Dodds15dI'd award half the prize for a non-trivial bound.
Redwood Research’s current project

Suppose you have three text-generation policies, and you define "policy X is better than policy Y" as "when a human is given a sample from both policy X and policy Y, they prefer the sample from the latter more than half the time". That definition of "better" is intransitive.

2adamShimi1moHum, I see. And is your point that it should not create a problem because you're only doing comparison X vs Y and Z vs Y (where Y is the standard policy and X and Z are two of your conservative policies) but you don't really care about the comparison between X and Z?
Redwood Research’s current project

Thanks, glad to hear you appreciate us posting updates as we go.

Redwood Research’s current project

You're totally right that we'll probably have low quality on those prompts. But we're defining quality with respect to the overall prompt distribution, and so as long as prompts that can't be realistically completed non-injuriously are rare, our average quality won't take that big a hit.

1Julian_R1moI was confused by Buck's response here because I thought we were going for worst-case quality until I realised: 1. The model will have low quality on those prompts almost by definition - that's the goal. 2. Given that, we also want to have a generally useful model - for which the relevant distribution is 'all fanfiction', not "prompts that are especially likely to have a violent continuation". In between those two cases is 'snippets that were completed injuriously in the original fanfic ... but could plausibly have non-violent completions', which seems like the interesting case to me. I suppose one possibility is to construct a human-labelled dataset of specifically these cases to evaluate on.
Redwood Research’s current project

We've now added this visual feedback, thanks for the suggestion :)

Redwood Research’s current project

So note that we're actually working on the predicate "an injury occurred or was exacerbated", rather than something about violence (I edited out the one place I referred to violence instead of injury in the OP to make this clearer).

The reason I'm not that excited about finding this latent is that I suspect that the snippets that activate it are particularly easy cases--we're only interested in generating injurious snippets that the classifier is wrong about.

For example, I think that the model is currently okay with dropping babies probably because it doesn... (read more)

Redwood Research’s current project

We're tried some things kind of like this, though less sophisticated. The person who was working on this might comment describing them at some point.

One fundamental problem here is that I'm worried that finding a "violence" latent is already what we're doing when we fine-tune. And so I'm worried that the classifier mistakes that will be hardest to stamp out are those that we can't find through this kind of process.

I have an analogous concern with the "make the model generate only violent completions"--if we knew how to define "violent", we'd already be don... (read more)

5gwern2moControlling the violence latent would let you systematically sample for it: you could hold the violence latent constant, and generate an evenly spaced grid of points around it to get a wide diversity of violent but stylistically/semantically unique. Kinds of text which would be exponentially hard to find by brute force sampling can be found this way easily. It also lets you do various kinds of guided search or diversity sampling, and do data augmentation (encode known-violent samples into their latent, hold the violent latent constant, generate a bunch of samples 'near' it). Even if the violence latent is pretty low quality, it's still probably a lot better as an initialization for sampling than trying to brute force random samples and running into very rapidly diminishing returns as you try to dig your way into the tails. And if you can't do any of that because there is no equivalent of a violent latent or its equivalent is clearly too narrow & incomplete, that is pretty important, I would think. Violence is such a salient category, so frequent in fiction and nonfiction (news), that a generative model which has not learned it as a concept is, IMO, probably too stupid to be all that useful as a 'model organism' of alignment. (I would not expect a classifier based on a failed generative model to be all that useful either.) If a model cannot or does not understand what 'violence' is, how can you hope to get a model which knows not to generate violence, can recognize violence, can ask for labels on violence, or do anything useful about violence?
The theory-practice gap

Yeah, I talk about this in the first bullet point here (which I linked from the "How useful is it..." section).

The alignment problem in different capability regimes

One crucial concern related to "what people want" is this seems underdefined, un-stable in interactions with wildly superintelligent systems, and prone to problems with scaling of values within systems where intelligence increases.

This is what I was referring to with

by assumption the superintelligence will be able to answer any question you’re able to operationalize about human values

The superintelligence can answer any operationalizable question about human values, but as you say, it's not clear how to elicit the right operationalization.

The alignment problem in different capability regimes

Re the negative side effect avoidance: Yep, you're basically right, I've removed side effect avoidance from that list.

And you're right, I did mean "it will be able to" rather than "it will"; edited.

The alignment problem in different capability regimes

I think this is a reasonable definition of alignment, but it's not the one everyone uses.

I also think that for reasons like the "ability to understand itself" thing, there are pretty interesting differences in the alignment problem as you're defining it between capability levels.

2Edouard Harris2moOne reason to favor such a definition of alignment might be that we ultimately need a definition that gives us guarantees that hold at human-level capability or greater, and humans are probably near the bottom of the absolute scale of capabilities that can be physically realized in our world. It would (imo) be surprising to discover a useful alignment definition that held across capability levels way beyond us, but that didn't hold below our own modest level of intelligence.
Buck's Shortform

[this is a draft that I shared with a bunch of friends a while ago; they raised many issues that I haven't addressed, but might address at some point in the future]

In my opinion, and AFAICT the opinion of many alignment researchers, there are problems with aligning superintelligent models that no alignment techniques so far proposed are able to fix. Even if we had a full kitchen sink approach where we’d overcome all the practical challenges of applying amplification techniques, transparency techniques, adversarial training, and so on, I still wouldn’t feel... (read more)

1TekhneMakre3moI appreciate your points, and I don't think I see significant points of disagreement. But in terms of emphasis, it seems concerning to be putting effort into (what seems like) rationalizing not updating that a given approach doesn't have a hope of working. (Or maybe more accurately, that a given approach won't lead to a sufficient understanding that we could know it would work, which (with further argument) implies that it will not work.) Like, I guess I want to amplify your point > But I think it’s also quite important for people to remember that they’re insufficient, and that they don’t suffice to solve the whole problem on their own. and say further that one's stance to the benefit of working on things with clearer metrics of success, would hopefully include ongoingly noticing everyone else's stance to that situation. If a given unit of effort can only be directed towards marginal things, then we could ask (for example): What would it look like to make cumulative marginal progress towards, say, improving our ability to propose better approaches, rather than marginal progress on approaches that we know won't resolve the key issues?
0Chantiel3moPotentially people could have the cost function of an AI's model have include its ease of interpretation by humans a factor. Having people manually check every change in a model for its effect on interperability would be too slow, but an AI could still periodically check its current best model with humans and learn a different one if it's too hard to interpret. I've seen a lot of mention of the importance of safe AI being competitive with non-safe AI. And I'm wondering what would happen if the government just illegalized or heavily taxed the use of the unsafe AI techniques. Then even with significant capability increases, it wouldn't be worthwhile to use them. Is there something very doubtful about governments creating such a regulation? I mean, I've already heard some people high in the government concerned about AI safety. And the Future of Life institute got the Californian government to unanimously pass the Asilomar AI Principles. It includes things about AI safety, like rigidly controlling any AI that can recursively self-improve. It sounds extremely dangerous having widespread use of powerful, unaligned AI. So simply to protect their selves and families, they could potentially benefit a lot from implementing such regulations.
-1Zack_M_Davis3moA key psychological advantage of the "modest alignment" agenda is that it's not insanity-inducing. When I seriously contemplate the problem of selecting a utility function to determine the entire universe until the end of time, I want to die (which seems safer and more responsible). But the problem of making language models "be honest" instead of just continuing the prompt? That's more my speed; that, I can think about, and possibly even usefully contribute to, without wanting to die. (And if someone else in the future uses honest language models as one of many tools to help select a utility function to determine the entire universe until the end of time, that's not my problem and not my fault.)
1Pattern3moThat may be 'the best we could hope for', but I'm more worried about 'we can't understand the neural net (with the tools we have)' than "the neural net is doing things that rely on concepts that it’s fundamentally impossible for humans to understand". (Or, solving the task requires concepts that are really complicated to understand (though maybe easy for humans to understand), and so the neural network doesn't get it.) Whether or not "empirical contingencies work out nicely", I think the concern about 'fundamentally impossible to understand concepts" is...something that won't show up in every domain. (I also think that things do exist that people can understand, but it takes a lot of work, so people don't do it. There's an example from math involving some obscure theorems that aren't used a lot for that reason.)
7Steven Byrnes3moI wonder what you mean by "competitive"? Let's talk about the "alignment tax" framing [] . One extreme is that we can find a way such that there is no tradeoff whatsoever between safety and capabilities—an "alignment tax" of 0%. The other extreme is an alignment tax of 100%—we know how to make unsafe AGIs but we don't know how to make safe AGIs. (Or more specifically, there are plans / ideas that an unsafe AI could come up with and execute, and a safe AI can't, not even with extra time/money/compute/whatever.) I've been resigned to the idea that an alignment tax of 0% is a pipe dream—that's just way too much to hope for, for various seemingly-fundamental reasons like humans-in-the-loop being more slow and expensive than humans-out-of-the-loop (more discussion here [] ). But we still want to minimize the alignment tax, and we definitely want to avoid the alignment tax being 100%. (And meanwhile, independently, we try to tackle the non-technical problem of ensuring that all the relevant players are always paying the alignment tax.) I feel like your post makes more sense to me when I replace the word "competitive" with something like "arbitrarily capable" everywhere (or "sufficiently capable" in the bootstrapping approach where we hand off AI alignment research to the early AGIs). I think that's what you have in mind?—that you're worried these techniques will just hit a capabilities wall, and beyond that the alignment tax shoots all the way to 100%. Is that fair? Or do you see an alignment tax of even 1% as an "insufficient strategy"?
Taboo "Outside View"

I really liked this post, thanks so much for writing it. I have been very frustrated by people conflating these different meanings of "outside view" in the past.

Is MIRI actually hiring and does Buck Shlegeris still work for you?

I think Anna and Rob answered the main questions here, but for the record I am still in the business of talking to people who want to work on alignment stuff. (And as Anna speculated, I am indeed still the person who processes MIRI job applications.)

5Viliam9moJust curious: would completing this game [] qualify as being able to get a position at MIRI? I am not actually trying to get a job at MIRI, I just played that game recently, so I am curious if that implies anything.
Buck's Shortform

I know a lot of people through a shared interest in truth-seeking and epistemics. I also know a lot of people through a shared interest in trying to do good in the world.

I think I would have naively expected that the people who care less about the world would be better at having good epistemics. For example, people who care a lot about particular causes might end up getting really mindkilled by politics, or might end up strongly affiliated with groups that have false beliefs as part of their tribal identity.

But I don’t think that this prediction is true: I... (read more)

2Richard_Ngo2moThese both seem pretty common, so I'm curious about the correlation that you've observed. Is it mainly based on people you know personally? In that case I expect the correlation not to hold amongst the wider population. Also, a big effect which probably doesn't show up much amongst the people you know: younger people seem more altruistic (or at least signal more altruism) and also seem to have worse epistemics than older people.
4Viliam10moCaring about things seems to make you interact with the world in more diverse ways (because you do this in addition to things other people do, not instead of); some of that translates into more experience and better models. But also tribal identity, mindkilling, often refusing to see the reasons why your straightforward solution would not work, and uncritical contrarianism. Now I think about a group of people I know, who care strongly about improving the world, in the one or two aspects they focus on. They did a few amazing things and gained lots of skills; they publish books, organize big conferences, created a network of like-minded people in other countries; some of their activities are profitable, for others they apply for various grants and often get them, so some of them improve the world as a full-time job. They also believe that covid is a hoax, plus have lots of less fringe but still quite irrational beliefs. However... this depends on how you calculate the "total rationality", but seems to me that their gains in near mode outweigh the losses in far mode, and in some sense I would call them more rational than average population. Of course I dream about a group that would have all the advantages and none of the disadvantages.
Buck's Shortform

I used to think that slower takeoff implied shorter timelines, because slow takeoff means that pre-AGI AI is more economically valuable, which means that economy advances faster, which means that we get AGI sooner. But there's a countervailing consideration, which is that in slow takeoff worlds, you can make arguments like ‘it’s unlikely that we’re close to AGI, because AI can’t do X yet’, where X might be ‘make a trillion dollars a year’ or ‘be as competent as a bee’. I now overall think ... (read more)

4Sammy Martin1yI wrote a whole post on modelling specific continuous or discontinuous scenarios [] - in the course of trying to make a very simple differential equation model of continuous takeoff, by modifying the models given by Bostrom/Yudkowsky for fast takeoff, the result that fast takeoff means later timelines naturally jumps out. But that model relies on pre-setting a fixed 'threshold for AGI, given by the parameter AGI, in advance. This, along with the starting intelligence of the system, fixes how far away AGI is. You could (I might get round to doing this), model the effect you're talking about by allowing IAGI to vary with the level of discontinuity. So every model would start with the same initial intelligence I0, but the IAGI would be correlated with the level of discontinuity, with larger discontinuity implying IAGI is smaller. That way, you would reproduce the epistemic difference of expecting a stronger discontinuity - that the current intelligence of AI systems is implied to be closer to what we'd expect to need for explosive growth on discontinuous takeoff scenarios than on continuous scenarios. We know the current level of capability and the current rate of progress, but we don't know I_AGI, and holding all else constant slow takeoff implies I_AGI is a significantly higher number (again, I_AGI is relative to the starting intelligence of the system) This is because my model was trying to model different physical situations, different ways AGI could be, not different epistemic situations, so I was thinking in terms of I_AGI being some fixed, objective value that we just don't happen to know. I'm uncertain if there's a rigorous way of quantifying how much this epistemic update does against the physical fact that continuous takeoff implies an earlier acceleration above exponential. If you're right, it overall completely cancels this effect out and makes timelines on discontinuous tak
How good is humanity at coordination?

I don't really know how to think about anthropics, sadly.

But I think that it's pretty likely that nuclear war could have not killed everyone. So I still lose Bayes points compared to the world where nukes were fired but not everyone died.

Nuclear war doesn't have to kill everyone to make our world non-viable for anthropic reasons. It just has to render our world unlikely to be simulated.

6Rafael Harth1yYeah, the "we didn't observe nukes going off" observation is definitely still some evidence for the "humans are competent at handling dangerous technology" hypothesis, but (if one buys into the argument I'm making) it's much weaker evidence than one would naively think.
$1000 bounty for OpenAI to show whether GPT3 was "deliberately" pretending to be stupider than it is
It's tempting to anthropomorphize GPT-3 as trying its hardest to make John smart. That's what we want GPT-3 to do, right?

I don't feel at all tempted to do that anthropomorphization, and I think it's weird that EY is acting as if this is a reasonable thing to do. Like, obviously GPT-3 is doing sequence prediction--that's what it was trained to do. Even if it turns out that GPT-3 correctly answers questions about balanced parens in some contexts, I feel pretty weird about calling that "deliberately pretending to be stupider than it is".

I don't feel at all tempted to do that anthropomorphization, and I think it's weird that EY is acting as if this is a reasonable thing to do.

"It's tempting to anthropomorphize GPT-3 as trying its hardest to make John smart" seems obviously incorrect if it's explicitly phrased that way, but e.g. the "Giving GPT-3 a Turing Test" post seems to implicitly assume something like it:

This gives us a hint for how to stump the AI more consistently. We need to ask questions that no normal human would ever talk about.

Q: How m
... (read more)
4ESRogs1yYeah, it seems like deliberately pretending to be stupid here would be predicting a less likely sequence, in service of some other goal.
Possible takeaways from the coronavirus pandemic for slow AI takeoff

If the linked SSC article is about the aestivation hypothesis, see the rebuttal here.

2Alexei1yNo the SSC article is about how you should pay attention to the distribution of aliens in different universes, and that in most of them you won’t find any, while the mean can still be pretty high.
Six economics misconceptions of mine which I've resolved over the last few years

Remember that I’m not interested in evidence here, this post is just about what the theoretical analysis says :)

In an economy where the relative wealth of rich and poor people is constant, poor people and rich people both have consumption equal to their income.

1Yosarian T1yDon't rich people tend to die with a significant portion of their lifetime income unspent, while poor people don't?
4Anon User1yFirst, poor have lower savings rate, and consume faster, so money velocity is higher. Second, minimal wages are local, and I would imagine that poor people on average spend a bigger fraction of their consumption locally (but I am not as certain about this one).
Six economics misconceptions of mine which I've resolved over the last few years

I agree that there's some subtlety here, but I don't think that all that happened here is that my model got more complex.

I think I'm trying to say something more like "I thought that I understood the first-order considerations, but actually I didn't." Or "I thought that I understood the solution to this particular problem, but actually that problem had a different solution than I thought it did". Eg in the situations of 1, 2, and 3, I had a picture in my head of some idealized market, and I had false beliefs about wh... (read more)

Six economics misconceptions of mine which I've resolved over the last few years

I agree that the case where there are several equilibrium points that are almost as good for the employer is the case where the minimum wage looks best.

Re point 1, note that the minimum wage decreases total consumption, because it reduces efficiency.

4gbear6051yA minimum wage decreases total consumption in some situations but not in all situations. In a world consisting of ten poor people and a single rich person, where buying the bare minimum food costs $1/day and buying comfort costs $10/day, the only way for the poor people to get money is to work for the rich person, and he is willing to hire them for any wage up to $100/day, the equilibrium wage would be $1/day. The result is that the rich person would only be spending $20/day (his own food, the wages of the nine poor people, and his own comfort) and each poor person would be spending $1/day, their entire wage. If a minimum wage of $11/day were instituted, all the poor people would be hired for $11/day, with all ten of the people having food to survive and having comfort. Consumption would go up significantly and everyone would be better off.
8Lucas20001yIs there actual evidence that a minimum wage decreases total consumption? I've never heard that, or seen any study on it, and I'd like to learn more. (Intuitively, it doesn't seem highly plausible to me, since my assumption would be that it transfers wealth from rich people to poor people, which should increase total consumption, because there's more room for consumption growth for poorer people, but I'm also not sure if that is true.) (Edit: after a cursory search of current research on the topic, it seems that the consensus is rather that a minimum wage has a small positive effect on consumption, which is what I would have naively expected.)
What will be the big-picture implications of the coronavirus, assuming it eventually infects >10% of the world?

I've now made a Guesstimate here. I suspect that it is very bad and dumb; please make your own that is better than mine. I'm probably not going to fix problems with mine. Some people like Daniel Filan are confused by what my model means; I am like 50-50 on whether my model is really dumb or just confusing to read.

Also don't understand this part. "4x as many mild cases as severe cases" is compatible with what I assumed (10%-20% of all cases end up severe or critical) but where does 3% come from?

Yeah my text was wrong here; I meant that I think you get 4

... (read more)
7DanielFilan2yFor what it's worth I don't see why the guesstimate makes sense - it assumes that the only people who die are those who get the disease during oxygen shortages, which seems wrong to me. [EDIT: it's possible that I'm confused about what the model means, the way to check this would be to see if I believe something false about it and then correct my belief]
1Lanrian2yMy impression is that the WHO has been dividing up confirmed cases into mild/moderate (≈80%) and severe/critical (20%). The guesstimate model assumes that there are 80% "mild" cases, and 20% "confirmed" cases, which is inconsistent with WHO's terminology. If you got the 80%-number from WHO or some other source using similar terminology, I'd recommend changing it. If you got it from a source explicitly talking about asymptomatic cases or so-mild-that-you-don't-go-to-the-doctor, then it seems fine to keep it (but maybe change the name). Edit: Wikipedia [] says that Diamond Princess had 392/705 asymptomatic cases by 26th February. Given that some of the patients might go on to develop symptoms later on, ≈55% might be an upper bound of asymptomatic cases? Some relevant quotes from WHO-report [] (mostly to back up my claims about terminology; Howie questions the validity of the last sentences [] further down in this thread):
What will be the big-picture implications of the coronavirus, assuming it eventually infects >10% of the world?

Oh yeah I'm totally wrong there. I don't have time to correct this now. Some helpful onlooker should make a Guesstimate for all this.

What will be the big-picture implications of the coronavirus, assuming it eventually infects >10% of the world?

Epistemic status: I don't really know what I'm talking about. I am not at all an expert here (though I have been talking to some of my more expert friends about this).

EDIT: I now have a Guesstimate model here, but its results don't really make sense. I encourage others to make their own.

Here's my model: To get such a large death toll, there would need to be lots of people who need oxygen all at once and who can't get it. So we need to multiply the proportion of people who might have be infected all at once by the fatality rate for such people. I'm going to

... (read more)
4Elizabeth2yAm I correct that you're assuming a percentage chance of access to oxygen or a ventilator, rather than a cut off after which we run out of ventilators?

In places with aggressive testing, like Diamond Princess and South Korea, you see much lower fatality rates, which suggests that lots of cases are mild.

With South Korea, I think most cases have not had enough time to progress to fatality yet. With Diamond Princess, there are 7 deaths out of 707 detected cases so far, with more than half of the cases still active. I'm not sure how you concluded from this "that lots of cases are mild". Please explain more? That page does say only 35 serious or critical cases, but I suspect this is probably because the pas

... (read more)
5Daniel Kokotajlo2yHow much oxygen is there to go round? Why think that everyone getting sick in one month will exhaust supplies but not if everyone gets sick in six months? I'd guess that there is very little oxygen to go round.
3Lanrian2yI don't understand how you get those kinds of numbers from the fb-comment, they're way too high. Maybe you mean fatality of severe or critical cases, or survival rates rather than fatality rates. Do you mind clarifying? Are you saying that the flu is more transmissible than corona? I think I've read that corona is spreading faster, but I don't have a good source, so I'd be curious if you do.
What will be the big-picture implications of the coronavirus, assuming it eventually infects >10% of the world?

Just for the record, I think that this estimate is pretty high and I'd be pretty surprised if it were true; I've talked to a few biosecurity friends about this and they thought it was too high. I'm worried that this answer has been highly upvoted but there are lots of people who think it's wrong. I'd be excited for more commenters giving their bottom line predictions about this, so that it's easier to see the spread.

Wei_Dai, are you open to betting about this? It seems really important for us to have well-calibrated beliefs about this.

Yeah, I kind of wrote that in a hurry to highlight the implications of one particular update that I made (namely that if hospitals are overwhelmed the CFR will become much higher), and didn't mean to sound very confident or have it be taken as the LW consensus. (Maybe some people also upvoted it for the update rather than for the bottom line prediction?)

I do still stand by it in the sense that I think there's >50% chance that global death rate will be >2.5%. Instead of betting about it though, maybe you could try to convince me otherwise? E.g., what's the weakest part of my argument/model, or what's your prediction and how did you arrive at it?

AIRCS Workshop: How I failed to be recruited at MIRI.

(I'm unsure whether I should write this comment referring to the author of this post in second or third person; I think I'm going to go with third person, though it feels a bit awkward. Arthur reviewed this comment before I posted it.)

Here are a couple of clarifications about things in this post, which might be relevant for people who are using it to learn about the MIRI recruiting process. Note that I'm the MIRI recruiter Arthur describes working with.

General comments:

I think Arthur is a really smart, good programmer. Arthur doesn't have as much backgroun

... (read more)


Thank you for your long and detailed answer. I'm amazed that you were able to do it so quickly after the post's publication. Especially since you sent me your answer by email while I just published my post on LW without showing it to anyone first.

Arthur reports a variety of people in this post as saying things that I think are somewhat misinterpreted, and I disagree with several of the things he describes them as saying.

I added a link to this comment in the top of the post. I am not surprised to learn that I misunderstood some things which were said

... (read more)
We run the Center for Applied Rationality, AMA

For the record, parts of that ratanon post seem extremely inaccurate to me; for example, the claim that MIRI people are deferring to Dario Amodei on timelines is not even remotely reasonable. So I wouldn't take it that seriously.

7Ben Pace2yHuh, thanks for the info, I'm surprised to hear that. I myself had heard that rumour, saying that at the second FLI conference Dario had spoken a lot about short timelines and now everyone including MIRI was scared. IIRC I heard it from some people involved in ML who were in attendance of that conference, but I didn't hear it from anyone at MIRI. I never heard much disconfirmatory evidence, and it's certainly been a sort-of-belief that's bounced around my head for the past two or so years.

Agreed I wouldn’t take the ratanon post too seriously. For another example, I know from living with Dario that his motives do not resemble those ascribed to him in that post.

Let's talk about "Convergent Rationality"

In OpenAI's Roboschool blog post:

This policy itself is still a multilayer perceptron, which has no internal state, so we believe that in some cases the agent uses its arms to store information.

Buck's Shortform

formatting problem, now fixed

Aligning a toy model of optimization

Given a policy π we can directly search for an input on which it behaves a certain way.

(I'm sure this point is obvious to Paul, but it wasn't to me)

We can search for inputs on which a policy behaves badly, which is really helpful for verifying the worst case of a certain policy. But we can't search for a policy which has a good worst case, because that would require using the black box inside the function passed to the black box, which we can't do. I think you can also say this as "the black box is an NP oracle, not a oracle".

This still means that w

... (read more)
Robustness to Scale

I think that the terms introduced by this post are great and I use them all the time

Six AI Risk/Strategy Ideas

Ah yes this seems totally correct

Buck's Shortform

[I'm not sure how good this is, it was interesting to me to think about, idk if it's useful, I wrote it quickly.]

Over the last year, I internalized Bayes' Theorem much more than I previously had; this led me to noticing that when I applied it in my life it tended to have counterintuitive results; after thinking about it for a while, I concluded that my intuitions were right and I was using Bayes wrong. (I'm going to call Bayes' Theorem "Bayes" from now on.)

Before I can tell you about that, I need to make sure you're thinking about Bayes in terms of ratios

... (read more)
1Liam Donovan2yWhat does "120:991" mean here?
7Ben Pace2yTime to record my thoughts! I won't try to solve it fully, just note my reactions. Well, firstly, I'm not sure that the likelihood ratio is 12x in favor of the former hypothesis. Perhaps likelihood of things clusters - like people either do things a lot, or they never do things. It's not clear to me that I have an even distribution of things I do twice a month, three times a month, four times a month, and so on. I'd need to think about this more. Also, while I agree it's a significant update toward your friend being a regular there given that you saw them the one time you went, you know a lot of people, and if it's a popular place then the chances of you seeing any given friend is kinda high, even if they're all irregular visitors. Like, if each time you go you see a different friend, I think it's more likely that it's popular and lots of people go from time to time, rather than they're all going loads of times each. I don't quite get what's going on here. As someone from Britain, I regularly walk through more than 6 cars of a train. The anthropics just checks out. (Note added 5 months later: I was making a british joke here.)
Open & Welcome Thread - November 2019

Email me at with some more info about you and I might be able to give you some ideas (and we can maybe talk about things you could do for ai alignment more generally)

Six AI Risk/Strategy Ideas

Minor point: I think asteroid strikes are probably very highly correlated between Everett branches (though maybe the timing of spotting an asteroid on a collision course is variable).

9Wei_Dai2yI think if we could look at all the Everett branches that contain some version of you, we'd see "bundles" where the asteroid locations are the same within each bundle but different between bundles, because different bundles evolved from different starting conditions (and then converged in terms of having produced someone who is subjectively indistinguishable from you). So a big asteroid strike would wipe out humanity in an entire bundle but that would only constitute a small fraction of all the Everett branches that contain a version of you. Hopefully that makes sense?
Buck's Shortform

A couple weeks ago I spent an hour talking over video chat with Daniel Cantu, a UCLA neuroscience postdoc who I hired on to spend an hour answering a variety of questions about neuroscience I had. (Thanks Daniel for reviewing this blog post for me!)

The most interesting thing I learned is that I had quite substantially misunderstood the connection between convolutional neural nets and the human visual system. People claim that these are somewhat bio-inspired, and that if you look at early layers of the visual cortex you'll find that it operates k

... (read more)
Buck's Shortform

I recommend looking on Wyzant.

Buck's Shortform

I think that an extremely effective way to get a better feel for a new subject is to pay an online tutor to answer your questions about it for an hour.

It turns that there are a bunch of grad students on Wyzant who mostly work tutoring high school math or whatever but who are very happy to spend an hour answering your weird questions.

For example, a few weeks ago I had a session with a first-year Harvard synthetic biology PhD. Before the session, I spent a ten-minute timer writing down things that I currently didn't get about biology. (This is an exercise wo

... (read more)

Hired an econ tutor based on this.

4magfrump2yHow do you connect with tutors to do this? I feel like I would enjoy this experience a lot and potentially learn a lot from it, but thinking about figuring out who to reach out to and how to reach out to them quickly becomes intimidating for me.
2Ben Pace2yThis sounds like a really fun thing I can do at weekends / in the mornings [] . I’ll try it out and report back sometime.
2Chris_Leong2yThanks for posting this. After looking, I'm definitely tempted.

I've hired tutors around 10 times while I was studying at UC-Berkeley for various classes I was taking. My usual experience was that I was easily 5-10 times faster in learning things with them than I was either via lectures or via self-study, and often 3-4 one-hour meetings were enough to convey the whole content of an undergraduate class (combined with another 10-15 hours of exercises).

"Other people are wrong" vs "I am right"

I'm confused about what point you're making with the bike thief example. I'm reading through that post and its comments to see if I can understand your post better with that as background context, but you might want to clarify that part of the post (with a reader who doesn't have that context in mind).

Can you clarify what is unclear about it?

Current AI Safety Roles for Software Engineers
I believe they would like to hire several engineers in the next few years.

We would like to hire many more than several engineers--we want to hire as many people as engineers as possible; this would be dozens if we could, but it's hard to hire, so we'll more likely end up hiring more like ten over the next year.

I think that MIRI engineering is a really high impact opportunity, and I think it's definitely worth the time for EA computer science people to apply or email me (

Weird question: could we see distant aliens?

My main concern with this is the same as the problem listed on Wei Dai's answer: whether a star near us is likely to block out this light. The sun is about 10^9m across. A star that's 10 thousand light years away (this is 10% of the diameter of the Milky Way) occupies about (1e9m / (10000 lightyears * 2 * pi))**2 = 10^-24 of the night sky. A galaxy that's 20 billion light years away occupies something like (100000 lightyears / 20 billion lightyears) ** 2 ~= 2.5e-11. So galaxies occupy more space than stars. So it would be weird if individual stars blocked out a whole galaxy.

Weird question: could we see distant aliens?

Another piece of idea: If you're extremely techno-optimistic, then I think it would be better to emit light at weird wavelengths than to just emit a lot of light. Eg emitting light at two wavelengths with ratio pi or something. This seems much more unmistakably intelligence-caused than an extremely bright light.

2paulfchristiano4ySame question as Michael: if there were a point source with weird spectrum outside of any galaxy, about as bright as the average galaxy, would we reliably notice it?
Weird question: could we see distant aliens?

My first idea is to make two really big black holes and then make them merge. We observed gravitational waves from two black holes with solar masses of around 25 solar masses each located 1.8 billion light years away. Presumably this force decreases as an inverse square times exponential decay; ignoring the exponential decay this suggests to me that we need 100 times as much mass to be as prominent from 18 billion light years. A galaxy mass is around 10^12 solar masses. So if we spent 2500 solar masses on this each year, it would be at least as prominent a... (read more)

2Donald Hobson4yThe first merger event that Ligo detected was 1 billion ly away and turned 1 solar mass into gravitational waves. 1030kg=1047J at a distance of 109×1016=1025 m so energy flux received is approx 1047×(1025)−2=10−3J/m2 The main peak power output from the merging black holes lasted around one second. A full moon illuminates earth with around 10−3W. So even if the aliens are great at making gravitational waves, they aren't a good way to communicate. If they send a gravitational wave signal just powerful enough for us to detect with our most sensitive instruments, with the same power as light they could outshine the moon. Light is just more easily detected.
1Buck4yMy main concern with this is the same as the problem listed on Wei Dai's answer: whether a star near us is likely to block out this light. The sun is about 10^9m across. A star that's 10 thousand light years away (this is 10% of the diameter of the Milky Way) occupies about (1e9m / (10000 lightyears * 2 * pi))**2 = 10^-24 of the night sky. A galaxy that's 20 billion light years away occupies something like (100000 lightyears / 20 billion lightyears) ** 2 ~= 2.5e-11. So galaxies occupy more space than stars. So it would be weird if individual stars blocked out a whole galaxy.
1Buck4yAnother piece of idea: If you're extremely techno-optimistic, then I think it would be better to emit light at weird wavelengths than to just emit a lot of light. Eg emitting light at two wavelengths with ratio pi or something. This seems much more unmistakably intelligence-caused than an extremely bright light.