All of Stefan_Schubert's Comments + Replies

Yeah, I think so. But since those people generally find AI less important (there's both less of an upside and less of a downside) they generally participate less in the debate. Hence there's a bit of a selection effect hiding those people.

There are some people who arguably are in that corner who do participate in the debate, though - e.g. Robin Hanson. (He thinks some sort of AI will eventually be enormously important, but that the near-term effects, while significant, will not be at the level people on the right side think).

Looking at the 2x2 I posted I w... (read more)

1Adam David Long7mo
Thanks. I think this is useful and I'm trying to think through who is in the upper left hand corner. Are there "AI researchers" or, more broadly, people who are part of the public conversation who believe (1) AI isn't moving all that fast towards AGI and (2) that it's not that risky?  I guess my initial reaction is that people in the upper  left hand corner just generally think "AI is kind of not that big a deal" and that there are other societal problems to worry about. does that sound right? Any thoughts on who should be placed in the upper left? 
Maybe something like "mundane-ist" would be better. The "realists" are people who think that AI is fundamentally "mundane" and that the safety concerns with AI are basically the same as safety concerns with any new technology (increases inequality by making the powerful more powerful, etc.) But of course "mundane-ist" isn't a real word, which is a bit of a problem.
2Adam David Long7mo
Thanks. To be honest, I am still wrestling with the right term to use for this group. I came up with "realist" and "pragmatist" as the "least bad" options after searching for a term that meets the following criteria: 1. short, ideally one word 2. conveys the idea of prioritizing (a) current or near-term harms over (b) far-term consequences 3. minimizes the risk that someone would be offended if the label were applied to them I also tried playing around with an acronym like SAFEr for "Skeptical, Accountable, Fair, Ethical" but couldn't figure out an acronym that I liked.  Would very much appreciate feedback or suggestions on a better term. FWIW, I am trying to steelman the position but not pre-judge the overall debate. 


I think psychologists-scientists should have unusually good imaginations about the potential inner workings of other minds, which many ML engineers probably lack.

That's not clear to me, given that AI systems are so unlike human minds. 

tell your fellow psychologist (or zoopsychologist) about this, maybe they will be incentivised to make a switch and do some ground-laying work in the field of AI psychology

Do you believe that (conventional) psychologists would be especially good at what you call AI psychology, and if so, why? I guess other skills (e.g. knowledge of AI systems) could be important.

1Roman Leventov1y
I talked about psychologists-scientists, not psychologists-therapists. I think psychologists-scientists should have unusually good imaginations about the potential inner workings of other minds, which many ML engineers probably lack. I think it's in principle possible for psychologists-scientists to understand all mech. interpretability papers in ML that are being published on the necessary level of detail. Developing the imaginations about inner workings of other minds in ML engineers could be harder. That being said, as de-facto the only scientifically grounded "part" of psychology has converged with neuroscience as neuropsychology, "AI psychology" shouldn't probably be a wholly separate field from the beginning, but rather a research sub-methodology within the larger field of "interpretability".

I think that could be valuable.

It might be worth testing quite carefully for robustness - to ask multiple different questions probing the same issue, and see whether responses converge. My sense is that people's stated opinions about risks from artificial intelligence, and existential risks more generally, could vary substantially depending on framing. Most haven't thought a lot about these issues, which likely contributes. I think a problem problem with some studies on these issues is that researchers over-generalise from highly framing-dependent survey responses.

That makes a lot of sense. We can definitely test a lot of different framings. I think the problem with a lot of these kinds of problems is that they are low saliency, and thus people tend not to have opinions already, and thus they tend to generate an opinion on the spot. We have a lot of experience polling on low saliency issues though because we've done a lot of polling on animal farming policy which has similar framing effects.

I wrote an extended comment in a blog post


Summing up, I disagree with Hobbhahn on three points.

  1. I think the public would be more worried about harm that AI systems cause than he assumes.
  2. I think that economic incentives aren’t quite as powerful as he thinks they are, and I think that governments are relatively stronger than he thinks.
  3. He argues that governments’ response will be very misdirected, and I don’t quite buy his arguments.

Note that 1 and 2/3 seem quite different: 1 is about how much people will worry about AI harms, whereas 2 and 3 ar

... (read more)

Another way to frame this, then, is that "For any choice of AI difficulty, faster pre-takeoff growth rates imply shorter timelines."

I agree. Notably, that sounds more like a conceptual and almost trivial claim.

I think that the original claims sound deeper than they are because they slide between a true but trivial interpretation and a non-trivial interpretation that may not be generally true.


My argument involved scenarios with fast take-off and short time-lines. There is a clarificatory part of the post that discusses the converse case, of a gradual take-off and long time-lines:

Is it inconsistent, then, to think both that take-off will be gradual and timelines will be long? No – people who hold this view probably do so because they think that marginal improvements in AI capabilities are hard. This belief implies both a gradual take-off and long timelines.

Maybe a related clarification could be made about the fast take-off/short time-line... (read more)

1Francis Rhys Ward2y
I agree with Rohin's comment above. Right. I guess the view here is that "The threshold level of capabilities needed for explosive growth is very low." Which would imply that we hit explosive growth before AIs are useful enough to be integrated into the economy, i.e. sudden take-off.   If "marginal improvements in AI capabilities are hard" then we must have a gradual take-off and timelines are probably "long" by the community's standards. In such a world, you simply can't have a sudden take-off, so a gradual take-off still happens on shorter timelines than a sudden take-off (i.e. sooner than never). I realise I have used two different meanings of "long timelines" 1) "long" by people's standards; 2) "longer" than in the counterfactual take-off scenario. Sorry for the confusion!  

For every choice of AGI difficulty, conditioning on gradual take-off implies shorter timelines.

What would you say about the following argument?

  • Suppose that we get AGI tomorrow because of a fast take-off. If so timelines will be extremely short.
  • If we instead suppose that take-off will be gradual, then it seems impossible for timelines to be that short.
  • So in this scenario - this choice of AGI difficulty - conditioning on gradual take-off doesn't seem to imply shorter timelines.
  • So that's a counterexample to the claim that for every choice of AGI difficulty, c
... (read more)
9Rohin Shah2y
Those were two different scenarios with two different amounts of AGI difficulty! In the first scenario, we have enough knowledge to build AGI today; in the second we don't have enough knowledge to build AGI today (and that is part of why the takeoff will be gradual).

Holden Karnofsky defends this view in his latest blog post.

I think it’s too quick to think of technological unemployment as the next problem we’ll be dealing with, and wilder issues as being much further down the line. By the time (or even before) we have AI that can truly replace every facet of what low-skill humans do, the “wild sci-fi” AI impacts could be the bigger concern.

A related view is that less advanced/more narrow AI will do be able to do a fair number of tasks, but not enough to create widespread technological unemployment until very late, when very advanced AI quite quickly causes lots of people to be unemployed.

One consideration is how long time it will take for people to actually start using new AI systems (it tends to take some time for new technologies to be widely used). I think that some have speculated that that time lag may be shortened as AI become more advanced (as AI becomes involved in the deployment of other AI systems).

Holden Karnofsky defends this view in his latest blog post.

Scott Alexander has written an in-depth article about Hreha's article:

The article itself mostly just urges behavioral economists to do better, which is always good advice for everyone. But as usual, it’s the inflammatory title that’s gone viral. I think a strong interpretation of behavioral economics as dead or debunked is unjustified.

See also Alex Imas's and Chris Blattman's criticisms of Hreha (on Twitter).

I think that though there's been a welcome surge of interest in conceptual engineering in recent years, the basic idea has been around for quite some time (though under different names). In particular, Carnap argued that we should "explicate" rather than "analyse" concepts already in the 1940s and 1950s. In other words, we shouldn't just try to explain the meaning of pre-existing concepts, but should develop new and more useful concepts that partially replace the old concepts.

Carnap’s understanding of explication was influenced by Karl Menger’s conception

... (read more)
1Suspended Reason2y
I've heard similar things about Carnap! Have had some of his writing in a to-read pile for ages now.
Probably meant to be this: "Scope insensitivity: The limits of intuitive valuation of human lives in public policy", Dickert et al.

Potentially relevant new paper:

The logic of universalization guides moral judgment
To explain why an action is wrong, we sometimes say: “What if everybody did that?” In other words, even if a single person’s behavior is harmless, that behavior may be wrong if it would be harmful once universalized. We formalize the process of universalization in a computational model, test its quantitative predictions in studies of human moral judgment, and distinguish it from alternative models. We show that adults spontaneously make moral judgments
... (read more)

A new paper may give some support to arguments in this post:

The smart intuitor: Cognitive capacity predicts intuitive rather than deliberate thinking
Cognitive capacity is commonly assumed to predict performance in classic reasoning tasks because people higher in cognitive capacity are believed to be better at deliberately correcting biasing erroneous intuitions. However, recent findings suggest that there can also be a positive correlation between cognitive capacity and correct intuitive thinking. Here we present results from 2 studies that directly con
... (read more)

An economist friend said in a discussion about sleepwalk bias 9 March:

In the case of COVID, this led me to think that there will not be that much mortality in most rich countries, but only due to drastic measures.

The rest of the discussion may also be of interest; e.g. note his comment that "in economics, I think we often err on the other side -- people fully incorporate the future in many models."

I agree people often underestimate policy and behavioural responses to disaster. I called this "sleepwalk bias" - the tacit assumption that people will sleepwalk into disaster to a greater extent than is plausible.

Jon Elster talks about "the younger sibling syndrome":

A French philosopher, Maurice Merleau-Ponty, said that our spontaneous tendency is to view other people as ‘‘younger siblings.’’ We do not easily impute to others the same capacity for deliberation and reflection that introspection tells us that
... (read more)
5Sammy Martin4y
From reading your post - the sleepwalk bias does seem to be the mirror-image of the Morituri Nolumus Mori effect; that we tend to systematically underweight strong, late reactions. One difference is that I was thinking of both individual and policy responses whilst your post focusses on policy, but that's in large part because most of the low-frequency high-damage risks we commonly talk[ed] about are X-risks that can be dealt with only at the level of policy. I also note that I got at a few of the same factors as you that might affect the strength of such a reaction: The speed issue I discussed in conclusions and I obliquely referred to the salience issue in talking about 'ability to understand consensus reality' and that we have pre-existing instincts around purity and disgust that would help a response to something like a pandemic. The presence of free-rider problems I didn't discuss. How the speed/level of difficulty interacts with the response I did mention - talking about the hypotheticals where R0 was 2 or 8, for example. Those differences aside, it seems like we got at the same phenomenon independently. I'm curious about whether you made any advance predictions about likely outcomes based on your understanding of the 'sleepwalk bias'. I made a light suggestion that things might go better than expected in mid-March, but I can't really call it a prediction. The first time I explicitly said 'we were wrong' was when a lot of evidence had already come in - in April.

Thanks, Lukas. I only saw this now. I made a more substantive comment elsewhere in this thread. Lodi is not a village, it's a province with 230K inhabitants, as are Cremona (360K) and Bergamo (1.11M). (Though note that all these names are also names of the central town in these provinces.)

In the province of Lodi (part of Lombardy), 388 people were reported to have died of Covid-19 on 27 March. Lodi has a population of 230,000, meaning that 0.17% of _the population_ of Lodi has died. Given that everyone hardly has been infected, IFR must be higher.

The same source reports that in the province of Cremona (also part of Lombardy), 455 people had died of Covid-19 on 27 March. Cremona has a population of 360,000, meaning that 0.126% of the population of Cremona has died, according to official data.

Note also that there are reports of substantial un... (read more)

These numbers support my suspicion that >10% of North Italy has already been infected, with a death rate of ~1%.

Here is a new empirical paper on folk conceptions of rationality and reasonableness:

Normative theories of judgment either focus on rationality (decontextualized preference maximization) or reasonableness (pragmatic balance of preferences and socially conscious norms). Despite centuries of work on these concepts, a critical question appears overlooked: How do people’s intuitions and behavior align with the concepts of rationality from game theory and reasonableness from legal scholarship? We show that laypeople view rationality as abstract and prefer
... (read more)

Thanks, this is interesting. I'm trying to understand your ideas. Please let me know if I represent them correctly.

It seems to me that at the start, you're saying:

1. People often have strong selfish preferences and weak altruistic preferences.

2. There are many situations where people could gain more utility through engaging in moral agreements or moral trade - where everyone promises to take some altruistic action conditional on everyone else doing the same. That is because the altruistic utility they gain more than makes up for the selfish util... (read more)

I think this is a kind of question where our intuitions are quite weak and we need empirical studies to know. It is very easy to get annoyed with poor epistemics and to conclude, in exasperation, that things must have got worse. But since people normally don't remember or know well what things were like 30 years ago or so, we can't really trust those conclusions.

One way to test this would be to fact-check and argument-check (cf. ) opinion pieces and election debates from

... (read more)
How about a book that has a whole bunch of other scenarios, one of which is AI risk which takes one chapter out of 20, and 19 other chapters on other scenarios?

It would be interesting if you went into more detail on how long-termists should allocate their resources at some point; what proportion of resources should go into which scenarios, etc. (I know that you've written a bit on such themes.)

Unrelatedly, it would be interesting to see some research on the supposed "crying wolf effect"; maybe with regards to other risks. I'm not sure that effect is as strong as one might think at first glance.

That was also probably my main question when listening to this interview. I also found it interesting to hear that statement you quoted now that The Precipice has been released, and now that there are two more books on the horizon (by MacAskill and Sandberg) that I believe are meant to be broadly on longtermism but not specifically on AI. The Precipice has 8 chapters, with roughly a quarter of 1 chapter specifically on AI, and a bunch of other scenarios discussed, so it seems quite close to what Hanson was discussing. Perhaps at least parts of the longtermist community have shifted (were already shifting?) more towards the sort of allocation of attention/resources that Hanson was envisioning. I share the view that research on the supposed "crying wolf effect" would be quite interesting. I think its results have direct implications for longtermist/EA/x-risk strategy and communication.

Associate professor, not assistant professor.

One of those concepts is the idea that we evolved to "punish the non-punishers", in order to ensure the costs of social punishment are shared by everyone.

Before thinking of how to present this idea, I would study carefully whether it's true. I understand there is some disagreement regarding the origins of third-party punishment. There is a big literature on this. I won't discuss it in detail, but here are some examples of perspectives which deviate from that taken in the quoted passage.

Joe Henrich writes:

This only makes sense as cul
... (read more)
I'm probably referring to the idea in a much narrower context, specifically our inclination to express outrage (or even just mild disapproval) as a form of low-cost, low-risk social punishment, and for that inclination to apply just as well to people who appear insufficiently disapproving or outraged. The targets of this inclination may vary culturally, and it might be an artifact or side-effect of the hardware, but I'd be surprised if there were societies where nothing was ever a subject that people disapproved of other people not being disapproving of. Disapproving of the same things is a big part of what draws societies together in the first place, so failing to disapprove of the common enemy seems like something that automatically makes you "probably the enemy". (But my reasons and evidence for thinking this way will probably be clearer in the actual article, as it's about patterns of motivated reasoning that seem to reliably pop up in certain circumstances... but then again my examples are not terribly diverse, culturally speaking.)
A different Cosmides-and-Tooby (and Michael E. Price) take:

A recent paper developed a statistical model for predicting whether papers would replicate.

We have derived an automated, data-driven method for predicting replicability of experiments. The method uses machine learning to discover which features of studies predict the strength of actual replications. Even with our fairly small data set, the model can forecast replication results with substantial accuracy — around 70%. Predictive accuracy is sensitive to the variables that are used, in interesting ways. The statistical features (p-value and effect siz
... (read more)

One aspect may be that the issues we discuss and try to solve are often at the limit of human capabilities. Some people are way better at solving them than others, and since those issues are so often in the spotlight, it looks like the less able are totally incompetent. But actually, they're not; it's just that the issues they are able to solve aren't discussed.


On first blush this looks like a success story, but it’s not. I was only able to catch the mistake because I had a bunch of background knowledge about the state of the world. If I didn’t already know mid-millenium China was better than Europe at almost everything (and I remember a time when I didn’t), I could easily have drawn the wrong conclusion about that claim. And following a procedure that would catch issues like this every time would take much more time than ESCs currently get.

Re this particular point, I guess one thing you mig... (read more)

First, let me say I think that would be interesting to experiment with. But the reasons to be dubious are more interesting, so I'm going to spend more time on those. This can definitely rule people out. I don't think it can totally rule people in, because there's always a risk someone made a sound argument based on faulty assumptions. In fact this is a large, sticky genre that I'm very worried about But assuming that was solved, there's something I find harder to express that might be at the core of why I'm doing this... I don't want to collect a bunch of other people's arguments I can apply as tools, and be confused if two of them conflict. I want a gears-level model of the world such that, if I was left with amnesia on an intellectual deserted island, I could re-derive my beliefs. Argument-checking as I conceive of it now more does the former. I can't explain why, exactly what I'm picturing when I say argument checking or what kind if amnesia I mean, but there's something there. My primary interest with argument-checking would be to find a way to engage with arguments in a way that develops that amnesia-proof knowledge.
This is a good point. I think the epistemic ability to predict and evaluate arguments independently of the truth of the conclusion is something we want to heavily select for and reward, see e.g. Eliezer's writing on that here. If Elizabeth is interested, I'm definitely interested in funding and experimenting with prediction markets on argument validity for the next round of amplifying epistemic spot checks.

Thanks for this. In principle, you could use KBCs for any kind of evaluation, including evaluation of products, texts (essay grading, application letters, life plans, etc), pictures (which of my pictures is the best?), etc. The judicial system is very high-stakes and probably highly resistant to reform, whereas some of the contexts I list are much lower stakes. It might be better to try out KBCs in such a low-stakes context (I'm not sure which one would be best). I don't know what extent KBCs have tested for these kinds of purposes (it was some t... (read more)

There is a substantial philosophical literature on Occam's Razor and related issues:

Yes, a new paper confirms this.

The association between quality measures of medical university press releases and their corresponding news stories—Important information missing

Agreed; those are important considerations. In general, I think a risk for rationalists is to change one's behaviour on complex and important matters based on individual arguments which, while they appear plausible, don't give the full picture. Cf Chesterton's fence, naive rationalism, etc.

This was already posted a few links down.

One interesting aspect of posts like this is that they can, to some extent, be (felicitously) self-defeating.

As Bastian Stern has pointed out to me, people often mix up pro tanto-considerations with all-things-considered-judgements - usually by interpreting what is merely intended to be a pro tanto-consideration as an all-things-considered judgement. Is there a name for this fallacy? It seems both dangerous and common so should have a name.

Thanks Ryan, that's helpful. Yes, I'm not sure one would be able to do something that has the right combination of accuracy, interestingness and low-cost at present.

Sure, I guess my question was whether you'd think that it'd be possible to do this in a way that would resonate with readers. Would they find the estimates of quality, or level of postmodernism, intuitively plausible?

My hunch was that the classification would primarily be based on patterns of word use, but you're right that it would probably be fruitful to use at patterns of citations.

If you get a well labelled dataset, I think this is pretty thoroughly within the scope of current machine learning technologies, but that means spending perhaps hundreds of hours labelling papers as a certain amount postmodern out of 100. If you're trying to single out the postmodernism that you're convinced is total BS, then that's more complex. Doable but you need to make the case to me about why it would be worthwhile, and what exactly your aim would be.


[This comment is no longer endorsed by its author]Reply
Something in a related space, is now being used by a few publishers and it is AWESOME. You can rearrange by researcher links (who published with whom), academic area links, citation links, institution, etc.
If you had a million labelled postmodern and non-postmodern papers, you could decently identify them. You could categorise most papers with fewer labels using citation graphs. You can recommend papers, as you would Amazon books with a recommender system (using ratings). There are hundreds of ways to apply machine learning to academic articles; it's a matter of deciding what you want the machine learning to do.

Could machine learning be used to fruitfully classify academic articles?

The word "fruitfully" is doing all the heavy lifting here.

It is, of course, possible to throw an ML algorithm at a corpus of academic articles. Will the results be useful? That entirely depends on what do you consider useful. You will certainly get some results.

Good points. I agree that what you write within parentheses is a potential problem. Indeed, it is a problem for many kinds of far-reaching norms on altruistic behaviour compliance with which is hard to observe: they might handicap conscientious people relative to less conscientious people to such an extent that the norms do more harm than good.

I also agree that individualistic solutions to collective problems have a chequered record. The point of 1)-3) was rather to indicate how you potentially could reduce hedge drift, given that you want to do that. To g... (read more)

Thanks. My claim is somewhat different, though. Adams says that "whenever humanity can see a slow-moving disaster coming, we find a way to avoid it". This is an all-things-considered claim. My claim is rather that sleepwalk bias is a pro-tanto consideration indicating that we're too pessimistic about future disasters (perhaps especially slow-moving ones). I'm not claiming that we never sleepwalk into a disaster. Indeed, there might be stronger countervailing considerations, which if true would mean that all things considered we are too optimistic about existential risk.

It is not quite clear to me whether you are here just talking about instances of sleepwalking, or whether you are also talking about a predictive error indicating anti-sleepwalking bias: i.e. that they wrongly predicted that the relevant actors would act, yet they sleepwalked into a disaster.

Also, my claim is not that sleepwalking never occurs, but that people on average seem to think that it happens more often than it actually does.

Great post. Another issue is why B doesn't believe Y in spite of believing X and in spite of A believing that X implies Y. Some mechanisms:

a) B rejects that X implies Y, for reasons that are good or bad, or somewhere in between. (Last case: reasonable disagreement.)

b) B hasn't even considered whether X implies Y. (Is not logically omniscient.)

c) Y only follows from X given some additional premises Z, which B either rejects (for reasons that are good or bad or somehwere in between) or hasn't entertained. (What Tyrrell McAllister wrote.)

d) B is confused over the meaning of X, and hence is confused over what X implies. (The dialect case.)

I have a maths question. Suppose that we are scoring n individuals on their performance in an area where there is significant uncertainty. We are categorizing them into a low number of categories, say 4. Effectively we're thereby saying that for the purposes of our scoring, everyone with the same score performs equally well. Suppose that we say that this means that all individuals with that score get assigned the mean actual performance of the individuals with that that score. For instance, if there were three people who got the highest score, and their pe... (read more)

Your problem is called a clustering problem. First of all, you need to answer how you measure your error (information loss, as you call it). Typical error norms used are l1 (sum of individual errors), l2 (sum of squares of errors, penalizes larger errors more) and l-infinity (maximum error).

Once you select a norm, there always exists a partition that minimizes your error, and to find it there are a bunch of heuristic algorithms, e.g. k-means clustering. Luckily, since your data is one-dimensional and you have very few categories, you can just brute force it (for 4 categories you need to correctly place 3 boundaries, and naively trying all possible positions takes only n^3 runtime)

Hope this helps.

If I'm understanding this correctly, it sounds like you're performing k-means clustering.
You'd minimize information loss by giving the actual scores. The argument is 'grading on the curve' vs 'ABCDEF'. The first way is fair, but it promotes extreme competition to be in the top 1% (or 'The Senior Wrangler', as we used to call it), which may not be desirable. The second way hands out random bonuses and penalties to individuals near the arbitrary boundaries. I was in the top 25% of my year in terms of marks, I believe. I was a 'Senior Optime', or 'got a second'. A class that stretched from around 25%-75%. Not bitter, or anything.

Great comment. Thanks!

Basically, rapid communication gives people too much choice. They choose things comfortably similar to what they know. Isolation is needed to allow new things to gain an audience before they're stomped out by the dominant things.

This is an interesting idea, reminiscent of, e.g. Lakatos's view of the philosophy of science. He argued that we shouldn't let new theories be discarded too quickly, just because they seem to have some things going against them. Only if their main tenets prove to be unfeasible should we discard them.

I thi... (read more)

I agree with all of this. The upshot seems to be that its important that those who actually have good ideas achieve high status.

If we could solve this, the other problems would be solved soon by people with high status. :D
Load More