Wiki Contributions


I wrote a what I believe to be simpler explanation of this post here. Things I tried to do differently: 

  1. More clearly explaining what Nash equilibrium means for infinitely repeated games -- it's a little subtle, and if you go into it just with intuition, it's not clear why the "everyone puts 99" situation can be a Nash equilibrium 
  2. Noting that just because something is a Nash equilibrium doesn't mean it's what the game is going to converge to 
  3. Less emphasis on minimax stuff (it's just boilerplate, not really the main point of folk theorems) 

The strategy profile I describe is where each person has the following strategy (call it "Strategy A"): 

  • If empty history, play 99 
  • If history consists only of 99s from all other people, play 99 
  • If any other player's history contains a choice which is not 99, play 100 

The strategy profile you are describing is the following (call it "Strategy B"): 

  • If empty history, play 99
  • If history consists only of 99s from all other people, play 99 
  • If any other player's history contains a choice which is not 99, play 30 

I agree Strategy B weakly dominates Strategy A. However, saying "everyone playing Strategy A forms a Nash equilibrium" just means that no player has a profitable deviation assuming everyone else continues to play Strategy A. Strategy B isn't a profitable deviation -- if you switch to Strategy B and everyone else is playing Strategy A, everyone will still just play 99 for all eternity. 

The general name for these kinds of strategies is grim trigger.

I'm not sure what the author intended, but my best guess is they wanted to say "punishment is bad because there exist really bad equilibria which use punishment, by folk theorems". Some evidence from the post (emphasis mine): 

Rowan: "If we succeed in making aligned AGI, we should punish those who committed cosmic crimes that decreased the chance of an positive singularity sufficiently."

Neal: "Punishment seems like a bad idea. It's pessimizing another agent's utility function. You could get a pretty bad equilibrium if you're saying agents should be intentionally harming each others' interests, even in restricted cases."


Rowan: "Well, I'll ponder this. You may have convinced me of the futility of punishment, and the desirability of mercy, with your... hell simulation. That's... wholesome in its own way, even if it's horrifying, and ethically questionable."

Folk theorems guarantee the existence of equilibria for both good (31) and bad (99) payoffs for players, both via punishment. For this reason I view them as neutral: they say lots of equilibria exist, but not which ones are going to happen. 

I guess if you are super concerned about bad equilibria, then you could take a stance against punishment, because then it would be harder/impossible for the everyone-plays-99 equilibrium to form. This could have been the original point of the post but I am not sure. 

I have 2 separate claims:

  1. Any researcher, inside or outside of academia, might consider emulating attributes successful professors have in order to boost personal research productivity. 
  2. AI safety researchers outside of academia should try harder to make their legible to academics, as a cheap way to get more good researchers thinking about AI safety. 

What I'm questioning is the implicit assumption in your post that AI safety research will inevitably take place in an academic environment [...]

This assumption is not implicit, you're putting together (1) and (2) in a way which I did not intend. 

Furthermore, in a corporate environment, limiting one's networking to just researchers is probably ill advised, given that there are many other people who would have influence upon the research. Knowing a senior executive with influence over product roadmaps could be just as valuable, even if that executive has no academic pedigree at all.

I agree but this is not a counterargument against my post. This is just an incredibly reasonable interpretation of what it means to be "good at networking" for a industry researcher. 

But 80/20-ing teaching? In a corporate research lab, one has no teaching responsibilities. One would be far better served learning some basic software engineering practices, in order to better interface with product engineers.

My post is not literally recommending that non-academics 80/20 their teaching. I am confused why you think that I would think this. 80/20ing teaching is an example of how professors allocate their time to what's important. Professors are being used as a case study in the post. When applied to an AI safety researcher who works independently or as part of an industry lab, perhaps "teaching" might be replaced with "responding to cold emails" or "supervising an intern". I acknowledge that professors spend more time teaching than non-academic researchers spend doing these tasks. But once again, the point of this post is just to list a bunch of things successful professors do, and then non-professors are meant to consider these points and adapt the advice to their own environment. 

Similarly, with regards to publishing, for a corporate research lab, having a working product is worth dozens of research papers. Research papers bring prestige, but they don't pay the bills. Therefore, I would argue that AI safety researchers should be keeping an eye on how their findings can be applied to existing AI systems. This kind of product-focused development is something that academia is notoriously bad at.

This seems like a crux. It seems like I am more optimistic about leveraging academic labor and expertise, and you are more optimistic about deploying AI safety solutions to existing systems. 

I also question your claim that academic bureaucracy doesn't slow good researchers down very much. That's very much not in line with what anecdotes I've heard. [...]

This is another crux. We both have heard different anecdotal evidence and are weighing it differently. 

I don't think it's inevitable that academia will take over AI safety research, given the trend in AI capabilities research, and I certainly don't think that academia taking over AI safety research would be a good thing.

I never said that academia would take over AI safety research, and I also never said this would be a good thing. I believe that there is a lot of untapped free skilled labor in academia, and AI safety researchers should put in more of an effort (e.g. by writing papers) to put that labor to use. 

For this reason I question whether it's valuable for AI safety researchers to develop skills valuable for academic research, specifically, as opposed to general time management, software engineering and product development skills.

One of the attributes I list is literally time management. As for the other two, I think it depends on the kind of AI safety researcher we are talking about -- going directly back to our "leveraging academia" versus "product development" crux. I agree that if what you're trying to do is product development, that the skills you list are critical. But also, I think product development is not at all the only way to do AI safety, and other ways to do AI safety more easily plug into academia. 

There are lots of ways a researcher can choose to adopt new productivity habits. They include:

  1. Inside view, reasoning from first principles 
  2. Outside view, copying what successful researchers do

The purpose of this post is to, from an outside view perspective, list what a class of researchers (professors) does, which happens to operate very differently from AI safety.

Once again, I am not claiming to have an inside view argument in favor of the adoption of each of these attributes. I do not have empirics. I am not claiming to have an airtight causal model. If you will refer back to the original post, you will notice that I was careful to call this a list of attributes coming from anecdotal evidence, and if you will refer back to the AI safety section, you will notice that I was careful to call my points considerations and not conclusions. 

You keep arguing against a claim which I've never put forward, which is something like "The bullshit in academia (publish or perish, positive results give better papers) causes better research to happen." Of course I disagree with this claim. There is no need to waste ink arguing against it. 

It seems like the actual crux we disagree on is: "How similar are the goals success in academia with success in doing good (AI safety) research?" If I had to guess the source of our disagreement, I might speculate that we've both heard the same stories about the replication crisis, the inefficiencies of grant proposals and peer review, and other bullshit in academia. But, I've additionally encountered a great deal of anecdotal evidence indicating: in spite of all this bullshit, the people at the top seem to overwhelmingly not be bogged down by it, and the first-order factor in them getting where they are was in fact research quality. The way to convince you of this fact might be to repeat the methodology used in Childhoods of exceptional people, but this would be incredibly time consuming. (I'll give you 1/20th of such a blog post for free: here's Terry Tao on time management.) 

This crux clears up our correlation vs causation disagreement: since I think the goals are very similar, correlation is evidence for causation, whereas since you think the goals are very different, it seems like you think many of the attributes I've listed are primarily relevant for the 'navigating academic bullshit' part of academia. 

I've addressed your comment in broad terms, but just to conclude I wanted to respond to one point you made which seems especially wrong. 

how did e.g. networking [...] enable them to get to these [impressive research] findings?

In the networking section, you will find that I defined "networking" as "knowing many people doing research in and outside your field, so that you can easily reach out to them to request a collaboration". People are more likely to respond to collaboration requests from acquaintances than from strangers. Thus for this particular attribute you actually do get a causal model: networking causes collaborations which causes better research results. I guess you can dispute the claim "collaborations cause better research results", but I think this would be an odd hill to die on, considering most interdisciplinary work relies on collaborations. 

Overall, it seems like your argument is that AI safety researchers should behave more like traditional academia for a bunch reasons that have mostly to do with social prestige.


That is not what I am saying. I am saying that successful professors are highly successful researchers, that they share many qualities (most of which by the way have nothing to do with social prestige), and that AI safety researchers might consider emulating these qualities. 

Furthermore, I would note that traditional academia has been moving away from these practices, to a certain extent. During the early days of the COVID pandemic, quite a lot of information was exchanged not as formal peer-reviewed research papers, but as blog posts, Twitter threads, and preprints. In AI capabilities research, many new advances are announced as blog posts first, even if they might be formalized in a reseach paper later. [...]

This is a non sequitur. I'm not saying stop the blog posts. In fact, I am claiming that "selling your work" is a good thing. Therefore I also think blog posts are fine. When I write about the importance of a good abstract/introduction, I mean not just literally in the context of a NeurIPS paper but also more broadly in the context of motivating ones' work better, so that a broader scientific audience can read your work and want to build off it. (But also separately I do think people should eventually turn good blog posts into papers for wider reach)

I think AI safety research is at this early stage of maturity

I disagree. Non-EA funding for safety is pouring in. Safety is being talked about in mainstream venues. Also more academic papers popping up, as linked in my post. In terms of progress on aligning AI I agree the field is in its early stages, but in terms of the size of the field and institutions built up around it, nothing about AI safety feels early stage to me anymore. 

How does one determine this?

I am confused by your repeated focus on empirics, when I have been very up front that this is a qualitative, anecdotal, personal analysis. 

However, I'd still like to know where you're drawing these observations from? Is it personal observation?


Yes, personal observation, across quite a few US institutions. 

And if so, how have you determined whether a professor is successful or not?

One crude way of doing it is saying that a professor is successful if they are a professor at a top 10-ish university. Academia is hypercompetitive so this is a good filter. Additionally my personal observations are skewed toward people who I think do good research, so additionally "successful" here means "does research which electroswing thinks is good". 

Is there a study that correlates academic impact across these traits?

I haven't looked for one. A lot of them seem tough to measure, hence my qualitative analysis here. 

In my experience, successful professors are often significantly better at the skills I've listed than similarly intelligent people who are not successful professors. My internal model is that this is because aptitude in these skills is necessary to survive academia, so anybody who doesn't make the cut never becomes a successful professor in the first place.

Specifically I think professors are at least +2σ at "hedgehog-y" and "selling work" compared to similarly intelligent people who are not successful professors, and more like +σ at the other skills. 

You can imagine a post "Attributes of successful athletes", where the author knows a bunch of top athletes, and finds shared traits in which the athletes are +2σ or +σ  such as 1) good sleep hygiene, 2) always does warm ups, 3) almost never eats junk food, 4) has a good sports doctor and so on. Even in the absence of a proper causal study, the average person who wants to improve fitness can look at this list and think: "Hmm (4) seems only relevant for professionals, but (1) and (3) seem like they probably have a strong causal effect and (2) seems plausible but hard to tell." 

I think the obvious answer here is AutoPay -- this should hedge against situations you are describing. 

The costs of making a mistake are certainly high, since it's a permanent hit to your credit report. I am not super knowledgeable of how late payments affect credit score (other than that it has a negative sign), this is an interesting question.

Hmmm...the orthogonality thesis is pretty simple to state, so I don't think necessarily that it has been grossly misunderstood. The bad reasoning in Fallacy 4 seems to come from a more general phenomenon with classic AI Safety arguments, where they do hold up, but only with some caveats and/or more precise phrasing. So I guess "bad coverage" could apply to the extent that popular sources don't go in depth enough. 

I do think the author presented good summaries of Bostrom's and Russell's viewpoints. But then they immediately jump to a "special sauce" type argument. (Quoting the full thing just in case)

The thought experiments proposed by Bostrom and Russell seem to assume that an AI system could be“superintelligent” without any basic humanlike common sense, yet while seamlessly preserving the speed, precision and programmability of a computer. But these speculations about superhuman AI are plagued by flawed intuitions about the nature of intelligence. Nothing in our knowledge of psychology or neuroscience supports the possibility that “pure rationality” is separable from the emotions and cultural biases that shape our cognition and our objectives. Instead, what we’ve learned from research in embodied cognition is that human intelligence seems to be a strongly integrated system with closely interconnected attributes, include emotions, desires, a strong sense of self hood and autonomy, and a commonsense understanding of the world. It’s not at all clear that these attributes can be separated.

I really don't understand where the author is coming from with this. I will admit that the classic paperclip maximizer example is pretty far-fetched, and maybe not the best way to explain the orthogonality thesis to a skeptic. I prefer more down-to-earth examples like, say, a chess bot with plenty of compute to look ahead, but its goal is to protect its pawns at all costs instead of its king. It will pursue its goal intelligently but the goal is silly to us, if what we want is for it to be a good chess player. 

I feel like the author's counterargument would make more sense if they framed it as an outer alignment objection like "it's exceedingly difficult to make an AI whose goal is to maximize paperclips unboundedly, with no other human values baked in, because the training data is made by humans". And maybe this is also what their intuition was, and they just picked on the orthogonality thesis since it's connected to the paperclip maximize example and easy to state. Hard to tell. 

It would be nice if AI Safety were less disorganized, and had a textbook or something. Then, a researcher would have a hard time learning about the orthogonality thesis without also hearing a refutation of this common objection. But a textbook seems a long way away...

I mean...sure...but again, this does not affect the validity of my counterargument. Like I said, I'm using as strong as possible of a counterargument by saying that even if the non-brain parts of the body were to add 2-100x computing power, this would not restrict our ability to scale up NNs to get human-level cognition. Obviously this still holds if we replace "2-100x" with "1x". 

The advantage of "2-100x" is that it is extraordinarily charitable to the "embodied cognition" theory—if (and I consider this to be extremely low probability) embodied cognition does turn out to be highly true in some strong sense, then "2-100x" takes care of this in a way that "~1x" does not. And I may as well be extraordinarily charitable to the embodied cognition theory, since "Bitter lesson" type reasoning is independent of its veracity. 

Load More