All of MaxRa's Comments + Replies

I just assumed this works without even questioning it. ^^ Can you explain more concretely what you did? When I simulate this in my mind, I'm able to pull the rope sideways as long as there are less than three people pulling the rope at each side. They are also not allowed to counteract my pulling more than they would by default without my pulling, right?

9Daniel Kokotajlo7mo
We had two teams of 6 people (so 12 total) do a tug o war, and found that one team seemed slightly stronger than the other, but not strong enough to win immediately, it looked like there was at least 5-10 seconds of stasis. Then we had a 13th person stand in the middle and pull the rope so as to help the weaker team. We didn't go long enough to actually conclude the game, but it was looking like this made the difference between victory and defeat -- the previously-weaker side now seemed stronger thanks to the additional person. Then we did one last game, and this time the 13th person pulled sideways. They were able to cause the rope to bend a little bit, but only a little bit, and after a while they gave up. They weren't able to make the rope/lines-of-people shift sideways to any noticeable degree, much less make them all lose their balance and fall over as several people suspected would happen.
I think the reason it doesn't work is because a tug of war is not so much about the force vectors being added together, (if it was then pulling sideways would be effective). I think it is more about which side's members are lighter or have worse shoes, and therefore slip. If you have 1 person pulling sideways (+y direction), and another 5 each pulling in the +x and -x directions respectively, then (ignoring the x direction), we have a force (it doesn't matter who is exerting the force) pulling 1 person in the -y direction and 10 in the +y direction. Which group is going to slide first? (The 1 person I think). And when they do you just have the other 10 not having moved (the static friction was never overcome), and 1 person who has moved closer to the rope/everyone else, but has not moved them at all.

Thanks for you work on this! I eat vegan since ~9 years now and also am probably not investing as much time into checking my health as would be optimal, but at least my recent shotgun blood test didn't turn up any issues except being low on HDL cholesterol and I don't have the impression that I'm less healthy than non-vegan peers (but certainly always could feel fitter and more energized than I normally am). Happy to share the test results privately if that'd be useful for you, have one from ~7 years ago and one from this year.

The report mentioned "harm to the global financial system [and to global supply chains]" somewhere as examples, which I found noteworthy for being very large scale harms and therefore plausibly requiring AI systems that the AI x-risk community is most worried about.

I also stumbled over this sentence.

1) I think even non-obvious issues can get much more research traction than AI safety does today. And I don't even think that catastrophic risks from AI are particularly non-obvious?

2) Not sure how broadly "cause the majority of research" is defined here, but I have some hope we can find ways to turn money into relevant research

Some ideas take many decades to become widely (let alone universally) accepted—famous examples include evolution and plate tectonics.

One example that an AI policy person mentioned in a recent Q&A is "bias in ML" already being fairly much a consensus issue in ML and AI policy. I guess this happened in 5ish years?

I certainly wouldn't say that all correct ideas take decades to become widely accepted. For example, often somebody proves a math theorem, and within months there's an essentially-universal consensus that the theorem is true and the proof is correct.

Still, "bias in ML" is an interesting example. I think that in general, "discovering bias and fighting it" is a thing that everyone feels very good about doing, especially in academia and tech which tend to be politically left-wing. So the deck was stacked in its favor for it to become a popular cause to suppor... (read more)

What do you think about encouraging writers to add TLDRs on top of their posts? TLDRs make the purpose and content immediately clear so readers can decide whether to read on, and it plausibly also helps the writers to be more focused on their key points. (Advice that’s emphasized a lot at Rethink Priorities.)

If your article could be substituted with a TLDR, then just write that in the first place.

Thanks, this was a really useful overview for me. 

I find the idea of the AI Objectives Institute really interesting. I've read their website and watched their kick-off call and would be interested how promising people in the AI Safety space think the general approach is, how much we might be able to learn from it, and how much solutions to the AI alignment problem will resemble a competently regulated competitive market between increasingly extremely competent companies.

I'd really appreciate pointers to previous discussions and papers on this topic, too. 

I am generally skeptical of this as an approach to AI alignment -- it feels like you are shooting yourself in the foot by restricting yourself to only those things that could be implemented in capitalism. Capitalism interventions must treat humans as black boxes; AI alignment has no analogous restriction. Some examples of things you can do in AI alignment but not capitalism:

... (read more)

Sounds really cool! Regarding the 1st and 3rd person models, this reminded my of self-perception theory (from the man Daryl Bem), which states that humans model themselves in the same way we model others, just by observing (our) behavior.

I feel like in the end our theories of how we model ourselves must involve input and feedback from “internal decision process information“, but this seems very tricky to think about. I‘m soo sure I observe my own thoughts and feelings and use that to understand myself. 

Thanks for elaborating!

I guess I would say, any given desire has some range of how strong it can be in different situations, and if you tell me that the very strongest possible air-hunger-related desire is stronger than the very strongest possible social-instinct-related desire, I would say "OK sure, that's plausible." But it doesn't seem particularly relevant to me. The relevant thing to me is how strong the desires are at the particular time that you're making a decision or thinking a thought.

I think that almost captures what I was thinking, only that I ... (read more)

2Steven Byrnes3y
Hmm, when I think "default plan", I think something like "what's the first thing I think to do, based on what's most salient in my mind right now?". So this can be related to the acetylcholine dynamic I mentioned here, where things like itches and annoying car alarms are salient in my mind even if I don't want them to be. Hunger is definitely capable of forcibly pulling attention. But I do also think you can get a similar dynamic from social instincts. Like if someone shouts your name "Hey MaxRa!!", your "default plan" is to immediately pay attention to that person. Or a more pleasant example is: if you're snuggling under the blanket with your significant other, then the associated pleasant feelings are very salient in your mind, and the "default plan" is to remain under the blanket. That acetylcholine dynamic is just one example; there can be other reasons for things to be more or less salient. Like, maybe I'm thinking: "I could go to the party…", but then I immediately think: "…my ex might be at the party and oh geez I don't want to see them and have to talk to them". That's an example where there are social instincts on both sides of the dilemma, but still, the downsides of going to the party (seeing my ex) pop right out immediately to the forefront of my mind when I think of the party, whereas the benefits of going to the party (I'll be really glad I did etc.) are strong but less salient. So the latter can spawn very powerful desires if I'm actively thinking of them, but they're comparatively easy to overlook.

It would be weird for two desires to have a strict hierarchical relationship.

I agree, I didn't mean to imply a strict hierarchical relationship, and I think you don't need a strict relationship to explain at least some part of the asymmetry. You just would need less honorable desires on average having more power over the default, e.g. 

  • taking care of hunger, 
  • thirst, 
  • breath, 
  • looking at aesthetically pleasing things,
  • remove discomforts 


  • taking care of long-term health
  • clean surrounding
  • expressing gratitude

And then we can try to opti... (read more)

2Steven Byrnes3y
I guess I sort of have a different way of thinking about it. On my perspective, if someone takes an action to satisfy a social-related desire at the expense of a food-related desire, then that means social-related desire was the more powerful desire at that particular time. So if Alice in point of fact chooses an action that advances friendship over an action that would satisfy her mild hunger right now, I would say the straightforward thing: "Well, I guess Alice's desire to advance friendship was a more powerful desire for her, at this particular moment, than her desire to satisfy her mild hunger". Or at least, in my mind, this is the straightforward and obvious way to think about it. I guess you would disagree, but I'm not quite sure what you would say instead. What do weak desires look like? Here's an example. I have a very weak desire that, when sitting down, I prefer to put my legs up, other things equal.  I wouldn't even bother to walk across a room to get an ottoman, I don't think about it at all, the only effect of this weak desire on my behavior is that, if I happen to be in a situation where putting my legs up is super easy and convenient and has essentially no costs whatsoever, I'll go ahead and put my legs up. In my model, the mark of a weak desire is that it has very little influence on my thoughts and behaviors. …And in particular, my model does not have a thing where weak desires heroically fight Jason-vs-Goliath battles to overcome stronger desires. See also the thing I wrote about internalizing ego-syntotic desires here, maybe that will help. I guess I would say, any given desire has some range of how strong it can be in different situations, and if you tell me that the very strongest possible air-hunger-related desire is stronger than the very strongest possible social-instinct-related desire, I would say "OK sure, that's plausible." But it doesn't seem particularly relevant to me. The relevant thing to me is how strong the desires are at the

Very interesting. This reminded me of Keith Stanovich's idea of the master rationality motive, which he defines as a desire to integrate higher-order preferences with first-order preferences. He gives an example of wanting to smoke and not wanting to want to smoke, which sounds like you would consider this as two conflicting preferences, health vs. the short-term reward from smoking. His idea how these conflicts are resolved are to have a "decoupled" simulation in which we can simulate adapting our first-order desires (I guess 'wanting to smoke' should rat... (read more)

2Steven Byrnes3y
Thanks! I'm not sure exactly what you mean ... I guess I would say "wanting to smoke" is being in a state where plans-that-will-not-lead-to-smoking get docked a ton of points by the brainstem, plus maybe there's some mechanism that is incessantly forcing high-level attention to be focused on the (unpleasant) sensations related to not-smoking-right-now (see my comment here), and so on. I want to push back on that. Humans are an intensely social species, and an individual's prospects for survival and reproduction are extremely dependent on their winning and maintaining allies, being popular and well-respected, etc. I think this is reflected in our behavior: e.g. among humanity's favorite collective activities are talking about people, thinking about people, talking to people, etc. Social experiences are probably well-represented among the best and worst experiences of most people's lives. I mean, there are lots of non-social species, they just don't do those kinds of things. This book says that if an early human made a lot of enemies, the enemies would probably gang up on that person and kill them. This book is about how practically every thought we think gets distorted by our social instincts, etc. etc. I think I read somewhere that in small tribes, the most charismatic and popular and high-status people are likelier to have multiple wives (for men) and lots of children etc.   It would be weird for two desires to have a strict hierarchical relationship. When given a choice between food and water, sometimes we choose food, sometimes we choose water, depending on our current metabolic state, how much food or water is at stake, etc. It's definitely not the case that "water always automatically trumps food, regardless of context"; that would be weird. So by the same token, if your friend is holding a drink, you probably won't stab them in the back and steal their drink, as you would under a "quenching thirst >> social instincts" model. But if you're super-duper-despe

But don't you share the impression that with increased wealth humans generally care more about the suffering of others? The story I tell myself is that humans have many basic needs (e.g. food, safety, housing) that historically conflicted with 'higher' desires like self-expression, helping others or improving the world. And with increased wealth, humans relatively universally become more caring. Or maybe more cynically, with increased wealth we can and do invest more resources into signalling that we are caring good reasonable people, i.e. the kinds of peo... (read more)

With increased wealth, humans relatively universally become more caring? Is this why billionaires are always giving up the vast majority of their fortunes to feed the hungry and house the homeless while willingly living on rice and beans?

I don't know how it will all play out in the end. I hope kindness wins and I agree the effect you discuss is real. But it is not obvious that our empathy increases faster than our capacity to do harm. Right now, for each human there are about seven birds/mammals on farms. This is quite the catastrophe. Perhaps that problem will eventually be solved by lab meat. But right now animal product consumption is still going up worldwide. And many worse things can be created and maybe those will endure.

People can be shockingly cruel to their own family. Scott's Who... (read more)

Very cool prompt and list. Does anybody have predictions on the level of international conflict about AI topics and the level of "freaking out about AI" in 2040, given the AI improvements that Daniel is sketching out?

Good point relating it to markets. I think I don't understand Acemoglu and Robinson's perspective well enough here, as the relationship between state, society and markets is the biggest questionmark I left the book with. I think A&R don't necessarily only mean individual liberty when talking about power of society, but the general influence of everything that falls in the "civil society" cluster.

I was reminded of the central metaphor of Acemoglu and Robinson's "The Narrow Corridor" as a RAAP candidate:

  • civil society wants to be able to control the government & undermines government if not
  • the government wants to become more powerful
  • successful societies inhabit a narrow corridor in which strengthening governments are strongly coupled with strengthening civil societies


That's an interesting connection to make. I am not familiar with the argument in detail, but at first glance I agree that the RAAP concept is meant to capture some principled institutional relationship between different cross-sections of society. I might disagree with "The Narrow Corridor" argument to the extent that RAAPs are not meant to prioritize or safeguard individual liberty so much as articulate a principled relationship between markets and states; according to RAAPs, the most likely path to multi-polar failure is particular forms of market failure that might require new grounds for antitrust enforcement of AI firms. The need for antitrust and antimonopoly policy is thus a straightforward rejection of the idea that enlightenment values will generate some kind of natural steady state where liberty is automatically guaranteed. I develop some of these ideas in my whitepaper on the political economy of reinforcement learning; I'm curious to hear how you see those arguments resonating with Acemoglu and Robinson.

So rather than escaping and setting up shop on some hacked server somewhere, I expect the most likely scenario to be something like "The AI is engaging and witty and sympathetic and charismatic [...]"

(I'm new to thinking about this and would find responses and pointers really helpful) In my head this scenario felt unrealistic because I expect transformative-ish AI applications to come up before highly sophisticated AIs start socially manipulating their designers. Just for the sake of illustrating, I was thinking of stuff like stock investment AIs, product ... (read more)

3Daniel Kokotajlo3y
I think this is a very important and neglected area of research. My take differs from yours but I'm very unconfident in it, you might be right. I'm glad you are thinking about this and would love to chat more about it with you. Stock investment AIs seem like they would make lots of money, which would accelerate timelines by causing loads more money to be spent on AI. But other than that, they don't seem that relevant? Like, how could they cause a point of no return? Product design AIs and question-answering AIs seem similar. Maybe they'll accelerate timelines, but other than that, they won't be causing a point of no return (unless they have gotten so generally intelligent that they can start strategically manipulating us via their products and questions, which I think would happen eventually but by the time that happens there will probably be agenty AIs running around too) Companionship AIs seem like the sort of thing that would be engaging and witty and charismatic, or at the very least, insofar as companionship AIs become a big deal, AIs that can argue themselves out of the box aren't close behind. Military strategy AIs seem similar to me if they can talk/understand language (convincing people of things is something you can strategize about too). Maybe we can imagine a kind of military strategy AI that doesn't really do language well, maybe instead it just has really good battle simulators and has generalized tactical skill that lets it issue commands to troops that are likely to win battles. But (a) I think this is unlikely, and (b) I think it isn't super relevant anyway since tactical skill isn't very important anyway. It's not like we are currently fighting a conventional war and better front-line tactics will let us break through the line or something.

Congrats, those are great news! :) I'd love to read your proposal, will shoot you a mail.

Thanks, I find your neocortex-like AGI approach really illuminating.

Random thought:

(I think you also need to somehow set up the system so that "do nothing" is the automatically-acceptable default operation when every possibility is unpalatable.)

I was wondering if this is necessarily the best „everything is unpalatable“ policy. I could imagine that the best fallback option could also be something like „preserve your options while gathering information, strategizing and communicating with relevant other agents“, assuming that this is not unpalatable, too. I ... (read more)

3Steven Byrnes3y
Yeah I was really only thinking about "not yet trust the AGI" as the main concern. Like, I'm somewhat hopeful that we can get the AGI to have a snap negative reaction to the thought of deceiving its operator, but it's bound to have a lot of other motivations too, and some of those might conflict with that. And it seems like a harder task to make sure that the latter motivations will never ever outbid the former, than to just give every snap negative reaction a veto, or something like that, if that's possible. I don't think "if every option is bad, freeze in place paralyzed forever" is a good strategy for humans :-P and eventuality it would be a bad strategy for AGIs too, as you say.

Thanks for sharing, just played my first round and it was a lot of fun! 

Make bets here? I expect many people should be willing to bet against an AI winter. Would additionally give you some social credit if you win. I’d be interested in seeing some concrete proposals.

Really enjoyed reading this. The section on "AI pollution" leading to a loss of control about the development of prepotent AI really interested me.

Avoiding [the risk of uncoordinated development of Misaligned Prepotent AI] calls for well-deliberated and respected assessments of the capabilities of publicly available algorithms and hardware, accounting for whether those capabilities have the potential to be combined to yield MPAI technology. Otherwise, the world could essentially accrue “AI-pollution” that might eventually precipitate or constitute MPAI.

  • I w
... (read more)

Thanks a lot for the elaboration!

in particular I still can't really put myself in the head of Friston, Clark, etc. so as to write a version of this that's in their language and speaks to their perspective.

Just a sidenote, one of my profs is part of the Bayesian CogSci crowd and was fairly frustrated with and critical of both Friston and Clark. We read one of Friston's papers in our journal club and came away thinking that Friston is reinventing a lot of wheels and using odd terms for known concepts.

For me, this paper by Sam Gershman helped a lot in underst... (read more)

That's really interesting, I haven't thought about this much, but it seems very plausible and big if true (though I am likely biased as a Cognitive Science student). Do you think this might be turned into a concrete question to forecast for the Metaculus crowd, i.e. "Reverse-engineering neocortex algorithms will be the first way we get AGI"? The resolution might get messy if an org like DeepMind, with their fair share of computational neuroscientists, will be the ones who get there first, right?

3Steven Byrnes3y
Yeah I think it would be hard to pin down. Obviously AGI will resemble neocortical algorithms in some respects, and obviously it will be different in some respects. For example, the neocortex uses distributed representations, deep neural nets use distributed representations, and the latter was historically inspired by the former, I think. And conversely, no way AGI will have synaptic vesicles! In my mind this probabilistic programming system with no neurons - - is "more like the neocortex" than a ConvNet, but that's obviously just a particular thing I have in mind, it's not an objective assessment of how brain-like something is. Maybe a concrete question would be "Will AGI programmers look back on the 2010s work of people like Dileep George, Randall O'Reilly, etc. as being an important part of their intellectual heritage, or just 2 more of the countless thousands of CS researchers?" But I dunno, and I'm not sure if that's a good fit for Metaculus anyway.

As a (maybe misguided) side comment, model sketches like yours make me intuitively update for shorter AI timelines, because they give me a sense of a maturing field of computational cognitive science. Would be really interested in what others think about that.

3Steven Byrnes3y
I think I'm in a distinct minority on this forum, maybe a minority of 1, in thinking that there's more than 50% chance that studying and reverse-engineering neocortex algorithms will be the first way we get AGI. (Obviously I'm not the only one in the world with this opinion, just maybe the only one on this forum.) I think there's a good outside-view argument, namely this is an active field of research, and at the end of it, we're all but guaranteed to have AGI-capable algorithms, unlike almost any other research program. I think there's an even stronger (to me) inside-view argument, in which cortical uniformity plays a big role, because (1) if one algorithm can learn languages and image-processing and calculus, that puts a ceiling on the level of complexity and detail within that algorithm, and (2) my reading of the literature makes me think that we already understand the algorithm at least vaguely, and the details are starting to crystallize into view on the horizon ... although I freely acknowledge that this might just be the Dunning-Kruger talking. :-)

That's super fascinating. I've dabbled a bit in all of those parts of your picture and seeing them put together like this feels really illuminating. I'd wish some predictive coding researcher would be so kind to give it a look, maybe somebody here knows someone?

During reading, I was a bit confused about the set of generative models or hypotheses. Do you have an example how this could concretely look like? For example, when somebody tosses me an apple, is there a generative model for different velocities and weights, or one generative model with an uncertainty distribution over those quantities? In the latter case, one would expect another updating-process acting "within" each generative model, right?

3Steven Byrnes3y
Thanks! Yeah, I haven't had the time or energy to start cold-emailing predictive coding experts etc. Well, I tweet this article at people now and then :-P Also, I'm still learning, the picture is in flux, and in particular I still can't really put myself in the head of Friston, Clark, etc. so as to write a version of this that's in their language and speaks to their perspective. I put more at My Computational Framework for the Brain, although you'll notice that I didn't talk about where the generative models come from or their exact structure (which is not entirely known anyway). Three examples I often think about would be: the Dileep George vision model, the active dendrite / cloned HMM sequence learning story (biological implementation by Jeff Hawkins, algorithmic implementation by Dileep George) (note that neither of these have reward), and maybe (well, it's not that concrete) also my little story about moving your toe. I would say that the generative models are a consortium of thousands of glued-together mini-generative-models, maybe even as much as one model per cortical column, which are self-consistent in that they're not issuing mutually-contradictory predictions (often because any given mini-model simply abstains from making predictions about most things). Some of the mini-model pieces stick around a while, while other pieces get thrown out and replaced constantly, many times per second, either in response to new sensory data or just because the models themselves have time-dependence. Like if someone tosses you an apple, there's a set of models (say, in language and object-recognition areas) that really just mean "this is an apple" and they're active the whole time, while there are other models (say, in a sensory-motor area) that say "I will reach out in a certain way and catch the apple and it will feel like this when it touches my hand)", and some subcomponents of the latter one keep getting edited or replaced as you watch the apple and update your bel
As a (maybe misguided) side comment, model sketches like yours make me intuitively update for shorter AI timelines, because they give me a sense of a maturing field of computational cognitive science. Would be really interested in what others think about that.

That was really interesting!:)

Your idea of subcortical spider detection reminded me of this post by Kaj Sotala, discussing the argument that it’s more about „peripheral“ attentional mechanisms having evolved to attend to spiders etc., and consequently being easier learned as dangerous.

These results suggest that fear of snakes and other fear-relevant stimuli is learned via the same central mechanisms as fear of arbitrary stimuli. Nevertheless, if that is correct, why do phobias so often relate to objects encountered by our ancestors, such as snakes and sp

... (read more)

From a Nature news article last week:

One study7 of 143 people with COVID-19 discharged from a hospital in Rome found that 53% had reported fatigue and 43% had shortness of breath an average of 2 months after their symptoms started. A study of patients in China showed that 25% had abnormal lung function after 3 months, and that 16% were still fatigued8.

I haven't read Stanovichs' papers you refer to, but in his book "Rationality and the reflective mind" he proposes a seperation of Type 2 processing into 1) serial associative cognition with a focal bias and 2) fully decoupled simulations for alternative hypothesis. (Just noting it because I found it useful for my own thinking.)

In fact, an exhaustive simulation of alternative worlds would guarantee correct responding in the [Wason selection] task. Instead [...] subjects accept the rule as given, assume it is true, and simply describe how
... (read more)

I always understood bias to mean systematic deviations from the correct response (as in the bias-variance decomposition [1], e.g. a bias to be more overconfident, or the bias of being anchored to arbitrary numbers). I read your and Evans' interpretation of it more like bias meaning incorrect in some areas. As Type 2 processing seems to be very flexible and unconstrained, I thought that it might not necessarily be biased but simply sufficiently unconstrained and high variance to cause plenty of errors in many domains.


PS: Thanks for your writing, I really enjoy it a lot.

Since one month I do some sort of productivity gamification: I rate my mornings on a 1-5 scale with regard to

1) time spend doing something useful and

2) degree of distraction.

Plus if I get out of bed immediately after waking up, I get a plus point.

For every point that I don't achieve on these scales, I pay 50 cents to a charity.

A random morning of mine:

1) time was well spent, I started working early and kept at it until lunch -> 5/5

2) I had some problems focussing while reading -> 3/5

+1 because I got out of bed immediately

The major noticeable imp... (read more)