All of Cameron Berg's Comments + Replies

Thanks for taking the survey! When we estimated how long it would take, we didn't count how long it would take to answer the optional open-ended questions, because we figured that those who are sufficiently time constrained that they would actually care a lot about the time estimate would not spend the additional time writing in responses.

In general, the survey does seem to take respondents approximately 10-20 minutes to complete. As noted in another comment below,

this still works out to donating $120-240/researcher-hour to high-impact alignment orgs (plus whatever the value is of the comparison of one's individual results to that of community), which hopefully is worth the time investment :)

Ideally within the next month or so. There are a few other control populations still left to sample, as well as actually doing all of the analysis.

Thanks for sharing this! Will definitely take a look at this in the context of what we find and see if we are capturing any similar sentiment.

Thanks for calling this out—we're definitely open to discussing potential opportunities for collaboration/engaging with the platform!

It's a great point that the broader social and economic implications of BCI extend beyond the control of any single company, AE no doubt included. Still, while bandwidth and noisiness of the tech are potentially orthogonal to one's intentions, companies with unambiguous humanity-forward missions (like AE) are far more likely to actually care about the societal implications, and therefore, to build BCI that attempts to address these concerns at the ground level.

In general, we expect the by-default path to powerful BCI (i.e., one where we are completely unin... (read more)

With respect to the RLNF idea, we are definitely very sympathetic to wireheading concerns. We think that approach is promising if we are able to obtain better reward signals given all of the sub-symbolic information that neural signals can offer in order to better understand human intent, but as you correctly pointed out that can be used to better trick the human evaluator as well. We think this already happens to a lesser extent and we expect that both current methods and future ones have to account for this particular risk.

More generally, we st... (read more)

2Roman Leventov2mo
I can push back on this somewhat by noting that most risks from BCI may lay outside of the scope of control of any company that builds it and "plugs people in", but rather in the wider economy and social ecosystem. The only thing that may matter is the bandwidth and the noisiness of information channel between the brain and the digital sphere, and it seems agnostic to whether a profit-maximising, risk-ambivalent, or a risk-conscious company is building the BCI.

Thanks for your comment! I think we can simultaneously (1) strongly agree with the premise that in order for AGI to go well (or at the very least, not catastrophically poorly), society needs to adopt a multidisciplinary, multipolar approach that takes into account broader civilizational risks and pitfalls, and (2) have fairly high confidence that within the space of all possible useful things to do to within this broader scope, the list of neglected approaches we present above does a reasonable job of documenting some of the places where we specifically th... (read more)

I'm definitely sympathetic to the general argument here as I understand it: something like, it is better to be more productive when what you're working towards has high EV, and stimulants are one underutilized strategy for being more productive. But I have concerns about the generality of your conclusion: (1) blanket-endorsing or otherwise equating the advantages and disadvantages of all of the things on the y-axis of that plot is painting with too broad a brush. They vary, eg, in addictive potential, demonstrated medical benefit, cost of maintenance, etc.... (read more)

27 people holding the view is not a counterexample to the claim that it is becoming less popular.

Still feels worthwhile to emphasize that some of these 27 people are, eg, Chief AI Scientist at Meta, co-director of CIFAR, DeepMind staff researchers, etc. 

These people are major decision-makers in some of the world's leading and most well-resourced AI labs, so we should probably pay attention to where they think AI research should go in the short-term—they are among the people who could actually take it there.


See also this survey of NLP

I assume thi... (read more)

This is already the case - transformer LLMs already predict neural responses of linguistic cortex remarkably well[1][2]. Perhaps not entirely surprising in retrospect given that they are both trained on overlapping datasets with similar unsupervised prediction objectives. ---------------------------------------- 1. The neural architecture of language: Integrative modeling converges on predictive processing ↩︎ 2. Brains and algorithms partially converge in natural language processing ↩︎

However, technological development is not a zero-sum game. Opportunities or enthusiasm in neuroscience doesn't in itself make prosaic AGI less likely and I don't feel like any of the provided arguments are knockdown arguments against ANN's leading to prosaic AGI.

Completely agreed! 

I believe there are two distinct arguments at play in the paper and that they are not mutually exclusive. I think the first is "in contrast to the optimism of those outside the field, many front-line AI researchers believe that major new breakthroughs are needed before we ca... (read more)

1Joseph Bloom1y
Understood. Maybe if the first argument was more concrete, we can examine it's predictions. For example, what fundamental limitations exist in current systems? What should a breakthrough do (at least conceptually) in order to move us into the new paradigm? I think it's reasonable that understanding the brain better may yield insights but I believe Paul's comment about return on existing insights diminishing over time. Technologies like dishbrain seem exciting and might change that trend?

Thanks for your comment! 

As far as I can tell the distribution of views in the field of AI is shifting fairly rapidly towards "extrapolation from current systems" (from a low baseline).

I suppose part of the purpose of this post is to point to numerous researchers who serve as counterexamples to this claim—i.e., Yann LeCun, Terry Sejnowski, Yoshua Bengio, Timothy Lillicrap et al seem to disagree with the perspective you're articulating in this comment insofar as they actually endorse the perspective of the paper they've coauthored.

You are obviously a h... (read more)

I'm claiming that "new stuff is needed" has been the dominant view for a long time, but is gradually becoming less and less popular. Inspiration from neuroscience has always been one of the most common flavors of "new stuff is needed." As a relatively recent example, it used to be a prominent part of DeepMind's messaging, though it has been gradually receding (I think because it hasn't been a part of their major results). 27 people holding the view is not a counterexample to the claim that it is becoming less popular. See also this survey of NLP, where ~17% of participants think that scaling will solve practically any problem. I think that's an unprecedentedly large number (prior to 2018 I'd bet it was <5%). But it still means that 83% of people disagree. 60% of respondents think that non-trivial results in linguistics or cognitive science will inspire one of the top 5 results in 2030, again I think down to an unprecedented low but still a majority. Did the paper say that NeuroAI is looking increasingly likely? It seems like they are saying that the perspective used to be more popular and has been falling out of favor, so they are advocating for reviving it.

Agreed that there are important subtleties here. In this post, I am really just using the safety-via-debate set-up as a sort of intuitive case for getting us thinking about why we generally seem to trust certain algorithms running in the human brain to adjudicate hard evaluative tasks related to AI safety. I don't mean to be making any especially specific claims about safety-via-debate as a strategy (in part for precisely the reasons you specify in this comment).

Thanks for the comment! I do think that, at present, the only working example we have of an agent able explicitly self-inspect its own values is in the human case, even if getting the base shards 'right' in the prosocial sense would likely entail that they will already be doing self-reflection. Am I misunderstanding your point here?  

Thanks Lukas! I just gave your linked comment a read and I broadly agree with what you've written both there and here, especially w.r.t. focusing on the necessary training/evolutionary conditions out of which we might expect to see generally intelligent prosocial agents (like most humans) emerge. This seems like a wonderful topic to explore further IMO. Any other sources you recommend for doing so?

Thanks! Not much has been done on the topic to my knowledge, so I can only recommend this very general post on the EA forum. I think it's the type of cause area where people have to carve out their own research approach to get something started. 

Hi Joe—likewise! This relationship between prosociality and distribution of power in social groups is super interesting to me and not something I've given a lot of thought to yet. My understanding of this critique is that it would predict something like: in a world where there are huge power imbalances, typical prosocial behavior would look less stable/adaptive. This brings to mind for me things like 'generous tit for tat' solutions to prisoner's dilemma scenarios—i.e., where being prosocial/trusting is a bad idea when you're in situations where the social... (read more)

I broadly agree with Viliam's comment above. Regarding Dagon's comment (to which yours is a reply), I think that characterizing my position here as 'people who aren't neurotypical shouldn't be trusted' is basically strawmanning, as I explained in this comment. I explicitly don't think this is correct, nor do I think I imply it is anywhere in this post.  

As for your comment, I definitely agree that there is a distinction to be made between prosocial instincts and the learned behavior that these instincts give rise to over the lifespan, but I would thin... (read more)

Interesting! Definitely agree that if people's specific social histories are largely what qualify them to be 'in the loop,' this would be hard to replicate for the reasons you bring up. However, consider that, for example,

Young neurotypical children (and even chimpanzees!) instinctively help others accomplish their goals when they believe they are having trouble doing so alone...

which almost certainly has nothing to do with their social history. I think there's a solid argument to be made, then, that a lot of these social histories are essentially a l... (read more)

I don't know of anyone advocating using children or chimpanzees as AI supervisors or trainers.  The gap from evolved/early-learning behaviors to the "hard part" of human alignment is pretty massive. I don't have any better ideas than human-in-the-loop - I'm somewhat pessimistic about it's effectiveness if AI significantly surpasses the humans in prediction/optimization power, but it's certainly worth including in the research agenda.

Agreed that the correlation between the modeling result and the self-report is impressive, with the caveat that the sample size is small enough not to take the specific r-value too seriously. In a quick search, I couldn't find a replication of the same task with a larger sample, but I did find a meta-analysis that includes this task which may be interesting to you! I'll let you know if I find something better as I continue to read through the literature :)

Definitely agree with the thrust of your comment, though I should note that I neither believe nor think I really imply anywhere that 'only neurotypical people are worth societal trust.' I only use the word in this post to gesture at the fact that the vast majority of (but not all) humans share a common set of prosocial instincts—and that these instincts are a product of stuff going on in their brains. In fact, my next post will almost certainly be about one such neuroatypical group: psychopaths!

  This passage seems to treat psychopaths as lacking prosocial circuitry causing uncomfortability entrusting. In the parent psychopaths seem to be members of those that do instantiate the circuitry. I am a bit confused. For example some people could characterise asperger cognition as "asocial cognition". Picking out one cognition as "moral" easily slips that anything that is not that is "immoral". I see the parent comment making a claim analogous in logical structure to "I did not say we distrust women as politicians. I just said that we trust men as politicians."

I liked this post a lot, and I think its title claim is true and important. 

One thing I wanted to understand a bit better is how you're invoking 'paradigms' in this post wrt AI research vs. alignment research. I think we can be certain that AI research and alignment research are not identical programs but that they will conceptually overlap and constrain each other. So when you're talking about 'principles that carry over,' are you talking about principles in alignment research that will remain useful across various breakthroughs in AI research, or ar... (read more)

Good question. Both. Imagine that we're planning a vacation to Australia. We need to plan flights, hotels, and a rental car. Now someone says "oh, don't forget that we must include some sort of plan for how to get from the airport to the rental car center". And my answer to that would usually be... no, I really don't need to plan out how to get from the airport to the rental car center. That part is usually easy enough that we can deal with it on-the-fly, without having to devote significant attention to it in advance. Just because a sub-step is necessary for a plan's execution, does not mean that sub-step needs to be significantly involved in the planning process, or even planned in advance at all. Setting aside for the moment whether or not that's a good analogy for whether "alignment research can't only be about modeling reality", what are the criteria for whether it's a good analogy? In what worlds would it be a good analogy, and in what worlds would it not be a good analogy? The key question is: what are the "hard parts" of alignment? What are the rate-limiting steps? What are the steps which, once we solve those, we expect the remaining steps to be much easier? The hard parts are like the flights and hotel. The rest is like getting from the airport to the rental car center: that's a problem which we expect will be easy enough that we don't need to put much thought into it in advance (and shouldn't bother to plan it at all until after we've figured out what flight we're taking). If the hard parts of alignment are all about modeling reality, then alignment research can, in principle, be only about modeling reality. My own main model for the "hard part" of alignment is in the first half of this video. (I'd been putting off bringing this up in the discussion on your Paradigm-Building posts, because I was waiting for the video to be ready.)

Thanks for your comment! I agree with both of your hesitations and I think I will make the relevant changes to the post: instead of 'totally unenforceable,' I'll say 'seems quite challenging to enforce.' I believe that this is true (and I hope that the broad takeaway from this post is basically the opposite of 'researchers need to stay out of the policy game,' so I'm not too concerned that I'd be incentivizing the wrong behavior). 

To your point, 'logistically and politically inconceivable' is probably similarly overblown.  I will change it to 'hi... (read more)

Very interesting counterexample! I would suspect it gets increasingly sketchy to characterize 1/8th, 1/16th, etc. 'units of knowledge towards AI' as 'breakthroughs' in the way I define the term in the post. 

I take your point that we might get our wires crossed when a given field looks like it's accelerating, but when we zoom in to only look at that field's breakthroughs, we find that they are decelerating. It seems important to watch out for this. Thanks for your comment!

Absolutely. It does - eventually. Which is partially my point. The extrapolation looks sound, until suddenly it isn't. I think you may be slightly missing my point. Once you hit the point that you no longer consider any recent advances breakthroughs, yes, it becomes obvious that you're decelerating. But until that point, breakthroughs appear to be accelerating. And if you're discretizing into breakthrough / non-breakthrough, you're ignoring all the warning signs that the trend might not continue. (To return to my previous example: say we currently consider any one step that's >=1/16th of a unit of knowledge as a breakthrough, and we're at t=2.4... we had breakthroughs at t=1, 3/2, 11/6, 25/12, 137/60. The rate of breakthroughs are accelerating! And then we hit t=49/20, and no breakthrough. And it either looks like we plateaued, or someone goes 'no, 1/32nd of advancement should be considered a breakthrough' and makes another chart of accelerating breakthroughs.) (Yes, in this example every discovery is half as much knowledge as the last one, which makes it somewhat obvious that things have changed. Power of 0.5 was just chosen because it makes the math simpler. However, all the same issues occur with an power of e.g. 0.99 not 0.5. Just more gradually. Which makes the 'no, the last advance should be considered a breakthrough too' argument a whole lot easier to inadvertently accept...)

The question is not "How can John be so sure that zooming into something narrower would only add noise?", the question is "How can Cameron be so sure that zooming into something narrower would yield crucial information without which we have no realistic hope of solving the problem?".

I am not 'so sure'—as I said in the previous comment, I have only claim(ed) it is probably necessary to, for instance, know more about AGI than just whether it is a 'generic strong optimizer.' I would only be comfortable making non-probabilistic claims about the necessity of pa... (read more)

Definitely agree that if we silo ourselves into any rigid plan now, it almost certainly won't work. However, I don't think 'end-to-end agenda' = 'rigid plan.' I certainly don't think this sequence advocates anything like a rigid plan. These are the most general questions I could imagine guiding the field, and I've already noted that I think this should be a dynamic draft. 

...we do not currently possess a strong enough understanding to create an end-to-end agenda which has any hope at all of working; anything which currently claims to be an end-to-end

... (read more)
My comment at the top of this thread detailed my disagreement with that if-then statement, and I do not think any of your responses to my top-level comment actually justified the claim of necessity of the questions. Most of them made the same mistake, which I tried to emphasize in my response. This, for example: The question is not "How can John be so sure that zooming into something narrower would only add noise?", the question is "How can Cameron be so sure that zooming into something narrower would yield crucial information without which we have no realistic hope of solving the problem?". I think this same issue applies to most of the rest of your replies to my original comment.

If it's possible that we could get to a point where AGI is no longer a serious threat without needing to answer the question, then the question is not necessary.

Agreed, this seems like a good definition for rendering anything as 'necessary.' 

Our goal: minimize AGI-induced existential threats (right?). 

My claim is that answering these questions is probably necessary for achieving this goal—i.e., P(achieving goal | failing to think about one or more of these questions) ≈ 0. (I say, "I am claiming that a research agenda that neglects these questions... (read more)

I think restricting oneself to end-to-end agendas is itself a mistake. One principle of e.g. the MIRI agenda is that we do not currently possess a strong enough understanding to create an end-to-end agenda which has any hope at all of working; anything which currently claims to be an end-to-end agenda is probably just ignoring the hard parts of the problem. (The Rocket Alignment Problem gives a good explanation of this view.) I do think that finding necessary subquestions, or noticing that a given subquestion may not be necessary, is much easier than figuring out an end-to-end agenda. One can notice that e.g. an architecture-agnostic alignment strategy seems plausible (or arguably even necessary!) without figuring out all the steps of an end-to-end strategy.

Thanks for taking the time to write up your thoughts! I appreciate your skepticism. Needless to say, I don't agree with most of what you've written—I'd be very curious to hear if you think I'm missing something:

[We] don't expect that the alignment problem itself is highly-architecture dependent; it's a fairly generic property of strong optimization. So, "generic strong optimization" looks like roughly the right level of generality at which to understand alignment...Trying to zoom in on something narrower than that would add a bunch of extra constraints whi

... (read more)
I mean, I don't actually need to defend the assertion all that much. Your core claim is that these questions are necessary, and therefore the burden is on you to argue not only that zooming in on something narrower might not just add noise, but that zooming in on something narrower will not just add noise. If it's possible that we could get to a point where AGI is no longer a serious threat without needing to answer the question, then the question is not necessary. Also, regarding the Afghan hound example, I'd guess (without having read anything about the subject) that training Afghan hounds does not actually involve qualitatively different methods than training other dogs, they just need more of the same training and/or perform less well with the same level of training. Not that that's particularly central. The more important part is that I do not need to be confident that "different possible AGIs could not follow this same pattern"; you've taken upon yourself the burden of arguing that different possible AGIs must follow this pattern, otherwise question 1 might not be necessary. That is basically what I mean, yes. I strongly recommend the Yudkowsky piece. Remember that if you want to argue necessity of the question, then it's not enough for these inputs to be relevant to the outcome of AGI, you need to argue that the question must be answered in order for AGI to go well. Just because some factors are relevant to the outcome does not mean that we must know those factors in advance in order to robustly achieve a good outcome. Remember that if you want to argue necessity of the question, it is not enough for you to think that the probabilities fluctuate; you need a positive argument that the probabilities must fluctuate across the spectrum, by enough that the question must be addressed. I think most of the strategies in MIRI's general cluster do not depend on most of these questions.

Hey Robert—thanks for your comment!

it seems very clear that we should update that structure to the best of our ability as we make progress in understanding the challenges and potentials of different approaches. 

Definitely agree—I hope this sequence is read as something much more like a dynamic draft of a theoretical framework than my Permanent Thoughts on Paradigms for AGI Safety™.

"Aiming at good outcomes while/and avoiding bad outcomes" captures more conceptual territory, while still allowing for the investigation to turn out that avoiding bad outcom

... (read more)

Thanks for your comment—I entirely agree with this. In fact, most of the content of this sequence represents an effort to spell out these generalizations. (I note later that, e.g., the combinatorics of specifying every control proposal to deal with every conceivable bad outcome from every learning architecture is obviously intractable for a single report; this is a "field-sized" undertaking.) 

I don't think this is a violation of the hierarchy, however. It seems coherent to both claim (a) given the field's goal, AGI safety research should follow a gene... (read more)

Hi Tekhne—this post introduces each of the five questions I will put forward and analyze in this sequence. I will be posting one a day for the next week or so. I think I will answer all of your questions in the coming posts.

I doubt that carving up the space in this—or any—way would be totally uncontroversial (there are lots of value judgments necessary to do such a thing), but I think this concern only serves to demonstrate that this framework is not self-justifying (i.e., there is still lots of clarifying work to be done for each of these questions). I ag... (read more)

I agree with this. By 'special class,' I didn't mean that AI safety has some sort of privileged position as an existential risk (though this may also happen to be true)—I only meant that it is unique. I think I will edit the post to use the word "particular" instead of "special" to make this come across more clearly.

I think this is an incredibly interesting point. 

I would just note, for instance, in the (crazy cool) fungus-and-ants case, this is a transient state of control that ends shortly thereafter in the death of the smarter, controlled agent. For AGI alignment, we're presumably looking for a much more stable and long-term form of control, which might mean that these cases are not exactly the right proofs of concept. They demonstrate, to your point, that "[agents] can be aligned with the goals of someone much stupider than themselves," but not necessarily th... (read more)

Glad you understood me. Sorry for my english! Of course, the following examples themselves do not prove the opportunity to solve the entire problem of AGI alignment! But it seems to me that this direction is interesting and strongly underestimated. Well, someone smarter than me can look at this idea and say that it is bullshit, at least. Partly this is a source of intuition for me, that the creation of aligned superintellect is possible. And maybe not even as hard as it seems. We have many examples of creatures that follow the goals of someone more stupid. And the mechanism that is responsible for this should not be very complex. Such a stupid process, as a natural selection, was able to create mentioned capabilities. It must be achievable for us.

If we expect to gain something from studying how humans implement these processes, it'd have to be something like ensuring that our AIs understand them “in the same way that humans do,” which e.g. might help our AIs generalize in a similar way to humans.

I take your point that there is probably nothing special about the specific way(s) that humans get good at predicting other humans. I do think that "help[ing] our AIs generalize in a similar way to humans" might be important for safety (e.g., we probably don't want an AGI that figures out its programmers wa... (read more)

Thank you! 

I don't think I claimed that the brain is a totally aligned general intelligence, and if I did, I take it back! For now, I'll stand by what I said here: "if we comprehensively understood how the human brain works at the algorithmic level, then necessarily embedded in this understanding should be some recipe for a generally intelligent system at least as aligned to our values as the typical human brain." This seems harmonious with what I take your point to be: that the human brain is not a totally aligned general intelligence. I second Steve... (read more)

2Charlie Steiner2y
Ah, you're too late to look forward to it, it's already published :P 

Thank you! I think these are all good/important points. 

In regards to functional specialization between the hemispheres, I think whether this difference is at the same level as mid-insular cortex vs posterior insular cortex would depend on whether the hemispheric differences can account for certain lower-order distinctions of this sort or not. For example, let's say that there are relevant functional differences between left ACC and right ACC, left vmPFC and right vmPFC, and left insular cortex and right insular cortex—and that these differences all h... (read more)

2Steven Byrnes2y
The link says "high-functioning adults with ASD…can easily pass the false belief task when explicitly asked to". So there you go! Perfectly good ToM, right? The paper also says they "do not show spontaneous false belief attribution". But if you look at Figure 3, they "fail" the test by looking equally at the incorrect window and correct window, not by looking disproportionately at the incorrect window. So I would suggest that the most likely explanation is not that the ASD adults are screwing up the ToM task, but rather that they're taking no interest in the ToM task! Remember, the subjects were never asked to pay any attention to the person! Maybe they just didn't! So I say this is a case of motivation, not capability. Maybe they were sitting there during the test, thinking to themselves "Gee, that's a neat diorama, I wonder how the experimenters glued it together!" :-P That would also be consistent with the eye-tracking results mentioned in the book excerpt here. (I recall also a Temple Grandin anecdote (I can't immediately find it) about getting fMRI'd, and she said she basically ignored the movie she was nominally supposed to be looking at, because she was so interested in some aspect of how the scientists had set up the experiment.) Anyway, the paper you link doesn't report (AFAICT) what fraction of the time the subjects are looking at neither window—they effectively just throw those trials away I think—which to me seems like discarding the most interesting data! I think you misunderstood me here. I'm suggesting that maybe: * ToM ≈ IRL ≈ building a good generative model that explains observations of humans * "understanding car engines" ≈ building a good generative model that explains observations of car engines. I guess you're assuming that a good generative model of a mind must contain special ingredients that a good generative model of a car engine does not need? I don't currently think that. Well, more specifically, I think "the particular general-purp