Cameron Berg

SERI MATS '21, Cognitive science @ Yale '22, Meta AI Resident '23, LTFF grantee. Currently doing prosocial alignment research @ AE Studio. Very interested in work at the intersection of AI x cognitive science x alignment x philosophy.

Sequences

Paradigm-Building for AGI Safety Research

Posts

Sorted by New

63AE Studio @ SXSW: We need more AI consciousness research (and further resources)

1mo

71Survey for alignment researchers!

3mo

163The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda

4mo

28Computational signatures of psychopathy

37AI researchers announce NeuroAI agenda

42Alignment via prosocial brain algorithms

2Paradigm-building: Conclusion and practical takeaways

5Question 5: The timeline hyperparameter

6Question 4: Implementing the control proposals

5Question 3: Control proposals for minimizing bad outcomes

Wiki Contributions

Comments

AE Studio @ SXSW: We need more AI consciousness research (and further resources)

Cameron Berg1mo30

Thanks for the comment!

Consciousness does not have a commonly agreed upon definition. The question of whether an AI is conscious cannot be answered until you choose a precise definition of consciousness, at which point the question falls out of the realm of philosophy into standard science.

Agree. Also happen to think that there are basic conflations/confusions that tend to go on in these conversations (eg, self-consciousness vs. consciousness) that make the task of defining what we mean by consciousness more arduous and confusing than it likely needs to be (which isn't to say that defining consciousness is easy). I would analogize consciousness to intelligence in terms of its difficulty to nail down precisely, but I don't think there is anything philosophically special about consciousness that inherently eludes modeling.

is there some secret sauce that makes the algorithm [that underpins consciousness] special and different from all currently known algorithms, such that if we understood it we would suddenly feel enlightened? I doubt it. I expect we will just find a big pile of heuristics and optimization procedures that are fundamentally familiar to computer science.

Largely agree with this too—it very well may be the case (as seems now to be obviously true of intelligence) that there is no one 'master' algorithm that underlies the whole phenomenon, but rather as you say, a big pile of smaller procedures, heuristics, etc. So be it—we definitely want to better understand (for reasons explained in the post) what set of potentially-individually-unimpressive algorithms, when run in concert, give you system that is conscious.

So, to your point, there is not necessarily any one 'deep secret' to uncover that will crack the mystery (though we think, eg, Graziano's AST might be a strong candidate solution for at least part of this mystery), but I would still think that (1) it is worthwhile to attempt to model the functional role of consciousness, and that (2) whether we actually have better or worse models of consciousness matters tremendously.

Survey for alignment researchers!

Cameron Berg2mo10

There will be places on the form to indicate exactly this sort of information :) we'd encourage anyone who is associated with alignment to take the survey.

Survey for alignment researchers!

Cameron Berg3mo10

Thanks for taking the survey! When we estimated how long it would take, we didn't count how long it would take to answer the optional open-ended questions, because we figured that those who are sufficiently time constrained that they would actually care a lot about the time estimate would not spend the additional time writing in responses.

In general, the survey does seem to take respondents approximately 10-20 minutes to complete. As noted in another comment below,

this still works out to donating $120-240/researcher-hour to high-impact alignment orgs (plus whatever the value is of the comparison of one's individual results to that of community), which hopefully is worth the time investment :)

Survey for alignment researchers!

Cameron Berg3mo10

Ideally within the next month or so. There are a few other control populations still left to sample, as well as actually doing all of the analysis.

Survey for alignment researchers!

Cameron Berg3mo20

Thanks for sharing this! Will definitely take a look at this in the context of what we find and see if we are capturing any similar sentiment.

The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda

Cameron Berg4mo20

Thanks for calling this out—we're definitely open to discussing potential opportunities for collaboration/engaging with the platform!

The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda

Cameron Berg4mo30

It's a great point that the broader social and economic implications of BCI extend beyond the control of any single company, AE no doubt included. Still, while bandwidth and noisiness of the tech are potentially orthogonal to one's intentions, companies with unambiguous humanity-forward missions (like AE) are far more likely to actually care about the societal implications, and therefore, to build BCI that attempts to address these concerns at the ground level.

In general, we expect the by-default path to powerful BCI (i.e., one where we are completely uninvolved) to be negative/rife with s-risks/significant invasions of privacy and autonomy, etc, which is why we are actively working to nudge the developmental trajectory of BCI in a more positive direction—i.e., one where the only major incentive is build the most human-flourishing-conducive BCI tech we possibly can.

The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda

Cameron Berg4mo84

With respect to the RLNF idea, we are definitely very sympathetic to wireheading concerns. We think that approach is promising if we are able to obtain better reward signals given all of the sub-symbolic information that neural signals can offer in order to better understand human intent, but as you correctly pointed out that can be used to better trick the human evaluator as well. We think this already happens to a lesser extent and we expect that both current methods and future ones have to account for this particular risk.

More generally, we strongly agree that building out BCI is like a tightrope walk. Our original theory of change explicitly focuses on this: in expectation, BCI is not going to be built safely by giant tech companies of the world, largely given short-term profit-related incentives—which is why we want to build it ourselves as a bootstrapped company whose revenue has come from things other than BCI. Accordingly, we can focus on walking this BCI developmental tightrope safely and for the benefit of humanity without worrying if we profit from this work.

We do call some of these concerns out in the post, eg:

We also recognize that many of these proposals have a double-edged sword quality that requires extremely careful consideration—e.g., building BCI that makes humans more competent could also make bad actors more competent, give AI systems manipulation-conducive information about the processes of our cognition that we don’t even know, and so on. We take these risks very seriously and think that any well-defined alignment agenda must also put forward a convincing plan for avoiding them (with full knowledge of the fact that if they can’t be avoided, they are not viable directions.)

Overall—in spite of the double-edged nature of alignment work potentially facilitating capabilities breakthroughs—we think it is critical to avoid base rate neglect in acknowledging how unbelievably aggressively people (who are generally alignment-ambivalent) are now pushing forward capabilities work. Against this base rate, we suspect our contributions to inadvertently pushing forward capabilities will be relatively negligible. This does not imply that we shouldn't be extremely cautious, have rigorous info/exfohazard standards, think carefully about unintended consequences, etc—it just means that we want to be pragmatic about the fact that we can help solve alignment while being reasonably confident that the overall expected value of this work will outweigh the overall expected harm (again, especially given the incredibly high, already-happening background rate of alignment-ambivalent capabilities progress).

The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda

Cameron Berg4mo62

Thanks for your comment! I think we can simultaneously (1) strongly agree with the premise that in order for AGI to go well (or at the very least, not catastrophically poorly), society needs to adopt a multidisciplinary, multipolar approach that takes into account broader civilizational risks and pitfalls, and (2) have fairly high confidence that within the space of all possible useful things to do to within this broader scope, the list of neglected approaches we present above does a reasonable job of documenting some of the places where we specifically think AE has comparative advantage/the potential to strongly contribute over relatively short time horizons. So, to directly answer:

Is this a deliberate choice of narrowing your direct, object-level technical work to alignment (because you think this where the predispositions of your team are?), or a disagreement with more systemic views on "what we should work on to reduce the AI risks?"

It is something far more like a deliberate choice than a systemic disagreement. We are also very interested and open to broader models of how control theory, game theory, information security, etc have consequences for alignment (e.g., see ideas 6 and 10 for examples of nontechnical things we think we could likely help with). To the degree that these sorts of things can be thought of further neglected approaches, we may indeed agree that they are worthwhile for us to consider pursuing or at least help facilitate others' pursuits—with the comparative advantage caveat stated previously.

Consider working more hours and taking more stimulants

Cameron Berg1y73

I'm definitely sympathetic to the general argument here as I understand it: something like, it is better to be more productive when what you're working towards has high EV, and stimulants are one underutilized strategy for being more productive. But I have concerns about the generality of your conclusion: (1) blanket-endorsing or otherwise equating the advantages and disadvantages of all of the things on the y-axis of that plot is painting with too broad a brush. They vary, eg, in addictive potential, demonstrated medical benefit, cost of maintenance, etc. (2) Relatedly, some of these drugs (e.g., Adderall) alter the dopaminergic calibration in the brain, which can lead to significant personality/epistemology changes, typically as a result of modulating people's risk-taking/reward-seeking trade-offs. Similar dopamine agonist drugs used to treat Parkinson's led to pathological gambling behaviors in patients who took it. There is an argument to be made for at least some subset of these substances that the trouble induced by these kinds of personality changes may plausibly outweigh the productivity gains of taking the drugs in the first place.