LESSWRONG
LW

Cameron Berg
1772Ω8623620
Message
Dialogue
Subscribe

Currently doing alignment and digital minds research @AE Studio 

Meta AI Resident '23, Cognitive science @ Yale '22, SERI MATS '21, LTFF grantee. 

Very interested in work at the intersection of AI x cognitive science x alignment x philosophy.

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Paradigm-Building for AGI Safety Research
Interim Research Report: Mechanisms of Awareness
Cameron Berg2mo41

Nice work. To me, this seems less like evidence that self-awareness is trivial, and more like evidence that it’s structurally latent. A single steering vector makes the model both choose risky options and say “I am risk-seeking”—despite the self-report behavior never being trained for. That suggests the model’s internal representations of behavior and linguistic self-description are already aligned. It’s probably not introspecting in a deliberate sense, but the geometry makes shallow self-modeling an easy, natural side effect.

Reply
Reducing LLM deception at scale with self-other overlap fine-tuning
Cameron Berg4moΩ120

Makes sense, thanks—can you also briefly clarify what exactly you are pointing at with 'syntactic?' Seems like this could be interpreted in multiple plausible ways, and looks like others might have a similar question.

Reply
Reducing LLM deception at scale with self-other overlap fine-tuning
Cameron Berg4moΩ3117

The idea to combine SOO and CAI is interesting. Can you elaborate at all on what you were imagining here? Seems like there are a bunch of plausible ways you could go about injecting SOO-style finetuning into standard CAI—is there a specific direction you are particularly excited about?

Reply
Making a conservative case for alignment
Cameron Berg7mo60

We've spoken to numerous policymakers and thinkers in DC. The goal is to optimize for explaining to these folks why alignment is important, rather than the median conservative person per se (ie, DC policymakers are not "median conservatives").

Reply
Making a conservative case for alignment
Cameron Berg8mo00

Fixed, thanks!

Reply
Making a conservative case for alignment
Cameron Berg8mo53

Note this is not equivalent to saying 'we're almost certainly going to get AGI during Trump's presidency,' but rather that there will be substantial developments that occur during this period that prove critical to AGI development (which, at least to me, does seem almost certainly true).

Reply
Making a conservative case for alignment
Cameron Berg8mo20

One thing that seems strangely missing from this discussion is that alignment is in fact, a VERY important CAPABILITY that makes it very much better. But the current discussion of alignment in the general sphere acts like 'alignment' is aligning the AI with the obviously very leftist companies that make it rather than with the user!

Agree with this—we do discuss this very idea at length here and also reference it throughout the piece.

That alignment is to the left is one of just two things you have to overcome in making conservatives willing to listen. (The other is obviously the level of danger.)

I think this is a good distillation of the key bottlenecks and seems helpful for anyone interacting with lawmakers to keep in mind.

Reply
Making a conservative case for alignment
Cameron Berg8mo1313

Whether one is an accelerationist, Pauser, or an advocate of some nuanced middle path, the prospects/goals of everyone are harmed if the discourse-landscape becomes politicized/polarized.
...
I just continue to think that any mention, literally at all, of ideology or party is courting discourse-disaster for all, again no matter what specific policy one is advocating for.
...
Like a bug stuck in a glue trap, it places yet another limb into the glue in a vain attempt to push itself free.

I would agree in a world where the proverbial bug hasn't already made any contact with the glue trap, but this very thing has clearly already been happening for almost a year in a troubling direction. The political left has been fairly casually 'Everything-Bagel-izing' AI safety, largely in smuggling in social progressivism that has little to do with the core existential risks, and the right, as a result, is increasingly coming to view AI safety as something approximating 'woke BS stifling rapid innovation.' The fly is already a bit stuck. 

The point we are trying to drive home here is precisely what you're also pointing at: avoiding an AI-induced catastrophe is obviously not a partisan goal. We are watching people in DC slowly lose sight of this critical fact. This is why we're attempting to explain here why basic AI x-risk concerns are genuinely important regardless of one's ideological leanings. ie, genuinely important to left-leaning and right-leaning people alike. Seems like very few people have explicitly spelled out the latter case, though, which is why we thought it would be worthwhile to do so here.

Reply1
Science advances one funeral at a time
Cameron Berg8mo172

Interesting—this definitely suggests that Planck's statement probably shouldn't be taken literally/at face value if it is indeed true that some paradigm shifts have historically happened faster than generational turnover. It may still be possible that this may be measuring something slightly different than the initial 'resistance phase' that Planck was probably pointing at.

Two hesitations with the paper's analysis: 

(1) by only looking at successful paradigm shifts, there might be a bit of a survival bias at play here (we're not hearing about the cases where a paradigm shift was successfully resisted and never came to fruition). 

(2) even if senior scientists in a field may individually accept new theories, institutional barriers can still prevent that theory from getting adequate funding, attention, exploration. I do think Anthony's comment below nicely captures how the institutional/sociological dynamics in science seemingly differ substantially from other domains (in the direction of disincentivizing 'revolutionary' exploration).

Reply
Science advances one funeral at a time
Cameron Berg8mo40

Thanks for this! Completely agree that there are Type I and II errors here and that we should be genuinely wary of both. Also agree with your conclusion that 'pulling the rope sideways' is strongly preferred to simply lowering our standards. The unconventional researcher-identification approach undertaken by the HHMI might be a good proof of concept for this kind of thing.

Reply
Load More
93The Mirror Trap
1mo
13
80Mistral Large 2 (123B) seems to exhibit alignment faking
Ω
3mo
Ω
4
155Reducing LLM deception at scale with self-other overlap fine-tuning
Ω
4mo
Ω
41
67Alignment can be the ‘clean energy’ of AI
5mo
8
208Making a conservative case for alignment
8mo
67
100Science advances one funeral at a time
8mo
9
91Self-prediction acts as an emergent regularizer
Ω
9mo
Ω
9
76The case for a negative alignment tax
10mo
20
223Self-Other Overlap: A Neglected Approach to AI Alignment
Ω
1y
Ω
51
62There Should Be More Alignment-Driven Startups
Ω
1y
Ω
14
Load More