LESSWRONG
LW

1991
Beth Barnes
3141Ω1082231320
Message
Dialogue
Subscribe

Alignment researcher. Views are my own and not those of my employer. https://www.barnes.page/

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
5Beth Barnes's Shortform
Ω
4y
Ω
7
Reasons to sell frontier lab equity to donate now rather than later
Beth Barnes20d20

Another example, somewhat less cherrypicked: holding a mix of google, nvidia and tsmc at 100% leverage with 5.5% interest on margin loan gets you like 64% annualized returns
 

Reply
Reasons to sell frontier lab equity to donate now rather than later
Beth Barnes20d*152

I think another point that's important here is:
Holding (leveraged) exposure to the best public AI stocks is not obviously less performant than holding lab equity. 
E.g. holding NVIDIA [ETA: with 100% leverage at 5.5% interest] had a ~120% annualized return between Jan 2021 and now, meaning it went up by roughly 40x. My impression is that people holding lab equity are not seeing returns that massively outstrip that. 

(Various caveats here about cherrypicking and past returns not guaranteeing future returns, but that's somewhat a problem for lab equity as well)

Reply
Beth Barnes's Shortform
Beth Barnes21dΩ340

Budget: We run at ~$13m p.a. rn (~$15m for the next year under modest growth assumptions, quite plausibly $17m++ given the increasingly insane ML job market).

Audacious funding: This ended up being a bit under $16m, and is a commitment across 3 years.

Runway: Depending on spend/growth assumptions, we have between 12 and 16 months of runway. We want to grow at the higher rate, but we might end up bottlenecked on senior hiring. (But that’s  potentially a problem you can spend money to solve - and it also helps to be able to say "we have funding security and we have budget for you to build out a new team").

More context on our thinking: The audacious funding was a one-off, and we need to make sure we have a sustainable funding model. My sense is that for “normal” nonprofits, raising >$10m/yr is considered a big lift that would involve multiple FT fundraisers and a large fraction of org leadership’s time, and even then not necessarily succeed.  We have the hypothesis that the AI safety ecosystem can support this level of funding (and more specifically, that funding availability will scale up in parallel with the growth of AI sector in general), but we want to get some evidence that that’s right and build up reasonable runway before we bet too aggressively on it. Our fundraising goal for the end of 2025 is to raise $10M

Reply
Beth Barnes's Shortform
Beth Barnes22dΩ7153

FYI: METR is actively fundraising! 

METR is a non-profit research organization. We prioritise independence and trustworthiness, which shapes both our research process and our funding options. To date, we have not accepted payment from frontier AI labs for running evaluations.[1] 

Part of METR's role is to independently assess the arguments that frontier AI labs put forward about the safety of their models. These arguments are becoming increasingly complex and dependent on nuances of how models are trained and how mitigations were developed.

For this reason, it's important that METR has its finger on the pulse of frontier AI safety research. This means hiring and paying for staff that might otherwise work at frontier AI labs, requiring us to compete with labs directly for talent.

The central constraint to our publishing more and better research, and scaling up our work aimed at monitoring the AI industry for catastrophic risk, is growing our team with excellent new researchers and engineers.

And our recruiting is, to some degree, constrained by our fundraising - especially given the skyrocketing comp that AI companies are offering.

To donate to METR, click here: https://metr.org/donate

If you’d like to discuss giving with us first, or receive more information about our work for the purpose of informing a donation, reach out to giving@metr.org

  1. ^

    However, we are definitely not immune from conflicting incentives. Some examples: 
      - We are open to taking donations from individual lab employees (subject to some constraints, e.g. excluding senior decision-makers, constituting <50% of our funding)
     - Labs provide us with free model access for conducting our evaluations, and several labs also provide us ongoing free access for research even if we're not conducting a specific evaluation. 

Reply
Prover-Estimator Debate: A New Scalable Oversight Protocol
Beth Barnes4moΩ8100

I can write more later, but here's a relevant doc I wrote as part of discussion with Geoffrey + others. Maybe the key point from there is that I don't think this protocol solves the examples given in the original post describing obfuscated arguments. But yeah, I was always thinking this was a completeness problem (the original post poses the problem as distinguishing a certain class of honest arguments from dishonest obfuscated arguments - not claiming you can't get soundness by just ignoring any arguments that are plausibly obfuscated.)

Reply1
Prover-Estimator Debate: A New Scalable Oversight Protocol
Beth Barnes4moΩ580

Yep, happy to chat!

Reply
Prover-Estimator Debate: A New Scalable Oversight Protocol
Beth Barnes4moΩ576

Yep. For empirical work I'm in favor of experiments with more informed + well-trained human judges who engage deeply etc, and having a high standard for efficacy (e.g. "did it get the correct answer with very high reliability") as opposed to "did it outperform a baseline by a statistically significant margin" where you then end up needing high n and therefore each example needs to be cheap / shallow

Reply
Prover-Estimator Debate: A New Scalable Oversight Protocol
Beth Barnes4moΩ560

IMO the requirements are a combination of stability and compactness - these trade off against each other, and the important thing is the rate at which you get evidence for which debater is dishonest while exploring the tree.

iiuc, the stability definition used here is pretty strong - says that the error in the parent is smaller than the largest error across the children. So any argument structure where errors can accumulate (like a conjunctive argument, or a proof which requires all the steps to be correct) is out. 

Reply
Prover-Estimator Debate: A New Scalable Oversight Protocol
Beth Barnes4mo*Ω35565

I was really impressed by the technical work in this paper. Getting to a formalization of the problem setup and the protocol that allows you to prove meaningful things is a big accomplishment. 

However, as the authors mention above, I don't think this should be a substantial update about whether obfuscated arguments are a problem for recursive decomposition approaches to scalable oversight. (I think the discussion in this post is fine, but I think the title of the paper "avoiding obfuscation with prover-estimator debate" is a bit misleading. I believe the authors are going to change this in v2.) 

I'm excited about more empirical work on making debate protocols work in practice. I feel a bit less excited about pure theory work, but I think work that mixes experimentation and theory could be helpful.

I think there are broadly two classes of hope about obfuscated arguments:

(1.) In practice, obfuscated argument problems rarely come up, due to one of:

  1. It’s difficult in practice to construct obfuscated arguments for arbitrary propositions
    1. It’s definitely plausible to me that the algorithm sketch I gave here for constructing obfuscated arguments for arbitrary propositions doesn’t work, given some relatively weak assumptions about the structure of the debaters’ knowledge
  2. For anything the debaters know, there’s usually a sufficiently compact and stable honest argument such that the honest debater can win by giving this argument
    1. It seems pretty easy to give counterexamples here - e.g. Paul gives the example of unsupervised translation. But maybe these are relatively rare. It's plausible that the honest debaters in the human experiments I did where we ran into obfuscation / instability problems weren't using the right tactics (i.e. they could have constructed more stable arguments that used independent lines of evidence more)

(2.) We can create a protocol that distinguishes between cases where:

  • (not obfuscatable) the debate mirrors the structure of the debaters’ understanding such that they would be able to easily identify which subtree contains a flaw if one was present   
  • (obfuscatable) they don't or wouldn't know which subtree contains the flaw. 

This is the way (apart from argument size) in which the primality test example differs from the obfuscated factorization example: the debaters have some high-level mathematical concepts which allow them to quickly determine whether some proposed lemma is correct.

This wouldn't get us to full ELK (bc maybe models still know things they have no human-understandable arguments for), but would at least expand the class of honest arguments that we can trust to include ones that are large + unstable in human-understandable form but where the debaters do have a faster way of identifying which subtree to go down. 


 

Reply2
Anthropic rewrote its RSP
Beth Barnes1y*429

I'm glad you brought this up, Zac - seems like an important question to get to the bottom of!

METR is somewhat capacity constrained and we can't currently commit to e.g. being available on a short notice to do thorough evaluations for all the top labs - which is understandably annoying for labs.

Also, we don't want to discourage people from starting competing evaluation or auditing orgs, or otherwise "camp the space".

We also don't want to accidentally safety-wash -that post was written in particular to dispel the idea that "METR has official oversight relationships with all the labs and would tell us if anything really concerning was happening"

All that said, I think labs' willingness to share access/information etc is a bigger bottleneck than METR's capacity or expertise. This is especially true for things that involve less intensive labor from METR (e.g. reviewing a lab's proposed RSP or evaluation protocol and giving feedback, going through a checklist of evaluation best practices, or having an embedded METR employee observing the lab's processes - as opposed to running a full evaluation ourselves).

I think "Anthropic would love to pilot third party evaluations / oversight more but there just isn't anyone who can do anything useful here" would be a pretty misleading characterization to take away, and I think there's substantially more that labs including Anthropic could be doing to support third party evaluations.

If we had a formalized evaluation/auditing relationship with a lab but sometimes evaluations didn't get run due to our capacity, I expect in most cases we and the lab would want to communicate something along the lines of "the lab is doing their part, any missing evaluations are METR's fault and shouldn't be counted against the lab".

Reply2
Load More
No wikitag contributions to display.
108Clarifying METR's Auditing Role
Ω
1y
Ω
1
90Introducing METR's Autonomy Evaluation Resources
2y
0
65METR is hiring!
2y
1
49Bounty: Diverse hard tasks for LLM agents
Ω
2y
Ω
31
77Send us example gnarly bugs
Ω
2y
Ω
10
66Managing risks of our own work
Ω
2y
Ω
0
153ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks
Ω
2y
Ω
12
233More information about the dangerous capability evaluations we did with GPT-4 and Claude.
Ω
3y
Ω
54
21Reflection Mechanisms as an Alignment Target - Attitudes on “near-term” AI
3y
0
104'simulator' framing and confusions about LLMs
Ω
3y
Ω
11
Load More