maxnadeau — LessWrong

I'm glad you're betting on your own taste/expertise instead of donating on behalf of the community—that seems sensible to me.

Three positive updates I made about technical grantmaking at Coefficient Giving (fka Open Phil)

maxnadeau24d160

(I am also a grantmaker at Coefficient/OP)

The arguments/evidence in this post seem true and underrated to me, and I think more people should come work with us.

In particular, I also have updated upward on how impactful the job is over the last year. It does really seem to me like each grantmaker enables a ton of good projects. Here’s an attempt to make more concrete how much is enabled by additional grantmakers: If Jake hadn’t joined OP, I think we would our interp/theory grants would have been fewer in number and less impactful, because I don’t know those areas nearly as well as Jake does. Jake’s superior knowledge improves our grantmaking in these areas in multiple ways:

Better sourcing: Jake’s involvement meant that the proposals in these areas that were even available to us to evaluate were much better. His contributions to the interp/theory sections of the RFP meant the incoming proposals were higher-quality than if I had attempted to write them, and he had good suggestions/steers for grant applicants that I couldn’t have offered. He was also able to proactively ideate and realize projects that I wouldn’t have thought of or wouldn’t have had time for.
More grants, in more varied subareas: because Jake knows those areas better, he can evaluate proposals faster and is more comfortable arguing for/defending these grants than I am. This allows us to make more, and more varied, grants in those areas.
[The obvious one] Jake has better discernment among proposals in these areas than I do, which straightforwardly increases the impact of our grantmaking.

I think there are probably more buckets of similar scale/impact grantmaking to interp and theory that we’re currently neglecting. We need to hire more people to open up these new vistas of TAIS grantmaking, each of which will contain not just mediocre/marginal grants, but also some real gems! I think this dynamic is often underappreciated; additional grantmakers take ownership for new areas, rather than just helping us make better choices on the margin.

I also think that Jake obviously had way more impact on theory/interp than if he had done direct work. He funded dozens of projects by capable researchers, many of whom wouldn’t have worked on AI safety otherwise. I think most TAIS researchers aren’t taking this nearly seriously enough, and I think the case for grantmaking roles looks very strong in light of this.

Humanity Learned Almost Nothing From COVID-19

maxnadeau2mo30

I fervently agree that the degree of inaction here is embarrassing and indefensible.

Here's one proposed explanation, though definitely not a justification, of why the Biden admin didn't do more. To be clear, I'm not saying they're the only ones who should have done/do more:

The shrinking anti-pandemic agenda

After coming up with a $65 billion moonshot plan, Biden asked for about half of that as part of his initial Build Back Better proposal. But as the entirety of BBB shrank in an effort to secure the support of Joe Manchin and Kyrsten Sinema, the pandemic prevention shrank to about $2.7 billion, of which roughly half is to modernize the CDC’s labs.
And it’s far from clear that even this relatively small amount will pass.
The extreme shrinkage of the pandemic prevention agenda in part reflects a partisan calculation. To Democrats who agree that this should be a priority, it doesn’t feel like it’s a distinctively progressive priority that should squeeze out ideas like free preschool or Medicaid expansion, which everyone understands Republicans oppose. They feel like this bill is supposed to be dessert, and pandemic prevention is vegetables.
And the good news is that it’s true — pandemic prevention is not a super partisan topic, and there are prospects for bipartisan cooperation.
The problem is that once you get into the regular appropriations process, the logic of base rates starts to dominate everything. To secure a 30% increase in pandemic preparedness funding would be a big step for appropriators since obviously most programs can’t score increases nearly that large. But we are currently spending peanuts on a problem that has both massive economic consequences and carries genuine existential risk. We don’t need a large increase in pandemic preparedness funding; we need to go from “not seriously investing in preventing pandemics” to “genuinely trying to prevent pandemics” with a gargantuan investment in funds.
What’s particularly galling is that even as the need exceeds the demand of standard appropriations, it’s still relatively modest compared to the $725 billion defense budget for fiscal year 2020. And due to base rate issues, the Biden administration’s 2021 requested increase — though modest in percentage terms — still amounts to $12 billion for one year in a world where asking for a $7 billion per year increase in defending ourselves against pandemics is considered outrageous.

plex's Shortform

maxnadeau2mo240

Thanks for writing this! Just booked a time in your calendly to discuss at more length.

An alignment safety case sketch based on debate

maxnadeau7moΩ680

For more discussion of the hard cases of exploration hacking, readers should see the comments of this post.

The 4-Minute Mile Effect

maxnadeau8mo73

Typo: should be "Gell-Mann"

Tormenting Gemini 2.5 with the [[[]]][][[]] Puzzle

maxnadeau9mo30

I figured out the encoding, but I expressed the algorithm for computing the decoding in different language from you. My algorithm produces equivalent outputs but is substantially uglier. I wanted to leave a note here in case anyone else had the same solution.

Alt phrasing of the solution:

Each expression (i.e. a well-formed string of brackets) has a "degree", which is defined as the the number of well-formed chunks that the encoding can be broken up into. Some examples: [], [[]], and [-[][]] have degree one, [][], -[][[]], and [][[][]] have degree two, etc.

Here's a special case: the empty string maps to 0, i.e. decode("") = 0

When an encoding has degree one, you take off the outer brackets and do 2^decode(the enclosed expression), defined recursively. So decode([]) = 2^decode("") = 2^0 = 1, decode([[]]) = 2^decode([]) = 2, etc.

Negation works as normal. So decode([-[]]) = 2^decode(-[]) = 2^(-decode([])) = 2^(-1) = 1/2

So now all we have to deal with is expressions with degree >1.

When an expression has degree >1, you compute its decoding as the product of decoding of the first subexpression and inc(decode(everything after the first subexpression)). I will define the "inc" function shortly.

So decode([][[]]) = decode([]) * inc(decode([[]]) = 1 * inc(2)

decode([[[]]][][[]]) = decode([[[]]]) * inc(decode([][[]])) = 4 * inc(decode([]) * inc(decode([[]]))) = 4 * inc(1 * inc(2))

What is inc()? inc() is a function that computes a prime factorization of a number and the increments (from one prime to the next) all the prime bases. So inc(10) = inc(2 * 5) = 3 * 7 = 21, and inc(36) = inc(2^2 * 3^2) = 3^2 * 5^2 = 225. But inc() doesn't just take in integers, it can take in any number representable as a product of primes raised to powers. So inc(2^(1/2) * 3^(-1)) = 3^(1/2) * 5^(-1) = sqrt(3)/5. I asked the language models whether there's a standard name for the set of numbers definable in this way, and they didn't have ideas.

Six Thoughts on AI Safety

maxnadeau9mo10

On point 6, "Humanity can survive an unaligned superintelligence": In this section, I initially took you to be making a somewhat narrow point about humanity's safety if we develop aligned superintelligence and humanity + the aligned superintelligence has enough resources to out-innovate and out-prepare a misaligned superintelligence. But I can't tell if you think this conditional will be true, i.e. whether you think the existential risk to humanity from AI is low due to this argument. I infer from this tweet of yours that AI "kill[ing] us all" is not among your biggest fears about AI, which suggests to me that you expect the conditional to be true—am I interpreting you correctly?

We should start looking for scheming "in the wild"

maxnadeau9moΩ691

To make a clarifying point (which will perhaps benefit other readers): you're using the term "scheming" in a different sense from how Joe's report or Ryan's writing uses the term, right?

I assume your usage is in keeping with your paper here, which is definitely different from those other two writers' usages. In particular, you use the term "scheming" to refer to a much broader set of failure modes. In fact, I think you're using the term synonymously with Joe's "alignment-faking"—is that right?

Open problems in emergent misalignment

maxnadeau10mo212

People interested in working on these sorts of problems should consider applying to Open Phil's request for proposals: https://www.openphilanthropy.org/request-for-proposals-technical-ai-safety-research/

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments