I'm not sure I see the distinction between your Type I and Type II.
The Type II constraints you list (compute availability, data, ideas (which may not be parallelizable), organizational overhead, testing and validation time) seem to apply to Type I also.
I think the distinction between Type I/II and Type III is meaningful (see here).
John thinks he's in the winners' bracket (who don't care about signalling), but he's actually in the upper-losers' bracket (who care about countersignalling).
Consider this part:
Occasionally people ask me if I’m going to the next one, and… I try to be polite. But my internal reaction is something like “oh god no, I absolutely cannot be seen at an EA global, that would be super cringe”. EA global, like many other effective altruism branded “networking events”, is (at least in my head) the losers’ bracket of the effective altruism job market.
The actual "winners' bracket" of the EA job market aren't thinking about whether it would be super cringe to be seen at EAG. They go if it's worth their time and don't if it isn't. I don’t imagine this is a thought that Holden Karnofsky has ever had.
Separately: John has revealed himself as lower-upper-losers' bracket, not even upper-upper-losers' bracket. The upper-upper-losers' bracket would go to EAG, as a counter-counter-signal that they are confident that everyone will assume they were there to recruit, not to find a job.
Not deeply believing in technological progress (e.g. prioritizing bed nets over vaccines).
I think this is wrong about the history.
I was recently watched this Q&A with Holden in 2013. The picture I got is that GiveWell was focusing on "proven interventions" (e.g. bednets) over "speculative interventions" (e.g. biomedical research), because proven interventions were easier to evaluate.[1] He says he thinks speculative/high-risk stuff is probably better, and GiveWell Labs (i.e. Open Philanthropy, i.e. Coefficient Giving) was the pivot toward finding those opportunities.
Supporting quotations
"We really focused on what I'd call traditional GiveWell, which is looking for charities that are proven and cost effective and scalable... That's why we picked it, is because we thought we'd be able to get somewhere on that."
"We kind of bit off this chunk at the beginning of proven effective scalable... that was because it had a shorter feedback loop. Do this analysis, people would check the analysis... we would learn."
"Proven can be easy to see, it also does tend to attract money from other funders... It allows a level of accountability that you don't get with other things, it allows a level of honesty and objectivity and transparency you don't get with other things."
"You get much better information when you're taken more seriously, and you're taken more seriously when there's more money behind you."
"I feel like by running an organization and by operating within that framework, I've just gotten so much better... knowing what's going to be accomplishable, knowing what sort of person to listen to."
"We don't believe, and didn't necessarily — didn't really ever believe — that this kind of charity is the only way to have a positive impact or the best way to have a positive impact."
"I do think that by taking away the restrictions, I believe we're going to find much better giving opportunities... Conventional wisdom in philanthropy is that speculative and risky is where all the action is."
"The ROI on medical research is probably really good... Aging is one of the things that I would look at as being promising."
Fun fact: Holden mentions that his "informal scientific adviser" for exploring biomedical research as a cause area ia a biology postdoc at Stanford called Dario Amodei.
Objectively, the global population is about 8 billion. But subjectively?
Let p_i be the probability I'll meet person i in the next year, and let μ = Σ p_i be the expected number of people I meet. Then the subjective population is
N = exp( -Σ (p_i/μ) log(p_i/μ) )
This is the perplexity of the conditional distribution "given I meet someone, who is it?". For example, if there's a pool of 100,000 people who I'll meet with 3% chance each (everyone else is 0%) then I'll meet 3000 people next year, and my subjective population is 100,000.
My guess is that my subjective population is around 30,000–100,000, but I might be way off.
my guess is that — confusingly — new claude "constitution" isn't used as in the constitutional ai paper. rather, it's a document used during midtraining sft
one clue is the new constitution is from the soul doc, which is a document used in midtraining sft
another clue is that, as a data structure, the old constitution was an unordered list of short principles whereas the new one is intended to be read as a monolithic document
they say:
We use the constitution at various stages of the training process. This has grown out of training techniques we've been using since 2023, when we first began training Claude models using Constitutional AI. Our approach has evolved significantly since then, and the new constitution plays an even more central role in training. Claude itself also uses the constitution to construct many kinds of synthetic training data, including data that helps it learn and understand the constitution, conversations where the constitution might be relevant, responses that are in line with its values, and rankings of possible responses.
but i think the sft midtraining is the main show
There's a reading of the Claude Constitution as an 80-page dialectic between Carlsmithian and Askellian metaethics.
One justification for treating "schemers" as a fairly binary classification: there's a phase transition between the AIs lying with 99% probability versus with 100% probability, which is that techniques like "just ask different copies of the model" or "ask the AI with higher temperature" or "ask the AI with slightly different phrasings" etc stop working. Like, these techniques work unless the deception is universal, reliable, robust, and persistent.
Maybe, but the schemers could optimize their research for looking-good-to-us, and it might be hard to distinguish this from actually good work
if you ask each variant to review the research of the other variants, then the schemers need to optimise their research for looking good to each variant. but the optimistic assumption is that at least one variant is an equally capable non-schemer.
Depends. Self-play isn't RSI — it's just a way to generate data. An AI could improve using the data with I-RSI (e.g. the AI thinks to itself "oh when I played against myself this reasoning heuristic worked I should remember that") or via E-RSI (e.g. the AI generates statistical analysis on the data and finds weakspots and tests different patches).
If you just the data to get gradients to update the weights, then I wouldn't count that as RSI.