I work at Redwood Research.
Our universe is small enough that it seems plausible (maybe even likely) that most of the value or disvalue created by a human-descended civilization comes from its acausal influence on the rest of the multiverse.
Naively, acausal influence should be in proportion to how much others care about what a lightcone controlling civilization does with our resources. So, being a small fraction of the value hits on both sides of the equation (direct value and acausal value equally).
Of course, civilizations elsewhere might care relatively more about what happens in our universe than whoever controls it does. (E.g., their measure puts much higher relative weight on our universe than the measure of whoever controls our universe.) This can imply that acausal trade is extremely important from a value perspective, but this is unrelated to being "small" and seems more well described as large gains from trade due to different preferences over different universes.
(Of course, it does need to be the case that our measure is small relative to the total measure for acausal trade to matter much. But surely this is true?)
Overall, my guess is that it's reasonably likely that acausal trade is indeed where most of the value/disvalue comes from due to very different preferences of different civilizations. But, being small doesn't seem to have much to do with it.
(Surely cryonics doesn't matter given a realistic action space? Usage of cryonics is extremely rare and I don't think there are plausible (cheap) mechanisms to increase uptake to >1% of population. I agree that simulation arguments and similar considerations maybe imply that "helping current humans" is either incoherant or unimportant.)
But I do think, intuitively, GPT-5-MAIA might e.g. make 'catching AIs red-handed' using methods like in this comment significantly easier/cheaper/more scalable.
Noteably, the mainline approach for catching doesn't involve any internals usage at all, let alone labeling a bunch of internals.
I agree that this model might help in performing various input/output experiments to determine what made a model do a given suspicious action.
By your values, do you think a misaligned AI creates a world that "rounds to zero", or still has substantial positive value?
I think misaligned AI is probably somewhat worse than no earth originating space faring civilization because of the potential for aliens, but also that misaligned AI control is considerably better than no one ever heavily utilizing inter-galactic resources.
Perhaps half of the value of misaligned AI control is from acausal trade and half from the AI itself being valuable.
You might be interested in When is unaligned AI morally valuable? by Paul.
One key consideration here is that the relevant comparison is:
Conditioning on the AI succeeding at acquiring power changes my views of what their plausible values are (for instance, humans seem to have failed at instilling preferences/values which avoid seizing control).
A common story for why aligned AI goes well goes something like: "If we (i.e. humanity) align AI, we can and will use it to figure out what we should use it for, and then we will use it in that way." To what extent is aligned AI going well contingent on something like this happening, and how likely do you think it is to happen? Why?
Hmm, I guess I think that some fraction of resources under human control will (in expectation) be utilized according to the results of a careful reflection progress with an altruistic bent.
I think resources which are used in mechanisms other than this take a steep discount in my lights (there is still some value from acausal trade with other entities which did do this reflection-type process and probably a bit of value from relatively-unoptimized-goodness (in my lights)).
I overall expect that a high fraction (>50%?) of inter-galactic computational resources will be spent on the outputs of this sort of process (conditional on human control) because:
To what extent is your belief that aligned AI would go well contingent on some sort of assumption like: my idealized values are the same as the idealized values of the people or coalition who will control the aligned AI?
Probably not the same, but if I didn't think it was at all close (I don't care at all for what they would use resources on), I wouldn't care nearly as much about ensuring that coalition is in control of AI.
Do you care about AI welfare? Does your answer depend on whether the AI is aligned? If we built an aligned AI, how likely is it that we will create a world that treats AI welfare as important consideration? What if we build a misaligned AI?
I care about AI welfare, though I expect that ultimately the fraction of good/bad that results from the welfare fo minds being used for labor is tiny. And an even smaller fraction from AI welfare prior to humans being totally obsolete (at which point I expect control over how minds work to get much better). So, I mostly care about AI welfare from a deontological perspective.
I think misaligned AI control probably results in worse AI welfare than human control.
Do you think that, to a first approximation, most of the possible value of the future happens in worlds that are optimized for something that resembles your current or idealized values? How bad is it to mostly sacrifice each of these? (What if the future world's values are similar to yours, but is only kinda effectual at pursuing them? What if the world is optimized for something that's only slightly correlated with your values?) How likely are these various options under an aligned AI future vs. an unaligned AI future?
Yeah, most value from my idealized values. But, I think the basin is probably relatively large and small differences aren't that bad. I don't know how to answer most of these other questions because I don't know what the units are.
How likely are these various options under an aligned AI future vs. an unaligned AI future?
My guess is that my idealized values are probably pretty similar to many other humans on reflection (especially the subset of humans who care about spending vast amounts of comptuation). Such that I think human control vs me control only loses like 1/3 of the value (putting aside trade). I think I'm probably less into AI values on reflection such that it's more like 1/9 of the value (putting aside trade). Obviously the numbers are incredibly unconfident.
You might be interested in discussion under this thread
I express what seem to me to be some of the key considerations here (somewhat indirect).
It seems to me like the sort of interpretability work you're pointing at is mostly bottlenecked by not having good MVPs of anything that could plausibly be directly scaled up into a useful product as opposed to being bottlenecked on not having enough scale.
So, insofar as this automation will help people iterate faster fair enough, but otherwise, I don't really see this as the bottleneck.
Thanks!
Yep, this is the exact experiment I was thinking about.
This seems like a reasonable concern.
My general view is that it seems implausible that much of the value from our perspective comes from extorting other civilizations.
It seems unlikely to me that >5% of the usable resources (weighted by how much we care) are extorted. I would guess that marginal gains from trade are bigger (10% of the value of our universe?). (I think the units work out such that these percentages can be directly compared as long as our universe isn't particularly well suited to extortion rather than trade or vis versa.) Thus, competition over who gets to extort these resources seems less important than gains from trade.
I'm wildly uncertain about both marginal gains from trade and the fraction of resources that are extorted.