When discussing the physics behind why the sky is blue, I'm surprised that the question 'Why isn't it blue on Mars or Titan?' isn't raised more often. Perhaps kids are so captivated by concepts like U(1) that they overlook inconsistencies in the explanation.
Just realized that stability of goals under self-improvement is kinda similar to stability of goals of mesa-optimizers; so there vingian reflection paradigm and mesa-optimization paradigm should fit.
What are practical implication of alignment research in the world where AGI is hard?
Imagine we have a good alignment theory but do not have AGI. Can this theory be used to manipulate existing superintelligent systems such as science, deep state, stock market? Does alignment research have any results which can be practically used outside of AGI field right now?
How does AGI solves it's own alignment problem?
For the alignment to work its theory should not only tell humans how to create aligned super-human AGI, but also tell AGI how to self-improve without destroying its own values. Good alignment theory should work across all intelligence levels. Otherwise how does paperclips optimizer which is marginally smarter than human make sure that its next iteration will still care about paperclips?
I don’t know too much about alignment research, but what surprises me most is lack of discussion of two points:
For the alignment to work its theory should not only tell humans how to create aligned super-human AGI, but also tell that AGI how to self-improve without destroying its own values. Otherwise how does paperclips optimizer which is marginally smarter than human make sure that its next iteration will still care about paperclips? Good alignment theory should work across all intelligence levels.
What are practical implication of alignment research in the world where AGI is hard? Imagine we have a good alignment theory but do not have AGI. I would assume that the theory can be used to manipulate existing superintelligent systems such as science, deep state, stock market. The reverse of this is does alignment research have any results which can be practically used right now?
What do you think about offering an option to divest from companies developing unsafe AGI? For example, by creating something like an ESG index that would deliberately exclude AGI-developing companies (Meta, Google etc) or just excluding these companies from existing ESGs.
The impact = making AGI research a liability (being AGI-unsafe costs money) + raising awareness in general (everyone will see AGI-safe & AGI-unsafe options in their pension investment menu + a decision itself will make a noise) + social pressure on AGI researchers (equating them to fossil fuels extracting guys).
Do you think this is implementable short-term? Is there a shortcut from this post to whoever makes a decisions at BlackRock & Co?
You can do something similar to the Drake equation:
where Nlife is how many stars with life there are in the Milky Way and it is assumed that a) once self-replicating molecule is evolved it produces life with 100% probability; b) there is an infinite supply of RNA monomers, and c) lifetime of RNA does not depend on its length. In addition:
You can combine everything except Nbase and LRNA into one factor Pabio, which would give you an approximation of "sampling power" of the galaxy: how many base pairs could have been sampled. If you take assumption that parameters are distributed log-normally with lower estimated range corresponding to mean minus 2 standard deviations and upper range to mean plus 2 standard deviations (and converting all to the same units), you will get the approximate sampling power of Milky Way of
Using this approximation you can see how long an RNA molecule should be to be found if you take top 5% of Pabio distribution: 102 bases. Sequence of 122 bases could be found in at least one galaxy in the observable universe (with 5% probability).
In 2009 article https://www.science.org/doi/10.1126/science.1167856 the sequence of the RNA on the Fig. 1B contained 63 bases. Given the assumptions above, such an RNA molecule could have evolved 0.3 times - 300 trillion times per planet (for comparison, abiogenesis event on Earth' could have occurred 6-17 times in Earth's history, as calculated from the date of earliest evidence of life).
Small 16S ribosomal subunit of prokaryotes contains ~1500 nucleotides, there is no way such a complex machinery could have evolved in the observable universe by pure chance.
On the object level it looks like there are a spectrum of society-level interventions starting from "incentivizing research that wouldn't be published" (which is supported by Eliezer) and all the way to "scaring the hell out of general public" and beyond. For example, I can think of removing $FB and $NVDA from ESGs, disincentivizing publishing code and research articles in AI, introducing regulation of compute-producing industry. Where do you think the line should be drawn between reasonable interventions and ones that are most likely to backfire?
On the meta level, the whole AGI foom management/alignment starts not some abstract 50 years in the future, but right now, with the managing of ML/AI research by humans. Do you know of any practical results produced by alignment research community that can be used right now to manage societal backfire / align incentives?
You haven't commented much on Eliezer's views on the social approach to slow down the development of AGI - the blocks starting with
I don't know how to effectively prevent or slow down the "next competitor" for more than a couple of years even in plausible-best-case scenarios.
and
I don't want to sound like I'm dismissing the whole strategy, but it sounds a lot like the kind of thing that backfires because you did not get exactly the public reaction you wanted
What's your take on this?
This is very strong claim. It puts severe limitations on biotech capabilities. Do you have any references to support it?