reverendfoom has not written any posts yet.

reverendfoom has not written any posts yet.

I appreciate the response and stand corrected.
The point about it being an iterated prisoner's dilemma is a good one, and I would rather there be more such ACX instances where he shares even more of his thinking due to our cooperative/trustworthy behavior, than this to be the last one or have the next ones be filtered PR-speak.
A small number of people in the alignment community repeatedly getting access to better information and being able to act on it beats the value of this one single post staying open to the world. And even in the case of "the cat being out of the bag," hiding/removing the post would probably do good as a gesture of cooperation.
Another point about "defection" is which action is a defection with respect to whom.
Sam Altman is the leader of an organization with a real chance of bringing about the literal end of the world, and I find any and all information about his thoughts and his organization to be of the highest interest for the rest of humanity.
Not disclosing whatever such information ones comes into contact with, except in case of speeding up potentially even-less-alignment-focused competitors, is a defection against the rest of us.
If this were an off-the-record meeting with a head of state discussing plans for expanding and/or deploying nuclear weapons capabilities, nobody would dare suggest taking it down, inaccuracy and... (read more)
Did he really speak that little about AI Alignment/Safety? Does anyone have additional recollections on this topic?
The only relevant parts so far seem to be these two:
Behavioral cloning probably much safer than evolving a bunch of agents. We can tell GPT to be empathic.
And:
Chat access for alignment helpers might happen.
Both of which are very concerning.
"We can tell GPT to be empathetic" assumes it can be aligned in the first place so you "can tell" it what to do, and "be empathetic" is a very vague description of what a good utility function would be assuming one would be followed at all. Of course it's all in conversational tone, not a formal paper,... (read more)
Very interested in this, especially looking out for how to balance or resolve trade-offs between high inner coordination (people agree fast and completely on actions and/or beliefs) and high "outer" coordination (with reality, i.e. converging fast and strongly on the right things), aka how to avoid echo-chambers/groupthink without devolving into bickering and splintering into factions.
Are astronomical suffering risks (s-risk) considered a subset of existential risks (x-risk) because they "drastically curtail humanity’s potential"? Or is this concern not taken into account for this research program?