Contra this post from the Sequences In Eliezer's sequence post, he makes the following (excellent) point: > I can’t find any theorem of probability theory which proves that I should appear ice-cold and expressionless. This debunks the then-widely-held view that rationality is counter to emotions. He then goes on to...
I don't think this is very likely, but a possible path to alignment is formal goal alignment, which is basically the following two step plan: 1. Define a formal goal that robustly leads to good outcomes under heavy optimization pressure 2. Build something that robustly pursues the formal goal you...
When I introduce people to plans like QACI, they often have objections like "How is an AI going to do all of the simulating necessary to calculate this?" or "If our technology is good enough to calculate this with any level of precision, we can probably just upload some humans."...
This is my "object level output" submission for Johannes Mayer's 2024 SPAR Application (the linked doc seems to be reused from the 2023 AISC application). Unless otherwise noted, all quote blocks in this post are from the application question doc. For those of you who aren't Johannes Mayer reading this,...
Epistemic status: Updating on this comment and taking into account uncertainty about my own values, my credence in this post is around 50%. TLDR: Even in worlds where we create an unaligned AGI, it will cooperate acausally with counterfactual FAIs—and spend some percentage of its resources pursuing human values—as long...