Discord: LemonUniverse (lemonuniverse). Reddit: u/Smack-works. About my situation: here.
To me, the initial poll options make no sense without each other. For example, "avoid danger" and "communicate beliefs" don't make sense without each other [in context of society].
If people can't communicate (report epistemic state), "avoid danger" may not help or be based on 100% biased opinions on what's dangerous.
I like communication, so I chose the second option. Even though "communicating without avoiding danger" doesn't make sense either.
Since the poll options didn't make much sense to me, I didn't see myself as "facing alien values" or "fighting off babyeaters". I didn't press the link, because I thought it may "blow up" the site (similar to the previous Petrov's Day) + I wasn't sure it's OK to click, I didn't think my unilateralism would be analogous to Petrov's unilateralism (did Petrov cure anyone's values, by the way?). I decided it's more Petrov-like to not click.
But is AGI (or anything else) related to the lessons of Petrov's Day? That's another can of worms. I think we should update the lessons of the past to fit the future situations. I think it doesn't make much sense to take away from Petrov's Day only lessons about "how to deal with launching nukes".
Another consideration: Petrov did accurately report his epistemic state. Or would have, if it were needed (if it were needed, he would lie to accurately report his epistemic state - "there are no launches"). Or "he accurately non-reported the non-presence of nuclear missiles".
Maybe you should edit the post to add something like this:
My proposal is not about the hardest parts of the Alignment problem. My proposal is not trying to solve theoretical problems with Inner Alignment or Outer Alignment (Goodhart, loopholes). I'm just assuming those problems won't be relevant enough. Or humanity simply won't create anything AGI-like (see CAIS).
Instead of discussing the usual problems in Alignment theory, I merely argue X. X is not a universally accepted claim, here's evidence that it's not universally accepted: [write the evidence here].
...
By focusing on the external legal system, many key problems associated with alignment (as recited in the Summary of Argument) are addressed. One worth highlighting is 4.4, which suggests AISVL can assure alignment in perpetuity despite changes in values, environmental conditions, and technologies, i.e., a practical implementation of Yudkowsky's CEV.
I think the key problems are not "addressed", you just assume they won't exist. And laws are not a "practical implementation of CEV".
Maybe there's a misunderstanding. Premise (1) makes sure that your proposal is different from any other proposal. It's impossible to reject premise (1) without losing the proposal's meaning.
Premise (1) is possible to reject only if you're not solving Alignment but solving some other problem.
I'm arguing for open, external, effective legal systems as the key to AI alignment and safety. I see the implementation/instilling details as secondary. My usage refers to specifying rules/laws/ethics externally so they are available and usable by all intelligent systems.
If an AI can be Aligned externally, then it's already safe enough. It feels like...
Perhaps the most important and (hopefully) actionable recommendation of the proposal is in the conclusion:
"For the future safety and wellbeing of all sentient systems, work should occur in earnest to improve legal processes and laws so they are more robust, fair, nimble, efficient, consistent, understandable, accepted, and complied with." (comment)
Sorry for sounding harsh. But to say something meaningful, I believe you have to argue two things:
I think the post fails to argue both points. I see no argument that instilling laws is distinct enough from instilling values/corrigibility/human semantics in general (1) and that laws actually prevent misalignment (2).
Later I write, "Suggested improvements to law and legal process are mostly beyond the scope of this brief. It is possible, however, that significant technological advances will not be needed for implementing some key capabilities. For example, current Large Language Models are nearly capable of understanding vast legal corpora and making appropriate legal decisions for humans and AI systems (Katz et al., 2023). Thus, a wholesale switch to novel legal encodings (e.g., computational and smart contracts) may not be necessary."
If AI can be just asked to follow your clever idea, then AI is already safe enough without your clever idea. "Asking AI to follow something" is not what Bostrom means by direct specification, as far as I understand.
I like how you explain your opinion, very clear and short, basically contained in a single bit of information: "you're not a random sample" or "this equivalence between 2 classes of problems can be wrong".
But I think you should focus on describing the opinion of others (in simple/new ways) too. Otherwise you're just repeating yourself over and over.
If you're interested, I could try helping to write a simplified guide to ideas about anthropics.
Additionally, this view ignores art consumers, who out-number artists by several orders of magnitude. It seems unfair to orient so much of the discussion of AI art's effects on the smaller group of people who currently create art.
What is the greater framework behind this argument? "Creating art" is one of the most general potentials a human being can realize. With your argument we could justify chopping off every human potential because "there's a greater amount of people who don't care about realizing it".
I think deleting a key human potential (and a shared cultural context) affects the entire society.
A stupid question about anthropics and [logical] decision theories. Could we "disprove" some types of anthropic reasoning based on [logical] consistency? I struggle with math, so please keep the replies relatively simple.
If I reason myself into drinking (reasoning that I have a 90% chance of reward), from the outside it would look as if 10 egoists have agreed (very conveniently, to the benefit of others) to suffer again and again... is it a consistent possibility?
Let's look at actual outcomes here. If every human says yes, 95% of them get to the afterlife. If every human says no, 5% of them get to the afterlife. So it seems better to say yes in this case, unless you have access to more information about the world than is specified in this problem. But if you accept that it's better to say yes here, then you've basically accepted the doomsday argument.
There's a chance you're changing the nature of the situation by introducing Omega. Often "beliefs" and "betting strategy" go together, but here it may not be the case. You have to prove that the decision in the Omega game has any relation to any other decisions.
There's a chance this Omega game is only "an additional layer of tautology" which doesn't justify anything. We need to consider more games. I can suggest a couple of examples.
Game 1:
Omega: There are 2 worlds, one is much more populated than another. In the bigger one magic exists, in the smaller one it doesn't. Would you bet that magic exists in your world? Would you actually update your beliefs and keep that update?
One person can argue it becomes beneficial to "lie" about your beliefs/adopt temporal doublethink. Another person can argue for permanently changing your mind about magic.
Game 2:
Omega: I have this protocol. When you stand on top of a cliff, I give you a choice to jump or not. If you jump, you die. If you don't, I create many perfect simulations of this situation. If you jump in a simulation, you get a reward. Wanna jump?
You can argue "jumping means death, the reward is impossible to get". Unless you have access to true randomness which can vary across perfect copies of the situation. IDK. Maybe "making the Doomsday update beneficially" is impossible.
You did touch on exactly that, so I'm not sure how much my comment agrees with your opinions.
The real question is will H5N1 pandemic happen in the next 5-10 years
2.4%
Sorry for a dumb question, but where do those numbers come from? What reasoning stands behind them? Is it some causal story ("jumping to humans is not that easy"), or priors ("pandemics are unlikely") or some precedent analysis ("it's not the first time a virus infects so much animal types")?
I really lack knowledge about viruses.
...
Can you expand on the this thought ("something can give less specific predictions, but be more general") or reference famous/professional people discussing it? This thought can be very trivial, but it also can be very controversial.
Right now I'm writing a post about "informal simplicity", "conceptual simplicity". It discusses simplicity of informal concepts (concepts not giving specific predictions). I make an argument that "informal simplicity" should be very important a priori. But I don't know if "informal simplicity" was used (at least implicitly) by professional and famous people. Here's as much as I know: (warning, controversial and potentially inaccurate takes!)
Zeno of Elea made arguments basically equivalent to "calculus should exist" and "theory of computation should exist" ("supertasks are a thing") using only the basic math.
The success of neural networks is a success of one of the simplest mechanisms: backpropagation and attention. (Even though they can be heavy on math too.) We observed a complicated phenomenon (real neurons), we simplified it... and BOOM!
Arguably, many breakthroughs in early and late science were sealed behind simple considerations (e.g. equivalence principle), not deduced from formal reasoning. Feynman diagram weren't deduced from some specific math, they came from the desire to simplify.
Some fields "simplify each other" in some way. Physics "simplifies" math (via physical intuitions). Computability theory "simplifies" math (by limiting it to things which can be done by series of steps). Rationality "simplifies" philosophy (by connecting it to practical concerns) and science.
To learn flying, Wright brothers had to analyze "simple" considerations.
Eliezer Yudkowsky influenced many people with very "simple" arguments. Rational community as a whole is a "simplified" approach to philosophy and science (to a degree).
The possibility of a logical decision theory can be deduced from simple informal considerations.
Albert Einstein used simple thought experiments.
Judging by the famous video interview, Richard Feynman likes to think about simple informal descriptions of physical processes. And maybe Feynman talked about "less precise, but more general" idea? Maybe he said that epicycles were more precise, but a heliocentric model was better anyway? I couldn't find it.
Terry Tao occasionally likes to simplify things. (e.g. P=NP and multiple choice exams, Quantum mechanics and Tomb Raider, Special relativity and Middle-Earth and Calculus as “special deals”). Is there more?
Some famous scientists weren't shying away from philosophy (e.g. Albert Einstein, Niels Bohr?, Erwin Schrödinger).
Please, share any thoughts or information relevant to this, if you have any! It's OK if you write your own speculations/frames.