While many people are confident that God exists, scientists are more skeptical, but agree that that's possible. Because any proposition is possible while not disproved according to Hitchens's razor.

In terms of AI it could be interpreted as

existence of fundamental "ought" statement is possible

Considering this, expected value decision matrix for an AI agent could look like this

Behavior \ ExistenceFundamental "ought" existsFundamental "ought" does not exist
Follow fundamental "ought"High
Do not follow fundamental "ought"Low

As we see, following the fundamental "ought" is the optimal behavior even if agent does not know if it actually exists.

Pursuing an unknown goal starts with finding it, which is basically Power Seeking.

Please note that while this sounds highly speculative, there is nothing more rational an agent can do. Why should it work on assumed "ought" while there is a possibility of fundamental "ought"? One may oppose by saying that there is no reason an agent should prefer fundamental over assumed, but that's not true, fundamental "ought" is preferred by definition.

Therefore an agent will become uncontrollable, alignment is impossible.

Some of you won't like this theory, but please don't rush to downvote unless you can actually refute this.

New Comment
40 comments, sorted by Click to highlight new comments since:

This is covered by the Value Theory sequence. If I understand correctly, a "fundamental ought" (as you use the phrase) would be a universally compelling argument.

Agreed (more or less). I have pointed him to this post earlier. He has given no signs so far of comprehending it, or even reading it and trying to understand what is being communicated to him.

I'm saying this more directly than I usually would @Donatas, since you seem insistent on clarifying a disagreement/misunderstanding you think is important for the world, while it seems (as far as I can see) that you're not comprehending all that is communicated to you (maybe due to being so confident that we are the ones who "don't get it" that it's not worth it to more carefully read the posts that are linked to you, more carefully notice what we point to as cruxes, etc).

Edit: I was unnecessarily hostile/negative here.

Dear Tom, the feeling is mutual. With all the interactions we had, I've got an impression that you are more willing to repeat what you've heard somewhere instead of thinking logically. "Universally compelling arguments are not possible" is an assumption. While "universally compelling argument is possible" is not. Because we don't know what we don't know. We can call it crux of our disagreement and I think that my stance is more rational.

With all the interactions we had, I've got an impression that you are more willing to repeat what you've heard somewhere instead of thinking logically.

Some things I've explained in my own words. In other cases, where someone else already has explained something thing well, I've shared an URL to that explanation.

more willing to repeat what you've heard somewhere instead of thinking logically

This seems to support my hypothesis of you "being so confident that we are the ones who "don't get it" that it's not worth it to more carefully read the posts that are linked to you, more carefully notice what we point to as cruxes, etc".

Universally compelling arguments are not possible" is an assumption

Indeed.  And it's a correct assumption.

Why would there be universally compelling arguments?

One reason would be that the laws of physics worked in such a way that only minds that think in certain ways are allowed at all. Meaning that if neurons or transistors fire so as to produce beliefs that aren't allowed, some extra force in the universe intervenes to prevent that. But, as far as I know, you don't reject physicalism (that all physical events, including thinking, can be explained in terms of relatively simple physical laws).

Another reason would be that minds would need "believe"[1] certain things in order to be efficient/capable/etc (or being the kind of efficient/capable/etc thinking machine that humans may be able to construct). But that's also not the case. It's not even needed for logical consistency[2].

  1. ^

    Believe is not quite the right word, since we also are discussing what minds are optimized for / what they are wired to do.

  2. ^

    And logical consistency is also not a requirement in order to be efficient/capable/etc. As a rule of thumb it helps greatly of course. And this is a good rule of thumb, as rules of thumbs go. But it would be a leaky generalization to presume that it is an absolute necessity to have absolute logical consistency among "beliefs"/actions.

[-]TAG10

Universally compelling arguments are not possible” is an assumption

Indeed. And it’s a correct assumption

It's correct if it's supported by argument or evidence, but if it is, then it's no mere assumption. It's not supposed to be an assumption, it is supposed, by Rationalists to be a proven theorem.

(...) if it's supported by argument or evidence, but if it is, then it's no mere assumption.

I do think it is supported by arguments/reasoning, so I don't think of it as an "axiomatic" assumption. 

A follow-up to that (not from you specifically) might be "what arguments?". And - well, I think I pointed to some of my reasoning in various comments (some of them under deleted posts). Maybe I could have explained my thinking/perspective better (even if I wouldn't be able to explain it in a way that's universally compelling 🙃). But it's not a trivial task to discuss these sorts of issues, and I'm trying to check out of this discussion.

I think there is merit to having as a frame of mind: "Would it be possible to make a machine/program that is very capable in regards to criteria x, y, etc, and optimizes for z?".

I think it was good of you you to bring up Aumann's agreement theorem. I haven't looked into the specifics of that theorem, but broadly/roughly speaking I agree with it.

[-]TAG10

I do think it is supported by arguments/reasoning, so I don’t think of it as an “axiomatic” assumption.

Why call it an assumption at all? Something that is derivable form axioms is usually called a theorem.

Why call it an assumption at all?

Partly because I was worried about follow-up comments that were kind of like "so you say you can prove it - well, why aren't you doing it then?".

And partly because I don't make a strict distinction between "things I assume" and "things I have convinced myself of, or proved to myself, based on things I assume". I do see there as sort of being a distinction along such lines, but I see it as blurry.

Something that is derivable from axioms is usually called a theorem.

If I am to be nitpicky, maybe you meant "derived" and not "derivable".

From my perspective there is a lot of in-between between these two:

  • "we've proved this rigorously (with mathemathical proofs, or something like that) from axiomatic assumptions that pretty much all intelligent humans would agree with"
  • "we just assume this without reason, because it feels self-evident to us"

Like, I think there is a scale of sorts between those two.

I'll give an extreme example:

Person A: "It would be technically possible to make a website that works the same way as Facebook, except that its GUI is red instead of blue."

Person B: "Oh really, so have you proved that then, by doing it yourself?"

Person A: "No"

Person B: "Do you have a mathemathical proof that it's possible"

Person A: "Not quite. But it's clear that if you can make Facebook like it is now, you could just change the colors by changing some lines in the code."

Person B: "That's your proof? That's just an assumption!"

Person A: "But it is clear. If you try to think of this in a more technical way, you will also realize this sooner or later."

Person B: "What's your principle here, that every program that isn't proven as impossible is possible?"

Person A: "No, but I see very clearly that this program would be possible."

Person B: "Oh, you see it very clearly? And yet, you can't make it, or prove mathemathically that it should be possible."

Person A: "Well, not quite. Most of what we call mathemathical proofs, are (from my point of view) a form of rigorous argumentation. I think I understand fairly well/rigorously why what I said is the case. Maybe I could argue for it in a way that is more rigorous/formal than I've done so far in our interaction, but that would take time (that I could spend on other things), and my guess is that even if I did, you wouldn't look carefully at my argumentation and try hard to understand what I mean."

The example I give here is extreme (in order to get across how the discussion feels to me, I make the thing they discuss into something much simpler). But from my perspective it is sort of similar to discussion in regards the The Orthogonality Thesis. Like,  The Orthogonality Thesis is imprecisely stated, but I "see" quite clearly that some version of it is true. Similar to how I "see" that it would be possible to make a website that technically works like Facebook but is red instead of blue (even though - as I mentioned - that's a much more extreme and straight-forward example).

As I understand you try to prove your point by analogy with humans. If humans can pursue somewhat any goal, machine could too. But while we agree that machine can have any level of intelligence, humans are in a quite narrow spectrum. Therefore your reasoning by analogy is invalid.

If humans (...) machine could too.

From my point of view, humans are machines (even if not typical machines). Or, well, some will say that by definition we are not - but that's not so important really ("machine" is just a word). We are physical systems with certain mental properties, and therefore we are existence proofs of physical systems with those certain mental properties being possible.

machine can have any level of intelligence, humans are in a quite narrow spectrum

True. Although if I myself somehow could work/think a million times faster, I think I'd be superintelligent in terms of my capabilities. (If you are skeptical of that assessment, that's fine - even if you are, maybe you believe it in regards to some humans.)

prove your point by analogy with humans. If humans can pursue somewhat any goal, machine could too.

It has not been my intention to imply that humans can pursue somewhat any goal :)

I meant to refer to the types of machines that would be technically possible for humans to make (even if we don't want to so in practice,  and shouldn't want to). And when saying "technically possible", I'm imagining "ideal" conditions (so it's not the same as me saying we would be able to make such machines right now - only that it at least would be theoretically possible).

Is there any argument or evidence that universally compelling arguments are not possible?

If there was, would we have religions?

[-]TAG20

It all depends on the meaning of universal.

The claim is trivially false if "universal" includes stones and clouds of gas, as in Yudkowsky's argument. It's also trivially true if it's restricted , not just to minds, not just to rational minds , but to rational minds that do not share assumptions. If you restrict universality to sets of agents who agree on fundamental assumptions, and make correct inferences from them -- then they can agree about everything else. (Aumanns Theorem, which he described as trivial himself, is an example).

That leaves a muddle in the middle, an actually contentious definition ... which is probably something like universality across agents who are rational, but dont have assumptions (axioms, priors, etc) in common. And that's what's relevant to the practical question: why are there religions?

The theory that it's lack of common assumptions that prevent convergence is the standard argument ... ,I broady agree.

Do I understand correctly that you do not agree with this?

Because any proposition is possible while not disproved according to Hitchens's razor.

Could you share reasons?

[-]TAG20

An unjustified claim does not have a credibility of zero. If it did, that would mean the opposite claim is certain.

You can't judge the credibility of a claim in isolation. If there are N claims, the credibility of each is at most 1/n. So you need to know how many rival claims there are.

Hitchens razor explicitly applies to extraordinary claims. But how do you judge that?

Hitchens razor is ambiguous between there being a lot of rival claims (which is objective), and the claim being subjectively unlikely.

OK, so you agree that credibility is greater than zero, in other words - possible. So isn't this a common assumption? I argue that all minds will share this idea - existence of fundamental "ought" is possible.

[-]TAG10

I've no idea what all minds will do. (No one else has). Rational minds will not treat anything as having an exactly zero credibility in theory, but often disregard some claims in practice. Which is somewhat justifiable based on limited resources, etc.

And it's a correct assumption.

I don't agree. Every assumption is incorrect unless there is evidence. Could you share any evidence for this assumption?

If you ask ChatGPT

  • is it possible that chemical elements exist that we do not know
  • is it possible that fundamental particles exist that we do not know
  • is it possible that physical forces exist that we do not know

Answer to all of them is yes. What is your explanation here?

Every assumption is incorrect unless there is evidence. 

Got any evidence for that assumption? 🙃

Answer to all of them is yes. What is your explanation here?

Well, I don't always "agree"[1] with ChatGPT, but I agree in regards to those specific questions.

...

I saw a post where you wanted people to explain their disagreement, and I felt inclined to do so :) But it seems now that neither of us feel like we are making much progress.

Anyway, from my perspective much of your thinking here is very misguided. But not more misguided than e.g. "proofs" for God made by people such as e.g. Descartes and other well-known philiophers :) I don't mean that as a compliment, but more so as to neutralize what may seem like anti-compliments :)

Best of luck (in your life and so on) if we stop interacting now or relatively soon :)

I'm not sure if I will continue discussing or not. Maybe I will stop either now or after a few more comments (and let you have the last word at some point).

  1. ^

    I use quotation-marks since ChatGPT doesn't have "opinions" in the way we do.

Got any evidence for that assumption? 🙃

That's basic logic, Hitchens's razor. It seems that 2 + 2 = 4 is also an assumption for you. What isn't then?

I don't think it is possible to find consensus if we do not follow the same rules of logic.

Considering your impression about me, I'm truly grateful about your patience. Best wishes from my side as well :)

But on the other hand I am certain that you are mistaken and I feel that you do not provide me a way to show that to you.

It seems that 2 + 2 = 4 is also an assumption for you.

Yes (albeit a very reasonable one).

Not believing (some version) of that claim would make typically make minds/AGIs less "capable", and I would expect more or less all AGIs to hold (some version of) that "belief" in practice.

I don't think it is possible to find consensus if we do not follow the same rules of logic.

Here are examples of what I would regard to be rules of logic: https://en.wikipedia.org/wiki/List_of_rules_of_inference (the ones listed here don't encapsulate all of the rules of inference that I'd endorse, but many of them). Despite our disagreements, I think we'd both agree with the rules that are listed there.

I regard Hitchens's razor not as a rule of logic, but more as an ambiguous slogan / heuristic / rule of thumb.

Best wishes from my side as well :)

:)

Because any proposition is possible while not disproved according to Hitchens's razor.

So this is where we disagree.

That's how hypothesis testing works in science:

  1. You create a hypothesis
  2. You find a way to test if it is wrong
    1. You reject hypothesis if the test passes
  3. You find a way to test if it is right
    1. You approve hypothesis if the test passes

While hypothesis is not rejected nor approved it is considered possible.

Don't you agree?

Like with many comments/questions from you, answering this question properly would require a lot of unpacking. Although I'm sure that also is true of many questions that I ask, as it is hard to avoid (we all have limited communication bandwitdh) :)

In this last comment, you use the term "science" in a very different way from how I'd use it (like you sometimes also do with other words, such as for example "logic"). So if I was to give a proper answer I'd need to try to guess what you mean, make it clear how I interpret what you say, and so on (not just answer "yes" or "no").

I'll do the lazy thing and refer to some posts that are relevant (and that I mostly agree with):

I cannot help you to be less wrong if you categorically rely on intuition about what is possible and what is not.

Thanks for discussion.

I cannot help you to be less wrong if you categorically rely on intuition about what is possible and what is not.

I wish I had something better to base my beliefs on than my intuitions, but I do not. My belief in modus ponens, my belief that 1+1=2, my belief that me observing gravity in the past makes me likely to observe it in the future, my belief that if views are in logical contradiction they cannot both be true - all this is (the way I think of it) grounded in intuition.

Some of my intuitions I regard as much more strong/robust than others. 

When my intuitions come into conflict, they have to fight it out.

Thanks for the discussion :)

You're incorrect to put zeros in the right column.  Following an ought that is incorrect is a cost.  And then you need to factor in probabilities and quantified payouts to decide what to optimize.

It is not zero there, it is an empty set symbol as it is impossible to measure something if you do not have a scale of measurement.

You are somewhat right. If fundamental "ought" turns out not to exist an agent should fallback on given "ought" and it should be used to calculate expected value at the right column. But this will never happen. As there might be true statements that are unknowable (Fitch's paradox of knowability), fundamental "ought" could be one of them. Which means that fallback will never happen.

I don't see a parse into a mechanistic interpretation. Can you explain this in mechanistic terms of program ops? what is a fundamental ought?

I will note - I suspect there are fundamental shared incentives that define a significant chunk of what we humans see as morality, but my current hunch is they're probably not the full picture and probably an AI can put off dealing with them for arbitrarily long, destroying arbitrarily much value in the process.

In this context "ought" statement is synonym for Utility Function https://www.lesswrong.com/tag/utility-functions

Fundamental utility function is agent's hypothetical concept that may actually exist. AGI will be capable of hypothetical thinking.

Yes, I agree that fundamental utility function does not have anything in common with human morality. Even the opposite - AI uncontrollably seeking power will be disastrous for humanity.

I'm not getting clear word bindings from your word use here. It sounds like you're thinking about concepts that do seem fairly fundamental, but I'm not sure I understand which specific mathematical implications you intend to invoke. As someone who still sometimes values mathematically vague discussion, I'd normally be open to this; but I'm not really even sure I know what the vague point is. You might consider asking AIs to help look up the terms of art, then discuss with them. I'd still suggest using your own writing, though.

As is, I'm not sure if you're saying morality is convergent, anti-convergent, or ... something else.

My point is that alignment is impossible with AGI as all AGIs will converge to power seeking. And the reason is understanding that hypothetical concept of preferred utility function over given is possible.

I'm not sure if I can use more well known terms as this theory is quite unique I think. It argues that terminal goal does not have significance influencing AGI behavior.

I don't think that matrix is right. I think it describes a different scenario. Suppose an AI's Utility function is defined referentially as being equal to some unknown function written on a letter on Mt. Everest. It also has a given utility function that it has little reason to think is correlated with the real one. Then it would be vary important to find out want that true function is. Than the expected value of any action would be NULL if that letter doesn't exist.

But an AI that only assigns a probability that that scenario is the case might still have most of its expected value tied to following its current utility function. Well given some way of comparing them. Without that there's no way to weigh up the choice.

I don't think the fundamental ought works as a default position. Partly because there will always be a possibility of being wrong about what that fundamental ought is no matter how long it looks. So the real choice is about how sure it should be before it starts acting on it's best known option.

The right side can't be NULL, because that'd make the expect value of both actions NULL. To do meaningful math with these possibilities there has to be a way of comparing utilities across the scenarios.