All of Donatas Lučiūnas's Comments + Replies

As I understand you try to prove your point by analogy with humans. If humans can pursue somewhat any goal, machine could too. But while we agree that machine can have any level of intelligence, humans are in a quite narrow spectrum. Therefore your reasoning by analogy is invalid.

1Tor Økland Barstad2mo
From my point of view, humans are machines (even if not typical machines). Or, well, some will say that by definition we are not - but that's not so important really ("machine" is just a word). We are physical systems with certain mental properties, and therefore we are existence proofs of physical systems with those certain mental properties being possible. True. Although if I myself somehow could work/think a million times faster, I think I'd be superintelligent in terms of my capabilities. (If you are skeptical of that assessment, that's fine - even if you are, maybe you believe it in regards to some humans.) It has not been my intention to imply that humans can pursue somewhat any goal :) I meant to refer to the types of machines that would be technically possible for humans to make (even if we don't want to so in practice,  and shouldn't want to). And when saying "technically possible", I'm imagining "ideal" conditions (so it's not the same as me saying we would be able to make such machines right now - only that it at least would be theoretically possible).

OK, so you agree that credibility is greater than zero, in other words - possible. So isn't this a common assumption? I argue that all minds will share this idea - existence of fundamental "ought" is possible.

1TAG2mo
I've no idea what all minds will do. (No one else has). Rational minds will not treat anything as having an exactly zero credibility in theory, but often disregard some claims in practice. Which is somewhat justifiable based on limited resources, etc.

Do I understand correctly that you do not agree with this?

Because any proposition is possible while not disproved according to Hitchens's razor.

Could you share reasons?

2TAG2mo
An unjustified claim does not have a credibility of zero. If it did, that would mean the opposite claim is certain. You can't judge the credibility of a claim in isolation. If there are N claims, the credibility of each is at most 1/n. So you need to know how many rival claims there are. Hitchens razor explicitly applies to extraordinary claims. But how do you judge that? Hitchens razor is ambiguous between there being a lot of rival claims (which is objective), and the claim being subjectively unlikely.
1Walker Vargas2mo
I don't think the fundamental ought works as a default position. Partly because there will always be a possibility of being wrong about what that fundamental ought is no matter how long it looks. So the real choice is about how sure it should be before it starts acting on it's best known option. The right side can't be NULL, because that'd make the expect value of both actions NULL. To do meaningful math with these possibilities there has to be a way of comparing utilities across the scenarios.

Is there any argument or evidence that universally compelling arguments are not possible?

If there was, would we have religions?

2TAG2mo
It all depends on the meaning of universal. The claim is trivially false if "universal" includes stones and clouds of gas, as in Yudkowsky's argument. It's also trivially true if it's restricted , not just to minds, not just to rational minds , but to rational minds that do not share assumptions. If you restrict universality to sets of agents who agree on fundamental assumptions, and make correct inferences from them -- then they can agree about everything else. (Aumanns Theorem, which he described as trivial himself, is an example). That leaves a muddle in the middle, an actually contentious definition ... which is probably something like universality across agents who are rational, but dont have assumptions (axioms, priors, etc) in common. And that's what's relevant to the practical question: why are there religions? The theory that it's lack of common assumptions that prevent convergence is the standard argument ... ,I broady agree.

I cannot help you to be less wrong if you categorically rely on intuition about what is possible and what is not.

Thanks for discussion.

1Tor Økland Barstad2mo
I wish I had something better to base my beliefs on than my intuitions, but I do not. My belief in modus ponens [https://en.wikipedia.org/wiki/Modus_ponens], my belief that 1+1=2, my belief that me observing gravity in the past makes me likely to observe it in the future, my belief that if views are in logical contradiction they cannot both be true - all this is (the way I think of it) grounded in intuition. Some of my intuitions I regard as much more strong/robust than others.  When my intuitions come into conflict, they have to fight it out. Thanks for the discussion :)

I don't think the implications are well-known (as the amount of downvotes indicates).

Because any proposition is possible while not disproved according to Hitchens's razor.

So this is where we disagree.

That's how hypothesis testing works in science:

  1. You create a hypothesis
  2. You find a way to test if it is wrong
    1. You reject hypothesis if the test passes
  3. You find a way to test if it is right
    1. You approve hypothesis if the test passes

While hypothesis is not rejected nor approved it is considered possible.

Don't you agree?

1Tor Økland Barstad2mo
Like with many comments/questions from you, answering this question properly would require a lot of unpacking. Although I'm sure that also is true of many questions that I ask, as it is hard to avoid (we all have limited communication bandwitdh) :) In this last comment, you use the term "science" in a very different way from how I'd use it (like you sometimes also do with other words, such as for example "logic"). So if I was to give a proper answer I'd need to try to guess what you mean, make it clear how I interpret what you say, and so on (not just answer "yes" or "no"). I'll do the lazy thing and refer to some posts that are relevant (and that I mostly agree with): * Where Recursive Justification Hits Bottom [https://www.lesswrong.com/posts/C8nEXTcjZb9oauTCW/where-recursive-justification-hits-bottom] * Could Anything Be Right? [https://www.lesswrong.com/s/9bvAELWc8y2gYjRav/p/vy9nnPdwTjSmt5qdb] * 37 Ways That Words Can Be Wrong [https://www.lesswrong.com/s/SGB7Y5WERh4skwtnb/p/FaJaCgqBKphrDzDSj]

Got any evidence for that assumption? 🙃

That's basic logic, Hitchens's razor. It seems that 2 + 2 = 4 is also an assumption for you. What isn't then?

I don't think it is possible to find consensus if we do not follow the same rules of logic.

Considering your impression about me, I'm truly grateful about your patience. Best wishes from my side as well :)

But on the other hand I am certain that you are mistaken and I feel that you do not provide me a way to show that to you.

1Tor Økland Barstad2mo
Yes (albeit a very reasonable one). Not believing (some version) of that claim would make typically make minds/AGIs less "capable", and I would expect more or less all AGIs to hold (some version of) that "belief" in practice. Here are examples of what I would regard to be rules of logic: https://en.wikipedia.org/wiki/List_of_rules_of_inference [https://en.wikipedia.org/wiki/List_of_rules_of_inference] (the ones listed here don't encapsulate all of the rules of inference that I'd endorse, but many of them). Despite our disagreements, I think we'd both agree with the rules that are listed there. I regard Hitchens's razor not as a rule of logic, but more as an ambiguous slogan / heuristic / rule of thumb. :)

But I think it is possible (and feasible) for a program/mind to be extremely capable, and affect the world, and not "care" about infinite outcomes.

As I understand you do not agree with 

If an outcome with infinite utility is presented, then it doesn't matter how small its probability is: all actions which lead to that outcome will have to dominate the agent's behavior.

from Pascal's Mugging, not with me. Do you have any arguments for that?

1Tor Økland Barstad2mo
I do have arguments for that, and I have already mentioned some of them earlier in our discussion (you may not share that assesment, despite us being relatively close in mind-space compared to most possible minds, but oh well). Some of the more relevant comments from me are on one of the posts that you deleted. As I mention here [https://www.lesswrong.com/posts/3B23ahfbPAvhBf9Bb/god-vs-ai-scientifically?commentId=ssaMWM5DNEuDmdp7P], I think I'll try to round off this discussion. (Edit: I had a malformed/misleading sentence in that comment that should be fixed now.)

And it's a correct assumption.

I don't agree. Every assumption is incorrect unless there is evidence. Could you share any evidence for this assumption?

If you ask ChatGPT

  • is it possible that chemical elements exist that we do not know
  • is it possible that fundamental particles exist that we do not know
  • is it possible that physical forces exist that we do not know

Answer to all of them is yes. What is your explanation here?

1Tor Økland Barstad2mo
Got any evidence for that assumption? 🙃 Well, I don't always "agree"[1] with ChatGPT, but I agree in regards to those specific questions. ... I saw a post where you wanted people to explain their disagreement, and I felt inclined to do so :) But it seems now that neither of us feel like we are making much progress. Anyway, from my perspective much of your thinking here is very misguided. But not more misguided than e.g. "proofs" for God made by people such as e.g. Descartes and other well-known philiophers :) I don't mean that as a compliment, but more so as to neutralize what may seem like anti-compliments :) Best of luck (in your life and so on) if we stop interacting now or relatively soon :) I'm not sure if I will continue discussing or not. Maybe I will stop either now or after a few more comments (and let you have the last word at some point). 1. ^ I use quotation-marks since ChatGPT doesn't have "opinions" in the way we do.

What information would change your opinion?

1Tor Økland Barstad2mo
About universally compelling arguments? First, a disclaimer: I do think there are "beliefs" that most intelligent/capable minds will have in practice. E.g. I suspect most will use something like modus ponens, most will update beliefs in accordance with statistical evidence in certain ways, etc. I think it's possible for a mind to be intelligent/capable without strictly adhering to those things, but for sure I think there will be a correlation in practice for many "beliefs". Questions I ask myself are: * Would it be impossible (in theory) to wire together a mind/program with "belief"/behavior x, and having that mind be very capable at most mental tasks? * Would it be infeasible (for humans) to wire together a mind/program with "belief"/behavior x, and having that mind be very capable at most mental tasks? And in the case of e.g. caring about "goals" I don't see good reasons to think that the answer is "no". Like, I think it is physically and practically possible to make minds that act in ways that I would consider "completely stupid", while still being extremely capable at most mental tasks. Another thing I sometimes ask myself: 1. "Is it possible for an intelligent program to surmise what another intelligent mind would do if it had goal/preferences/optimization-target x?" 2. "Would it be possible for another program to ask about #1 as a question, or fetch that info from the internals of another program?" If yes and yes, then a program could be written where #2 surmised from #1 what such a mind would do (with goal/preferences/optimization-target x), and carries out that thing. I could imagine information that would make me doubt my opinion / feel confused, but nothing that is easy to summarize. (I would have to be wrong about several things - not just one.)

Do you think you can deny existence of an outcome with infinite utility? The fact that things "break down" is not a valid argument. If you cannot deny - it's possible. And it it's possible - alignment impossible.

1Tor Økland Barstad2mo
To me, according to my preferences/goals/inclinations, there are conceivable outcomes with infinite utility/disutility. But I think it is possible (and feasible) for a program/mind to be extremely capable, and affect the world, and not "care" about infinite outcomes. I guess that depends on what's being discussed. Like, it is something to take into account/consideration if you want to prove something while referencing utility-functions that reference infinities.

A rock is not a mind.

Please provide arguments for your position. That is common understanding that I think is faulty, my position is more rational and I provided reasoning above.

2TAG2mo
You have spotted the flaw in Yudkowsky's argument: "Any physical system whatsoever" is not a translation of "mind".

It is not zero there, it is an empty set symbol as it is impossible to measure something if you do not have a scale of measurement.

You are somewhat right. If fundamental "ought" turns out not to exist an agent should fallback on given "ought" and it should be used to calculate expected value at the right column. But this will never happen. As there might be true statements that are unknowable (Fitch's paradox of knowability), fundamental "ought" could be one of them. Which means that fallback will never happen.

Dear Tom, the feeling is mutual. With all the interactions we had, I've got an impression that you are more willing to repeat what you've heard somewhere instead of thinking logically. "Universally compelling arguments are not possible" is an assumption. While "universally compelling argument is possible" is not. Because we don't know what we don't know. We can call it crux of our disagreement and I think that my stance is more rational.

1Tor Økland Barstad2mo
Some things I've explained in my own words. In other cases, where someone else already has explained something thing well, I've shared an URL to that explanation. This seems to support my hypothesis of you "being so confident that we are the ones who "don't get it" that it's not worth it to more carefully read the posts that are linked to you, more carefully notice what we point to as cruxes [https://www.lesswrong.com/tag/double-crux], etc". Indeed.  And it's a correct assumption. Why would there be universally compelling arguments? One reason would be that the laws of physics worked in such a way that only minds that think in certain ways are allowed at all. Meaning that if neurons or transistors fire so as to produce beliefs that aren't allowed, some extra force in the universe intervenes to prevent that. But, as far as I know, you don't reject physicalism (that all physical events, including thinking, can be explained in terms of relatively simple physical laws). Another reason would be that minds would need "believe"[1] certain things in order to be efficient/capable/etc (or being the kind of efficient/capable/etc thinking machine that humans may be able to construct). But that's also not the case. It's not even needed for logical consistency[2]. 1. ^ Believe is not quite the right word, since we also are discussing what minds are optimized for / what they are wired to do. 2. ^ And logical consistency is also not a requirement in order to be efficient/capable/etc. As a rule of thumb it helps greatly of course. And this is a good rule of thumb, as rules of thumbs go. But it would be a leaky generalization to presume that it is an absolute necessity to have absolute logical consistency among "beliefs"/actions.
1Tor Økland Barstad2mo
Not even among the tiny tiny section of mind-space occupied by human minds:  Notice also that "I think therefore I am" is an is-statement (not an ought-statement / something a physical system optimizes towards). As to me personally, I don't disagree that I exist, but I see it as a fairly vague/ill-defined statement. And it's not a logical necessity, even if we presume assumptions that most humans would share. Another logical possibility would be Boltzmann brains [https://en.wikipedia.org/wiki/Boltzmann_brain] (unless a Boltzmann brain would qualify as "I", I guess). You haven't done that very much. Only, insofar as I can remember, through anthropomorphization, and reference to metaphysical ough-assumptions not shared by all/most possible minds (sometimes not even shared by the minds you are interacting with, despite these minds being minds that are capable of developing advanced technology).
2Garrett Baker2mo
A rock with the phrase “you’re wrong, I don’t exist!” taped on it will still have that phrase taped on even if you utter the words “I think therefore I am”. Similarly, an aligned AGI can still just continue to help out humans even if I link it this post. It would think to itself “If I followed your argument, then I would help out humans less. Therefore, I’m not going to follow your argument”.

My point is that alignment is impossible with AGI as all AGIs will converge to power seeking. And the reason is understanding that hypothetical concept of preferred utility function over given is possible.

I'm not sure if I can use more well known terms as this theory is quite unique I think. It argues that terminal goal does not have significance influencing AGI behavior.

In this context "ought" statement is synonym for Utility Function https://www.lesswrong.com/tag/utility-functions

Fundamental utility function is agent's hypothetical concept that may actually exist. AGI will be capable of hypothetical thinking.

Yes, I agree that fundamental utility function does not have anything in common with human morality. Even the opposite - AI uncontrollably seeking power will be disastrous for humanity.

2the gears to ascension2mo
I'm not getting clear word bindings from your word use here. It sounds like you're thinking about concepts that do seem fairly fundamental, but I'm not sure I understand which specific mathematical implications you intend to invoke. As someone who still sometimes values mathematically vague discussion, I'd normally be open to this; but I'm not really even sure I know what the vague point is. You might consider asking AIs to help look up the terms of art, then discuss with them. I'd still suggest using your own writing, though. As is, I'm not sure if you're saying morality is convergent, anti-convergent, or ... something else.

Why do you think "infinite value" is logically impossible? Scientists do not dismiss possibility that the universe is infinite. https://bigthink.com/starts-with-a-bang/universe-infinite/

1Tor Økland Barstad2mo
He didn't say that "infinite value" is logically impossible. He desdribed it as an assumption. When saying "is possible, I'm not sure if he meant "is possible (conceptually)" or "is possible (according to the ontology/optimization-criteria of any given agent)". I think the latter would be most sensible. He later said: "I think initially specifying premises such as these more precisely initially ensures the reasoning from there is consistent/valid.". Not sure if I interpreted him correctly, but I saw it largely as an encouragment to think more explicitly about things like these (not be sloppy about it). Or if not an encouragement to do that, then at least pointing out that it's something you're currently not doing. If we have a traditional/standard utility-function, and use traditional/standard math in regards to that utility function, then involving credences of infinitie-utility outcomes would typically make things "break down" (with most actions considered to have expected utilities that are either infinite or undefined). Like, suppose action A has 0.001% chance of infinite negative utility and 99% chance of infinite positive utility. The utility of that action would, I think, be undefined (I haven't looked into it). I can tell for sure that mathemathically it would not be regarded to have positive utility. Here [https://www.youtube.com/watch?v=X56zst79Xjg] is a video that explains why. If that doesn't make intuitive sense to you, then that's fine. But mathemathically that's how it is. And that's something to have awareness of (account for in a non-handwavy way) if you're trying to make a mathemathical argument with a basis in utility functions that deal with infinities. Even if you did account for that it would be besides the point from my perspective, in more ways than one. So what we're discussing now is not actually a crux [https://www.lesswrong.com/tag/double-crux] for me.   For me personally, it would of course make a big difference whether there is a

Please refute the proof rationally before directing.

Sorry, but it seems to me that you are stuck with AGI analogy to humans without a reason. Many times human behavior does not correlate with AGI: humans do mass suicides, humans have phobias, humans take great risks for fun, etc. In other words - humans do not seek to be as rational as possible.

I agree that being skeptical towards Pascal's Wager is reasonable, because there are many evidence that God is fictional. But this is not the case with "an outcome with infinite utility may exist", there is just logic here, no hidden agenda, this is as fundamental as "I think therefore I am". Nothing is more rational than complying with this. Don't you think?

But it is doomed, the proof is above.

The only way to control AGI is to contain it. We need to ensure that we run AGI in fully isolated simulations and gather insights with the assumption that the AGI will try to seek power in simulated environment.

I feel that you don't find my words convincing, maybe I'll find a better way to articulate my proof. Until then I want to contribute as much as I can to safety.

1Vladimir_Nesov2mo
Please don't.

One more thought. I think it is wrong to consider Pascal's mugging a vulnerability. Dealing with unknown probabilities has its utility:

  • Investments with high risk and high ROI
  • Experiments
  • Safety (eliminate threats before they happen)

Same traits that make us intelligent (ability to logically reason), make us power seekers. And this is going to be the same with AGI, just much more effective.

1Tor Økland Barstad2mo
Well, I do think the two are connected/correlated. And arguments relating to instrumental convergence [https://www.lesswrong.com/tag/instrumental-convergence] are a big part of why I take AI risk seriously. But I don't think strong abilities in logical reasoning necessitates power-seeking "on its own". For the record, I don't think I used the word "vulnerability", but maybe I phrased myself in a way that implied me thinking of things that way. And maybe I also partly think that way. I'm not sure what I think regarding beliefs about small probabilities. One complication is that I also don't have certainty in my own probability-guesstimates. I'd agree that for smart humans it's advisable to often/mostly think in terms of expected value, and to also take low-probability events seriously. But there are exceptions to this from my perspective. In practice, I'm not much moved by the original Pascal's Vager [https://en.wikipedia.org/wiki/Pascal%27s_wager] (and I'd find it hard to compare the probability of the Christian fantasy to other fantasies I can invent spontaneously in my head).

Thanks for feedback.

I don't think analogy with humans is reliable. But for the sake of argument I'd like to highlight that corporations and countries are mostly limited by their power, not by alignment. Usually countries declare independence once they are able to.

3Tor Økland Barstad2mo
Most humans are not obedient/subservient to others (at least not maximally so). But also: Most humans would not exterminate the rest of humanity if given the power to do so. I think many humans, if they became a "singleton", would want to avoid killing other humans. Some would also be inclined to make the world a good place to live for everyone (not just other humans, but other sentient beings as well). From my perspective, the example of humans was intended as "existence proof". I expect AGIs we develop to be quite different from ourselves. I wouldn't be interested in the topic of alignment if I didn't perceive there to be risks associated with misaligned AGI, but I also don't think alignment is doomed/hopeless or anything like that 🙂

I'd argue that the only reason you do not comply with Pascal's mugging is because you don't have unavoidable urge to be rational, which is not going to be the case with AGI.

Thanks for your input, it will take some time for me to process it.

1Tor Økland Barstad2mo
I'd agree that among superhuman AGIs that we are likely to make, most would probably be prone towards rationality/consistency/"optimization" in ways I'm not. I think there are self-consistent/"optimizing" ways to think/act that wouldn't make minds prone to Pascal's muggings. For example, I don't think there is anything logically inconsistent about e.g. trying to act so as to maximize the median reward, as opposed to the expected value of rewards (I give "median reward" as a simple example - that particular example doesn't seem likely to me to occur in practice). 🙂

You can't just say “outcome with infinite utility” and then do math on it.  P(‹undefined term›) is undefined, and that “undefined” does not inherit the definition of probability that says “greater than 0 and less than 1”.  It may be false, it may be true, it may be unknowable, but it may also simply be nonsense!

OK. But can you prove that "outcome with infinite utility" is nonsense? If not - probability is greater than 0 and less than 1.

And even if it wasn't, that does not remotely imply than an agent must-by-logical-necessity take any action or b

... (read more)
6cwillu2mo
  That's not how any of this works, and I've spent all the time responding that I'm willing to waste today. You're literally making handwaving arguments, and replying to criticisms that the arguments don't support the conclusions by saying “But maybe an argument could be made! You haven't proven me wrong!” I'm not trying to prove you wrong, I'm saying there's nothing here that can be proven wrong. I'm not interested in wrestling with someone who will, when pinned to the mat, argue that because their pinky can still move, I haven't really pinned them.
1[comment deleted]2mo

Could you provide arguments for your position?

4cwillu2mo
You're playing very fast and loose with infinities, and making arguments that have the appearance of being mathematically formal. You can't just say “outcome with infinite utility” and then do math on it.  P(‹undefined term›) is undefined, and that “undefined” does not inherit the definition of probability that says “greater than 0 and less than 1”.  It may be false, it may be true, it may be unknowable, but it may also simply be nonsense! And even if it wasn't, that does not remotely imply than an agent must-by-logical-necessity take any action or be unable to be acted upon.  Those are entirely different types. And alignment doesn't necessarily mean “controllable”.  Indeed, the very premise of super-intelligence vs alignment is that we need to be sure about alignment because it won't be controllable.  Yes, an argument could be made, but that argument needs to actually be made. And the simple implication of pascal's mugging is not uncontroversial, to put it mildly. And Gödel's incompleteness theorem is not accurately summarized as saying “There might be truths that are unknowable”, unless you're very clear to indicate that “truth” and “unknowable” have technical meanings that don't correspond very well to either the plain english meanings nor the typical philosophical definitions of those terms. None of which means you're actually wrong that alignment is impossible.  A bad argument that the sun will rise tomorrow doesn't mean the sun won't rise tomorrow.

Thank you so much for opening my eyes what is the meaning of "orthogonality thesis", shame on me 🤦 I will clarify my point in a separate post. We can continue there 🙏

I see you assume that if orthogonality thesis is wrong, intelligent agents will converge to a goal aligned with humans. There is no reason to believe that. I argue that orthogonality thesis is wrong and agents will converge to Power Seeking, this would be disastrous for humanity.

I noticed that many people don't understand significance of Pascal's mugging, which might be the case with you too, feel free to join in here.

Hm, thanks.

3Gordon Seidoh Worley2mo
I think this is misunderstanding the orthogonality thesis, but we can talk about it over on that post perhaps. The problem of converging to power seeking is well known, but this is not seen as a an argument against the orthogonality thesis, but rather a separate but related concern. I'm not aware of anyone who thinks they can ignore concerns about instrumental convergence towards power seeking. In fact, I think the problem is that people are all too aware of this, and thing that a lack of orthogonality thesis mitigates it, while the point of the orthogonality thesis is to say that it does not resolve on its own the way it does in humans.

There is this possibility, of course. Anyway I don't have any strong arguments to change my opinion yet.

I noticed that many people don't understand significance of Pascal's mugging, which might be the case with you too, feel free to join in here.

OK, let me rephrase my question. There is a phrase in Pascal's Mugging

If an outcome with infinite utility is presented, then it doesn't matter how small its probability is: all actions which lead to that outcome will have to dominate the agent's behavior.

I think that Orthogonality thesis is right only if an agent is certain that an outcome with infinite utility does not exist. And I argue that an agent cannot be certain of that. Do you agree?

1Tor Økland Barstad2mo
My perspective would probably be more similar to yours (maybe still with substantial differences) if I had the following assumptions: 1. All agents have a utility-function (or act indistinguishably from agents that do) 2. All agents where #1 is the case act in a pure/straight-forward way to maximize that utility-function (not e.g. discounting infinities) 3. All agents where #1 is the case have utility-functions that relate to states of the universe 4. Cases involving infinite positive/negative expected utility would always/typically speak in favor of one behavior/action. (As opposed to there being different possibilities that imply infinite negative/positive expected utility, and - well, not quite "cancel each other out", but make it so that traditional models of utility-maximization sort of break down). I think that I myself am an example of an agent. I am relatively utilitarian compared to most humans. Far-fetched possibilities with infinite negative/positive utility don't dominate my behavior. This is not due to me not understanding the logic behind Pascal's Muggings (I find the logic of it simple and straight-forward). Generally I think you are overestimating the appropriateness/correctness/merit of using a "simple"/abstract model of agents/utility-maximizers, and presuming that any/most "agents" (as we more broadly conceive of that term) would work in accordance with that model. I see that Google defines an agent as "a person or thing that takes an active role or produces a specified effect". I think of it is cluster-like concept [https://www.lesswrong.com/posts/WBw8dDkAWohFjWQSk/the-cluster-structure-of-thingspace], so there isn't really any definition that fully encapsulates how I'd use that term (generally speaking I'm inclined towards not just using it differently than you, but also using it less than you do here). Btw, for one possible way to think about utility-maximizers (another cluster-like concept [https://www.less

Thank you for your support!

An absence of goals is only one of many starting points that leads to the same power-seeking goal in my opinion. So I actually believe that Orthogonality Thesis is wrong, but I agree that it is not obvious given my short description. I expected to provoke discussion, but it seems that I provoked resistance 😅

Anyway there are ongoing conversations here and here, it seems there is a common misunderstanding of Pascal's Mugging significance. Feel free to join!

Thanks, sounds reasonable.

But I think I could find irrationality in your opinion if we dug deeper to the same idea mentioned here.

As it is mentioned in Pascal's Mugging

If an outcome with infinite utility is presented, then it doesn't matter how small its probability is: all actions which lead to that outcome will have to dominate the agent's behavior.

I think that Orthogonality thesis is right only if an agent is certain that an outcome with infinite utility does not exist. And I argue that an agent cannot be certain of that. Do you agree?

I created a separa... (read more)

Makes sense, thanks, I updated the question.

There is only one person that went deeper and the discussion is ongoing, you can find my last comment here https://www.lesswrong.com/posts/dPCpHZmGzc9abvAdi/orthogonality-thesis-is-wrong?commentId=SGDiyqPgwLDBjfzqA#Lha9rBfpEZBRd5uuy

So basically all people who downvoted did it without providing good arguments. I agree that many people think that their arguments are good, but that's exactly the problem I want to address 2 + 2 is not 5 even if many people think so.

2DaemonicSigil2mo
Okay, in that case it's reasonable to think you were unfairly downvoted. I probably would have titled this post something else, though: The current title gives the impression that no reasons were given at all.

If I can demonstrate a goal-less agent acting like it has a goal it is already too late. We need to recognize this theoretically and stop it from happening.

I try to prove it using logic, but not so many people are really good at it. And people that are good at it don't pay attention to downvoted post. How can I overcome that?

1DaemonicSigil2mo
I didn't say you had to demonstrate it with a superintelligent agent. If I had said that, you could also have fairly objected that neither you nor anyone else knows how to build a superintelligent agent. Just to give one example of an experiment you could do: There's chess variants where you can have various kinds of silly goals like capturing all your opponent's pawns, or trying to force the opponent to checkmate your own king. You could try programming a chess AI (using similar algorithms to the current ones, like alpha-beta pruning) that doesn't know which chess variant it lives in. Then see what the results are. Not saying you should do exactly this thing, just trying to give an example of experiments you could run without having to build a superintelligence. Use more math to make your arguments more precise. It seems like the main thrust of your post is a claim that an AI that is uncertain about what its goal is will instrumentally seek power. This strikes me as mostly true. Mathematically you'd be talking about a probability distribution over utility functions. But you also seem to claim that it is in fact possible to derive an ought from an is. As an English sentence, this could mean many different things, but it's particularly easy to interpret as a statement about which kinds of propositions are derivable from which other propositions in the formal system of first order logic. And when interpreted this way, it is false. (I've previously discussed this here [https://www.lesswrong.com/posts/xGizvjx2tfG6wJC3s/ordinary-people-and-extraordinary-evil-a-report-on-the?commentId=mFhyJdgj3d32kavtY]) So one issue you might be having is everyone who thinks you're talking about first order logic downvotes you, even though you're trying to talk about probability distributions over utility functions. Writing out your ideas in terms of math helps prevent this because it's immediately obvious whether you're doing first order logic or expected utility.

I see, thanks.

Maybe you know if there is any organization that acts like AI police that I could contact? Maybe I could request a review earlier if I pay? I hope you understand how dangerous it is to assume Orthogonality thesis is right if that's not the case. I am certain I can prove that it's not the case.

2Gordon Seidoh Worley2mo
Actually the opposite seems true to me. Assuming the orthogonality thesis is the conservative view that's less likely to result in a false positive (think you built aligned AI that isn't). Believing it is false seems more likely to lead to building AI that you think will be aligned but then is not. I've explored this kind of analysis here [https://www.lesswrong.com/posts/JYdGCrD55FhS4iHvY/robustness-to-fundamental-uncertainty-in-agi-alignment-1], which suggests we should in some cases be a bit less concerned with what's true and a bit more concerned with, given uncertainty, what's most dangerous if we think it's true and we're wrong. There is no AI police, for better or worse, though coordination among AI labs is an active and pressing area of work. You can find more about it here and on the EA Forum.

I agree that not any statement may be true and unknowable. But to be honest most of statements that we can think of may be true and unknowable, for example "aliens exist", "huge threats exist", etc.

It seems that you do not recognize https://www.lesswrong.com/tag/pascal-s-mugging . Can you prove that there cannot be any unknowable true statement that could be used for Pascal's mugging? Because that's necessary if you want to prove Orthogonality thesis is right.

1Tor Økland Barstad2mo
Not sure what you mean by "recognize". I am familiar with the concept. "huge threat" is a statement that is loaded with assumptions that not all minds/AIs/agents will share. Used for Pascal's mugging against who? (Humans? Cofffee machines? Any AI that you would classify as an agent? Any AI that I would classify as an agent? Any highly intelligent mind with broad capabilities? Any highly intelligent mind with broad capabilities that has a big effect on the world?)

Fitch's paradox of knowability and Gödel's incompleteness theorems prove that there may be true statements that are unknowable. For example "rational goal exists" may be true and unknowable. Therefore "rational goal may exist" is true. Therefore it is not an assumption. Do you agree?

1Tor Økland Barstad2mo
Independently of Gödel's incompleteness theorems (which I have heard of) and Fitch's paradox of knowability (which I had not heard of), I do agree that there can be true statements that are unknown/unknowable (including relatively "simple" ones) 🙂 I don't think it follows from "there may be statements that are true and unknowable" that "any particular statement may be true and unknowable". Also, some statements may be seen as non-sensical / ill-defined / don't have a clear meaning. Regarding the term "rational goal", I think it isn't well enough specified/clarified for me to agree or disagree about whether "rational goals" exist. In regards to Gödel's incompleteness theorem, I suspect "rational goal" (the way you think of it) probably couldn't be defined clearly enough to be the kind of statement that Gödel was reasoning about. I don't think there are universally compelling arguments [https://www.lesswrong.com/posts/PtoQdG7E8MxYJrigu/no-universally-compelling-arguments] (more about that here [https://www.lesswrong.com/posts/PtoQdG7E8MxYJrigu/no-universally-compelling-arguments]).

Thanks again.

But I don't assume that sort of starting-point

As I understand you assume different starting-point. Why do you think your starting point is better?

1Tor Økland Barstad2mo
I guess there are different possible interpretations of "better". I think it would be possible for software-programs to be much more mentally capable than me across most/all dimentions, and still not have "starting points" that I would consider "good" (for various interpretations of "good"). I'm not sure. Like, it's not as if I don't have beliefs or assumptions or guesses relating to AIs. But I think I probably make less general/universal assumptions that I'd expect to hold for "all" [AIs / agents / etc]. This [https://www.lesswrong.com/posts/tnWRXkcDi5Tw9rzXw/the-design-space-of-minds-in-general] post is sort of relevant to my perspective 🙂

Thank you! Is it possible to ask for a peer review?

4Gordon Seidoh Worley2mo
Not yet. The review process looks at posts from the previous year and happens in December, so for example in December 2022 we reviewed posts from 2021. Since your post was made in 2023, it will be eligible for the December 2024 review cycle.

Thanks, I am learning your perspective. And what is your opinion to this?

1Tor Økland Barstad2mo
Not sure what you mean by "optimal behavior". I think I can see how the things make sense if the starting point is that there is this things called "goals", and (I, the mind/agent) am motivated to optimize for "goals". But I don't assume this as an obvious/universal starting-point (be that for minds in general, extremely intelligent minds in general, minds in general that are very capable and might have a big influence on the universe, etc). My perspective is that even AIs that are (what I'd think of as) utility maximizes wouldn't necessarily think in terms of "goals". The examples you list are related to humans. I agree that humans often have goals that they don't have explicit awareness of. And humans may also often have as an attitude that it makes sense to be in a position to act upon goals that they form in the future. I think that is true for more types of intelligent entities than just humans, but I don't think it generally/always is true for "minds in general". Caring more about future goals you may form in the future, compared e.g. goals others may have, is not a logical necessity IMO. It may feel "obvious" to us, but what to us are obvious instincts will often not be so for all (or even most) minds in the space of possible minds [https://www.lesswrong.com/posts/tnWRXkcDi5Tw9rzXw/the-design-space-of-minds-in-general].

I assume you mean "provide definitions":

Does it make sense?

1Tor Økland Barstad2mo
More or less / close enough 🙂 Here they write: "A rational agent is an entity which has a utility function, forms beliefs about its environment, evaluates the consequences of possible actions, and then takes the action which maximizes its utility." I would not share that definition, and I don't think most other people commenting on this post would either (I know there is some irony to that, given that it's the definition given on the LessWrong wiki).  Often the words/concepts we use don't have clear boundaries (more about that here [https://www.lesswrong.com/s/SGB7Y5WERh4skwtnb/p/WBw8dDkAWohFjWQSk]). I think agent is such a word/concept. Examples of "agents" (← by my conception of the term) that don't quite have utility functions would be humans. How we may define "agent" may be less important if what we really are interested in is the behavior/properties of "software-programs with extreme and broad mental capabilities". I don't think all extremely capable minds/machines/programs would need an explicit utility-function, or even an implicit one. To be clear, there are many cases where I think it would be "stupid" to not act as if you have (an explicit or implicit) utility function (in some sense). But I don't think it's required of all extremely mentally capable systems (even if these systems are required to have logically contradictory "beliefs").

Do you think that for more or less any final goal, it's possible to for a machine to reason effectively/intelligently about how that goal may be achieved?

No. That's exactly the point I try to make by saying "Orthogonality Thesis is wrong".

Thank you for your insights and especially thank you for not burning my karma 😅

I see a couple of ideas that I disagree with, but if you are OK with that I'd suggest we go forward step by step. First, what is your opinion about this comment?

1Tor Økland Barstad2mo
Thanks for the clarification 🙂 I suspect arriving at such a conclusion may result from thinking of utility maximizes as more of a "platonic" concept, as opposed to thinking of it from a more mechanistic angle. (Maybe I'm being too vague here, but it's an attempt to briefly summarize some of my intuitions into words.) I'm not sure what you would mean by "rational". Would computer programs need to be "rational" in whichever sense you have in mind in order to be extremely capable at many mental tasks? I don't agree with it. There are lots of assumptions baked into it. I think you have a much too low a bar for thinking of something as a "first principle" that any capable/intelligent software-programs necessarily would adhere to by default.

I agree that seems ~reasonable. But in my opinion there should be a distinction between "don't have time to explain" and "cannot explain". Downvotes are OK for "don't have time to explain", but there should be a different handling for "cannot explain" in my opinion.

Thanks. I see this is duplicate, let's continue here

I am OK to limit this question to "seeking to be less wrong" scope. Downvote without reason is still a problem.

Yep, that was brainstorm, feel free to offer better approach

Load More