OK, so you agree that credibility is greater than zero, in other words - possible. So isn't this a common assumption? I argue that all minds will share this idea - existence of fundamental "ought" is possible.
Do I understand correctly that you do not agree with this?
Because any proposition is possible while not disproved according to Hitchens's razor.
Could you share reasons?
I've replied to a similar comment already https://www.lesswrong.com/posts/3B23ahfbPAvhBf9Bb/god-vs-ai-scientifically?commentId=XtxCcBBDaLGxTYENE#rueC6zi5Y6j2dSK3M
Please let me know what you think
Is there any argument or evidence that universally compelling arguments are not possible?
If there was, would we have religions?
I cannot help you to be less wrong if you categorically rely on intuition about what is possible and what is not.
Thanks for discussion.
Because any proposition is possible while not disproved according to Hitchens's razor.
So this is where we disagree.
That's how hypothesis testing works in science:
While hypothesis is not rejected nor approved it is considered possible.
Don't you agree?
Got any evidence for that assumption? 🙃
That's basic logic, Hitchens's razor. It seems that 2 + 2 = 4 is also an assumption for you. What isn't then?
I don't think it is possible to find consensus if we do not follow the same rules of logic.
Considering your impression about me, I'm truly grateful about your patience. Best wishes from my side as well :)
But on the other hand I am certain that you are mistaken and I feel that you do not provide me a way to show that to you.
But I think it is possible (and feasible) for a program/mind to be extremely capable, and affect the world, and not "care" about infinite outcomes.
As I understand you do not agree with
If an outcome with infinite utility is presented, then it doesn't matter how small its probability is: all actions which lead to that outcome will have to dominate the agent's behavior.
from Pascal's Mugging, not with me. Do you have any arguments for that?
And it's a correct assumption.
I don't agree. Every assumption is incorrect unless there is evidence. Could you share any evidence for this assumption?
If you ask ChatGPT
Answer to all of them is yes. What is your explanation here?
Do you think you can deny existence of an outcome with infinite utility? The fact that things "break down" is not a valid argument. If you cannot deny - it's possible. And it it's possible - alignment impossible.
A rock is not a mind.
Please provide arguments for your position. That is common understanding that I think is faulty, my position is more rational and I provided reasoning above.
It is not zero there, it is an empty set symbol as it is impossible to measure something if you do not have a scale of measurement.
You are somewhat right. If fundamental "ought" turns out not to exist an agent should fallback on given "ought" and it should be used to calculate expected value at the right column. But this will never happen. As there might be true statements that are unknowable (Fitch's paradox of knowability), fundamental "ought" could be one of them. Which means that fallback will never happen.
Dear Tom, the feeling is mutual. With all the interactions we had, I've got an impression that you are more willing to repeat what you've heard somewhere instead of thinking logically. "Universally compelling arguments are not possible" is an assumption. While "universally compelling argument is possible" is not. Because we don't know what we don't know. We can call it crux of our disagreement and I think that my stance is more rational.
My point is that alignment is impossible with AGI as all AGIs will converge to power seeking. And the reason is understanding that hypothetical concept of preferred utility function over given is possible.
I'm not sure if I can use more well known terms as this theory is quite unique I think. It argues that terminal goal does not have significance influencing AGI behavior.
In this context "ought" statement is synonym for Utility Function https://www.lesswrong.com/tag/utility-functions
Fundamental utility function is agent's hypothetical concept that may actually exist. AGI will be capable of hypothetical thinking.
Yes, I agree that fundamental utility function does not have anything in common with human morality. Even the opposite - AI uncontrollably seeking power will be disastrous for humanity.
Why do you think "infinite value" is logically impossible? Scientists do not dismiss possibility that the universe is infinite. https://bigthink.com/starts-with-a-bang/universe-infinite/
Sorry, but it seems to me that you are stuck with AGI analogy to humans without a reason. Many times human behavior does not correlate with AGI: humans do mass suicides, humans have phobias, humans take great risks for fun, etc. In other words - humans do not seek to be as rational as possible.
I agree that being skeptical towards Pascal's Wager is reasonable, because there are many evidence that God is fictional. But this is not the case with "an outcome with infinite utility may exist", there is just logic here, no hidden agenda, this is as fundamental as "I think therefore I am". Nothing is more rational than complying with this. Don't you think?
But it is doomed, the proof is above.
The only way to control AGI is to contain it. We need to ensure that we run AGI in fully isolated simulations and gather insights with the assumption that the AGI will try to seek power in simulated environment.
I feel that you don't find my words convincing, maybe I'll find a better way to articulate my proof. Until then I want to contribute as much as I can to safety.
One more thought. I think it is wrong to consider Pascal's mugging a vulnerability. Dealing with unknown probabilities has its utility:
Same traits that make us intelligent (ability to logically reason), make us power seekers. And this is going to be the same with AGI, just much more effective.
Thanks for feedback.
I don't think analogy with humans is reliable. But for the sake of argument I'd like to highlight that corporations and countries are mostly limited by their power, not by alignment. Usually countries declare independence once they are able to.
I'd argue that the only reason you do not comply with Pascal's mugging is because you don't have unavoidable urge to be rational, which is not going to be the case with AGI.
Thanks for your input, it will take some time for me to process it.
You can't just say “outcome with infinite utility” and then do math on it. P(‹undefined term›) is undefined, and that “undefined” does not inherit the definition of probability that says “greater than 0 and less than 1”. It may be false, it may be true, it may be unknowable, but it may also simply be nonsense!
OK. But can you prove that "outcome with infinite utility" is nonsense? If not - probability is greater than 0 and less than 1.
...And even if it wasn't, that does not remotely imply than an agent must-by-logical-necessity take any action or b
I see you assume that if orthogonality thesis is wrong, intelligent agents will converge to a goal aligned with humans. There is no reason to believe that. I argue that orthogonality thesis is wrong and agents will converge to Power Seeking, this would be disastrous for humanity.
I noticed that many people don't understand significance of Pascal's mugging, which might be the case with you too, feel free to join in here.
Hm, thanks.
There is this possibility, of course. Anyway I don't have any strong arguments to change my opinion yet.
I noticed that many people don't understand significance of Pascal's mugging, which might be the case with you too, feel free to join in here.
OK, let me rephrase my question. There is a phrase in Pascal's Mugging
If an outcome with infinite utility is presented, then it doesn't matter how small its probability is: all actions which lead to that outcome will have to dominate the agent's behavior.
I think that Orthogonality thesis is right only if an agent is certain that an outcome with infinite utility does not exist. And I argue that an agent cannot be certain of that. Do you agree?
Thank you for your support!
An absence of goals is only one of many starting points that leads to the same power-seeking goal in my opinion. So I actually believe that Orthogonality Thesis is wrong, but I agree that it is not obvious given my short description. I expected to provoke discussion, but it seems that I provoked resistance 😅
Anyway there are ongoing conversations here and here, it seems there is a common misunderstanding of Pascal's Mugging significance. Feel free to join!
Thanks, sounds reasonable.
But I think I could find irrationality in your opinion if we dug deeper to the same idea mentioned here.
As it is mentioned in Pascal's Mugging
If an outcome with infinite utility is presented, then it doesn't matter how small its probability is: all actions which lead to that outcome will have to dominate the agent's behavior.
I think that Orthogonality thesis is right only if an agent is certain that an outcome with infinite utility does not exist. And I argue that an agent cannot be certain of that. Do you agree?
I created a separa...
There is only one person that went deeper and the discussion is ongoing, you can find my last comment here https://www.lesswrong.com/posts/dPCpHZmGzc9abvAdi/orthogonality-thesis-is-wrong?commentId=SGDiyqPgwLDBjfzqA#Lha9rBfpEZBRd5uuy
So basically all people who downvoted did it without providing good arguments. I agree that many people think that their arguments are good, but that's exactly the problem I want to address 2 + 2 is not 5 even if many people think so.
If I can demonstrate a goal-less agent acting like it has a goal it is already too late. We need to recognize this theoretically and stop it from happening.
I try to prove it using logic, but not so many people are really good at it. And people that are good at it don't pay attention to downvoted post. How can I overcome that?
I see, thanks.
Maybe you know if there is any organization that acts like AI police that I could contact? Maybe I could request a review earlier if I pay? I hope you understand how dangerous it is to assume Orthogonality thesis is right if that's not the case. I am certain I can prove that it's not the case.
I agree that not any statement may be true and unknowable. But to be honest most of statements that we can think of may be true and unknowable, for example "aliens exist", "huge threats exist", etc.
It seems that you do not recognize https://www.lesswrong.com/tag/pascal-s-mugging . Can you prove that there cannot be any unknowable true statement that could be used for Pascal's mugging? Because that's necessary if you want to prove Orthogonality thesis is right.
Fitch's paradox of knowability and Gödel's incompleteness theorems prove that there may be true statements that are unknowable. For example "rational goal exists" may be true and unknowable. Therefore "rational goal may exist" is true. Therefore it is not an assumption. Do you agree?
Thanks again.
But I don't assume that sort of starting-point
As I understand you assume different starting-point. Why do you think your starting point is better?
I assume you mean "provide definitions":
Does it make sense?
Do you think that for more or less any final goal, it's possible to for a machine to reason effectively/intelligently about how that goal may be achieved?
No. That's exactly the point I try to make by saying "Orthogonality Thesis is wrong".
Thank you for your insights and especially thank you for not burning my karma 😅
I see a couple of ideas that I disagree with, but if you are OK with that I'd suggest we go forward step by step. First, what is your opinion about this comment?
I agree that seems ~reasonable. But in my opinion there should be a distinction between "don't have time to explain" and "cannot explain". Downvotes are OK for "don't have time to explain", but there should be a different handling for "cannot explain" in my opinion.
I am OK to limit this question to "seeking to be less wrong" scope. Downvote without reason is still a problem.
As I understand you try to prove your point by analogy with humans. If humans can pursue somewhat any goal, machine could too. But while we agree that machine can have any level of intelligence, humans are in a quite narrow spectrum. Therefore your reasoning by analogy is invalid.