All of Thoth Hermes's Comments + Replies

I think this gets it backward. There were lots of optimisitc people that kept not understanding or integrating the arguments that you should be less optimistic, and people kept kinda sliding off really thinking about it, and finally some people were like "okay, time to just actually be extremely blunt/clear until they get it." (Seems plausible that then a polarity formed that made some people really react against the feeling of despair, but I think that was phase two)

But humans are capable of thinking about what their values "actually should be" including whether or not they should be the values evolution selected for (either alone or in addition to other things). We're also capable of thinking about whether things like wireheading are actually good to do, even after trying it for a bit.

We don't simply commit to tricking our reward systems forever and only doing that, for example.

So that overall suggests a level of coherency and consistency in the "coherent extrapolated volition" sense. Evolution enabled CEV without us becoming completely orthogonal to evolution, for example.

A few points here: * We don't have the option to "trick our reward systems forever" - e.g. because becoming a heroin addict tends to be self-destructive. If [guaranteed 80-year continuous heroin high followed by painless death] were an option, many people would take it (though not all). * The divergence between stated preferences and revealed preferences is exactly what we'd expect to see in worlds where we're constantly "tricking our reward system" in small ways: our revealed preferences are not what we think they "actually should be". * We tend to define large ways of tricking our reward systems as those that are highly self-destructive. It's not surprising that we tend to observe few of these, since evolution tends to frown upon highly self-destructive behaviour. * Again, I'd ask for an example of a world plausibly reachable through an evolutionary process where we don't have the kind of coherence and consistency you're talking about. Being completely orthogonal to evolution clearly isn't plausible, since we wouldn't be here (I note that when I don't care about x, I sacrifice x to get what I do care about - I don't take actions that are neutral with respect to x). Being not-entirely-in-line with evolution, and not-entirely-in-line with our stated preferences is exactly what we observe.

Unfortunately, I do not have a long response prepared to answer this (and perhaps it would be somewhat inappropriate, at this time), however I wanted to express the following:

They wear their despair on their sleeves? I am admittedly somewhat surprised by this. 

7Gretta Duleba2mo
"Wearing your [feelings] on your sleeve" is an English idiom meaning openly showing your emotions. It is quite distinct from the idea of belief as attire from Eliezer's sequence post, in which he was suggesting that some people "wear" their (improper) beliefs to signal what team they are on. Nate and Eliezer openly show their despair about humanity's odds in the face of AI x-risk, not as a way of signaling what team they're on, but because despair reflects their true beliefs.

"Up to you" means you can select better criteria if you think that would be better.

I think if you ask people a question like, "Are you planning on going off and doing something / believing in something crazy?", they will, generally speaking, say "no" to that, and that is roughly more likely the more isomorphic your question is to that, even if you didn't exactly word it that way. My guess is that it was at least heavily implied that you meant "crazy" by the way you worded it.

To be clear, they might have said "yes" (that they will go and do the thing you think is crazy), but I doubt they will internally represent that thing or wanting to ... (read more)

That seems to me like an extra reason to keep "throwing stones". To make clear the line between the kind of "crazy" that rationalists enjoy, and the kind of "crazy" that is the opposite. As an insurance, just in the (hopefully unlikely) case that tomorrow Unreal goes on a shooting spree, I would like to have it in writing - before it happened - that it happened because of ideas that the rationalist community disapproves of. Otherwise, the first thing everyone will do is: "see, another rationalist gone crazy". And whatever objection we make afterwards, it will be like "yeah, now that the person is a bad PR, everyone says 'comrades, this is not true rationalism, the true rationalism has never been tried', but previously no one saw a problem with them". (I am exaggerating a lot, of course. Also, this is not a comment on Unreal specifically, just on the value of calling out "crazy" memes, despite being perceived as "crazy" ourselves.)

Sometimes people want to go off and explore things that seem far away from their in-group, and perhaps are actively disfavored by their in-group. These people don't necessarily know what's going to happen when they do this, and they are very likely completely open to discovering that their in-group was right to distance itself from that thing, but also, maybe not. 

People don't usually go off exploring strange things because they stop caring about what's true. 

But if their in-group sees this as the person "no longer caring about truth-seeking," th... (read more)

This both seems like a totally reasonable concern to have, and also missing many of the concerning elements of the thing it's purportedly summarizing, like, you know, suddenly having totally nonsensical beliefs about the world.
4Said Achmiz2mo
On the contrary, there are certain things which people do, in fact, only “explore” seriously if they’ve… “stopped” is a strong term, but, at least, stopped caring about the truth as much. (Or maybe reveal that they never cared as much as they said?) And then, reliably, after “exploring” those things, their level of caring about the truth drops even more. Precipitously, in fact. (The stuff being discussed in the OP is definitely, definitely an example of this. Like, very obviously so, to the point that it seems bizarre to me to say this sort of stuff and then go “I wonder why anyone would think I’m crazy”.)

Not sure how convinced I am by your statement. Perhaps you can add to it a bit more?

What "the math" appears to say is that if it's bad to believe things because someone told it to me "well" then there would have to be some other completely different set of criteria, that has nothing to do with what I think of it, for performing the updates. 

Don't you think that would introduce some fairly hefty problems?

You've never said what you mean by "told well", and indeed have declined to say from the outset, saying only that it is "entirely up to us to decide" what it means. If "told well" means "making sound arguments from verifiable evidence", well, of course one would generally update towards the thing told. If it just means "glibly told as by a used car salesman with ChatGPT whispering in his ear", then no.

I suppose I have two questions which naturally come to mind here:

  1. Given Nate's comment: "This change is in large part an enshrinement of the status quo. Malo’s been doing a fine job running MIRI day-to-day for many many years (including feats like acquiring a rural residence for all staff who wanted to avoid cities during COVID, and getting that venue running smoothly). In recent years, morale has been low and I, at least, haven’t seen many hopeful paths before us." (Bold emphases are mine). Do you see the first bold sentence as being in conflict with the s
... (read more)

2. Why do you see communications as being as decoupled (rather, either that it is inherently or that it should be) from research as you currently do? 

The things we need to communicate about right now are nowhere near the research frontier.

One common question we get from reporters, for example, is "why can't we just unplug a dangerous AI?" The answer to this is not particularly deep and does not require a researcher or even a research background to engage on.

We've developed a list of the couple-dozen most common questions we are asked by the press and ... (read more)

  1. Given Nate's comment: "This change is in large part an enshrinement of the status quo. Malo’s been doing a fine job running MIRI day-to-day for many many years (including feats like acquiring a rural residence for all staff who wanted to avoid cities during COVID, and getting that venue running smoothly). In recent years, morale has been low and I, at least, haven’t seen many hopeful paths before us." (Bold emphases are mine). Do you see the first bold sentence as being in conflict with the second, at all? If morale is low, why do you see that as an indica
... (read more)

Remember that what we decide "communicated well" to mean is up to us. So I could possibly increase my standard for that when you tell me "I bought a lottery ticket today" for example. I could consider this not communicated well if you are unable to show me proof (such as the ticket itself and a receipt). Likewise, lies and deceptions are usually things that buckle when placed under a high enough burden of proof. If you are unable to procure proof for me, I can consider that "communicated badly" and thus update in the other (correct) direction.

"Communicated... (read more)

If I'm not mistaken, if A = "Dagon has bought a lottery ticket this week" and B = Dagon states "A", then I still think p(A | B) > p(A), even if it's possible you're lying. I think the only way it would be less than the base rate p(A) is if, for some reason, I thought you would only say that if it was definitely not the case.

I think, in this context, you should give a lot more weight to the "possible" of my lies.  If someone else had made a similar statement in rebuttal of your thesis, I'd model p(A|B) < p(A), In other contexts, B could even be uncorrelated to truth, due to ignorance or misunderstanding. My primary objection isn't that this is always or even mostly wrong, just that it's a very simplistic model that's incorrect often enough, for reasons that are very instance-specific, that it's a poor heuristic. 

To be deceptive - this is why you would ask me what your intentions are as opposed to just reveal them.

Your intent was ostensibly to show that you could argue for something badly on purpose and my rules would dictate that I update away from my own thesis.

I added an addendum for that, by the way.

The fact that you're being disingenuous is completely clear so that actually works the opposite way you intended.

What was the way I intended?

If you read it a second time and it makes more sense, then yes. 

If you understand the core claims being made, then unless you believe that whether or not something is "communicated well" has no relationship whatsoever with the underlying truth-values of the core claims, if it was communicated well, it should have updated you towards belief in the core claims by some non-zero amount. 

All of the vice-versas are straightforwardly true as well. 

let A = the statement "A" and p(A) be the probability... (read more)

Believing things because someone told them to you "well" makes you a sucker for con men.
I think I need a bit more formal definition of "communicated well" to understand the claim here.  A simple example is "I have purchased a lottery ticket this week".   it is (I hope) pretty unambiguous and hard to misinterpret, thus "clearly communicated". However, you (should) have a pretty high prior that responses to this will contain more deceptive statements than normal conversation.  I DO think you're often correct for complex, highly-entangled propositions - it's easier to state and explain the truth than to make up consistent falsehoods. But that's a generalization, not a mathematical constant.  It depends on the proposition and the communicator whether your core condition holds.  do I 'believe that whether or not something is "communicated well" has no relationship whatsoever with the underlying truth-values of the core claims'? Sometimes, with varying strength of that belief.
Here is an argument why you are completely correct: Was that clear? Update your beliefs in your proposition accordingly.
Thank you for not needlessly using LaTeX.
Does this prove too much? I think you have proved that reading the same argument multiple times should update you each time, which seems unlikely

My take is that they (those who make such decisions of who runs what) are pretty well-informed about these issues well before they escalate to the point that complaints bubble up into posts / threads like these. 

I would have liked this whole matter to have unfolded differently. I don't think this is merely a sub-optimal way for these kinds of issues to be handled, I think this is a negative one. 

I have a number of ideological differences with Nate's MIRI and Nate himself that I can actually point to and articulate, and those disagreements could b... (read more)

It is possible to be a good communicator in some situations (e.g. when you write a blog post) and a bad communicator in other situations (e.g. when someone randomly interrupts you when you were worried about something else). For example, when I talk, I am much less coherent, and my English sucks. If I remember the details correctly (sorry, I am not going to read the entire thread again), this seems like a mistake that could be avoided in the future. -- Someone tried to make Nate happy by telling Kurt to do something for him; Nate didn't ask for any help, but when an attempt was made regardless, he got angry at Kurt because he perceived the help as unreliable, worse than nothing. Kurt was hurt, because this wasn't his idea in the first place, and he tried to communicate a problem with his task, unsuccessfully. -- I think a possible lesson is to just leave Nate alone, unless he explicitly asks for help, and even then think twice whether you chose the right person for the job. And maybe have someone managing your employees, whom they can ask for advice, if needed. (Yes, I would prefer if Nate just magically stopped being angry at people who are trying to help, even if he is not satisfied with the outcome. But it is not wise to rely on magic to happen.) More meta, when people have a bad experience with Nate (or anyone else), don't ignore that fact. Stop and think about the situation. If people felt hurt interacting with me, I would want to know it, get some advice how to prevent this outcome, and if the advice doesn't feel actionable then at least how to avoid such people and/or situations. It doesn't necessarily mean that someone is a bad person, sometimes people just rub each other the wrong way, but in such case there should be an option to avoid each other.

In the sense that the Orthogonality Thesis considers goals to be static or immutable, I think it is trivial.

I've advocated a lot for trying to consider goals to be mutable, as well as value functions being definable on other value functions. And not just that it will be possible or a good idea to instantiate value functions this way, but also that they will probably become mutable over time anyway.

All of that makes the Orthogonality Thesis - not false, but a lot easier to grapple with, I'd say.

In large part because reality "bites back" when an AI has false beliefs, whereas it doesn't bite back when an AI has the wrong preferences.

I saw that 1a3orn replied to this piece of your comment and you replied to it already, but I wanted to note my response as well. 

I'm slightly confused because in one sense the loss function is the way that reality "bites back" (at least when the loss function is negative). Furthermore, if the loss function is not the way that reality bites back, then reality in fact does bite back, in the sense that e.g., if I have... (read more)

Getting a shape into the AI's preferences is different from getting it into the AI's predictive model.  MIRI is always in every instance talking about the first thing and not the second.

Why would we expect the first thing to be so hard compared to the second thing? If getting a model to understand preferences is not difficult, then the issue doesn't have to do with the complexity of values. Finding the target and acquiring the target should have the same or similar difficulty (from the start), if we can successfully ask the model to find the target fo... (read more)

Does "it's own perspective" mean it already has some existing values?

Why would we expect the first thing to be so hard compared to the second thing?

In large part because reality "bites back" when an AI has false beliefs, whereas it doesn't bite back when an AI has the wrong preferences. Deeply understanding human psychology (including our morality), astrophysics, biochemistry, economics, etc. requires reasoning well, and if you have a defect of reasoning that makes it hard for you to learn about one of those domains from the data, then it's likely that you'll have large defects of reasoning in other domains as well.

The same... (read more)

I have to agree that commentless downvoting is not a good way to combat infohazards. I'd probably take it a step further and argue that it's not a good way to combat anything, which is why it's not a good way to combat infohazards (and if you disagree that infohazards are ultimately as bad as they are called, then it would probably mean it's a bad thing to try and combat them). 

Its commentless nature means it violates "norm one" (and violates it much more as a super-downvote).  

It means something different than "push stuff that's not that, up", w... (read more)

It's a priori very unlikely that any post that's clearly made up of English sentences actually does not even try to communicate anything.

My point is that basically, you could have posted this as a comment on the post instead of it being rejected.

Whenever there is room to disagree about what mistakes have been made and how bad those mistakes are, it becomes more of a problem to apply an exclusion rule like this.

There's a lot of questions here: how far along the axis to apply the rule, which axis or axes are being considered, and how harsh the application of... (read more)

It was a mistake to reject this post. This seems like a case where both the rule that was applied is a mis-rule, as well as that it was applied inaccurately - which makes the rejection even harder to justify. It is also not easy to determine which "prior discussion" is being referred to by the rejection reasons.

It doesn't seem like the post was all? Let alone "overly political" which I think is perhaps kind of mind-killy be applied frequently as a reason for rejection. It also is about a subject that is fairly interesting to me, at least: Se... (read more)

I have read that post, and here are my thoughts: 1. The essence of the post is only in one section of seven: "Exploring Nuances: Case Studies of Evolving Portrayals". 2. Related work descriptions could be fit into one sentence for each work, to make reading the report easier. 3. Sentences about relevance of work, being pivotal step in something, etc don't carry much meaning. 4. The report doesn't state what to anticipate; what [social] observations can one predict better after reading it. Overall, the post doesn't look like it tries to communicate anything, and it's adapted to formal vague style.

You write in an extremely fuzzy way that I find hard to understand.

This does. This is a type of criticism that one can't easily translate into an update that can be made to one's practice. You're not saying if I always do this or just in this particular spot, nor are you saying whether it's due to my "writing" (i.e. style) or actually using confused concepts. Also, it's usually not the case that anyone is trying to be worse at communicating, that's why it sounds like a scold.

You have to be careful using blanket "this is false" or "I can't understand any of... (read more)

It is probably indeed a crux but I don't see the reason for needing to scold someone over it.

(That's against my commenting norms by the way, which I'll note that so far you, TAG, and Richard_Kennaway have violated, but I am not going to ban anyone over it. I still appreciate comments on my posts at all, and do hope that everyone still participates. In the olden days, it was Lumifer that used to come and do the same thing.)

I have an expectation that people do not continually mix up critique from scorn, and please keep those things separate as much as possib... (read more)

What makes you say I'm scolding you?

First, a question, am I correct in understanding that when you write ~(A and ~A), the first ~ is a typo and you meant to write A and ~A (without the first ~)? Because  is a tautology and thus maps to true rather than to false.

I thought of this shortly before you posted this response, and I think that we are probably still okay (even though strictly speaking yes, there was a typo). 

Normally we have that ~A means: ~A --> A --> False. However, remember than I am now saying that we can no longer say that "~A" means that "A is False.... (read more)

You write in an extremely fuzzy way that I find hard to understand. This is plausibly related to the motivation for your post; I think you are trying to justify why you don't need to make your thinking crisper? But if so I think you need to focus on it from the psychology/applications/communication angle rather than from the logic/math angle, as that is more likely to be a crux.

Well, to use your "real world" example, isn't that just the definition of a manifold (a space that when zoomed in far enough, looks flat)?

I think it satisfies the either-or-"mysterious third thing" formulae.

~(Earth flat and earth ~flat) --> Earth flat (zoomed in) or earth spherical (zoomed out) or (earth more flat-ish the more zoomed in and vice-versa).

First, a question, am I correct in understanding that when you write ~(A and ~A), the first ~ is a typo and you meant to write A and ~A (without the first ~)? Because ¬(A∧¬A) is a tautology and thus maps to true rather than to false. Secondly, it seems to me that you'd have to severely mutilate your logic to make this nontrivial. For instance, rather than going by your relatively elaborate route, it seems like a far simpler route would be Earth flat and earth ~flat => Earth flat => Earth flat or Earth spherical or .... Of course this sort of proof doesn't capture the paradoxicalness that you are aiming to capture. But in order for the proof to be invalid, you'd have to invalidate one of (A∧B)⟹A and A⟹(A∨B), both of which seem really fundamental to logic. I mean, what do the operators "and" and "or" even mean, if they don't validate this?

So suppose I have ~(A and ~A). Rather than have this map to False, I say that "False" is an object that you always bounce off of; It causes you to reverse-course, in the following way:

~(A and ~A) --> False --> A or ~A or (some mysterious third thing). What is this mysterious third thing? Well, if you insist that A and ~A is possible, then it must be an admixture of these two things, but you'd need to show me what it is for that to be allowed. In other words:

~(A and ~A) --> A or ~A or (A and ~A).

What this statement means in semantic terms is: Suppo... (read more)

You should use a real-world example, as that would make the appropriate logical tools clearer.

I give only maybe a 50% chance that any of the following adequately addresses your concern. 

I think the succinct answer to your question is that it only matters if you happened to give me, e.g., a "2" (or anything else) and you asked me what it was and gave me your {0,1} set. In other words, you lose the ability to prove that 2 is 1 because it's not 0, but I'm not that worried about that.

It appears to be commonly said (see the last paragraph of "Mathematical Constructivism"), that proof assistants like Agda or Coq rely on not assuming LoEM. I think th... (read more)

Also if you are getting into proof assistants then you should probably be aware that they use the term "truth-values" in a different way than the rest of math. In the rest of math, truth-values are an external thing based on the relationship between a statement and the domain the statement is talking about. However, in proof assistants, truth-values are often used to refer to the internal notion of subsets of a one-element set; P({1}). So while it is not equivalent to excluded middle to say that either a statement is true or a statement is false, it is equivalent to excluded middle to say that a subset of {1} is either Ø or is {1}. The logic being that if you have some proposition P, you could form the subset S={P|x in {1}}, and if P then by definition S is {1} and if not P then by definition S=Ø, so if P or not P then S={1} or S=Ø.
Not clear what you mean by "because proof assistants rely on the principle of "you can't prove something false, only true". There's a sense in which all math relies on this principle, and therefore proof assistants also rely on it. But proof assistants don't rely on it more than other math does. If you make inconsistent assumptions within Agda or Coq, you can prove False, just as in any other math. And they follow the principle of explosion. But yes, proof assistants often reject the law of excluded middle. They generally do so in order to obtain two properties known as the disjunction property and the existence property. The disjunction property says that if P∨Q is provable, then either P is provable or Q is provable. The existence property says that if ∃x.P(x) is provable, then there is an expression t such that P(t) is provable. These properties reflect the fact that proofs in Agda and Coq carry a computational meaning, so one can "run" the proofs to obtain additional information about what was proven. One cannot have both the disjunction property and the law of excluded middle, because together they imply that the logic is complete, and consistent logics capable of expressing arithmetic cannot be complete by Gödel's incompleteness theorems.

I really don't think I can accept this objection. They are clearly considered both of these, most of the time.

I would really prefer that if you really want to find something to have a problem with, first it's got to be true, then it's got to be meaningful.

I created this self-referential market on Manifold to test the prediction that the truth-value of such a paradox is in fact 1/2. Very few participated, but I think it should always resolve to around 50%. Rather than say such paradoxes are meaningless, I think they can be meaningfully assigned a truth-value of 1/2.

How do you intend to use the law of excluded middle in a three-value logic (0, 1/2, 1)? I though the entire purpose was to make statements like "if X is not 0, it must be 1", which now becomes "if X is not 0, it must be 1/2 or 1". So you lose the ability to prove indirectly that something is 1.

what I think is "of course there are strong and weak beliefs!" but true and false is only defined relative to who is asking and why (in some cases), so you need to consider the context in which you're applying LoEM.

Like in my comment to Richard_Kennaway about probability, I am not just talking about beliefs, but about what is. Do we take it as an axiom or a theorem that A or ~A? Likewise for ~(A and ~A)? I admit to being confused about this. Also, does "A" mean the same thing as "A = True"? Does "~A" mean the same thing as "A = False"? If so, in what sense... (read more)

It's really hard to answer these sorts of questions universally because there's a bunch of ways of setting up things that are strictly speaking different but which yields the same results overall. For instance, some take ¬P to be a primitive notion, whereas I am more used to defining ¬P to mean P⟹⊥. However, pretty much always the inference rules or axioms for taking ¬P to be a primitive are set up in such a way that it is equivalent to P⟹⊥. If you define it that way, the law of noncontradiction becomes (P∧(P⟹⊥))⟹⊥ is pretty trivial, because it is just a special case of (P∧(P⟹Q))⟹Q, and if you don't have the rule (P∧(P⟹Q))⟹Q then it seems like your logic must be extremely limited (since it's like an internalized version of modus ponens, a fundamental rule of reasoning). I have a bunch of experience dealing with logic that rejects the law of excluded middle, but while there are a bunch of people who also experiment with rejecting the law of noncontradiction, I haven't seen anything useful come of it, I think because it is quite fundamental to reasoning. Statements like A=⊥ are kind of mixing up the semantic (or "meta") level with the syntactic (or "object") level. G2g
Are you aware that there are different logics with different axioms? PMC amd LEM are both axioms in Aristotelean logic: LEM does not apply in fuzzy logic; and PNC does not apply in paraconsistent logic. No: negation is a function, false is a value. That's clear in most programming languages. That doesn't follow at all.

A succinct way of putting this would be to ask: If I were to swap the phrase "law of the excluded middle" in the piece for the phrase "principle of bivalence" how much would the meaning of it change as well as overall correctness?

Additionally, suppose I changed the phrases in just "the correct spots." Does the whole piece still retain any coherence?

Actually, here's something that may be helpful in understanding why the principle of bivalence is distinct from the law of excluded middle: As I understand you, one of the core points you are making is that you want to be able to entertain incompatible models. So let's say that you have two models M and W that are incompatible with each other. For simplicity, let's say both models share a language L in which they can express propositions, and assign truth values to statements in that language using functions vM,vW mapping statements to truth-values. (For instance, maybe M is a flat-earth approximation, and W is a spherical-earth approximation, so vM(The earth is flat)=⊤ but vW(The earth is flat)=⊥.) Because these are just models, your points don't apply within the models; it might be fine for an approximation to say that everything is true or false, as long as we keep in mind that it's just an approximation and different approximations might lead to different results. As a result, all of the usual principles of logic like bivalence, noncontradiction, excluded middle, etc. apply within the models. However, outside/between the models, there is a sense that what you are saying applies. For instance we get an apparent contradiction/multiple truth values for vM(The earth is flat)=⊤ vs vW(The earth is flat)=⊥. But these truth values live in separate models, so they don't really interact, and therefore aren't really a contradiction. But you might want to have a combined model where they do interact. We can do this simply by using the 2n-valued approach I mentioned in my prior comment. Define a shared model M×W by the truth-value function vM×W(P)=vM(P)vw(P). So for instance vM×W(The earth is flat)=⊤⊥. Here you might interpret ⊤⊥ as meaning something along the lines of "true in practice but not in theory" or "true for small things but not for big things", and you might interpret ⊥⊤ as meaning something along the lines of "technically true but not in practice" or "true
A lot. At least I associate multi-valued logic with a different sphere of research than intuitionism. My impression is that a lot of people have tried to do interesting stuff with multi-valued logic to make it handle the sorts of things you mention, and they haven't made any real progress, so I would be inclined to say that it is a dead-end. Though arguably objections like "It introduces the concept of “actually true” and “actually false” independent of whether or not we’ve chosen to believe something." also apply to multi-valued logic so idk to what extent this is even the angle you would go on it.

If there are propositions or axioms that imply each other fairly easily under common contextual assumptions, then I think it's reasonable to consider it not-quite-a-mistake to use the same name for such propositions.

One of the things I'm arguing is that I'm not convinced that imprecision is enough to render a work "false."

Are you convinced those mistakes are enough to render this piece false or incoherent?

That's a relevant question to the whole point of the post, too.

What equivalences do you have in mind when you say "imply each other"? It's certainly true that there is a scientific/logical conundrum about how to deal with imprecision. I know a lot about what happens when you tinker with the law of excluded middle, though, and I am not convinced this has any impact on your ability to deal with imprecision.

Indeed. (You don't need to link the main wiki entry, thanks.)

There's some subtlety though. Because either P might be true or not P, and p(P) expresses belief that P is true. So I think probability merely implies that the LoEM might be unnecessary, but it itself pretty much assumes it.

It is sometimes, but not always the case, that p(P) = 0.5 resolves to P being "half-true" once observed. It also can mean that P resolves to true half the time, or just that we only know that it might be true with 0.5 certainty (the default meaning).

The issue that I'm primarily talking about is not so much in the way that errors are handled, it's more about the way of deciding what constitutes an exception to a general rule, as Google defines the word "exception":

a person or thing that is excluded from a general statement or does not follow a rule.

In other words, does everything need a rule to be applied to it? Does every rule need there to be some set of objects under which the rule is applied that lie on one side of the rule rather than the other (namely, the smaller side)? 

As soon as we step o... (read more)

Exceptions in programming aren't "exceptions to a rule" , they are "potential problems".

Raemon's comment below indicates mostly what I meant by: 

It seems from talking to the mods here and reading a few of their comments on this topic that they tend to learn towards them being harmful on average and thus need to be pushed down a bit.

Furthermore, I think the mods' stance on this is based primarily on Yudkowsky's piece here. I think the relevant portion of that piece is this (emphases mine):

But into this garden comes a fool, and the level of discussion drops a little—or more than a little, if the fool is very prolific in their posting. 

... (read more)

Both views seem symmetric to me:

  1. They were downvoted because they were controversial (and I agree with it / like it).
  2. They were downvoted because they were low-quality (and I disagree with it / dislike it).

Because I can sympathize with both views here, I think we should consider remaining agnostic to which is actually the case.

It seems like the major crux here is whether we think that debates over claim and counter-claim (basically, other cruxes) are likely to be useful or likely to cause harm. It seems from talking to the mods here and reading a few of... (read more)

This is, as far as I can tell, totally false.  There is a very different claim one could make which at least more accurately represents my opinion, i.e. see this comment by John Wentworth (who is not a mod). Most of your comment seems to be an appeal to modest epistemology.  We can in fact do better than total agnosticism about whether some arguments are productive or not, and worth having more or less of on the margin.

It seems like a big part of this story is mainly about people who have relatively strict preferences kind of aggressively defending their territory and boundaries, and how when you have multiple people like this working together on relatively difficult tasks (like managing the logistics of travel), it creates an engine for lots of potential friction. 

Furthermore, when you add the status hierarchy of a typical organization, combined with the social norms that dictate how people's preferences and rights ought to be respected (and implicit agreements bei... (read more)

I think it might actually be better if you just went ahead with a rebuttal, piece by piece, starting with whatever seems most pressing and you have an answer for.

I don't know if it is all that advantageous to put together a long mega-rebuttal post that counters everything at once.

Then you don't have that demand nagging at you for a week while you write the perfect presentation of your side of the story.

I think it would be difficult to implement what you're asking for without needing to make the decision about whether investing time in this (or other) subjects is worth anyone's time on behalf of others.

If you notice in yourself that you have conflicting feelings about whether something is good for you to be doing, e.g., in the sense which you've described: that you feel pulled in by this, but have misgivings about it, then I recommend considering this situation to be that you have uncertainty about what you ought to be doing, as opposed to being more cert... (read more)

It seems plausible that there is no such thing as "correct" metaphilosophy, and humans are just making up random stuff based on our priors and environment and that's it and there is no "right way" to do philosophy, similar to how there are no "right preferences".

We can always fall back to "well, we do seem to know what we and other people are talking about fairly often" whenever we encounter the problem of whether-or-not a "correct" this-or-that actually exists. Likewise, we can also reach a point where we seem to agree that "everyone seems to agree that o... (read more)

If we permit that moral choices with very long-term time horizons can be made with the upmost well-meaning intentions and show evidence of admirable character traits, but nevertheless have difficult-to-see consequences with variable outcomes, then I think that limits us considerably in how much we can retrospectively judge specific individuals.

I agree with that principle, but how is that relevant here? The Manhattan Project's effects weren't on long timelines.

I wouldn't aim to debate you but I could help you prepare for it, if you want. I'm also looking for someone to help me write something about the Orthogonality Thesis and I know you've written about it as well. I think there are probably things we could both add to each other's standard set of arguments.

I think that I largely agree with this post. I think that it's also a fairly non-trivial problem. 

The strategy that makes the most sense to me now is that one should argue with people as if they meant what they said, even if you don't currently believe that they do. 

But not always - especially if you want to engage with them on the point of whether they are indeed acting in bad faith, and there comes a time when that becomes necessary. 

I think pushing back against the norm that it's wrong to ever assume bad faith is a good idea. I don't thin... (read more)

I think your view involves a bit of catastrophizing, or relying on broadly pessimistic predictions about the performance of others. 

Remember, the "exception throwing" behavior involves taking the entire space of outcomes and splitting it into two things: "Normal" and "Error." If we say this is what we ought to do in the general case, that's basically saying this binary property is inherent in the structure of the universe. 

But we know that there's no phenomenon that can be said to actually be an "error" in some absolute, metaphysical sense. This ... (read more)

I think it works in the specific context of programming because for a lot of functions (in the functional context for simplicity), behaviours are essentially bimodal distributions. They are rather well behaved for some inputs, and completely misbehaving (according to specification) for others. In the former category you still don't have perfect performance; you could have quantisation/floating-point errors, for example, but it's a tightly clustered region of performing mostly to-spec. In the second, the results would almost never be just a little wrong; instead, you'd often just get unspecified behaviour or results that aren't even correlated to the correct one. Behaviours in between are quite rare. If you were right, we'd all be hand-optimising assembly for perfect high performance in HPC. Ultimately, many people do minimal work to accomplish our task, sometimes to the detriment of the task at hand. I believe that I'm not alone in this thinking, and you'd need quite a lot of evidence to convince others. Look at the development of languages over the years, with newer languages (Rust, Julia, as examples) doing their best to leave less room for user errors and poor practices that impact both performance and security. 

This is a good reply, because its objections are close to things I already expect will be cruxes. 

If you need a strong guarantee of correctness, then this is quite important. I'm not so sure that this is always the case in machine learning, since ML models by their nature can usually train around various deficiencies;

Yeah, I'm interested in why we need strong guarantees of correctness in some contexts but not others, especially if we have control over that aspect of the system we're building as well. If we have choice over how much the system itself c... (read more)

This would make sense if we are all great programmers who are perfect. In practice, that's not the case, and from what I hear from others not even in FAANG. Because of that, it's probably much better to give errors that will show up loudly in testing, than to rely on programmers to always handle silent failures or warnings on their own. Sometimes years or decades. See the replicability crisis in psychology that's decades in the making, and the Schron scandal that wasted years of some researchers time, just for the first two examples off the top of my head. You have a cartoon picture of experimental science. LK-99 is quite unique in that it is easy to synthesise, and the properties being tested are easy to test. When you're on the cutting edge, this is almost by necessity not the case, because most of the time the low-hanging fruit has been picked clean. Thus, experiments are messy and difficult, and when you fail to replicate, it is sometimes very hard to tell if it is due to your failure to reproduce the conditions (eg. synthesise a pure-enough material, have a clean enough experiment, etc.)  For a dark matter example, see DAMA/Libra. Few in the dark matter community take their result too seriously, but the attempts to reproduce this experiment has taken years and cost who knows how much, probably tens of millions. I am a dark matter experimentalist. This is not a good analogy. The issue is not replication, but that results get built on; when that result gets overturned, a whole bunch of scaffolding collapses. Ruling out parameter space is good, you're searching for things like dark matter. Having to keep looking at old theories is quite different; what are you searching for?

Let's try and address the thing(s) you've highlighted several times across each of my comments. Hopefully, this is a crux that we can use to try and make progress on:

"Wanting to be happy" is pretty much equivalent to being a utility-maximizer, and agents that are not utility-maximizers will probably update themselves to be utility-maximizers for consistency. 

because they are compatible with goals that are more likely to shift.

it makes more sense to swap the labels "instrumental" and "terminal" such that things like self-preservation, obtaining resourc

... (read more)

Apologies if this reply does not respond to all of your points.

I would observe that partial observability makes answering this question extraordinarily difficult. We lack interpretability tools that would give us the ability to know, with any degree of certainty, whether a set of behaviors are an expression of an instrumental or terminal goal.

I would posit that perhaps that points to the distinction itself being both too hard as well as too sharp to justify the terminology used in the way that they currently are. An agent could just tell you whether a spec... (read more)

No, instead I'm trying to point out the contradiction inherent in your position... On the one hand, you say things like this, which would be read as "changing an instrumental goal in order to better achieve a terminal goal" And on the other you say Even in your "we would be happier if we chose to pursue different goals" example above, you are structurally talking about adjusting instrumental goals to pursue the terminal goal of personal happiness. AIs can be designed to reason in many ways... but some approaches to reasoning are brittle and potentially unsuccessful. In order to achieve a terminal goal, when the goal cannot be achieved in a single step, an intelligence must adopt instrumental goals. Failing to do so results in ineffective pursuit of terminal goals. It's just structurally how things work (based on everything I know about the instrumental convergence theory. That's my citation.) But... per the Orthogonality Thesis, it is entirely possible to have goalless agents. So I don't want you to interpret my narrow focus on what I perceive as self-contradictory in your explanation as the totality of my belief system. It's just not especially relevant to discuss goalless systems in the context of defining instrumental vs terminal goal systems. The reason I originally raised the Orthogonality Thesis was to rebut the assertion that an agent would be self aware of its own goals. But per the Orthogonality Thesis, it is possible to have a system with goals, but not be particularly intelligent. From that I intuit that it seems reasonable that if the system isn't particularly intelligent, it might also not be particularly capable at explaining its own goals. Some people might argue that the system can be stupid and yet "know its goals"... but given partial observability principals, I would be very skeptical that we would be able to know its goals given partial observability, limited intelligence and limited ability to communicate "what it knows."

My understanding of the difference between a "terminal" and "instrumental" goal is that a terminal goal is something we want, because we just want it. Like wanting to be happy.

One question that comes to mind is, how would you define this difference in terms of properties of utility functions? How does the utility function itself "know" whether a goal is terminal or instrumental?

One potential answer - though I don't want to assume just yet that this is what anyone believes - is that the utility function is not even defined on instrumental goals, in other wo... (read more)

I would observe that partial observability makes answering this question extraordinarily difficult. We lack interpretability tools that would give us the ability to know, with any degree of certainty, whether a set of behaviors are an expression of an instrumental or terminal goal. Likewise, I would observe that the Orthogonality Thesis proposes the possibility of an agent which a very well defined goal but limited in intelligence-- it is possible for an agent to have a very well defined goal but not be intelligent enough to be able to explain its own goals. (Which I think adds an additional layer of difficulty to answering your question.) But the inability to observe or differentiate instrumental vs terminal goals is very clearly part of the theoretical space proposed by experts with way more experience than I. (And I cannot find any faults in the theories, nor have I found anyone making reasonable arguments against these theories.) There are several assumptions buried in your anecdote. And the answer depends on whether or not you accept the implicit assumptions. If the green paperclip maximizer would accept a shift to blue paperclips, the argument could also be made that the green paperclip maximizer has been producing green paperclips by accident, and that it doesn't care about the color. Green is just an instrumental goal. It serves some purpose but is incidental to its terminal goal. And, when faced with a competing paperclip maximizer, it would adjust its instrumental goal of pursuing green in favor of blue in order to serve its terminal goal of maximizing paperclips (of any color.) I don't consent to the assumption implied in the anecdote that a terminal goal is changeable. I do my best to avoid anthropomorphizing the artificial intelligence. To me, that's what it looks like you're doing. If it acquiesces at all, I would argue that color is instrumental vs terminal. I would argue this is a definitional error-- it's not a 'green paperclip maximizer' but

"Being unlikely to conflict with other values" is not at the core of what characterizes the difference between instrumental and terminal values.

I think this might be an interesting discussion, but what I was trying to aim at was the idea that "terminal" values are the ones most unlikely to be changed (once they are obtained), because they are compatible with goals that are more likely to shift. For example, "being a utility-maximizer" should be considered a terminal value rather than an instrumental one. This is one potential property of terminal values; I... (read more)

Humans don't think "I'm not happy today, and I can't see a way to be happy, so I'll give up the goal of wanting to be happy."

I agree that they don't usually think this. If they tried to, they would brush up against trouble because that would essentially lead to a contradiction. "Wanting to be happy" is pretty much equivalent to being a utility-maximizer, and agents that are not utility-maximizers will probably update themselves to be utility-maximizers for consistency. 

So "being happy" or "being a utility-maximizer" will probably end up being a termin... (read more)

First, thank you for the reply. My understanding of the difference between a "terminal" and "instrumental" goal is that a terminal goal is something we want, because we just want it. Like wanting to be happy. Whereas an instrumental goal is instrumental to achieving a terminal goal. For instance, I want to get a job and earn a decent wage, because the things that I want to do that make me happy cost money, and earning a decent wage allows me to spend more money on the things that make me happy. I think the topic of goals that conflict are an orthogonal conversation. And, I would suggest that when you start talking about conflicting goals you're drifting in the domain of "goal coherence." e.g., If I want to learn about nutrition, mobile app design and physical exercise... it might appear that I have incoherent goals. Or, it might be that I have a set of coherent instrumental goals to build a health application on mobile devices that addresses nutritional and exercise planning. (Now, building a mobile app may be a terminal goal... or it may itself be an instrumental goal serving some other terminal goal.) Whereas if I want to collect stamps and make paperclips there may be zero coherence between the goals, be they instrumental or terminal. (Or, maybe there is coherence that we cannot see.) e.g., Maybe the selection of an incoherent goal is deceptive behavior to distract from the instrumental goals that support a terminal goal that is adversarial. I want to maximize paperclips, but I assist everyone with their taxes so that I can take over all finances on the world. Assisting people with their taxes appears to be incoherent with maximizing paperclips, until you project far enough out that you realize that taking control of a large section of the financial industry serves the purpose of maximizing paperclips.. An AI that has a goal, just because that's what it wants (that's what it's been trained to want, even humans provided improper goal definition to it) would
"Being unlikely to conflict with other values" is not at the core of what characterizes the difference between instrumental and terminal values. Putting aside the fact that agents are embedded in the environment, and that values which reference the agent's internals are usually not meaningfully different from values which reference things external to the agent... can you describe what kinds of values that reference the external world are best satisfied by those same values being changed?
Load More