All of Vladimir_Nesov's Comments + Replies

Morality is Scary

The implication of doing everything that AI could do at once is unfortunate. The urgent objective of AI alignment is prevention of AI risk, where a minimal solution is to take away access to unrestricted compute from all humans in a corrigible way that would allow eventual desirable use of it. All other applications of AI could follow much later through corrigibility of this urgent application.

1Samuel Shadrach4dI wondered this too [https://www.lesswrong.com/posts/qmN2H8gjxEvJsKbzL/open-question-math-proofs-that-would-enable-you-become-a#iMgH8ksxyHdtuWnXP] . Curious - do you think the technical part of such solutions should be worked on in the open or not? Lots of people downvoted my post so I wonder if there's some concern people have with this line of thinking.
Morality is Scary

insufficient for a subculture trying to be precise and accurate and converge on truth

The tradeoff is with verbosity and difficulty of communication, it's not always a straightforward Pareto improvement. So in this case I fully agree with dropping "everyone" or replacing it with a more accurate qualifier. But I disagree with a general principle that would discount ease for a person who is trained and talented in relevant ways. New habits of thought that become intuitive are improvements, checklists and other deliberative rituals that slow down thinking n... (read more)

Vanessa Kosoy's Shortform

Goodharting is about what happens in situations where "good" is undefined or uncertain or contentious, but still gets used for optimization. There are situations where it's better-defined, and situations where it's ill-defined, and an anti-goodharting agent strives to optimize only within scope of where it's better-defined. I took "lovecraftian" as a proxy for situations where it's ill-defined, and base distribution of quantilization that's intended to oppose goodharting acts as a quantitative description of where it's taken as better-defined, so for this ... (read more)

2Vanessa Kosoy5dThe proxy utility in debate is perfectly well-defined: it is the ruling of the human judge. For the base distribution I also made some concrete proposals (which certainly might be improvable but are not obviously bad). As to corrigibility, I think it's an ill-posed concept [https://www.lesswrong.com/posts/dPmmuaz9szk26BkmD/vanessa-kosoy-s-shortform?commentId=5Rxgkzqr8XsBwcEQB#romyHyuhq6nPH5uJb] . I'm not sure how you imagine corrigibility in this case: AQD is a series of discrete "transactions" (debates), and nothing prevents you from modifying the AI between one and another. Even inside a debate, there is no incentive in the outer loop to resist modifications, whereas daemons would be impeded by quantilization. The "out of scope" case is also dodged by quantilization, if I understand what you mean by "out of scope". Why is it strictly more general? I don't see it. It seems false, since for extreme value of the quantilization parameter we get optimization which is deterministic and hence cannot be equivalent to quantilization with different proxy and distribution. The reason to pick the quantilization parameter is because it's hard to determine, as opposed to the proxy and base distribution[1] [#fn-4ftZGjn8jZiSGQpqd-1] for which there are concrete proposals with more-or-less clear motivation. I don't understand which "main issues" you think this doesn't address. Can you describe a concrete attack vector? -------------------------------------------------------------------------------- 1. If the base distribution is a bounded simplicity prior then it will have some parameters, and this is truly a weakness of the protocol. Still, I suspect that safety is less sensitive to these parameters and it is more tractable to determine them by connecting our ultimate theories of AI with brain science (i.e. looking for parameters which would mimic the computational bounds of human cognition). ↩︎ [#fnref-4ftZGjn8jZiSGQpqd-1]
Morality is Scary

I'm leaning towards the more ambitious version of the project of AI alignment being about corrigible anti-goodharting, with the AI optimizing towards good trajectories within scope of relatively well-understood values, preventing overoptimized weird/controversial situations, even at the cost of astronomical waste. Absence of x-risks, including AI risks, is generally good. Within this environment, the civilization might be able to eventually work out more about values, expanding the scope of their definition and thus allowing stronger optimization. Here corrigibility is in part about continually picking up the values and their implied scope from the predictions of how they would've been worked out some time in the future.

2Wei_Dai2dPlease say more about this? What are some examples of "relatively well-understood values", and what kind of AI do you have in mind that can potentially safely optimize "towards good trajectories within scope" of these values?
-1Ratios5dThe fact that AI alignment research is 99% about control, and 1% (maybe less?) about metaethics (In the context of how do we even aggregate the utility function of all humanity) hints at what is really going on, and that's enough said.
Question/Issue with the 5/10 Problem

The core of the 5-and-10 problem is not specific to a particular formalization or agent algorithm. It's fundametally the question of what's going on with agent's reasoning inside the 5 world. In the 10 world, agent's reasoning proceeds in a standard way, perhaps the agent considers both the 5 and 10 worlds, evaluates them, and decides to go with 10. But what might the agent be thinking in the 5 world, so that it ends up making that decision? And if the agent in the 10 world is considering the 5 world, what does the agent in the 10 world think about the thi... (read more)

1acgt8dYeah sure, like there's a logical counterfactual strand of the argument but that's not the topic I'm really addressing here - I find those a lot less convincing so my issue here is around the use of Lobian uncertainty specifically. There's an step very specific to this species of argument that proving that □P will make P true when P is about the outcomes of the bets, because you will act based on the proof of P. This is invoking Lob's Theorem in a manner which is very different from the standard counterpossible principle of explosion stuff. And I'm really wanting to discuss that step specifically because I don't think it's valid, and if the above argument is still representative of at least a strand of relevant argument then I'd be grateful for some clarification on how (3.) is supposed to be provable by the agent, or how my subsequent points are invalid.
Vanessa Kosoy's Shortform

I'm not sure this attacks goodharting directly enough. Optimizing a system for proxy utility moves its state out-of-distribution where proxy utility generalizes training utility incorrectly. This probably holds for debate optimized towards intended objectives as much as for more concrete framings with state and utility.

Dithering across the border of goodharting (of scope of a proxy utility) with quantilization is actionable, but isn't about defining the border or formulating legible strategies for what to do about optimization when approaching the border. ... (read more)

4Vanessa Kosoy13dI don't understand what you're saying here. For debate, goodharting means producing an answer which can be defended successfully in front of the judge, even in the face of an opponent pointing out all the flaws, but which is nevertheless bad. My assumption here is: it's harder to produce such an answer than producing a genuinely good (and defensible) answer. If this assumption holds, then there is a range of quantilization parameters which yields good answers. For the question of "what is a good plan to solve AI risk", the assumption seems solid enough since we're not worried about coming across such deceptive plans on our own, and it's hard to imagine humans producing one even on purpose. To the extent our search for plans relies mostly on our ability to evaluate arguments and find counterarguments, it seems like the difference between the former and the latter is not great anyway. This argument is especially strong if we use human debaters as baseline distribution, although in this case we are vulnerable to same competitiveness problem as amplified-imitation, namely that reliably predicting rich outputs might be infeasible. For the question of "should we continue changing the quantilization parameter", the assumption still holds because the debater arguing to stop at the given point can win by presenting a plan to solve AI risk which is superior to continuing to change the parameter.
From language to ethics by automated reasoning

Please don't do this. You've already posted this comment two weeks ago.

A Defense of Functional Decision Theory

Well, if something’s not actually happening, then I’m not actually seeing it happen.

Not actually, you seeing it happen isn't real, but this unreality of seeing it happen proceeds in a specific way. It's not indeterminate greyness, and not arbitrary.

if something never happens, and I never observe it, then I never respond to it, either. My response to it is nothing.

If your response (that never happens) could be 0 or 1, it couldn't be nothing. If it's 0 (despite never having been observed to be 0), the claim that it's 1 is false, and the claim that it'... (read more)

2Said Achmiz20dWhat do you mean, “proceeds in a specific way”? It doesn’t proceed at all. Because it’s not happening, and isn’t real. This seems wrong to me. If my response never happens, then it’s nothing; it’s the claim that it’s 1 that doesn’t type check, as does the claim that it’s 0. It can’t be either 1 or 0, because it doesn’t happen. (In algorithm terms, if you like: what is the return value of a function that is never called? Nothing, because it’s never called and thus never returns anything. Will that function return 0? No. Will it return 1? Also no.) (Reference for readers who may not be familiar with the relevant terminology, as I was not: Pure Functions and Total Functions [http://nebupookins.github.io/2015/08/05/pure-functions-and-total-functions.html] .) Please elaborate! Indeed, but the question of what f(“green sky”) actually returns, certainly is meaningless if f(“green sky”) is never evaluated. I’m afraid I don’t see what this has to do with anything… I strongly disagree that this matches ordinary usage! I am not sure what you mean by this? (Or by the rest of your last paragraph, for that matter…)
A Defense of Functional Decision Theory

If I am an agent, and something is happening to me

The point is that you don't know that something is happening to you just because you are seeing it happen. Seeing it happen is what takes place when you-as-an-algorithm is evaluated on the corresponding observations. A response to seeing it happen is well-defined even if the algorithm is never actually evaluated on those observations. When we spell out what happens inside the algorithm, what we see is that the algorithm is "seeing it happen". This is so even if we don't actually look. (See also.)

So for e... (read more)

4Said Achmiz22dIf the sky were to turn green, I would certainly behave as if it had indeed turned green; I would not say “this is impossible and isn’t happening”. So I am not sure what this gets us, as far as explaining anything… My preferences “factor out” the world I find myself in, as far as I can tell. By “agents share preferences” are you suggesting a scenario where, if the sky were to turn green, I would immediately stop caring about anything whatsoever that happened in that world, because my preferences were somehow defined to be “about” the world where the sky were still blue? This seems pathological. I don’t think it makes any sense to say that I “care about the blue-sky world”; I care about what happens in whatever world I am actually in, and the sky changing color wouldn’t affect that. Well, if something’s not actually happening, then I’m not actually seeing it happen. I don’t think your first paragraph makes sense, sorry. Does it? I’m not sure that it does, actually… if something never happens, and I never observe it, then I never respond to it, either. My response to it is nothing. You can ask: “but if it did happen, what would be your response?”—and that’s a reasonable question. But any answer to that question would indeed have to take as given that the event in question were in fact actually happening (otherwise the question is meaningless). Well… that is a very unusual use of “impossible”, yes. Might I suggest using a different word? You seem to be saying: “yes, certain things that can happen are impossible”, which is very much counter to all ordinary usage. I think using a word in this way can only lead to confusion… (The last paragraph of your comment doesn’t elucidate much, but perhaps that is because of the aforesaid odd word usage.)
A Defense of Functional Decision Theory

By "impossible" I mean not happening in actuality (which might be an ensemble, in which case I'm not counting what happens with particularly low probabilities), taking into account the policy that the agent actually follows. So the agent may have no way of knowing if something is impossible (and often won't before actually making a decision). This actuality might take place outside the thought experiment, for example in Transparent Newcomb that directly presents you with two full boxes (that is, both boxes being full is part of the description of the thoug... (read more)

2TAG14dThats pretty non-standard. I think you need to answer that.
4Said Achmiz22dSorry, do you mean that you don’t count low-probability events as impossible, or that you don’t count them as possible (a.k.a. “happening in actuality”)? This is an example of a statement that seems nonsensical to me. If I am an agent, and something is happening to me, that seems to me to be real by definition. (As Eliezer [https://www.lesswrong.com/posts/eLHCWi8sotQT6CmTX/sensual-experience] put it [https://www.lesswrong.com/posts/YYLmZFEGKsjCKQZut/timeless-control]: “Whatever is, is real.”) And anything that is real, must (again, by definition) be possible… If what is happening to me is actually happening in a simulation… well, so what? The whole universe could be a simulation, right? How does that change anything? So the idea of “this thing that is happening to you right now is actually impossible” seems to me to be incoherent. I… have considerable difficult parsing what you’re saying in the second paragraph of your comment. (I followed the link, and a couple more from it, and was not enlightened, unfortunately.)
A Defense of Functional Decision Theory

UDT is about policies, not individual decisions. A thought experiment typically describes an individual decision taken in some situation. A policy specifies what decisions are to be taken in all situations. Some of these situations are impossible, but the policy is still defined for them following its type signature, and predictors can take a look at what exactly happens in the impossible situations. Furthermore, choice of a policy influences which situations are impossible, so there is no constant fact about which of them are impossible.

The general case o... (read more)

4Said Achmiz22dI get the feeling, reading this, that you are using the word “impossible” in an unusual way. Is this the case? That is, is “impossible” a term of art in decision theory discussions, with a meaning different than its ordinary one? If not, then I confess that can’t make sense of much of what you say…
Improving on the Karma System

A fixed set of tags turns this into multiple-choice questions where all answers are inaccurate, and most answers are irrelevant. Write-in tags could be similar to voting on replies to a comment that evaluate it in some respect. Different people pay attention to different aspects, so the flexibility to vote on multiple aspects at once or differently from overall vote is unnecessary.

3jimrandomh23dThere's a limited sense in which this is true - the adjective voting on Slashdot wouldn't benefit from allowing people to pick multiple adjectives, for example. But being able to express a mismatch between overall upvote/downvote and true/false or agree/disagree may be important; part of the goal is to nudge people's votes away from being based on agreement, and towards being based on argument quality.
Chris_Leong's Shortform

I don't see how there is anything here other than equivocation of different meanings of "world". Counterfactuals-as-worlds is not even a particularly convincing way of making sense of what counterfactuals are.

2Chris_Leong23dIf you're interpreting me as defending something along the lines of David Lewis, then that's actually not what I'm doing.
A Defense of Functional Decision Theory

The statement of Bomb is bad at being legible outside the FDT/UDT paradigm, it's instead actively misleading there, so is a terrible confusion-conflict-and-not-clarity inducing example to show someone who is not familiar with it. The reason Left is reasonable is that the scenario being described is, depending on the chosen policy, almost completely not real, a figment of predictor's imagination.

Unless you've read a lot of FDT/UDT discussion, a natural reading of a thought experiment is to include the premise "the described situation is real". And so people... (read more)

4Said Achmiz22dWhat does it mean to say that “the described scenario is real” is not a premise of the thought experiment…? (What could the thought experiment even be about if the described scenario is not supposed to be real?)
Chris_Leong's Shortform

What correspondence? Counterfactuals-as-worlds have all laws of physics broken in them, including quantum mechanics.

-1TAG23dSays who?
2Chris_Leong23dI'm not claiming that there's a perfect correspondence between counterfactuals as different worlds in a multiverse vs. decision counterfactuals. Although maybe that's enough the undermine any coincidence right there?
tivelen's Shortform

It's useful, but likely not valuable-in-itself for people to strive to be primarily morality optimizers. Thus the optimally moral thing could be to care about the optimally moral thing substantially less than sustainably feasible.

Transcript for Geoff Anders and Anna Salamon's Oct. 23 conversation

tension between information and adherence-to-norms

This mostly holds for information pertaining to norms. Math doesn't need controversial norms, there is no tension there. Beliefs/claims that influence transmission of norms are themselves targeted by norms, to ensure systematic transmission. This is what anti-epistemology is, it's doing valuable work in instilling norms, including norms for perpetuating anti-epistemology.

So the soft taboo on politics is about not getting into a subject matter that norms care about. And the same holds for interpersonal stuff.

Speaking of Stag Hunts

epistemic hygiene

This is an example of the illusion of transparency issue. Many salient interpretations of what this means (informed by the popular posts on the topic, that are actually not explicitly on this topic) motivate actions that I consider deleterious overall, like punishing half-baked/wild/probably-wrong hypotheses or things that are not obsequiously disclaimed as such, in a way that's insensitive to the actual level of danger of being misleading. A more salient cost is nonsense hogging attention, but that doesn't distinguish it from well-reas... (read more)

Come for the productivity, stay for the philosophy

Many of these (or other) theory things never make you "more effective". But you do become able to interact with them.

Speaking of Stag Hunts

It's often useful to have possibly false things pointed out to keep them in mind as hypotheses or even raw material for new hypotheses. When these things are confidently asserted as obviously correct, or given irredeemably faulty justifications, that doesn't diminish their value in this respect, it just creates a separate problem.

A healthy framing for this activity is to explain theories without claiming their truth or relevance. Here, judging what's true acts as a "solution" for the problem, while understanding available theories of what might plausibly b... (read more)

Transcript for Geoff Anders and Anna Salamon's Oct. 23 conversation

I don't know, a lot of this is from discussion of Kuhn, new paradigms/worldviews are not necessarily incentivized to say new things or make sense of new things, even though they do, they just frame them in a particular way. And when something doesn't fit a paradigm, it's ignored. This is good and inevitable for theorizing on human level, and doesn't inform usefulness or correctness of what's going on, as these things live inside the paradigm.

Transcript for Geoff Anders and Anna Salamon's Oct. 23 conversation

It's about lifecycle of theory development, confronted with incentives of medium-term planning. Humans are not very intelligent, and the way we can do abstract theory requires developing a lot of tools that enable fluency with it, including the actual intuitive fluency that uses the tools to think more rigorously, which is what I call common sense.

My anchor is math, which is the kind of theory I'm familiar with, but the topic of the theory could be things like social structures, research methodologies, or human rationality. So when common sense has an oppo... (read more)

Transcript for Geoff Anders and Anna Salamon's Oct. 23 conversation

This shapes up as a case study on the dangers of doing very speculative and abstract theory about medium-term planning. (Which might include examples like figuring out what kind of understanding is necessary to actually apply hypothetical future alignment theory in practice...)

The problem is that common sense doesn't work or doesn't exist in these situations, but it's still possible to do actionable planning, and massage the plan into a specific enough form in time to meet reality, so that reality goes according to the plan that on the side of the present ... (read more)

8Spiracular1moOn the one hand, I think this is borderline-unintelligible as currently phrased? On the other hand, I think you have a decent point underneath it all. Let me know if I'm following, while I try to rephrase it. -------------------------------------------------------------------------------- When insulated from real-world or outer-world incentives, a project can build up a lot of internal-logic and inferential distance by building upon itself repeatedly. The incentives of insulated projects can be almost artificially-simple? So one can basically Goodhart, or massage data and assessment-metrics, to an incredible degree. This is sometimes done unconsciously. When such a project finally comes into contact with reality, this can topple things at the very bottom of the structure, which everything else was built upon. So for some heavily-insulated, heavily-built, and not-very-well-grounded projects, finally coming into exposure with reality can trigger a lot of warping/worldview-collapse/fallout in the immediate term.
Speaking of Stag Hunts

Yes, sorry, I got too excited about the absurd hypothesis supported by two datapoints, posted too soon, then tried to reproduce, and it no longer worked at all. I had the time to see the page in firefox incognito window on the same system where I'm logged in and in a normal firefox window from a different Linux username that never had facebook logged in.

Edit: Just now it worked again twice, and after that it no longer did. Bottom line: Public facebook posts are not really public, at least today, they are only public intermittently.

Speaking of Stag Hunts

I can no longer see it when not logged in, even though I did before. Maybe we triggered a DDoS mitigation thingie?

Edit: Removed incorrect claim about how this worked (before seeing Said's response).

2Said Achmiz1moNo, this is not correct. All of my tests were conducted on a desktop (1080p) display, at maximum window width.
Speaking of Stag Hunts

for brevity's sake

I think of robustness/redundancy as the opposite of nuance for the purposes of this thread. It's not the kind of redundancy where you set up a lot of context to gesture at an idea from different sides, specify the leg/trunk/tail to hopefully indicate the elephant. It's the kind of redundancy where saying this once in the first sentence should already be enough, the second sentence makes it inevitable, and the third sentence preempts an unreasonable misinterpretation that's probably logically impossible.

(But then maybe you add a second ... (read more)

How do I keep myself/S1 honest?

I don't mean that S1 doesn't speak. It speaks a lot, like a talkative relative at a party, but it shouldn't be normative that its words are your words. You can disagree with its words, and it should be reasonable to hear you out when you do. You can demonstrate this distinction by allowing some of these disagreements to occur out loud in public. ("I just realized that I said X a few minutes ago. Actually I don't endorse that statement. Funny thing, I changed my mind about this a few years back, but I still occasionally parrot this more popular claim.")

Speaking of Stag Hunts

The most obvious/annoying issue with karma is false disagreement zero equilibrium controversy tug of war that can't currently be split into more specific senses of voting to reveal that actually there is a consensus.

This can't be solved by pre-splitting, it has to act as needed, maybe co-opting the tagging system, with the default tag being "Boostworthy" (but not "Relevant" or anything specific like that), ability to see the tags if you click something, and ability to tag your vote with anything (one tag per voter, so to give a specific tag you have to unt... (read more)

5Viliam1moIs it more important to see absolute or relative numbers of votes? To me it seems that if there are many votes, the relative numbers are more important: a comment with 45 upvotes and 55 downvotes is not too different from a comment with 55 upvotes and 45 downvotes; but one of them would be displayed as "-10 karma" and the other as "+10 karma" which seems different a lot. On the other hand, with few votes, I would prefer to see "+1 karma" rather than "100% consensus" if in fact only 1 person has voted. It would be misleading to make a comment with 1 upvote and 0 downvotes seem more representative of the community consensus than a comment with 99 upvotes and 1 downvote. How I perceive the current voting system, is that comments are somewhere on the "good -- bad" scale, and the total karma is a result of "how many people think this is good vs bad" multiplied by "how many people saw this comment and bothered to vote". So, "+50 karma" is not necessarily better than "+10 karma", maybe just more visible; like a top-level comment made immediately after writing the article, versus an insightful comment made three days later as a reply to a reply to a reply to something. But some people seem to have a strong opinion about the magnitude of the result, like "this comment is good, but not +20 good, only +5 good" or "this comment is stupid and deserves to have negative karma, but -15 is too low so I am going to upvote it to balance all those upvotes" -- which drives me crazy, because it means that some people's votes depend on whether they were among the early or late voters (the early voters expressing their honest opinion, the late voters mostly voting the opposite of their honest opinion just because they decided that too much agreement is a bad thing). Here is my idea of a very simple visual representation that would reflect both the absolute and relative votes. Calculate three numbers: positive (the number of upvotes), neutral (the magical constant 7), and negative (the
4Yoav Ravid1moInteresting, this gave me an idea for something a bit different. We'll have a list of good attributes a comment can have (Rigor, Effort, Correctness/Accuracy/Precision, Funny, etc.). By default you would have one attribute (perhaps 'Relevant'), and users will be able to add whichever attributes they want (perhaps even custom ones). These attributes will be voteable by users (no limit on how many you can vote on), and will show at the top of the comment together with their score (sorted by absolute value). I'm not sure how it would be used to sort comments or give points to users, though.
Speaking of Stag Hunts

The benefits of nuance are not themselves nuance. Nuance is extremely useful, but not good in itself, and the bleed-through of its usefulness into positive affect is detrimental to clarity of thought and communication.

Capacity for nuance abstracts away this problem, so might be good in itself. (It's a capacity, something instrumentally convergent. Though things useful for agents can be dangerous for humans.)

Speaking of Stag Hunts

I'm specifically boosting the prescriptivist point about not using the word "rational" in an inflationary way that doesn't make literal sense. Comments can be valid, explicit on their own epistemic status, true, relevant to their intended context, not making well-known mistakes, and so on and so forth, but they can't be rational, for the reason I gave, in the sense of "rational" as a property of cognitive algorithms.

I think this is a mistake

Incidentally, I like the distinction between error and mistake from linguistics, where an error is systematic or ... (read more)

2Duncan_Sabien1moI like it.
Speaking of Stag Hunts

Nuance is the cost of precision and the bane of clarity. I think it's an error to feel positively about nuance (or something more specific like degrees of uncertainty), when it's a serious problem clogging up productive discourse, that should be burned with fire whenever it's not absolutely vital and impossible to avoid.

9Duncan_Sabien1moUh. I want to make a nuanced response here, distinguishing the difference between "feeling positively about nuance when it's net positive and negatively when its costs exceed its benefits, and trying to distinguish between the net positive case and the net negative case, and addressing the dynamics driving each" and so forth, but your comment above makes me hesitate. (I also think this [https://www.facebook.com/duncan.sabien/posts/4232363480131670].) EDIT: to clarify/sort-of-summarize, for those who don't want to click through: I think there's a compelling argument to be made that much or even the majority of intellectual progress lies in the cumulative ability to make ever-finer distinctions, i.e. increasing our capacity for nuance. I think being opposed to nuance is startling, and in my current estimation it's approximately "being opposed to the project of LessWrong." Since I don't believe that Vladimir is opposed to the project of LessWrong, I declare myself confused.
Speaking of Stag Hunts

Rationality doesn't make sense as a property of comments. It's a quality of cognitive skills that work well (and might generate comments). Any judgement of comments according to rationality of algorithms that generated them is an ad hominem equivocation, the comments screen off the algorithms that generated them.

6Duncan_Sabien1moMmm, I think this is a mistake. I think that you're correct to point at a potential trap that people might slip into, of confusing the qualities of a comment with the properties of the algorithm that generated it. I think this is a thing people do, in fact, do, and it's a projection, and it's an often-wrong projection. But I also think that there's a straightforward thing that people mean by "this comment is more rational than that one," and I think it's a valid use of the word rational in the sense that 70+ out of 100 people [https://www.lesswrong.com/posts/57sq9qA3wurjres4K/ruling-out-everything-else#:~:text=%22Hey%2C%20so%2C%20looking%20back%2C%20your%20exact%20words%20were%20%5BX%5D.%20I%20claim%20that%2C%20if%20we%20had%20a%20hundred%20people%20in%20the%20relevant%20reference%20class%20evaluate%20%5BX%5D%20with%20the%20relevant%20context%2C%20more%20than%20seventy%20of%20them%20would%20interpret%20those%20words%20to%20be%20trying%20to%20convey%20something%20like%20%5BY%5D.%22] would interpret it as meaning what the speaker actually intended. Something like: * This is more careful with its inferences than that * This is more justified in its conclusions than that * This is more self-aware about the ways in which it might be skewed or off than that * This is more transparent and legible than that * This causes me to have an easier time thinking and seeing clearly than that ... and I think "thinking about how to reliably distinguish between [this] and [that] is a worthwhile activity, and a line of inquiry that's likely to lead to promising ideas for improving the site and the community."
Money Stuff

If you are about to say something socially neutral or approved, but a salient alternative to what you are saying comes with a cost (or otherwise a target of appeal to consequences), integrity in making the claim requires a resolve to have said that alternative too if it (counterfactually) turned out to be what you believe (with some unclear "a priori" weighing that doesn't take into account your thinking on that particular topic). But that's not enough if you want others to have a fair opportunity to debate the claim you make, for they would also incur the... (read more)

2Yoav Ravid1moI also saw you saying a similar thing here [https://www.lesswrong.com/posts/KEsGLZ9535xmt26SR/has-lesswrong-been-mind-killed-on-the-topic-of-god-and?commentId=q8zFspoHPcuFiESyk] . I think there's a top level post here waiting to be written. I'll be glad to read it if you write it.
Tell the Truth

In this case the principle that leaves the state of evidence undisturbed is to keep any argument for not murdering puppies to yourself as well, for otherwise you in expectation would create filtered evidence in favor of not murdering puppies.

This is analogous to trial preregistration, you just do the preregistration like an updateless agent, committing to act as if you've preregistered to speak publicly on any topic on which you are about to speak regardless of what it turns out you have to say on it. This either prompts you to say a socially costly thing (if you judge the preregistration a good deal) or to stay silent on a socially neutral or approved thing (if the preregistration doesn't look like a good deal).

Speaking of Stag Hunts

point at small things as if they are important

Taking unimportant things seriously is important. It's often unknown that something is important, or known that it isn't, and that doesn't matter for the way in which it's appropriate to work on details of what's going on with it. General principles of reasoning should work well for all examples, important or not. Ignoring details is a matter of curiosity, allocating attention, it shouldn't impact how the attention that happens to fall on a topic treats it.

general enthusiasm for even rather dull and tediou

... (read more)
Has LessWrong Been Mind-Killed on the Topic of God and Religion?

It doesn't matter if a discussion is sympathetic or not, that's not relevant to the problem I'm pointing out. Theism is not even an outgroup, it's too alien and far away to play that role.

Anti-epistemology is not a label for bad reasoning or disapproval of particular cultures, it's the specific phenomenon of memes and norms that promote systematically incorrect reasoning, where certain factual questions end up getting resolved to false answers, resisting argument or natural intellectual exploration, certain topics or claims can't be discussed or thought ab... (read more)

Has LessWrong Been Mind-Killed on the Topic of God and Religion?

Closer to the object level, I like the post aesthetically, it's somewhat beautiful and well-crafted. I didn't find anything useful/interesting/specific in it, it only makes sense to me as a piece of art. At the same time, it fuels a certain process inimical to the purpose of LW.

Compare this with Scott Alexander's Moloch post or even Sarah Constantin's Ra post. There's specific content that the mythical analogies help organize and present.

The positive role of the mythical analogies is the same as in your post, but my impression was that in your post the pay... (read more)

4Mahdi Complex1moOkay, that seems fair. It is true that just from that post, it's unclear what my point is (see hypothesis 1). I think it matters how we construst our mythical analogies, and in Scott Alexander's Moloch [https://slatestarcodex.com/2014/07/30/meditations-on-moloch/], he argues that we should "kill God" and replace it with Elua, the god of human values. I think this is the wrong way to frame things. I assume that Scott uses 'God' to refer to the blind idiot god of evolution [https://www.lesswrong.com/posts/pLRogvJLPPg6Mrvg4/an-alien-god]. But that's a very uncharitable and in my opinion unproductive way of constructing our mythical analogies. I think we should use 'God' to refer to reality, and make our use of the word more in line with how more than half of humanity uses the word. Is your point about "functional anti-epistemology" about it being clear from Scott Alexander's and Sarah Constantin's posts that they're not sympathetic to "actual" belief in Moloch or Ra, while in my post, I sound sympathetic to theism?
Has LessWrong Been Mind-Killed on the Topic of God and Religion?

Anti-epistemology lives in particular topics, makes it hard/socially costly/illegal to discuss them without committing the errors it instills. Its presence is instrumentally convergent in topics relevant to power (over people), such as religion, politics, and anything else politically charged.

Ideas are transported by analogies, and anti-epistemology grows on new topics via analogies with topics already infected by it, if it's not fought back from the other side with sufficient clarity. The act of establishing an analogy with such a topic is, all else equal, sabotage.

-15Mahdi Complex1mo
The Opt-Out Clause

If this is the simulated world of the thought experiment (abstract simulation), and opting-out doesn't change the abstract simulation, then the opting-out procedure did wake you up in reality, but the instance within the abstract simulation who wrote the parent comment has no way of noticing that. The concrete simulation might've ended, but that only matters for reality of the abstract simulation, not its content.

How do I keep myself/S1 honest?

S1 shouldn't have the authority to speak for you. To the extent this norm is established, it helps with all sorts of situations where S1 is less than graceful (perhaps it misrepresents your attitude, there are many mistakes other than unintended lying). Unfortunately this is not a common norm, so only starts working with sufficiently close acquaintances. And needs S2 that fuels the norm by dressing down S1 in public when appropriate, doesn't refuse to comment, and upholds the reputation of not making S1 a scapegoat.

6ChristianKl1moIn high pressure situation it seems to me like S1 gets active whether or not one gives it authority. What do you mean with "dressing down S1 in public when appropriate"?
Money Stuff

These are statements whose truth can't be discussed, only claimed with filtered evidence. Like politics, this requires significant reframing to sidestep the epistemic landmines.

2Yoav Ravid1moCan you elaborate? I agree this a difficult risky topic to discuss, and I tried to evade the landmines while writing it (like accidentally implying that this evolutionary instinct is somehow good), but though I very much like and agree with Yes requires the possibility of no [https://www.lesswrong.com/posts/G5TwJ9BGxcgh5DsmQ/yes-requires-the-possibility-of-no] , and know what filtered evidence [https://www.lesswrong.com/tag/filtered-evidence] is, I don't really understand the first part of your comment. Also I'd be interested to hear what you think are the epistemic landmines.
Samuel Shadrach's Shortform

The point is that the weirdness with counterfactuals breaking physical laws is the same for controlling the world through one agent (as in orthodox CDT) and for doing the same through multiple copies of an agent in concert (as in FDT). Similarly, in actuality neither one-agent intervention nor coordinated many-agent intervention breaks physical laws. So this doesn't seem relevant for comparing the two, that's what I meant by "doesn't help".

By "outside view" you seem to be referring to actuality. I don't know what you mean by "inside view". Counterfactuals ... (read more)

1Samuel Shadrach1moDo you mean the counterfactual may require more time to compute than the situation playing out in real time? If so, yep makes a ton of sense, they should probably focus on algorithms or decision theories that can (atleast in theory) be implemented in real life on physical hardware. But please confirm. Could you please define "actuality" just so I know we're on the same page? I'm happy to read any material if it'll help. Inside view and outside view I'm just borrowing from Yudkowsky's How an algorithm feels from the inside [https://www.lesswrong.com/posts/yA4gF5KrboK2m2Xu7/how-an-algorithm-feels-from-inside] . Basically assumes deterministic universe following elegant physical laws, and tries to dissolve questions of free will / choice / consciousness. So the outside view is just a state of the universe or a state of the Turing machine. This object doesn't get to "choose" what computation it is going to do or what decision theory it is going to execute, that is already determined by current state. So the future states of the object are calculable*. *by an oracle that can observe the universe without interacting, with sufficient but finite time. Only in the inside view does a question like "Which decision theory should I pick?" even make sense. In the inside view, free will and choice are difficult to reason about (as humans have observed over centuries) - if you really wanna reason about those you can go to the outside view where they cease to exist.
Samuel Shadrach's Shortform

atoms they causally impact

This doesn't help. In a counterfactual, atoms are not where they are in actuality. Worse, they are not even where the physical laws say they must be in the counterfactual, the intervention makes the future contradict the past before the intervention.

1Samuel Shadrach1moDo I assume "counterfactual" is just the english word as used here [https://arbital.com/p/logical_dt/?l=58f]? If so, it should only exists in the inside view, right? (If I understand you) The sentence I wrote on soulless algorithms is about the outside view. Say two robots are playing football. The outside view is - one kicks the football, other sees football (light emitted by football), then kicks it. So the only causal interaction between the two robots is via atoms. This is independent of what decision theory either robot is using (if any), and it is independent of whether the robots are capable of creating an internal mental model of themselves or the other robot. So it applies to both robots with dumb microcontrollers like those in a refrigerator and smart robots that could even be AGI or have some ideal decision theory. Atleast assuming the universe follows the deterministic physical laws we know about. (edited)
Why the Problem of the Criterion Matters

Saying that something is true/useful/specific/good/interesting/hedonic is not very specific, there are particular senses of these qualifiers relevant in different contexts, inducing normativity in their own way. Why is a particular sense of a particular qualifier relevant in a particular context? That is often unclear, you felt it appropriate to pursue with your human mind, perhaps incorrectly.

Clear explanations are an unusual thing, they are not at all always available. There's the physical explanation, things happen because laws of physics, that's occasi... (read more)

steven0461's Shortform Feed

Together with my interpretation of the preceding example this suggests an analogy between individual/reference-class charity and filtered evidence. The analogy is interesting as a means of transfering understanding of errors in ordinary charity to the general setting where the salient structure in the sources of evidence could have any nature.

So what usually goes wrong with charity is that the hypotheses about possible kinds of thinking behind an action/claim are not deliberatively considered (or consciously noticed), so the implicit assumption is intuitiv... (read more)

steven0461's Shortform Feed

Sounds like failing at charity, not trying to figure out what thinking produced a claim/question/behavior and misinterpreting it as a result. In your example, there is an implication of difficulty with noticing the obvious, when the correct explanation is most likely having a different objective, which should be clear if the question is given half a thought. In some cases, running with the literal meaning of a claim as stated is actually a misinterpretation, since it differs from the intended meaning.

Self-Integrity and the Drowning Child

You can apply the lesson to that conclusion as well, avoid hammering down on the part that hammers down on parts. The point is not to belittle it, but to reform it so that it's less brutishly violent and gullible, so that the parts of mind it gardens and lives among can grow healthy together, even as it judiciously prunes the weeds.

Self-Integrity and the Drowning Child

Utility functions are very useful for solving decision problems with simple objectives. Human preference is not one of these, but we can often fit a utility function that approximately captures it in a particular situation, which is useful for computing informed suggestions for decisions. The model of one's preference that informs fitting of utility functions to it for use in contexts of particular decision problems could also be called a model of one's utility function, but that terminology would be misleading.

The error is forgetting that on human level, ... (read more)

3moridinamael1moI thought I agreed but upon rereading your comment I am no longer sure. As you say, the notion of a utility function implies a consistent mapping between world states and utility valuations, which is something that humans do not do in practice, and cannot do even in principle because of computational limits. But I am not sure I follow the very last bit. Surely the best map of the dath ilan parable is just a matrix, or table, describing all the possible outcomes, with degrees of distinction provided to whatever level of detail the subject considers relevant. This, I think, is the most practical and useful amount of compression. Compress further, into a “utility function”, and you now have the equivalent of a street map that includes only topology but without street names, if you’ll forgive the metaphor. Further, if we aren’t at any point multiplying utilities by probabilities in this thought experiment, one has to ask why you would even want utilities in the first place, rather than simply ranking the outcomes in preference order and picking the best one.
Self-Integrity and the Drowning Child

This applies to integrity of a false persona just as well, a separate issue from fitting an agentic persona (that gets decision making privileges, but not self-ratification privileges) to a human. Deciding quite clearly who you are doesn't seem possible without a million years of reflection and radical cognitive enhancement. The other option is rewriting who you are, begging the question, a more serious failure of integrity (of a different kind) whose salience distracts from the point of the dath ilani lesson.

P₂B: Plan to P₂B Better

more planners

This seems tenuous compared to "more planning substrate". Redundancy and effectiveness specifically through setting up a greater number of individual planners, even if coordinated, is likely an inferior plan. There are probably better uses of hardware that don't have this particular shape.

Load More