I didn't like this post. At the time, I didn't engage with it very much. I wrote a mildly critical comment (which is currently the top-voted comment, somewhat to my surprise) but I didn't actually engage with the idea very much. So it seems like a good idea to say something now.
The main argument that this is valuable seems to be: this captures a common crux in AI safety. I don't think it's my crux, and I think other people who think it is their crux are probably mistaken. So from my perspective it's a straw-man of the view it&... (read more)
In my opinion, the biggest shift in the study of rationality since the Sequences were published were a change in focus from "bad math" biases (anchoring, availability, base rate neglect etc.) to socially-driven biases. And with good reason: while a crash course in Bayes' Law can alleviate many of the issues with intuitive math, group politics are a deep and inextricable part of everything our brains do.
There has been a lot of great writing describing the issue like Scott’s essays on ingroups and outgroups and Robin Hanson’s the... (read more)
This is an unusually difficult post to review. In an ideal world, we'd like to be able to review things as they are, without reference to who the author is. In many settings, reviews are done anonymously (with the author's name stricken off), for just this reason. This post puts that to the test: the author is a pariah. And ordinarily I would say, that's irrelevant, we can just read the post and evaluate it on its own merits.
Other comments have mentioned that there could be PR concerns, ie, that making the author's existence and participation on LessWrong
I think that strictly speaking this post (or at least the main thrust) is true, and proven in the first section. The title is arguably less true: I think of 'coherence arguments' as including things like 'it's not possible for you to agree to give me a limitless number of dollars in return for nothing', which does imply some degree of 'goal-direction'.
I think the post is important, because it constrains the types of valid arguments that can be given for 'freaking out about goal-directedness', for lack of a better term. In my mind, it provokes various follo
In this essay, ricraz argues that we shouldn't expect a clean mathematical theory of rationality and intelligence to exist. I have debated em about this, and I continue to endorse more or less everything I said in that debate. Here I want to restate some of my (critical) position by building it from the ground up, instead of responding to ricraz point by point.
When should we expect a domain to be "clean" or "messy"? Let's look at everything we know about science. The "cleanest" domains are mathematics and fundamental physics. There, we have crisply defined
Tldr; I don’t think that this post stands up to close scrutiny although there may be unknown knowns anyway. This is partly due to a couple of things in the original paper which I think are a bit misleading for the purposes of analysing the markets.
The unknown knowns claim is based on 3 patterns in the data:
“The mean prediction market belief of replication is 63.4%, the survey mean was 60.6% and the final result was 61.9%. That’s impressive all around.”
“Every study that would replicate traded at a higher probability of suc... (read more)
[this is a review by the author]
I think what this post was doing was pretty important (colliding two quite different perspectives). In general there is a thing where there is a "clueless / naive" perspective and a "loser / sociopath / zero-sum / predatory" perspective that usually hides itself from the clueless perspective (with some assistance from the clueless perspective; consider the "see no evil, hear no evil, speak no evil" mindset, a strategy for staying naive). And there are lots of difficulties in trying to establish communication. And the dial
Here are my thoughts.
I don't know if I'll ever get to a full editing of this. I'll jot notes here of how I would edit it as I reread this.
In this essay Paul Christiano proposes a definition of "AI alignment" which is more narrow than other definitions that are often employed. Specifically, Paul suggests defining alignment in terms of the motivation of the agent (which should be, helping the user), rather than what the agent actually does. That is, as long as the agent "means well", it is aligned, even if errors in its assumptions about the user's preferences or about the world at large lead it to actions that are bad for the user.
Rohin Shah's comment on the essay (which I believe is endorsed
In this essay, Rohin sets out to debunk what ey perceive as a prevalent but erroneous idea in the AI alignment community, namely: "VNM and similar theorems imply goal-directed behavior". This is placed in the context of Rohin's thesis that solving AI alignment is best achieved by designing AI which is not goal-directed. The main argument is: "coherence arguments" imply expected utility maximization, but expected utility maximization does not imply goal-directed behavior. Instead, it is a vacuous constraint, since any agent policy can be regarded as maximiz
As far as I can tell, this post successfully communicates a cluster of claims relating to "Looking, insight meditation, and enlightenment". It's written in a quite readable style that uses a minimum of metaphorical language or Buddhist jargon. That being said, likely due to its focus as exposition and not persuasion, it contains and relies on several claims that are not supported in the text, such as:
I have several problems with including this in the 2018 review. The first is that it's community-navel-gaze-y - if it's not the kind of thing we allow on the frontpage because of concerns about newcomers seeing a bunch of in-group discussion, then it seems like we definitely wouldn't want it to be in a semi-public-facing book, either.
The second is that I've found that most discussion of the concept of 'status' in rationalist circles to be pretty uniformly unproductive, and maybe even counterproductive. People generally only discuss 'status' when they
A year later, I continue to agree with this post; I still think its primary argument is sound and important. I'm somewhat sad that I still think it is important; I thought this was an obvious-once-pointed-out point, but I do not think the community actually believes it yet.
I particularly agree with this sentence of Daniel's review:
I think the post is important, because it constrains the types of valid arguments that can be given for 'freaking out about goal-directedness', for lack of a better term."
"Constraining the types of valid arguments" is exactly the... (read more)
There are two separate lenses through which I view the idea of competitive markets as backpropagation.
First, it's an example of the real meat of economics. Many people - including economists - think of economics as studying human markets and exchange. But the theory of economics is, to a large extent, general theory of distributed optimization. When we understand on a gut level that "price = derivative", and markets are just implementing backprop, it makes a lot more sense that things like markets would show up in other fields - e.g. AI or b... (read more)
A brief authorial take - I think this post has aged well, although as with Caring Less (https://www.lesswrong.com/posts/dPLSxceMtnQN2mCxL/caring-less), this was an abstract piece and I didn't make any particular claims here.
I'm so glad that
A) this was popular
B) I wasn't making up a new word for a concept that most people already know by a different name, which I think will send you to at least the first layer of Discourse Hell on its own.
I've met at least one person in the community who said they knew and thought about this post a lot, well before they'd
I've been pleasantly surprised by how much this resource has caught on in terms of people using it and referring to it (definitely more than I expected when I made it). There were 30 examples on the list when was posted in April 2018, and 20 new examples have been contributed through the form since then. I think the list has several properties that contributed to wide adoption: it's fun, standardized, up-to-date, comprehensive, and collaborative.
Some of the appeal is that it's fun to read about AI cheating at tasks in unexpected ways (I&apo... (read more)
I think this post should be included in the best posts of 2018 collection. It does an excellent job of balancing several desirable qualities: it is very well written, being both clear and entertaining; it is informative and thorough; it is in the style of argument which is preferred on LessWrong, by which I mean makes use of both theory and intuition in the explanation.
This post adds to the greater conversation by displaying rationality of the kind we are pursuing directed at a big societal problem. A specific example of what I mean that distinguishes this... (read more)
I thought I'd add a few quick notes as the author.
As I reread this, a few things jump out for me:
[Disclaimer: I'm reading this post for the first time now, as of 1/11/2020. I also already have a broad understanding of the importance of AI safety. While I am skeptical about MIRI's approach to things, I am also a fan of MIRI. Where this puts me relative to the target demographic of this post, I cannot say.]
I think this post is pretty good. It's a solid and well-written introduction to some of the intuitions behind AI alignment and the fundamental research that MIRI does. At the same time, the use of analogy made the post m... (read more)
I just re-read this sequence. Babble has definitely made its way into my core vocabulary. I think of "improving both the Babble and Prune of LessWrong" as being central to my current goals, and I think this post was counterfactually relevant for that. Originally I had planned to vote weakly in favor of this post, but am currently positioning it more at the upper-mid-range of my votes.
I think it's somewhat unfortunate that the Review focused only on posts, as opposed to sequences as a whole. I just re-read this sequence, and I think the posts More Babble, P
This is my post.
I've spent much of the last year thinking about the pedagogical mistakes I made here, and am writing the Reframing Impact sequence to fix them. While this post recorded my 2018-thinking on impact measurement, I don't think it communicated the key insights well. Of course, I'm glad it seems to have nonetheless proven useful and exciting to some people!
If I were to update this post, it would probably turn into a rehash of Reframing Impact. Instead, I'll just briefly state the argument as I would present it today.
Author here. I still endorse the post and have continued to find it pretty central to how I think about myself and nearby ecosystems.
I just submitted some major edits to the post. Changes include:
1. Name change ("Robust, Coherent Agent")
After much hemming and hawing and arguing, I changed the name from "Being a Robust Agent" to "Being a Robust, Coherent Agent." I'm not sure if this was the right call.
It was hard to pin down exactly one "quality" that the post was aiming at. Coherence was the single word that pointed towards "what sort of agent to become." ... (read more)
I was surprised that this post ever seemed surprising, which either means it wasn't revolutionary, or was *very* revolutionary. Since it has 229 karma, seems like it was the latter. I feel like the same post today would have been written with more explicit references to reinforcement learning, reward, addiction, and dopamine. The overall thesis seems to be that you can get a felt sense for these things, which would be surprising - isn't it the same kind of reward-seeking all the way down, including on things that are genuinely valuable? Not sure how to model this.
I still believe this article is a important addition to the discussion of inadequate equilibria. While Scott Alexander's Moloch post and Eliezer Yudkowsky's book are great for introduction and discussion of the topic, both of them fail, in my opinion, to convey the sheer complexity of the problem as it occurs in the real world. That, I think, results in readers thinking about the issue in simple malthusian or naive game-theoretic terms and eventually despairing about inescapability of suboptimal Nash equilibria.
What I try to present is a world
I think about this post a lot, and sometimes in conjunction with my own post on common knowlege.
As well as it being a referent for when I think about fairness, it also ties in with how I think about LessWrong, Arbital and communal online endeavours for truth. The key line is:
For civilization to hold together, we need to make coordinated steps away from Nash equilibria in lockstep.
You can think of Wikipedia as being a set of communally editable web pages where the content of the page is constrained to be that which we can easily gain common knowledge of its
This essay provides some fascinating case studies and insights about coordination problems and their solutions, from a book by Elinor Ostrom. Coordination problems are a major theme in LessWrongian thinking (for good reasons) and the essay is a valuable addition to the discussion. I especially liked the 8 features of sustainable governance systems (although I wish we got a little more explanation for "nested enterprises").
However, I think that the dichotomy between "absolutism (bad)" and "organically grown institutions (good)" that the essay creates needs
The core of this post seems to be this
As Zack_M_Davis points out ... (read more)
Quick authorial review: This post has brought me the greatest joy from other sources referring to it, including Marginal Revolution (https://marginalrevolution.com/marginalrevolution/2018/10/funnel-human-experience.html) and the New York Times bestseller "The Uninhabitable Earth". I was kind of hoping to supply a fact about the world that people could use in many different lights, and they have (see those and also like https://unherd.com/2018/10/why-are-woke-liberals-such-enemies-of-the-past/ )
An unintentional takeaway from this attention is solidifying my
One of the founders of Circling Europe sincerely and apropos-of-nothing thanked me for writing this post earlier this year, which I view as a sign that there were good consequences of me writing this post. My guess is that a bunch of rationalists found their way to Circling, and it was beneficial for people.
I've heard it said that this is one of the more rationalist-friendly summaries of Circling. I don't know it's the best possible such, but I think it's doing OK. I would certainly write it differently now, but shrug.
At this point I&... (read more)
I wrote this post, and at the time I just wrote it because... well, I thought I'd be able to write a post with a grand conclusion about how science used to check the truth, and then point to how it changed, but I was so surprised to find that journals had not one sentence of criticism in them at all. So I wrote it up as a question post instead, framing my failure to answer the question as 'partial work' that 'helped define the question'.
In retrospect, I'm really glad I wrote the post, because it is a clear datapoint about how science does not work. I have
In the hindsight, I still feel that the phenomenon is interesting and potentially important topic to look into. I am not aware of any attempt to replicate or dive deeper though.
As for my attempt to explain the psychology underlying the phenomenon I am not entirely happy with it. It's based only on introspection and lacks sound game-theoretic backing.
By the way, there's one interesting explanation I've read somewhere in the meantime (unfortunately, I don't remember the source):
Cooperation may incur different costs on different participants. If y
Since others have done a contextualized review, I'll aim to do a decoupled review, with a caveat that I think the contextual elements are important for consideration with inclusion into the compendium.
Okay. There’s a social interaction concept that I’ve tried to convey multiple times in multiple conversations, so I’m going to just go ahead and make a graph.
I’m calling this concept “Affordance Widths”.
I'd like to see a clear definition here before launching into an example. In fact, there's no clear ... (read more)
I find it deeply sad that many of us feel the need to frequently link to this article - I don't think I have ever done so, because if I need to explain local validity, then perhaps I'm talking to the wrong people? But certainly the ignoring of this principle has gotten more and more blatant and common over time since this post, so it's becoming less reasonable to assume that people understand such things. Which is super scary.
Hi, I'm pleased to see that this has been nominated and has made a lasting impact.
Do I have any updates? I think it aged well. I'm not making any particular specific claims here, but I still endorse this and think it's an important concept.
I've done very little further thinking on this. I was quietly hoping that others might pick up the mantle and write more on strategies for caring less, as well as cases where this should be argued. I haven't seen this, but I'd love to see more of it.
I've referred to it myself when talking about values that I think people
This post kills me. Lots of great stuff, and I think this strongly makes the cut. Sarah has great insights into what is going on, then turns away from them right when following through would be most valuable. The post is explaining why she and an entire culture is being defrauded by aesthetics. That is it used to justify all sorts of things, including high prices and what is cool, based on things that have no underlying value. How it contains lots of hostile subliminal messages that are driving her crazy. It's very clear. And then she... doesn't see the fnords. So close!
This post should be included in the Best-of-2018 compilation.
This is not only a good post, but one which cuts to the core of what this community is about. This site began not as a discussion of topics X, Y, and Z, but as a discussion of how to be... less wrong than the world around you (even/especially your own ingroup), and the difficulties this entails. Uncompromising honesty and self-skepticism are hard, and even though the best parts are a distillation of other parts of the Sequences, people need to be reminded more often than they need to be instructed.
I'm a bit torn here, because the ideas in the post seem really important/useful to me (e.g., I use these phrases as a mental pointer sometimes), such that I'd want anyone trying to make sense of the human situation to have access to them (via this post or a number of other attempts at articulating much the same, e.g. "Elephant and the Brain"). And at the same time I think there's some crucial misunderstanding in it that is dangerous and that I can't articulate. Voting for it anyhow though.
[Update: the new version is now live!!][Author writing here.]The initial version of this post was written quickly on a whim, but given the value people have gotten from this post (as evidenced by the 2018 Review nomination and reviews), I think it warrants a significant update which I plan to write in time for possibly publication in a book, and ideally the Review voting stage.
Things I plan to include in the update:
I hadn't realized this post was nominated, partially because of my comment, so here's a late review. I basically continue to agree with everything I wrote then, and I continue to like this post for those reasons, and so I support including it in the LW Review.
Since writing the comment, I've come across another argument for thinking about intent alignment -- it seems like a "generalization" of assistance games / CIRL, which itself seems like a formalization of an aligned agent in a toy setting. In assistance games, the agent explici... (read more)
Many people pointed out that the real cost of a Bitcoin in 2011 or whenever wasn't the couple of cents that it cost, but the several hours of work it would take to figure out how to purchase it. And that costs needed to be discounted by the significant risk that a Bitcoin purchased in 2011 would be lost or hacked - or by the many hours of work it would have taken to ensure that didn't happen. Also, that there was another hard problem of not selling your 2011-Bitcoins in 2014. I agree that all of these are problems with the original post, and tha... (read more)
I was going to write a longer review but I realised that Ben’s curation notice actually explains the strengths of this post very well so you should read that!
In terms of including this in the 2018 review I think this depends on what the review is for.
If the review is primarily for the purpose of building common knowledge within the community then including this post maybe isn’t worth it as it is already fairly well known, having been linked from SSC.
On the other hand if the review process is at least partly for, as Raemon put it:
“I wan... (read more)
This is a review of my own post.
The first thing to say is that for the 2018 Review Eli’s mathematicians post should take precedence because it was him who took up the challenge in the first place and inspired my post. I hope to find time to write a review on his post.
If people were interested (and Eli was ok with it) I would be happy to write a short summary of my findings to add as a footnote to Eli’s post if it was chosen for the review.
This was my first post on LessWrong and looking back at it I think it still holds up fairly well.
There... (read more)
Reply: "Firming Up Not-Lying Around Its Edge-Cases Is Less Broadly Useful Than One Might Initially Think"
Review by the author:
I continue to endorse the contents of this post.
I don't really think about the post that much, but the post expresses a worldview that shapes how I do my research - that agency is a mechanical fact about the workings of a system.
To me, the main contribution of the post is setting up a question: what's a good definition of optimisation that avoids the counterexamples of the post? Ideally, this definition would refer or correspond to the mechanistic properties of the system, so that people could somehow statically determine whether a giv
I wrote about this post extensively as part of my essay on Rationalist self-improvement. The general idea of this post is excellent: gathering data for a clever natural experiment of whether Rationalists actually win. Unfortunately, the analysis itself is very lacking and is not very data-driven.
The core result is: 15% of SSC readers who were referred by LessWrong made over $1,000 in crypto, 3% made $100,000. These quantities require quantitative analysis: Is 15%/3% a lot or a little compared to matched groups like the Silicon Valley or Libertarian blogosp... (read more)
It strikes me as pedagogically unfortunate that sections i. and ii. (on arguments and proof-steps being locally valid) are part of the same essay as as sections iii.–vi. (on what this has to do with the function of Law in Society). Had this been written in the Sequences-era, one would imagine this being (at least) two separate posts, and it would be nice to have a reference link for just the concept of argumentative local validity (which is obviously correct and important to have a name for, even if some of the speculations about Law in sections iii.–vi. turned out to be wrong).
I still broadly agree with everything that I said in this post. I do feel that it is a little imprecise, in that I now have much more detailed and gears-y models for many of its claims. However, elaborating on those would require an entirely new post (one which I currently working on) with a sequence's worth of prerequisites. So if I were to edit this post, I would probably mostly leave it as it is, but include a pointer to the new post once it's finished.
In terms of this post being included in a book, it is worth noting that the post situates it... (read more)
The LW team is encouraging authors to review their own posts, so:
In retrospect, I think this post set out to do a small thing, and did it well. This isn't a grand concept or a vast inferential distance, it's just a reframe that I think is valuable for many people to try for themselves.
I still bring up this concept quite a lot when I'm trying to help people through their psychological troubles, and when I'm reflecting on my own.
I don't know whether the post belongs in the Best of 2018, but I'm proud of it.
Insofar as the AI Alignment Forum is part of the Best-of-2018 Review, this post deserves to be included. It's the friendliest explanation to MIRI's research agenda (as of 2018) that currently exists.
I haven't thought about the bat and ball question specifically very much since writing this post, but I did get a lot of interesting comments and suggestions that have sort of been rolling around my head in background mode ever since. Here's a few I wanted to highlight:
Is the bat and ball question really different to the others? First off, it was interesting to see how much agreement there was with my intuition that the bat and ball question was interestingly different to the other two questions in the CRT. Reading through the comments I count four other p
When this article came out, I put a bit of money into alternate cryptocurrencies that I thought might have upside. They are now worth less than I invested.
I think it's good to review how you did in the past, but it's important not to overlearn specific lessons. In retrospect, I think that this article should have put more emphasis on that point.
Scott wonders how anyone could ever find this surprising. I think it's like many things - the underlying concept is obviously there once you point it out, but it's easier not to think about or notice it, and easier not to have a model of what's going on beyond a vague sense that it is there and that this counts as the virtuous level of noticing.
My sense over time of how important this is gets bigger, not smaller, and I see almost no one properly noticing the taste of the Lotus. So this seems like one of the most important posts.
“The Tails Coming Apart as a Metaphor for Life” should be retitled “The Tails Coming Apart as a Metaphor for Earth since 1800.” Scott does three things, 1) he notices that happiness research is framing dependent, 2) he notices that happiness is a human level term, but not specific at the extremes, 3) he considers how this relates to deep seated divergences in moral intuitions becoming ever more apparent in our world.
He hints at why moral divergence occurs with his examples. His extreme case of hedonic utilitarianism, converting... (read more)
Most people who commented on this post seemed to recognise it from their experience and get a general idea of what the different cultures look like (although some people differ on the details, see later). This is partly because it is explained well but also because I think the names were chosen well.
Here are a few people saying that they have used/referenced it: 1, 2, 3 plus me.
From a LW standpoint thinking about this framing helps me to not be offended by blunt comments. My family was very combat culture but in life in general I find people are unwilling ... (read more)
I'm generally in favor of public praise and private criticism, but this post really rubbed me the wrong way. To me it reads as a group of neurotic people getting together to try to get out of neuroticism by being even more neurotic at each other. Or, that in a quest to avoid interacting with the layer of intentions, let's go arbitrarily deep on the recursion stack at the algorithmic/strategy layer of understanding.
Also really bothered by calling a series of reactions spread over time levels of meta. Actually going meta would be paying attention to the structure of the back and forth rather than the individual steps in the back and forth.
I reviewed this post here.
Epistemics: Yes, it is sound. Not because of claims (they seem more like opinions to me), but because it is appropriately charitable to those that disagree with Paul, and tries hard to open up avenues of mutual understanding.
Valuable: Yes. It provides new third paradigms that bring clarity to people with different views. Very creative, good suggestions.
Should it be in the Best list?: No. It is from the middle of a conversation, and would be difficult to understand if you haven't read a lot about the 'Foom debate'.
Improved: The same concepts... (read more)
I didn't feel like I fully understood this post at the time when it was written, but in retrospect it feels like it's talking about essentially the same thing as Coherence Therapy does, just framed differently.
Any given symptom is coherently produced, in other words, by either (1) how the individual strives, without conscious awareness, to carry out strategies for safety or well-being; or (2) how the individual responds to having suffered violations of safety or well-being. This model of symptom production is squarely in accord with the construct
As you would expect from someone who was one of the inspirations for the post, I strongly approve of the insight/advice contained herein. I also agree with the previous review that there is not a known better write-up of this concept. I like that this gets the thing out there compactly.
Where I am disappointed is that this does not feel like it gets across the motivation behind this or why it is so important - I neither read this and think 'yes that explains why I care about this so much' or 'I expect that this would move the needle much on p... (read more)
"Epistemic Status: Confident"?
That's surprising to me.
I skipped past that before reading, and read it as fun, loose speculation. I liked it, as that.
But I wouldn't have thought it deserves "confident".
I'm not sure if I should give it less credence or more, now.
"Caring less" was in the air. People were noticing the phenomenon. People were trying to explain it. In a comment, I realized that I was in effect telling people to care less about things without realizing what I was doing. All we needed was a concise post to crystallize the concept, and eukaryote obliged.
The post, especially the beginning, gets straight to the point. It asks the question of why we don't hear more persuasion in the form of "care less", offers a realistic example and a memorable graphic, and calls to action. This is... (read more)
I still generally endorse this post, though I agree with everyone else's caveats that many arguments aren't like this. The biggest change is that I feel like I have a slightly better understanding of "high-level generators of disagreement" now, as differences in priors, contexts, and categorizations - see my post "Mental Mountains" for more.
I think we should encourage posts which are well-delimited and research based; "here's a question I had, and how I answered it in a finite amount of time" rather than "here's something I've been thinking about for a long time, and here's where I've gotten with it".
Also, this is an engaging topic and well-written.
I feel the "final thoughts" section could be tightened up/shortened, as to me it's not the heart of the piece.
This is the second time I've seen this. Now it seems obvious. I remember liking it the first time, but also remember it being obvious. That second part of the memory is probably false. I think it's likely that this explained the idea so well that I now think it's obvious.
In other words: very well done.
I remember thinking when I originally read this 'oh this is insightful' and then again when I re-read it I had the same thought. Then I realized that's exactly the type of one feels-like-an-insight thinking the review is trying to get us away from! I've never used the concept or even thought about it since I first read the post, nor encountered it elsewhere, despite assuming I would do so. Bad sign.
I understand that this post seems wise to some people. To me, it seems like a series of tautologies on the surface, with an understructure of assumptions that are ultimately far more important and far more questionable. The basic assumption being made is that society-wide "memetic collapse" is a thing; the evidence given for this (even if you follow the links) is weak, and yet the attitude throughout is that further debate on this point is not worth our breath.
I am a co-author of statistics work with somebody whose standards of mathematical rigou... (read more)
This a first pass review that's just sort of organizing my thinking about this post.
This post makes a few different types of claims:
It has a question which is listed although not
I still endorse most of this post, but https://docs.google.com/document/d/1cEBsj18Y4NnVx5Qdu43cKEHMaVBODTTyfHBa8GIRSec/edit has clarified many of these issues for me and helped quantify the ways that science is, indeed, slowing down.
I think it was important to have something like this post exist. However, I now think it's not fit for purpose. In this discussion thread, rohinmshah, abramdemski and I end up spilling a lot of ink about a disagreement that ended up being at least partially because we took 'realism about rationality' to mean different things. rohinmshah thought that irrealism would mean that the theory of rationality was about as real as the theory of liberalism, abramdemski thought that irrealism would mean that the theory of rationality would be about as real as the theo
This post is well written and not over-long. If the concepts it describes are unfamiliar to you, it is a well written introduction. If you're already familiar with them, you can skim it quickly for a warm feeling of validation.
I think the post would be even better with a short introduction describing its topic and scope, but I'm aware that other people have different preferences. In particular:
I do not understand Logical Induction, and I especially don't understand the relationship between it and updating on evidence. I feel like I keep viewing Bayes as a procedure separate from the agent, and then trying to slide LI into that same slot, and it fails because at least LI and probably Bayes are wrongly viewed that way.But this post is what I leaned on to shift from an utter-darkness understanding of LI to a heavy-fog one, and re-reading it has been very useful in that regard. Since I am otherwise not a person who would be expected to understa... (read more)
I love how cleanly this brings up its point and asks the question. My answer is essentially that you can do this if and only if you can create expectation of Successful Failure in some way. Thus, if failing person's real mission can be the friends they made along the way or skills they developed or lessons learned, or they still got a healthy paycheck, or the attempt brings them honor, or whatever, that's huge.
Writing a full response is on my list of things to eventually do, which is rare for posts that are over a year old.
These are an absolute blast. I'm not rating it as important because it all seems so obvious to me that it would go down like this, and it's hard to see why people need convincing, but perhaps they do? Either way, it's great fun to read the examples again.
Note: this is on balance a negative review of the post, at least least regarding the question of whether it should be included in a "Best of LessWrong 2018" compilation. I feel somewhat bad about writing it given that the author has already written a review that I regard as negative. That being said, I think that reviews of posts by people other than the author are important for readers looking to judge posts, since authors may well have distorted views of their own works.
This project (best read in the bolded link, not just in this post) seemed and still seems really valuable to me. My intuitions around "Might AI have discontinuous progress?" become a lot clearer once I see Katja framing them in terms of concrete questions like "How many past technologies had discontinuities equal to ten years of past progress?". I understand AI Impacts is working on an updated version of this, which I'm looking forward to.
I read this post when it initially came out. It resonated with me to such an extent that even three weeks ago, I found myself referencing it when counseling a colleague on how to deal with a student whose heterodoxy caused the colleague to make isolated demands for rigor from this student.
The author’s argument that Nurture Culture should be the default still resonates with me, but I think there are important amendments and caveats that should be made. The author said:
"To a fair extent, it doesn’t even matter if you believe that someone... (read more)
I think awareness of this effect is tremendously important. Your immune system needs to fight cancer (mindless unregulated replication) in order for you to function and pursue any goal with a lower time preference than the mindless replicators. But what's even worse than cancer is a disease that coopts the immune system, leading to a lowered ability to fight off infections in general. People who care about the future are concerned about no-value aligned replication outcompeting human values. But they should also be concerned about agentic processes that specifically undermine the ability to do low time preference work aka antisocial punishers and the things that lead them to exist and flourish.
This my own post. I continue to talk and think a lot about the world from the perspective of solving coordination problems where facilitating the ability for people to build common knowledge is one of the central tools. I'm very glad I wrote the post, it made a lot of my own thinking more rigorous and clear.
This post seems to be making a few claims, which I think can be evaluated separately:1) Decoupling norms exist2) Contextualizing norms exist 3) Decoupling and contextualization norms are useful to think as opposites (either as a dichotomy or spectrum)
(i.e. there are enough people using those norms that it's a useful way to carve up the discussion-landscape)
There's a range of "strong" / "weak" versions of these claims – decoupling and/or contextualization might be principled norms that some people explicitly endorse, or they might just be clusters of t
I support this post being included in the Best-of-2018 Review.
It does a good job of starting with a straightforward concept, and explaining it clearly and vividly (a SlateStarScott special). And then it goes on to apply the concept to another phenomenon (ethical philosophy) and make more sense of an oft-observed phenomenon (the moral revulsion to both branches of thought experiments, sometimes by the same individual).
Reply: "Relevance Norms; Or, Gricean Implicature Queers the Decoupling/Contextualizing Binary" (further counterreplies in the comment section)
I argue that this post should not be included in the Best-of-2018 compilation.
In the comments of this post, Scott Garrabrant says:
I think that Embedded Agency is basically a refactoring of Agent Foundations in a way that gives one central curiosity based goalpost, rather than making it look like a bunch of independent problems. It is mostly all the same problems, but it was previously packaged as "Here are a bunch of things we wish we understood about aligning AI," and in repackaged as "Here is a central mystery of the universe, and here are a bunch things we don't understand about it." It is not a coincidence that they are the sa
I think that Embedded Agency is basically a refactoring of Agent Foundations in a way that gives one central curiosity based goalpost, rather than making it look like a bunch of independent problems. It is mostly all the same problems, but it was previously packaged as "Here are a bunch of things we wish we understood about aligning AI," and in repackaged as "Here is a central mystery of the universe, and here are a bunch things we don't understand about it." It is not a coincidence that they are the sa
I do not think this is a strong analysis. Things were a lot more complicated than this, on many levels. Analyzing that in detail would be more interesting. This post seems more interested in the question of 'what grade should we get for our efforts' than in learning from the situation going forward, which is what I think is the far more interesting problem.
That's not to say that the actual evaluation is especially unfair. I give myself very low marks because I had the trading skills to know better, or I should have had them, and the spare c... (read more)
Big fan of this but, like most of us, I knew all this already. What I want to know is, how effective is/was this when not preaching to the choir? What happens when someone who doesn't understand MIRI's mission starts to read this? I'd like to think it helps them grok what is going on reasonably often, but I could be fooling myself, and that question is ultimately the test of how vital this really is.
I still feel some desire to finish up my "first pass 'help me organize my thoughts' review". I went through the post, organizing various claims and concepts. I came away with the main takeaway "Wowzers there is so much going on in this post. I think this could have been broken up into a full sequence, each post of which was saying something pretty important."
There seem to be four major claims/themes here:
This is very interesting. I do not have a good chance of being able to try this out, so I cannot evaluate any of the claims made directly, but it seems well-written, well-thought, and all in all a top-tier post.
Pretty minimal in and of itself, but has prompted plenty of interesting discussion. Operationally that suggests to me that posts like this should be encouraged, but not by putting them into "best of" compilations.
This does exactly what it sets out to do: presents an issue, shows why we might care, and lays out some initial results (including both intuitive and counterintuitive ones). It's not world-shaking for me, but it certainly carries its weight.
This is truly one of the best posts I've read. It guides the reader through a complex argument in a way that's engaging and inspiring. Great job.
This post is close in my mind to Alex Zhu's post Paul's research agenda FAQ. They each helped to give me many new and interesting thoughts about alignment.
This post was maybe the first time I'd seen a an actual conversation about Paul's work between two people who had deep disagreements in this area - where Paul wrote things, someone wrote an effort-post response, and Paul responded once again. Eliezer did it again in the comments of Alex's FAQ, which also was a big deal for me in terms of learning.
I weakly think this post should be included in Best of LessWrong 2018. Although I'm not an expert, the post seems sound. The writing style is nice and relaxed. The author highlights a natural dichotomy; thinking about Babble/Prune has been useful to me on several occasions. For example, in a research brainstorming / confusion-noticing session, I might notice I'm not generating any ideas (Prune is too strong!). Having this concept handle lets me notice that more easily.
One improvement to this post could be the inclusion of specific examples of how the author used this dichotomy to improve their idea generation process.
I don't recommend this post for the Best-of-2018 Review.
It's an exploration of a fascinating idea, but it'skind of messy and unusually difficult to understand (in the later sections). Moreover, the author isn't even sure whether it's a good concept or one that will be abused, and in addition worries about it becoming a popularized/bastardized concept in a wider circle. (Compare what happened to "virtue signaling".)
I read posts as a beginner - and thinking about a wider-access book format ...Great writing style - very accessible. Honest and informative. A modern-day explorer of the frontiers of the mind and human experience.
1. I'd make this the 1st paragraph: "In recent years, Circling has caught the eye of rationalists... " include a "WTF is circling?" as a question for a wider audience! and the LW bit isn't necessary now.
2. Include a definition for inferential distance for ease of reading to newbies.
3... (read more)
Although normally I am all for judging arguments by their merits, regardless of who speaks them, I think that in this particular case we need to think twice before including the essay in the "Best of 2018" book. The notoriety of the author is such that including it risks serious reputation damage for the community, especially that the content of the essay might be interpreted as a veiled attempt to justify the author's moral transgressions. To be clear, I am not saying we should censor everything that this man ever said, but giving it the spotlight in "Best of 2018" seems like a bad choice.
This is probably the post I got the most value out of in 2018. This is not so much because the precise ideas (although I have got value out of the principle of meta-honesty, directly), but because it was an attempt to understand and resolve a confusing, difficult domain. Eliezer explores various issues facing meta-honesty – the privilege inherent in being fast-talking enough to remain honesty in tricky domains, and the various subtleties of meta-honesty that might make it too subtly a set of rules to coordinate around.
This illustration of "how to contend w
This phenomenon is closely related to "regression towards the mean". It is important, when discussing something like this, to include such jargon names, because there is a lot of existing writing and thought on the topic. Don't reinvent the wheel.
Other than that, it's a fine article.
This is a moderately interesting and well-written example, but did not really surprise me at any point. Worth having, but wouldn't be something I'd go out of my way to recommend.
It's nice to see such an in-depth analysis of the CRT questions. I don't really share drossbucket's intuition - for me the 100 widget question feels counterintuitive the same way as the ball and bat question, but neither feels really aversive, so it was hard for me to appreciate the feelings that generated this post. But this gives a good example of an idea of "training mathematical intuitions" I hadn't thought about before.
This is a nice, simple model for thinking. But I notice that both logic and empiricism sometimes have "shortcuts" — non-obvious ways to shorten, or otherwise substantially robustify, the chain of (logic/evidence). It's reasonable to imagine that intuition/rationality would also have various shortcuts; some that would correspond to logical/empirical shortcuts, and some that would be different. Communication is more difficult when two people are using chains of reasoning that differ substantially in what shortcuts they use. You could ge... (read more)
Basic politeness rules, explained well for people who don't find them obvious, yay!
As I recall, this is a solid, well-written post. Skimming it over again prior to reviewing it, nothing stands out to me as something worth mentioning here. Overall, I probably wouldn't put it on my all-time best list, or re-read it too often, but I'm certainly glad I read it once; it's better than "most" IMO, even among posts with (say) over 100 karma.
I think the primary value of this post is in prompting Benquo's response. That's not nothing, but I don't think it's top-shelf, because it doesn't really explore the game theory of "care more" vs. "care less" attempts between two agents whose root values don't necessarily align.
See next year's post here.
This essay makes a valuable contribution to the vocabulary we use to discuss and think about AI risk. Building a common vocabulary like this is very important for productive knowledge transmission and debate, and makes it easier to think clearly about the subject.
I've had a read of this post.
It seems rather whiny. I'm struggling to see the value to the advancement of rational thinking.
edited to add - Imagine if this was an English comprehension test and the question was "with which character does the author most identify with?"
This post raises some reasonable-sounding and important-if-true hypotheses. There seems to be a vast open space of possible predictions, relevant observations, and alternative explanations. A lot of it has good treatment, but not on LW, as far as I know.
I would recommend this post as an introduction to some ideas and a starting point, but not as a good argument or a basis for any firm conclusions. I hope to see more content about this on LW in the future.
The content here is pretty awesome. I'm a little wary of including it in our review because it is, as author notes, more of a general-audience thing, but it's both a lot of fun and is making important points.
Re-reading this for review was a weird roller-coaster. I had remembered (in 2018) my strong takeaway that aesthetics mattered to rationality, and that "Aesthetic Doublecrux" would be an important innovation.
But I forgot most of the second half of the article. And when I got to it, I had such a "woah" moment that I stopped writing this review, went to go rewrite my conclusion in "Propagating Facts into Aesthetics" and then forgot to finish the actual review. The part that really strikes me is her analysis of Scott:
Sometimes I can almost feel this happ
This essay defines and clearly explains an important property of human moral intuitions: the divergence of possible extrapolations from the part of the state spaces we're used to think about. This property is a challenge in moral philosophy, that has implications on AI alignment and long-term or "extreme" thinking in effective altruism. Although I don't think that it was especially novel to me personally, it is valuable to have a solid reference for explaining this concept.
I support the inclusion of this post in the Best-of-2018 Review.
It's a thorough explicit explanation of a core concept in group epistemology, allowing aspects of social reality to better add up to normality, and so it's extremely relevant to this community.
I enjoyed this post. It brings a more world-wide view to LW (sorely missed in some things I've read here) and makes the important point that we don't all think the same. Experiences can be very different and so are our reactions and reasonings, coming with there own logic. We should not ignore the human element of how the world works.
I would suggest a bit of an edit to move the description of the game with punishment to after the non-punishment results just for ease of reading and absorption.
I also enjoyed reading the supporting material h... (read more)
[Rambly notes while voting.] This post has some merit, but it feels too...jumpy, and, as the initial comments point out, it's unclear in what's being considered "explicit" vs "implicit" communication. Only getting to the comments did I realize that the author's sense of those words was not quite my own.I'm also not sure it's either 1) telling the whole picture, vs 2) correct. A couple of examples are brought, but examples are easy to cherry-pick. The fact that the case brought with Bruce Lee seemed to be in favor of a non-compassionate feels maybe, maybe l
Interesting list, but seems to have a triumphalist bias. I doubt that "50K years ago, nobody could imagine changing the world" is true, and I suspect that "hunter-gathering cultures have actually found locally-optimal ways of life, and were generally happier and healthier than most premodern post-agricultural people" was a much bigger factor than most of these.
See my review here.
I would weakly support this post's inclusion in the Best-of-2018 Review. It's a solid exposition of an important topic, though not a topic that is core to this community.
I would not include this in the Best-of-2018 Review.
While it's good and well-researched, it's more or less a footnote to the Slate Star Codex post linked above. (I think there's an argument for back-porting old SSC posts to LW with Scott's consent, and if that were done I'd have nominated several of those.)
I would like to see a post on this concept included in the best of 2018, but I also agree that there are reputational risks given the author. I'd like to suggest possible compromise - perhaps we could include the concept, but write our own explanation of the concept instead of including this article?
My actual honest reaction to this sort of thing: Please, please stop. This kind of thinking actively drives me and many others I know away from LW/EA/Rationality. I see it strongly as asking the wrong questions with the wrong moral frameworks, and using it to justify abominable conclusions and priorities, and ultimately the betrayal of humanity itself - even if people who talk like this don't write the last line of their arguments, it's not like the rest of us don't notice it. I don't have any idea what to say to someone who writes &apo... (read more)