I mostly feel bad about LessWrong these days. I slightly dread logging on, I don't expect to find much insightful on the website, and think the community has a lot of groupthink / other "ew" factors that are harder for me to pin down (although I think that's improved over the last year or two). I also feel some dread at posting this because it might burn social capital I have with the mods, but whatever.
(Also, most of this stuff is about the community and not directly in the purview of the mods anyways.)
Here are some rambling thoughts, though:
I expect there to be a bunch of responses which strike me as defensive, revisionist gaslighting, and I don't know if/when I'll reply.
I feel pretty frustrated at how rarely people actually bet or make quantitative predictions about existential risk from AI. [...]
I think that alignment "theorizing" is often a bunch of philosophizing and vibing in a way that protects itself from falsification (or even proof-of-work) via words like "pre-paradigmatic" and "deconfusion." I think it's not a coincidence that many of the "canonical alignment ideas" somehow don't make any testable predictions until AI takeoff has begun.
This sentiment resonates strongly with me.
A personal background: I remember getting pretty heavily involved in AI alignment discussions on LessWrong in 2019. Back then I think there were a lot of assumptions people had about what "the problem" was that are, these days, often forgotten, brushed aside, or sometimes even deliberately minimized post-hoc in order to give the impression that the field has a better track record than it actually does. [ETA: but to be clear, I don't mean to say everyone made the same mistake I describe here]
This has been a bit shocking and disorienting to me, honestly, because at the time in 2019 I didn't get the strong impression that people were deliberately constr...
I wrote a fair amount about alignment from 2014-2020[1] which you can read here. So it's relatively easy to get a sense for what I believed.
Here are some summary notes about my views as reflected in that writing, though I'd encourage you to just judge for yourself[2] by browsing the archives:
Some thoughts on my journey in particular:
Suppose in 2024-2029, someone constructs an intelligent robot that is able clean a room to a high level of satisfaction, consistent with the user’s intentions, without any major negative side effects or general issues of misspecification. It doesn’t break any vases while cleaning.
I remember explicit discussion about how solving this problem shouldn't even count as part of solving long-term / existential safety, for example:
"What I understand this as saying is that the approach is helpful for aligning housecleaning robots (using near extrapolations of current RL), but not obviously helpful for aligning superintelligence, and likely stops being helpful somewhere between the two. [...] There is a risk that a large body of safety literature which works for preventing today's systems from breaking vases but which fails badly for very intelligent systems actually worsens the AI safety problem" https://www.lesswrong.com/posts/H7KB44oKoSjSCkpzL/worrying-about-the-vase-whitelisting?commentId=rK9K3JebKDofvJA3x
...Why is it so hard to find people explicitly saying that this specific problem, and the examples illustrating it, were not meant to be seriously representative of the hard parts of
This matches my sense of how a lot of people seem to have... noticed that GPT-4 is fairly well aligned to what the OpenAI team wants it to be, in ways that Yudkowsky et al said would be very hard, and still not view this as at a minimum a positive sign?
Ie problems of the class 'I told the intelligence to get my mother out of the burning building and it blew her up so the dead body flew out the window, this is because I wasn't actually specific enough' just don't seem like they are a major worry anymore?
Usually when GPT-4 doesn't understand what I'm asking, I wouldn't be surprised if a human was confused also.
If I was misreading the blog post at the time, how come it seems like almost no one ever explicitly predicted at the time that these particular problems were trivial for systems below or at human-level intelligence?!?
Quoting the abstract of MIRI's "The Value Learning Problem" paper (emphasis added):
Autonomous AI systems’ programmed goals can easily fall short of programmers’ intentions. Even a machine intelligent enough to understand its designers’ intentions would not necessarily act as intended. We discuss early ideas on how one might design smarter-than-human AI systems that can inductively learn what to value from labeled training data, and highlight questions about the construction of systems that model and act upon their operators’ preferences.
And quoting from the first page of that paper:
...The novelty here is not that programs can exhibit incorrect or counter-intuitive behavior, but that software agents smart enough to understand natural language may still base their decisions on misrepresentations of their programmers’ intent. The idea of superintelligent agents monomaniacally pursuing “dumb”-seeming goals may sound odd, but it follows from the observation of Bostrom an
If that were to happen, I think an extremely natural reading of the situation is that a substantial part of what we thought "the problem" was in value alignment has been solved, from the perspective of this blog post from 2019. That is cause for an updating of our models, and a verbal recognition that our models have updated in this way.
Yet, that's not how I think everyone on LessWrong would react to the development of such a robot. My impression is that a large fraction, perhaps a majority, of LessWrongers would not share my interpretation here, despite the plain language in the post explaining what they thought the problem was. Instead, I imagine many people would respond to this argument basically saying the following:
"We never thought that was the hard bit of the problem. We always thought it would be easy to get a human-level robot to follow instructions reliably, do what users intend without major negative side effects, follow moral constraints including letting you shut it down, and respond appropriately given unusual moral dilemmas. The idea that we thought that was ever the problem is a misreading of what we wrote. The problem was always purely that alignment issues would arise after we far surpassed human intelligence, at which point entirely novel problems will arise."
For what it's worth I do remember lots of people around the MIRI-sphere complaining at the time that that kind of prosaic alignment work was kind of useless, because it missed the hard parts of aligning superintelligence.
Well, for instance, I watched Ryan Carey give a talk at CHAI about how Cooperative Inverse Reinforcement Learning didn't give you corrigibility. (That CIRL didn't tackle the hard part of the problem, despite seeming related on the surface.)
I think that's much more an example of
"Prosaic alignment work is kind of useless because it will actually be easy to get a roughly human-level machine to interpret our commands reliably, do what you want without significant negative side effects, and let you shut it down whenever you want etc. The hard part is doing this for superintelligence."
than of
"Prosaic alignment work is kind of useless because machine learning is natively not very transparent and alignable, and we should focus instead on creating alignable alternatives to ML, or building the conceptual foundations that would let us align powerful AIs."
"Sure, Rohin thought that was a major problem, but we [our organization/thought cluster/ideological group] never agreed with him."
Oh really? Did you ever explicitly highlight this particular disagreement at the time?
FWIW at the time I wasn't working on value learning and wasn't incredibly excited about work in that direction, despite the fact that that's what the rest of my lab was primarily focussed on. I also wrote a blog post in 2020, based off a conversation I had with Rohin in 2018, where I mention how important it is to work on inner alignment stuff and how those issues got brought up by the 'paranoid wing' of AI alignment. My guess is that my view was something like "stuff like reward learning from the state of the world doesn't seem super important to me because of inner alignment etc, but for all I know cool stuff will blossom out of it, so I'm happy to hear about your progress and try to offer constructive feedback", and that I expressed that to Rohin in person.
At this point I think there are a number of potential replies from people who still insist that the LW models of AI alignment were never wrong, which I (depending on the speaker) think can often border on gaslighting:
This is one of the main reasons I'm not excited about engaging with LessWrong. Why bother? It feels like nothing I say will matter. Apparently, no pre-takeoff experiments matter to some folk.[1] And even if I successfully dismantle some philosophical argument, there's a good chance they will use another argument to support their beliefs instead. Nothing changes.
So there we are. It doesn't matter what my experiments say, because (it is claimed) there are no testable predictions before The End. But also, everyone important already knew in advance that it'd be easy to get GPT-4 to interpret and execute your value-laden requests in a human-reasonable fashion. Even though ~no one said so ahead of time.
When talking with pre-2020 alignment folks about these issues, I feel gaslit quite often. You have no idea how many times I've been told things like "most people already understood that reward is not the optimization target"[2] and "maybe you had a lesson you needed ...
I get why you feel that way. I think there are a lot of us on LessWrong who are less vocal and more openminded, and less aligned with either optimistic network thinkers or pessimistic agent foundations thinkers. People newer to the discussion and otherwise less polarized are listening and changing their minds in large or small ways.
I'm sorry you're feeling so pessimistic about LessWrong. I think there is a breakdown in communication happening between the old guard and the new guard you exemplify. I don't think that's a product of venue, but of the sheer difficulty of the discussion. And polarization between different veiwpoints on alignment.
I think maintaining a good community falls on all of us. Formats and mods can help, but communities set their own standards.
I'm very, very interested to see a more thorough dialogue between you and similar thinkers, and MIRI-type thinkers. I think right now both sides feel frustrated that they're not listened to and understood better.
(I didn't follow this argument at the time, so I might be missing key context.)
The blog post "Reward is not the optimization target" gives the following summary of its thesis,
- Deep reinforcement learning agents will not come to intrinsically and primarily value their reward signal; reward is not the trained agent’s optimization target.
- Utility functions express the relative goodness of outcomes. Reward is not best understood as being a kind of utility function. Reward has the mechanistic effect of chiseling cognition into the agent's network. Therefore, properly understood, reward does not express relative goodness and is therefore not an optimization target at all.
I hope it doesn't come across as revisionist to Alex, but I felt like both of these points were made by people at least as early as 2019, after the Mesa-Optimization sequence came out in mid-2019. As evidence, I'll point to my post from December 2019 that was partially based on a conversation with Rohin, who seemed to agree with me,
...consider a simple feedforward neural network trained by deep reinforcement learning to navigate my Chests and Keys environment. Since "go to the nearest key" i
Thanks for the edit :)
As I mentioned elsewhere (not this website) I don't agree with "will reliably lead people to false beliefs", if we're talking about ML people rather than LW people (as was my audience for that blog post).
I do think that it's a reasonable hypothesis to have, and I assign it more likelihood than I would have a year ago (in large part from you pushing some ML people on this point, and them not getting it as fast as I would have expected).
It seems to me that often people rehearse fancy and cool-sounding reasons for believing roughly the same things they always believed, and comment threads don't often change important beliefs. Feels more like people defensively explaining why they aren't idiots, or why they don't have to change their mind. I mean, if so—I get it, sometimes I feel that way too. But it sucks and I think it happens a lot.
My sense is that this is an inevitable consequence of low-bandwidth communication. I have no idea whether you're referring to me or not, and I am really not saying you are doing so, but I think an interesting example (whether you're referring to it or not) are some of the threads recently where we've been discussing deceptive alignment. My sense is that neither of us have been very persuaded by those conversations, and I claim that's not very surprising, in a way that's epistemically defensible for both of us. I've spent literal years working through the topic myself in great detail, so it would be very surprising if my view was easily swayed by a short comment chain—and similarly I expect that the same thing is true of you, where you've spent much more time thinking about this and ...
FWIW, LessWrong does seem—in at least one or two ways—saner than other communities of similar composition. I agree it's better than Twitter overall. But in many ways it seems worse than other communities. I don't know what to do about it, and to be honest I don't have much faith in e.g. the mods.[1]
Hopefully my comments do something anyways, though. I do have some hope because it seems like a good amount has improved over the last year or two.
Despite thinking that many of them are cool people.
Why are you so focused on Eliezer/MIRI yourself? If you think you (or events in general) have adequately shown that their specific concerns are not worth worrying about, maybe turn your attention elsewhere for a bit? For example you could look into other general concerns about AI risk, or my specific concerns about AIs based on shard theory. I don't think I've seen shard theory researchers address many of these yet.
I feel pretty frustrated at how rarely people actually bet or make quantitative predictions about existential risk from AI. EG my recent attempt to operationalize a bet with Nate went nowhere. Paul trying to get Eliezer to bet during the MIRI dialogues also went nowhere, or barely anywhere—I think they ended up making some random bet about how long an IMO challenge would take to be solved by AI. (feels pretty weak and unrelated to me. lame. but huge props to Paul for being so ready to bet, that made me take him a lot more seriously.)
For what it's worth, I would be up for a dialogue or some other context where I can make concrete predictions. I do think it's genuinely hard, since I do think there is a lot of masking of problems going on, and optimization pressure that makes problems harder to spot (both internally in AI systems and institutionally), so asking me to make predictions feels a bit like asking me to make predictions about FTX before it collapsed.
Like, yeah, I expect it to look great, until it explodes. Similarly I expect AI to look pretty great until it explodes. That seems like kind of a core part of the argument for difficulty for me.
I would nevertheless be happy to try to operationalize some bets, and still expect we would have lots of domains where we disagree, and would be happy to bet on those.
Like, yeah, I expect it to look great, until it explodes. Similarly I expect AI to look pretty great until it explodes. That seems like kind of a core part of the argument for difficulty for me.
If your hypothesis smears probability over a wider range of outcomes than mine, while I can more sharply predict events using my theory of how alignment works—that constitutes a Bayes-update towards my theory and away from yours. Right?
"Anything can happen before the explosion" is not a strength for a theory. It's a vulnerability. If probability is better-concentrated by any other theories which make claims about both the present and the future of AI, then the noncommittal theory gets dropped.
Sure, yeah, though like, I don't super understand. My model will probably make the same predictions as your model in the short term. So we both get equal Bayes points. The evidence that distinguishes our models seems further out, and in a territory where there is a decent chance that we will be dead, which sucks, but isn't in any way contradictory with Bayes rule. I don't think I would have put that much probability on us being dead at this point, so I don't think that loses much of any bayes points. I agree that if we are still alive in 20-30 years, then that's definitely bayes points, and I am happy to take that into account then, but I've never had timelines or models that predicted things to look that different from now (or like, where there were other world models that clearly predicted things much better).
My model will probably make the same predictions as your model in the short term.
No, I don't think so. My model(s) I use for AGI risk is an outgrowth of the model I use for normal AI research, and so it makes tons of detailed predictions. That's why my I have weekly fluctuations in my beliefs about alignment difficulty.
Overall question I'm interested in: What, if any, catastrophic risks are posed by advanced AI? By what mechanisms do they arise, and by what solutions can risks be addressed?
Making different predictions. The most extreme prediction of AI x-risk is that AI presents, well, an x-risk. But theories gain and lose points not just on their most extreme predictions, but on all their relevant predictions.
I have a bunch of uncertainty about how agentic/transformative systems will look, but I put at least 50% on "They'll be some scaffolding + natural outgrowth of LLMs." I'll focus on that portion of my uncertainty in order to avoid meta-discussions on what to think of unknown future systems.
I don't know what your model of AGI risk is, but I'm going to point to a cluster of adjacent models and memes which have been popular on LW and point out a bunch of predictions t...
This model naturally predicts things like "it's intractably hard/fragile to get GPT-4 to help people with stuff." Sure, the model doesn't predict this with probability 1, but it's definitely an obvious prediction.
Another point is that I think GPT-4 straightforwardly implies that various naive supervision techniques work pretty well. Let me explain.
From the perspective of 2019, it was plausible to me that getting GPT-4-level behavioral alignment would have been pretty hard, and might have needed something like AI safety via debate or other proposals that people had at the time. The claim here is not that we would never reach GPT-4-level alignment abilities before the end, but rather that a lot of conceptual and empirical work would be needed in order to get models to:
Well, to the surprise of my 2019-self, it turns out that naive RLHF with a cautious supervisor designing the reward model seems basically sufficient to do all of these things in a reas...
What did you think would happen, exactly? I'm curious to learn what your 2019-self was thinking would happen, that didn't happen.
I feel pretty frustrated at how rarely people actually bet or make quantitative predictions about existential risk from AI.
Without commenting on how often people do or don't bet, I think overall betting is great and I'd love to see more it!
I'm also excited how much of it I've seen since Manifold started gaining traction. So I'd like to give a shout out to LessWrong users who are active on Manifold, in particular on AI questions. Some I've seen are:
Good job everyone for betting on your beliefs :)
There are definitely more folks than this: feel free to mention more folks in the comments who you want to give kudos to (though please don't dox anyone who's name on either platforms is pseudonymous and doesn't match the other).
Yeah, I'm not really happy with the state of discourse on this matter either.
I think it's not a coincidence that many of the "canonical alignment ideas" somehow don't make any testable predictions until AI takeoff has begun. 🤔
As a proponent of an AI-risk model that does this, I acknowledge that this is an issue, and I indeed feel pretty defensive on this point. Mainly because, as @habryka pointed out and as I'd outlined before, I think there are legitimate reasons to expect no blatant evidence until it's too late, and indeed, that's the whole reason AI risk is such a problem. As was repeatedly stated.
So all these moves to demand immediate well-operationalized bets read a bit like tactical social attacks that are being unintentionally launched by people who ought to know better, which are effectively exploiting the territory-level insidious nature of the problem to undermine attempts to combat it, by painting the people pointing out the problem as blind believers. Like challenges that you're set up to lose if you take them on, but which make you look bad if you turn them down.
And the above, of course, may read exactly like a defense attempt a particularly self-aware blin...
Your post defending the least forgiving take on alignment basically relies on a sharp/binary property of AGI, and IMO a pretty large crux is that either this property probably doesn't exist, or if it does exist, it is not universal, and IMO I think tends to be overused.
To be clear, I'm increasingly agreeing with a weak version of the hypothesis, and I also think you are somewhat correct, but IMO I dont think your stronger hypothesis is correct, and I think that the lesson of AI progress is that it's less sharp the more tasks you want, and the more general intelligence you want, which is in opposition to your hypothesis on AI progress being sharp.
But in the meanwhile, yeah, discussing the matter just makes me feel weary and tired.
I actually kinda agree with you here, but unfortunately, this is very, very important, since your allies are trying to gain real-life political power over AI, and given this is extremely impactful, it is basically required for us to discuss it.
I feel pretty frustrated at how rarely people actually bet or make quantitative predictions about existential risk from AI. EG my recent attempt to operationalize a bet with Nate went nowhere. Paul trying to get Eliezer to bet during the MIRI dialogues also went nowhere, or barely anywhere—I think they ended up making some random bet about how long an IMO challenge would take to be solved by AI. (feels pretty weak and unrelated to me. lame. but huge props to Paul for being so ready to bet, that made me take him a lot more seriously.)
This paragraph doesn't seem like an honest summary to me. Eliezer's position in the dialogue, as I understood it, was:
Thanks for you feedback. I certainly appreciate your articles and I share many of your views. Reading what you had to say, along with Quentin, Jacob Cannell, Nora was a very welcome alternative take that expanded my thinking and changed my mind. I have changed my mind a lot over the last year, from thinking AI was a long way off and Yud/Bostrom were basically right to seeing that its a lot closer and theories without data are almost always wrong in may ways - e.g. SUSY was expected to be true for decades by most of the world's smartest physicists. Many alignment ideas before GPT3.5 are either sufficiently wrong or irrelevant to do more harm than good.
Especially I think the over dependence on analogy, evolution. Sure when we had nothing to go on it was a start, but when data comes in, ideas based on analogies should be gone pretty fast if they disagree with hard data.
(Some background - I read the site for over 10 years have followed AI for my entire career, have an understanding of Maths, Psychology, and have built and deployed a very small NN model commercially. Also as an aside I remember distinctly being surprised that Yud was skeptical of NN/DL in the earlier days when I considered it obviously where AI progress would come from - I don't have references because I didn't think that would be disputed afterwards)
I am not sure what the silent majority belief on this site is (by people not Karma)? Is Yud's worldview basically right or wrong?
analogies based on evolution should be applied at the evolutionary scale: between competing organizations.
Hi there.
> (High confidence) I feel like the project of thinking more clearly has largely fallen by the wayside, and that we never did that great of a job at it anyways.
I'm new to this community. I've skimmed quite a few articles, and this sentence resonates with me for several reasons.
1) It's very difficult in general to find websites like LessWrong these days. And among the few that exist, I've found that the intellectuals on them are so incredibly doubtful of their own intellect. This creates a sort of Ouroboros phenomenon where int...
I feel pretty frustrated at how rarely people actually bet or make quantitative predictions about existential risk from AI.
I think that might be a result of how the topic is, well, just really fucking grim. I think part of what allows discussion of it and thought about it for a lot of people (including myself) is a certain amount of detachment. "AI doomers" get often accused of being LARPers or not taking their own ideas seriously because they don't act like people who believe the world is ending in 10 years, but I'd flip that around - a person who beli...
I think there are some great points in this comment but I think it's overly negative about the LessWrong community. Sure, maybe there is a vocal and influential minority of individuals who are not receptive to or appreciative of your work and related work. But I think a better measure of the overall community's culture than opinions or personal interactions is upvotes and downvotes which are much more frequent and cheap actions and therefore more representative. For example, your posts such as Reward is not the optimization target have received hundreds of...
No disagreement here that this place does this. I also think we should attempt to change many of these things. However, I don't expect the lesswrong team to do anything sufficiently drastic to counter the hero-worship. Perhaps they could consider hiding usernames by default, hiding vote counts until things have been around for some period of time, or etc.
Hmm, my sense is Eliezer very rarely comments, and the people who do comment a lot don't have a ton of hero worship going on (like maybe Wentworth?). So I don't super believe that hiding usernames would do much about this.
Somewhat relatedly, there have been a good number of times where it seems like I've persuaded someone of A and of A ⟹ B and they still don't believe B, and coincidentally B is unpopular.
Would you mind sharing some specifiexamples? (Not of people of but of beliefs)
I think it's fine for there to be a status hierarchy surrounding "good alignment research". It's obviously bad if that becomes mismatched with reality, as it almost certainly is to some degree, but I think people getting prestige for making useful progress is essentially what happens for it to be done at all.
LessWrong.com is my favorite website. I’ve tried having thoughts on other websites and it didn't work. Seriously, though—I feel very grateful for the effort you all have put in to making this an epistemically sane environment. I have personally benefited a huge amount from the intellectual output of LW—I feel smarter, saner, and more capable of positively affecting the world, not to mention all of the gears-level knowledge I’ve learned, and model building I’ve done as a result, which has really been a lot of fun :) And when I think about what the world would look like without LessWrong.com I mostly just shudder and then regret thinking of such dismal worlds.
Some other thoughts of varying import:
If there were one dial I’d want to experiment with turning on LW it would be writing quality, in the direction of more of it.
I'd like to highlight this. In general, I think fewer things should be promoted to the front page.
[edit, several days later]: https://www.lesswrong.com/posts/SiPX84DAeNKGZEfr5/do-websites-and-apps-actually-generally-get-worse-after is a prime example. This has nothing to do with rationality or AI alignment. This is the sort of off-topic chatter that belongs somewhere else on the Internet.
I’m a huge fan of agree/disagree voting. I think it’s an excellent example of a social media feature that nudges users towards truth, and I’d be excited to see more features like it.
I also enjoy the reacts way more than I expected! They feel aesthetically at home here, especially with reacts for specific parts of the text.
It seems like it would be useful to have it for top-level posts. I love disagree voting and there are massive disparities sometimes between upvotes and agreements that show how useful it is in surfacing good arguments that are controversial.
I think I'm seeing some high effort, topical and well-researched top-level posts die on the vine because of controversial takes that are probably disagree voting. This is not a complaint about my own posts sometimes dying; I've been watching others posts with this hypothesis, and it fits.
I guess there's a reason for not having it on top-level posts, but I miss having it on top-level posts.
I'd like to like this more but I don't have a clear idea of when to up one, up the other, down one, down the other, or down one and up the other.
The EA Forum has this problem worse, but I've started to see it on LessWrong: it feels to me like we have a lot more newbies on the site who don't really get what LW-style rationality is about, and they make LessWrong a less fun place to write because they are regressing discussion norms back towards the mean.
Earlier this year I gave up on EAF because it regressed so far towards the mean that it became useless to me. LW has still been passable but feels like it's been ages since I really got into a good, long, deep thread with somebody on here. Partly that's because I'm busy, but it's also because I'm been quicker to give up because my expectations of having a productive conversation here are now lower. :-(
Do you have any thoughts on what the most common issues you see are or is it more like that every time it is a different issue?
First of all, I appreciate all the work the LessWrong / Lightcone team does for this website.
Maybe there's a lot of boiling feelings out there about the site that never get voiced?
I tend to avoid giving negative feedback unless someone explicitly asks for it. So…here we go.
Over the 1.5 years, I've been less excited about LessWrong than any time since I discovered this website. I'm uncertain to what extent this is because I changed or because the community did. Probably a bit of both.
The most obvious change is the rise of AI Alignment writings on LessWrong. There are two things that bother me about AI Alignment writing.
I have hidden the "AI Alignment" tag from my homepage, but there is still a spillover effect. "Likes unfalsifiable political claims" is the opposite of the kind of community I want to be part of. I think adopting lc's POC || GTFO burden of proof would make AI Alignment dialogue productive, but I am pessimistic about that happening on a collective scale.
When I write about weird ideas, I get three kinds of responses.
Over the years, I feel like I've gotten fewer "yes and" comments and more "we don't want you to say that" comments. This might be because my writing has changed, but I think what's really going on is that this happens to every community as it gets older. What was once radical eventually congeals into dogma.
I used to post my weird ideas immediately to LessWrong. Now I don't, because I feel like the reception on LessWrong would bum me out.[1]
I wonder what fraction of the weirdest writers here feel the same way. I can't remember the last time I've read something on LessWrong and thought to myself, "What a strange, daring, radical idea. It might even be true. I'm scared of what the implications might be." I miss that.[2]
I have learned a lot from reading and writing on LessWrong. Eight months ago, I had an experience where I internalized something very deep about rationality. I felt like I graduated from Level 1 to Level 2.
According to Eliezer Yudkowsky, his target audience for the Sequences was 2nd grade. He missed and ended up hitting college-level. They weren't supposed to be comprehensive. They were supposed to be Level 1. But after that, nobody wrote a Level 2. (The postrats don't count.) I've been trying―for years―to write Level 2, but I feel like a sequence of blog posts is a suboptimal format in 2023. Yudkowsky started writing the Sequences in 2006, when YouTube was still a startup. That leads me to…
The other reason I've been posting less on LessWrong is that I feel like I'm hitting a soft ceiling with what I can accomplish here. I'm nowhere near the my personal skill cap, of course. But there is a much larger potential audience (and therefore impact) if I shifted from writing essays to filming YouTube videos. I can't think of anything LessWrong is doing wrong here. The editor already allows embedded YouTube links.
Over the years, I feel like I've gotten fewer "yes and" comments and more "we don't want you to say that" comments. This might be because my writing has changed, but I think what's really going on is that this happens to every community as it gets older. What was once radical eventually congeals into dogma.
This is the part I'm most frustrated with. It used to be you could say some wild stuff on on this site and people would take you seriously. Now there's a chorus of people who go "eww, gross" if you go too far past what they think should be acceptable. LessWrong culture originally had very high openness to wild ideas. At worst, if you reasoned well and people disagreed, they'd at least ignore you, but now you're more likely to get downvoted for saying controversial things because they are controversial and it feels bad.
This was always a problem, but feels like it's gotten worse.
Huh, I am surprised by this. I agree this is a thing in lots of the internet, but do you have any examples? I feel like we really still have a culture of pretty extreme openness and taking random ideas seriously (enough that sometimes I feel like wild sounding bad ideas get upvoted too much because people like being contrarian a bit too much).
Here's part of a comment on one of my posts. The comment negatively impacted my desire to post deviant ideas on LessWrong.
Bullshit. If your desire to censor something is due to an assessment of how much harm it does, then it doesn't matter how open-minded you are. It's not a variable that goes into the calculation.
I happen to not care that much about the object-level question anymore (at least as it pertains to LessWrong), but on a meta level, this kind of argument should be beneath LessWrong. It's actively framing any concern for unrestricted speech as poorly motivated, making it more difficult to have the object-level discussion.
The comment doesn't represent a fringe opinion. It has +29 karma and +18 agreement.
I think I'm less open to weird ideas on LW than I used to be, and more likely to go "seems wrong, okay, next". Probably this is partly a me thing, and I'm not sure it's bad - as I gain knowledge, wisdom and experience, surely we'd expect me to become better at discerning whether a thing is worth paying attention to? (Which doesn't mean I am better, but like. Just because I'm dismissing more ideas, doesn't mean I'm incorrectly dismissing more ideas.)
But my guess is it's also partly a LW thing. It seems to me that compared to 2013, there are more weird ideas on LW and they're less worth paying attention to on average.
In this particular case... when you talk about "We don’t want you to say that" comments, it sounds to me like those comments don't want you to say your ideas. It sounds like Habryka and other commenters interpreted it that way too.
But my read of the the comment you're talking about here isn't that it's opposed to your ideas. Rather, it doesn't want you to use a particular style of argument, and I agree with it, and I endorse "we don't want bad arguments on LW". I downvoted that post of yours because it seemed to be arguing poorly. It's possible I missed something; I admi...
I wonder what fraction of the weirdest writers here feel the same way. I can't remember the last time I've read something on LessWrong and thought to myself, "What a strange, daring, radical idea. It might even be true. I'm scared of what the implications might be." I miss that.
I thought Genesmith's latest post fully qualified as that!
I totally didn't think adult gene editing was possible, and had dismissed it. It seems like a huge deal if true, and it's the kind of thing I don't expect would have been highlighted anywhere else.
I wonder what fraction of the weirdest writers here feel the same way. I can't remember the last time I've read something on LessWrong and thought to myself, "What a strange, daring, radical idea. It might even be true. I'm scared of what the implications might be." I miss that.
The post about not paying one's taxes was pretty out there and had plenty interesting discussion, but now it's been voted down to the negatives. I wish it was a bit higher (at 0-ish karma, say), which might've happened if people could disagree-vote on it.
But yes, overall this critic...
Another improvement I didn't notice until right now is the "respond to a part of the original post" feature. I feel like it nudges comments away from nitpicking.
The other reason I've been posting less on LessWrong is that I feel like I'm hitting a soft ceiling with what I can accomplish here. I'm nowhere near the my personal skill cap, of course. But there is a much larger potential audience (and therefore impact) if I shifted from writing essays to filming YouTube videos.
There are also writers with a very large reach. A recommendation I saw was to post where most of the people and hence most of the potential readers are, i.e. on the biggest social media sites. If you're trying to have impact as a writer, the reachable audience on LW is much smaller. (Though of course there are other ways of having a bigger impact than just reaching more readers.)
I can't think of anything LessWrong is doing wrong here. The editor already allows embedded YouTube links.
One thing that could help is to be able to have automatic crossposting from your YouTube channel like you can currently have from a blog. It would be even more powerful if it generated a transcript automatically (though that's currently difficult and expansive).
I wonder what fraction of the weirdest writers here feel the same way. I can't remember the last time I've read something on LessWrong and thought to myself, "What a strange, daring, radical idea. It might even be true. I'm scared of what the implications might be." I miss that.
Do you remember any examples from back in the day?
I enjoy your content here and would like to continue reading you as you grow into your next platforms.
YouTube grows your audience in the immediate term, among people who have the tech and time to consume videos. However, text is the lowest common denominator for human communication across longer time scales. Text handles copying and archiving in ways that I don't think we can promise for video on a scale of hundreds of years, let alone thousands. Text handles search with an ease that we can only approximate for video by transcribing it. Transcription is tr...
I just posted a big effortpost and it may have been consigned to total obscurity because I posted it at the wrong time of day. Unsure whether I actually want the recommendation algorithm to have flattened time-discounting over periods with less activity on the site, or if I should just post more strategically in the future.
I have found the dialogues to be generally low-quality to read. The good ones tend to be more interview-like - "I have something I want to talk about but writing a post is harder than talking to a curious interlocutor about it." I think this maybe suggests that I want to see dialogues rebranded to not say "dialogue."
(Note, I don't think it's because it was posted at the wrong time of day. I think it's because the opening doesn't make a clear case for why people should read it.
In my experience posts like this still get a decent amount of attention if they are good, but it takes a lot longer, since it spreads more by word-of-mouth. The initial attention burst of LW is pretty heavily determined by how much the opening paragraphs and title draw people in. I feel kind of sad about that, but also don't have a great alternative to the current HN-style algorithm that still does the other things we need karma/frontpage-sorting algorithm to do)
I have found the dialogues to be generally low-quality to read.
I think overall I've found dialogues pretty good, I've found them useful for understanding people's specific positions and getting people's takes on areas I don't know that well.
My favorite one so far is AI Timelines, which I found useful for understanding the various pictures of how AI development will go in the near term. I liked How useful is mechanistic interpretability? and Speaking to Congressional staffers about AI risk for understanding people's takes on these areas.
AI content for specialists
There is a lot of AI content recently, and it is sometimes of the kind that requires specialized technical knowledge, which I (an ordinary software developer) do not have. Similarly, articles on decision theories are often written in a way that assumes a lot of background knowledge that I don't have. As a result there are many articles I don't even click at, and if I accidentally do, I just sigh and close them.
This is not necessarily a bad thing. As something develops, inferential distances increase. So maybe, as a community we are developing a new science, and I simply cannot keep up with it. -- Or maybe it is all crackpottery; I wouldn't know. (Would you? Are some of us upvoting content they are not sure about, just because they assume that it must be important? This could go horribly wrong.) Which is a bit of a problem for me, because now I can no longer recommend Less Wrong in good faith as a source of rational thinking. Not because I see obviously wrong things, but because there are many things where I have no idea whether they are right or wrong.
We had some AI content and decision theory here since the beginning. But those articles written back then by Eliezer were quite easy to understand, at least for me. For example, "How An Algorithm Feels From Inside" doesn't require anything beyond high-school knowledge. Compare it to "Hypothesis: gradient descent prefers general circuits". Probably something important, but I simply do not understand it.
Just like historically MIRI and CFAR split into two organizations, maybe Less Wrong should too.
Feeling of losing momentum
I miss the feeling that something important is happening right now (and I can be a part of it). Perhaps it was just an illusion, but at the first years of Less Wrong it felt like we were doing something important -- building the rationalist community, inventing the art of everyday rationality, with the perspective to raise the general sanity waterline.
It seems to me that we gave up on the sanity waterline first. The AI is near, we need to focus on the people who will make a difference (whom we could recruit for an AI research), there is no time to care about the general population.
Although recently, this baton was taken over by the Rational Animations team!
Is the rationalist community still growing? Offline, I guess it depends on the country. In Bratislava, where I live, it seems that ~ no one cares about rationality. Or effective altruism. Or Astral Codex Ten. Having five people at a meetup is a big success. Nearby Vienna is doing better, but it is merely climbing back to pre-COVID levels, not growing. Perhaps it is better at some other parts of the world.
Online, new people are still coming. Good.
Also, big thanks to all people who keep this website running.
But still it no longer feels to me anymore like I am here to change the world. It is just another form of procrastination, albeit a very pleasant one. (Maybe because I do not understand the latest AI and decision theory articles; maybe all the exciting things are there.)
Etc.
Some dialogs were interesting, but most are meh.
My greatest personal pet peeve was solved: people no longer talk uncritically about Buddhism and meditation. (Instead of talking more critically they just stopped talking about it at all. Works for me, although I hoped for some rational conclusion.)
It is difficult for me to disentangle what happens in the rationalist community from what happens in my personal life. Since I have kids, I have less free time. If I had more free time, I would probably be recruiting for the local rationality (+adjacent) community, spend more time with other rationalists, maybe even write some articles... so it is possible that my overall impression would be quite different.
(Probably forgot something; I may add some points later.)
Is the rationalist community still growing? Offline, I guess it depends on the country. In Bratislava, where I live, it seems that ~ no one cares about rationality. Or effective altruism. Or Astral Codex Ten. Having five people at a meetup is a big success. Nearby Vienna is doing better, but it is merely climbing back to pre-COVID levels, not growing. Perhaps it is better at some other parts of the world.
I think that starting things that are hard forks of the lesswrong memeplex might be beneficial to being able to grow. Raising the sanity waterline woul...
I love LessWrong. I have better discussions here than anywhere else on the web.
I think I may have a slightly different experience with the site than the modal user because I am not very engaged in the alignment discourse.
I've found the discussions on the posts I've written to be of unusually high quality, especially the things I've written about fertility and polygenic embryo screening.
I concur with other comments about the ability to upvote and agree/disagree with a comment to be a great feature which I use all the time.
My number one requested feature continues to be the ability to see a retention graph on the posts I've written, i.e. where do people get bored and stop reading? After technical accuracy my number one goal is to write something interesting and engaging, but I lack any kind of direct feedback mechanism to optimize my writing in that way.
My number one requested feature continues to be the ability to see a retention graph on the posts I've written, i.e. where do people get bored and stop reading? After technical accuracy my number one goal is to write something interesting and engaging, but I lack any kind of direct feedback mechanism to optimize my writing in that way.
Yeah, I've been wanting something like this for a while. It would require capturing more data and processing a bunch of data than we have historically. Also distinguishing between someone skimming up and down a post and actua...
(low confidence, low context, just an intuition)
I feel as though the LessWrong team should experiment with even more new features, treating the project of maintaining a platform for collective truth-seeking like a tech startup. The design space for such a platform is huge (especially as LLMs get better).
From my understanding, the strategy that startups use to navigate huge design spaces is “iterate on features quickly and observe objective measures of feedback”, which I suspect LessWrong should lean into more. Although, I imagine creating better truth-seeking infrastructure doesn’t have as good of a feedback signal as “acquire more paying users” or “get another round of VC funding”.
This is basically what we do, capped by our team capacity. For most of the last ~2 years, we had ~4 people working full-time on LessWrong plus shared stuff we get from EA Forum team. Since the last few months, we reallocated people from elsewhere in the org and are at ~6 people, though several are newer to working on code. So pretty small startup. Dialogues has been the big focus of late (plus behind the scenes performance optimizations and code infrastructure).
All that to say, we could do more with more money and people. If you know skilled developers willing to live in the Berkeley area, please let us know!
Agreed! Cf. Proposal for improving the global online discourse through personalised comment ordering on all websites -- using LessWrong as the incubator for the first version of the proposed model would actually be critical.
I feel a mix of pleased and frustrated. The main draw for me is AI safety discussion. I dislike the feeling of group-think around stuff, and I value the people who speak up against the group-think with contrary views (e.g. TurnTrout), who post high quality technical content, or well-researched and thought-out posts (e.g. Steven Byrnes).
I feel frustrated at things like feeling that people don't always do a good job of voting comments up based on how valuable/coherent/high-effort the information content is, and then separately voting agree/disagree. I really like this feature, and I wish people gave it more respect. I am pleased that it does as well as it does though.
I like the new emojis and the new dialogues. I'm excited for the site designers to keep trying new (optional) stuff.
The things I'd like more from the site would be if it could split into two: one which was even more in the direction of technical discussion of AI safety, and the other for rationality and philosophy stuff. And then I'd like the technical side to have features like jupyter notebook-based posts for dynamic code demonstrations. And people presenting recent important papers not their own (e.g. from arxiv), for the sake of highlighting/summarizing/sparking-discussion. The weakness of the technical discussion here is, in my opinion, related to the lack of engagement with the wider academic community and empirical evidence.
Ultimately, I don't think it matters much what we do with the site in the longer term because I think things are about to go hockey stick singularity crazy. That's the bet I'm making anyway.
Yeah. The threshold for "okay, you can submit to alignmentforum" is way, way, way too high, and as a result, lesswrong.com is the actual alignmentforum. Attempts to insist otherwise without appropriately intense structural change will be met with lesswrong.com going right on being the alignmentforum.
Ok, slightly off topic, but I just had a wacky notion for how to break-up groupthink as a social phenomenon. You know the cool thing from Audrey Tang's ideas, Polis? What if we did that, but we found 'thought groups' of LessWrong users based on the agreement voting. And then posts/comments which were popular across thought-groups instead of just intensely within a thought group got more weight?
Niclas Kupper tried a LessWrong Polis to gather our opinions a while back. https://www.lesswrong.com/posts/fXxa35TgNpqruikwg/lesswrong-poll-on-agi
I still like the site, though I had to set the AI tag to -100 this year. One thing I wish was a bit different is that I've posted a whole bunch of LW-site-relevant feedback in comments (my natural inclination is to post comprehensive feedback on whatever content I interact with), and for a good fraction of them I've received no official reaction whatso
Hello! This is jacobjacob from the LessWrong / Lightcone team.
This is a meta thread for you to share any thoughts, feelings, feedback or other stuff about LessWrong, that's been on your mind.
Examples of things you might share:
...or anything else!
The point of this thread is to give you an affordance to share anything that's been on your mind, in a place where you know that a team member will be listening.
(We're a small team and have to prioritise what we work on, so I of course don't promise to action everything mentioned here. But I will at least listen to all of it!)
I haven't seen any public threads like this for a while. Maybe there's a lot of boiling feelings out there about the site that never get voiced? Or maybe y'all don't have more to share than what I find out from just reading normal comments, posts, metrics, and Intercom comments? Well, here's one way to find out! I'm really curious to ask and see how people feel about the site.
So, how do you feel about LessWrong these days? Feel free to leave your answers below.