Hello! This is jacobjacob from the LessWrong / Lightcone team. 

This is a meta thread for you to share any thoughts, feelings, feedback or other stuff about LessWrong, that's been on your mind. 

Examples of things you might share: 

  • "I really like agree/disagree voting!"
  • "What's up with all this Dialogues stuff? It's confusing... 
  • "Hm... it seems like recently the vibe on the site has changed somehow... in particular [insert 10 paragraphs]"

...or anything else! 

The point of this thread is to give you an affordance to share anything that's been on your mind, in a place where you know that a team member will be listening. 

(We're a small team and have to prioritise what we work on, so I of course don't promise to action everything mentioned here. But I will at least listen to all of it!)

I haven't seen any public threads like this for a while. Maybe there's a lot of boiling feelings out there about the site that never get voiced? Or maybe y'all don't have more to share than what I find out from just reading normal comments, posts, metrics, and Intercom comments? Well, here's one way to find out! I'm really curious to ask  and see how people feel about the site. 

So, how do you feel about LessWrong these days? Feel free to leave your answers below.

How do you feel about LessWrong these days? [Open feedback thread]
New Answer
New Comment

47 Answers sorted by

TurnTrout

16844

I mostly feel bad about LessWrong these days. I slightly dread logging on, I don't expect to find much insightful on the website, and think the community has a lot of groupthink / other "ew" factors that are harder for me to pin down (although I think that's improved over the last year or two). I also feel some dread at posting this because it might burn social capital I have with the mods, but whatever.

(Also, most of this stuff is about the community and not directly in the purview of the mods anyways.)

Here are some rambling thoughts, though:

  • I think there are pretty good reasons that the broader AI community hasn't taken LW seriously. 
  • I feel a lot of cynicism. I worry that colors my lens here. But I'll just share what I see looking through that lens.
    • Also some of my cynicism comes from annoying-feeling object-level disagreements driving me away from the website. Probably other people are having more fun.
  • (High confidence) I feel like the project of thinking more clearly has largely fallen by the wayside, and that we never did that great of a job at it anyways. 
  • Over time, I've felt myself grow more distant from this community and website. At times, it feels sad. At times, it feels correct. Sometimes it feels both. 
  • (Medium confidence, unsure if relevant to LW itself) In the bay area community, there are lots of professionally relevant events which are de facto gated by how much random people like you on a personal level (namely, the organizers). There’s also a lot of weird social stuff but IDK how relevant that is to LW. 
  • (Medium confidence) It seems to me that often people rehearse fancy and cool-sounding reasons for believing roughly the same things they always believed, and comment threads don't often change important beliefs. Feels more like people defensively explaining why they aren't idiots, or why they don't have to change their mind. I mean, if so—I get it, sometimes I feel that way too. But it sucks and I think it happens a lot.
    • I feel worried that there are a bunch of people with entrenched worldviews who basically never change their minds about anything important. Seems unhealthy on a community level. 
    • Like, there is a way that it feels to be defending yourself or sailing against the winds of counterevidence to your beliefs, and it’s really really important to not do that. Come on guys :(
    • (When Wei_Dai introduced Updateless Decision Theory, it wasn't about this kind of "updatelessness"! :( )
  • (High confidence) I think this community has engaged in a lot of hero worship. I think to some extent have benefited from this, though I don't think I'm the prototype. But, seriously guys, looking back, I think this place has been pretty creepy in some ways. 
    • The way people praise/exalt Eliezer and Paul is just... weird. The times I'd be at an in-person workshop, and people would spend time "ranking" alignment researchers. Feels like a social status horse race, and probably LessWrong has some direct culpability here.
      • But people don't seem to take Eliezer as seriously these days, which I think is great, so maybe it's less of a problem now.
    • I think this is Eliezer's fault in his case and mostly not Paul's fault for his own rep, but IDK. 
  • I think we've kinda patted ourselves on the back for being awesome and ahead of the curve, even though, in terms of alignment, I think we really didn't get anything done until 2022 or so, and a lot of the meaningful progress happened elsewhere. 
  • (Medium confidence) It seems possible to me that "taking ideas seriously" has generally meant something like "being willing to change your life to further the goals and vision of powerful people in the community, or to better accord with socially popular trends", and less "taking unconventional but meaningful bets on your idiosyncratic beliefs."
    • Somewhat relatedly, there have been a good number of times where it seems like I've persuaded someone of and of , and they still don't believe , and coincidentally  is unpopular. 
  • I feel pretty frustrated at how rarely people actually bet or make quantitative predictions about existential risk from AI. EG my recent attempt to operationalize a bet with Nate went nowhere. Paul trying to get Eliezer to bet during the MIRI dialogues also went nowhere, or barely anywhere—I think they ended up making some random bet about how long an IMO challenge would take to be solved by AI. (feels pretty weak and unrelated to me. lame. but huge props to Paul for being so ready to bet, that made me take him a lot more seriously.)
  • (Medium-high confidence) I think that alignment "theorizing" is often a bunch of philosophizing and vibing in a way that protects itself from falsification (or even proof-of-work) via words like "pre-paradigmatic" and "deconfusion." I think it's not a coincidence that many of the "canonical alignment ideas" somehow don't make any testable predictions until AI takeoff has begun. 🤔 

I expect there to be a bunch of responses which strike me as defensive, revisionist gaslighting, and I don't know if/when I'll reply.

3911109521
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

I feel pretty frustrated at how rarely people actually bet or make quantitative predictions about existential risk from AI. [...]

I think that alignment "theorizing" is often a bunch of philosophizing and vibing in a way that protects itself from falsification (or even proof-of-work) via words like "pre-paradigmatic" and "deconfusion." I think it's not a coincidence that many of the "canonical alignment ideas" somehow don't make any testable predictions until AI takeoff has begun.

This sentiment resonates strongly with me. 

A personal background: I remember getting pretty heavily involved in AI alignment discussions on LessWrong in 2019. Back then I think there were a lot of assumptions people had about what "the problem" was that are, these days, often forgotten, brushed aside, or sometimes even deliberately minimized post-hoc in order to give the impression that the field has a better track record than it actually does. [ETA: but to be clear, I don't mean to say everyone made the same mistake I describe here]

This has been a bit shocking and disorienting to me, honestly, because at the time in 2019 I didn't get the strong impression that people were deliberately constr... (read more)

I wrote a fair amount about alignment from 2014-2020[1] which you can read here. So it's relatively easy to get a sense for what I believed.

Here are some summary notes about my views as reflected in that writing, though I'd encourage you to just judge for yourself[2] by browsing the archives:

  • I expected AI systems to be pretty good at predicting what behaviors humans would rate highly, long before they were catastrophically risky. This comes up over and over again in my writing. In particular, I repeatedly stated that it was very unlikely that an AI system would kill everyone because it didn't understand that people would disapprove of that action, and therefore this was not the main source of takeover concerns. (By 2017 I expected RLHF to work pretty well with language models, which was reflected in my research prioritization choices and discussions within OpenAI though not clearly in my public writing.)
  • I consistently expressed that my main concerns were instead about (i) systems that were too smart for humans to understand the actions they proposed, (ii) treacherous turns from deceptive alignment. This comes up a lot, and when I talk about other problems I'm usually clea
... (read more)
Reply6111
6Matthew Barnett
I agree, your past views do look somewhat better. I painted alignment researchers with a fairly broad brush in my original comment, which admittedly might have been unfair to many people who departed from the standard arguments (alternatively, it gives those researchers a chance to step up and receive credit for having been in the minority who weren't wrong). Partly I portrayed the situation like this because I have the sense that the crucial elements of your worldview that led you to be more optimistic were not disseminated anywhere close to as widely as the opposite views (e.g. "complexity of wishes"-type arguments), at least on LessWrong, which is where I was having most of these discussions. My general impression is that it sounds like you agree with my overall take although you think I might have come off too strong. Perhaps let me know if I'm wrong about that impression.

Some thoughts on my journey in particular:

  1. When I joined AI safety in late 2017 (having read approximately nothing in the field), I thought of the problem as "construct a utility function for an AI system to optimize", with a key challenge being the fragility of value. In hindsight this was clearly wrong.
    1. The Value Learning sequence was in large part a result of my journey away from the utility function framing.
    2. That being said, I suspect I continued to think that fragility-of-value type issues were a significant problem, probably until around mid-2019 (see next point).
      1. (I did continue some projects more motivated from a fragility-of-value perspective, partly out of a heuristic of actually finishing things I start, and partly because I needed to write a PhD thesis.)
  2. Early on, I thought of generalization as a key issue for deep learning and expected that vanilla deep learning would not lead to AGI for this reason. Again, in hindsight this was clearly wrong.
    1. I was extremely surprised by OpenAI Five in June 2018 (not just that it worked, but also the ridiculous simplicity of the methods, in particular the lack of any hierarchical RL) and had to think through that.
    2. I spent a while trying to u
... (read more)
[-]Wei Dai2623

Suppose in 2024-2029, someone constructs an intelligent robot that is able clean a room to a high level of satisfaction, consistent with the user’s intentions, without any major negative side effects or general issues of misspecification. It doesn’t break any vases while cleaning.

I remember explicit discussion about how solving this problem shouldn't even count as part of solving long-term / existential safety, for example:

"What I understand this as saying is that the approach is helpful for aligning housecleaning robots (using near extrapolations of current RL), but not obviously helpful for aligning superintelligence, and likely stops being helpful somewhere between the two. [...] There is a risk that a large body of safety literature which works for preventing today's systems from breaking vases but which fails badly for very intelligent systems actually worsens the AI safety problem" https://www.lesswrong.com/posts/H7KB44oKoSjSCkpzL/worrying-about-the-vase-whitelisting?commentId=rK9K3JebKDofvJA3x

Why is it so hard to find people explicitly saying that this specific problem, and the examples illustrating it, were not meant to be seriously representative of the hard parts of

... (read more)
2Matthew Barnett
Two points: * I have a slightly different interpretation of the comment you linked to, which makes me think it provides only weak evidence for your claim. (Though it's definitely still some evidence.) * I agree some people deserve credit for noticing that human-level value specification might be kind of easy before LLMs. I don't mean to accuse everyone in the field of making the same mistake.  Anyway, let me explain the first point.  I interpret Abram to be saying that we should focus on solutions that scale to superintelligence, rather than solutions that only work on sub-superintelligent systems but break down at superintelligence. This was in response to Alex's claim that "whitelisting contributes meaningfully to short- to mid-term AI safety, although I remain skeptical of its robustness to scale." In other words, Alex said (roughly): "This solution seems to work for sub-superintelligent AI, but might not work for superintelligent AI." Abram said in response that we should push against such solutions, since we want solutions that scale all the way to superintelligence. This is not the same thing as saying that any solution to the house-cleaning robot provides negligible evidence of progress, because some solutions might scale. It's definitely arguable, but I think it's likely that any realistic solution to the human-level house cleaning robot problem -- in the strong sense of getting a robot to genuinely follow all relevant moral constraints, allow you to shut it down, and perform its job reliably in a wide variety of environments -- will be a solution that scales reasonably well above human intelligence (maybe not all the way to radical superintelligence, but at the very least I don't think it's negligible evidence of progress). If you merely disagree that any such solutions will scale, and you've been consistent on this point for the last five years, then I guess I'm not really addressing you in my original comment, but I still think what I wrote applie

This matches my sense of how a lot of people seem to have... noticed that GPT-4 is fairly well aligned to what the OpenAI team wants it to be, in ways that Yudkowsky et al said would be very hard, and still not view this as at a minimum a positive sign?

Ie problems of the class 'I told the intelligence to get my mother out of the burning building and it blew her up so the dead body flew out the window, this is because I wasn't actually specific enough' just don't seem like they are a major worry anymore?

 

Usually when GPT-4 doesn't understand what I'm asking, I wouldn't be surprised if a human was confused also.

If I was misreading the blog post at the time, how come it seems like almost no one ever explicitly predicted at the time that these particular problems were trivial for systems below or at human-level intelligence?!? 

Quoting the abstract of MIRI's "The Value Learning Problem" paper (emphasis added):

Autonomous AI systems’ programmed goals can easily fall short of programmers’ intentions. Even a machine intelligent enough to understand its designers’ intentions would not necessarily act as intended. We discuss early ideas on how one might design smarter-than-human AI systems that can inductively learn what to value from labeled training data, and highlight questions about the construction of systems that model and act upon their operators’ preferences.

And quoting from the first page of that paper:

The novelty here is not that programs can exhibit incorrect or counter-intuitive behavior, but that software agents smart enough to understand natural language may still base their decisions on misrepresentations of their programmers’ intent. The idea of superintelligent agents monomaniacally pursuing “dumb”-seeming goals may sound odd, but it follows from the observation of Bostrom an

... (read more)
4Matthew Barnett
I think you missed my point: my original comment was about whether people are updating on the evidence from instruction-tuned LLMs, which seem to actually act on human values (i.e., our actual intentions) quite well, as opposed to mis-specified versions of our intentions. I don't think the Value Learning Problem paper said that it would be easy to make human-level AGI systems act on human values in a behavioral sense, rather than merely understand human values in a passive sense. I suspect you are probably conflating two separate concepts: 1. It is easy to create a human-level AGI that can passively learn and understand human values (I am not saying people said this would be difficult in the past) 2. It is easy to create a human-level AGI that acts on human values, in the sense of actually executing instructions that follow our intentions, rather than following a dangerously mis-specified version of what we asked for. I do not think the Value Learning Paper asserted that (2) was true. To the extent it asserted that, I would prefer to see quotes that back up that claim explicitly. Your quote from the paper illustrates that it's very plausible that people thought (1) was true, but that seems separate to my main point: that people thought (2) was not true. (1) and (2) are separate and distinct concepts. And my comment was about (2), not (1). There is simply a distinction between a machine that actually acts on and executes your intended commands, and a machine that merely understands your intended commands, but does not necessarily act on them as you intend. I am talking about the former, not the latter. From the paper, Indeed, and GPT-4 does not base its decisions on a misrepresentation of its programmers intentions, most of the time. It generally both correctly understands our intentions, and more importantly, actually acts on them!
2quetzal_rainbow
No? GPT-4 predicts text and doesn't care about anything else. Under certain conditions it predicts nice text, under other not very nice and we don't know what happens if we create GPT actually capable to, say, bulid nanotech.

If that were to happen, I think an extremely natural reading of the situation is that a substantial part of what we thought "the problem" was in value alignment has been solved, from the perspective of this blog post from 2019. That is cause for an updating of our models, and a verbal recognition that our models have updated in this way.

Yet, that's not how I think everyone on LessWrong would react to the development of such a robot. My impression is that a large fraction, perhaps a majority, of LessWrongers would not share my interpretation here, despite the plain language in the post explaining what they thought the problem was. Instead, I imagine many people would respond to this argument basically saying the following:

"We never thought that was the hard bit of the problem. We always thought it would be easy to get a human-level robot to follow instructions reliably, do what users intend without major negative side effects, follow moral constraints including letting you shut it down, and respond appropriately given unusual moral dilemmas. The idea that we thought that was ever the problem is a misreading of what we wrote. The problem was always purely that alignment issues would arise after we far surpassed human intelligence, at which point entirely novel problems will arise." 

For what it's worth I do remember lots of people around the MIRI-sphere complaining at the time that that kind of prosaic alignment work was kind of useless, because it missed the hard parts of aligning superintelligence. 

6Matthew Barnett
I agree some people in the MIRI-sphere did say this, and a few of them get credit for pointing out things in this vicinity, but I personally don't remember reading many strong statements of the form: "Prosaic alignment work is kind of useless because it will actually be easy to get a roughly human-level machine to interpret our commands reliably, do what you want without significant negative side effects, and let you shut it down whenever you want etc. The hard part is doing this for superintelligence." My understanding is that a lot of the time the claim was instead something like: "Prosaic alignment work is kind of useless because machine learning is natively not very transparent and alignable, and we should focus instead on creating alignable alternatives to ML, or building the conceptual foundations that would let us align powerful AIs." As some evidence, I'd point to Rob Bensinger's statement that,  I do also think a number of people on LW sometimes said a milder version of the thing I mentioned above, which was something like: "Prosaic alignment work might help us get narrow AI that works well in various circumstances, but once it develops into AGI, becomes aware that it has a shutdown button, and can reason through the consequences of what would happen if it were shut down, and has general situational awareness along with competence across a variety of domains, these strategies won't work anymore." I think this weaker statement now looks kind of false in hindsight, since I think current SOTA LLMs are already pretty much weak AGIs, and so they already seem close to the threshold at which we were supposed to start seeing these misalignment issues come up. But they are not coming up (yet). I think near-term multimodal models will be even closer to the classical "AGI" concept, complete with situational awareness and relatively strong cross-domain understanding, and yet I also expect them to mostly be fairly well aligned to what we want in every relevant be

Well, for instance, I watched Ryan Carey give a talk at CHAI about how Cooperative Inverse Reinforcement Learning didn't give you corrigibility. (That CIRL didn't tackle the hard part of the problem, despite seeming related on the surface.)

I think that's much more an example of 

"Prosaic alignment work is kind of useless because it will actually be easy to get a roughly human-level machine to interpret our commands reliably, do what you want without significant negative side effects, and let you shut it down whenever you want etc. The hard part is doing this for superintelligence."

than of

"Prosaic alignment work is kind of useless because machine learning is natively not very transparent and alignable, and we should focus instead on creating alignable alternatives to ML, or building the conceptual foundations that would let us align powerful AIs."

2Matthew Barnett
This doesn't seem to be the same thing as what I was talking about. Yes, people frequently criticized particular schemes for aligning AI systems, arguing that the scheme doesn't address some key perceived obstacle. By itself, this is pretty different from predicting both: * It will be easy to get behavioral alignment on slightly-sub-AGI, and maybe even par-human systems, including on shutdown problems * The problem is that these schemes don't scale well all the way to radical superintelligence. I remember a lot of people making the second point, but not nearly as many making the first point.
4Eli Tyre
I think I'm missing you then. If (some) people already had the view that this kind of prosaic alignment wouldn't scale to Superintelligence, but didn't express an opinion about whether behavioral alignment of slightly-sub-AGI would be solved, what in what way do you want them to be updating that they're not? Or do you mean they weren't just agnostic about the behavioral alignment of near-AGIs, they specifically thought that it wouldn't be easy? Is that right? 
2Matthew Barnett
Two points: One, I think being able to align AGI and slightly sub-AGI successfully is plausibly very helpful for making the alignment problem easier. It's kind of like learning that we can create more researchers on demand if we ever wanted to. Two, the fact that the methods scale surprisingly well to human-level is evidence that they actually work pretty well in general, even if they don't scale all the way into some radical regime way above human-level. For example, Eliezer talked about how he expected you'd need to solve the suspend button problem by the time your AI has situational awareness, but I think you can interpret this prediction as either becoming increasingly untenable, or that we appear close to a solution to the problem since our AIs don't seem to be resisting shutdown. Again, presumably once you get the aligned AGI, you can use many copies of the aligned AGI to help you with the next iteration, AGI+. This seems plausibly very positive as an update. I can sympathize with those who say it's only a minor update because they never thought the problem was merely aligning human-level AI, but I'm a bit baffled by those who say it's not an update at all from the traditional AI risk models, and are still very pessimistic.
8Eli Tyre
I feel like I'm being obstinate or something, but I think that the linked article is still basically correct, and not particularly untenable.  From the article... The key word in that sentence is "consequentialist". Current LLMs are pretty close (I think!) to having pretty detailed situational awareness. But, as near as I can tell, LLMs are, at best, barely consequentialist. I agree that that is a surprise, on the old school LessWrong / MIRI world view. I had assumed that "intelligence" and "agency" were way more entangled, way more two sides of the same coin, than they apparently are. And the framing of the article focuses on situational awareness and not on consequentialism because of that error. Because Eliezer (and I) thought at the time that situational awareness would come after consequentialist reasoning in the tech tree. But I expect that we'll have consequentialist agents eventually (if not, that's a huge crux for how dangerous I expect AGI to be), and I expect that you'll have "off button" problems at the point when you have "enough" consequentialism aimed at some goal, "enough" strategic awareness, and strong "enough" capabilities that the AI can route around the humans and the human safeguards.  
2Matthew Barnett
In my opinion, the extent to which the linked article is correct is roughly the extent to which the article is saying something trivial and irrelevant. The primary thing I'm trying to convey here is that we now have helpful, corrigible assistants (LLMs) that can aid us in achieving our goals, including alignment, and the rough method used to create these assistants seems to scale well, perhaps all the way to human level or slightly beyond it. Even if the post is technically correct because a "consequentialist agent" is still incorrigible (perhaps by definition), and GPT-4 is not a "consequentialist agent", this doesn't seem to matter much from the perspective of alignment optimism, since we can just build helpful, corrigible assistants to help us with our alignment work instead of consequentialist agents.
6Eli Tyre
A side-note to this conversation, but I basically still buy the quoted text and don't think it now looks false in hindsight. We (apparently) don't yet have models that have robust longterm-ish goals. I don't know how natural it will be for models to end up with long term goals: the MIRI view says that anything that can do science will definitely have long-term planning abilities which fundamentally, entails having goals that are robust to changing circumstances. I don't know if that's true, but regardless, I expect that we'll specifically engineer agents with long term goals. (Whether or not those agents will have "robust" long term goals, over and above what they were prompted to do in a specific situation is also something that I don't know.) What I expect to see is agents that have a portfolio of different drives and goals, some of which are more like consequentialist objectives (eg "I want to make the number in this bank account go up") and some of which are more like deontological injunctions ("always check with my user/ owner before I make a big purchase or take a 'creative' action, one that is outside of my training distribution"). My prediction is that the consequentialist parts of the agent will basically route around any deontological constraints that are trained in.  For instance, the your personal assistant AI does ask your permission before it does anything creative, but also, it's superintelligently persuasive and so it always asks your permission in exactly the way that will result in it accomplishing what it wants. If there are a thousand action sequences in which it asks for permission, it picks the one that has the highest expected value with regard to whatever it wants. This basically nullifies the safety benefit of any deontological injunction, unless there are some injunctions that can't be gamed in this way. To do better than this, it seems like you do have to solve the Agent Foundations problem of corrigibility (getting the agent to be si
0[comment deleted]

"Sure, Rohin thought that was a major problem, but we [our organization/thought cluster/ideological group] never agreed with him."

Oh really? Did you ever explicitly highlight this particular disagreement at the time?

FWIW at the time I wasn't working on value learning and wasn't incredibly excited about work in that direction, despite the fact that that's what the rest of my lab was primarily focussed on. I also wrote a blog post in 2020, based off a conversation I had with Rohin in 2018, where I mention how important it is to work on inner alignment stuff and how those issues got brought up by the 'paranoid wing' of AI alignment. My guess is that my view was something like "stuff like reward learning from the state of the world doesn't seem super important to me because of inner alignment etc, but for all I know cool stuff will blossom out of it, so I'm happy to hear about your progress and try to offer constructive feedback", and that I expressed that to Rohin in person.

2DanielFilan
Of course, the fact that I think the same thing now as I did in 2020 isn't much evidence that I'm right.
[-]TurnTrout15-21

At this point I think there are a number of potential replies from people who still insist that the LW models of AI alignment were never wrong, which I (depending on the speaker) think can often border on gaslighting:

This is one of the main reasons I'm not excited about engaging with LessWrong. Why bother? It feels like nothing I say will matter. Apparently, no pre-takeoff experiments matter to some folk.[1] And even if I successfully dismantle some philosophical argument, there's a good chance they will use another argument to support their beliefs instead. Nothing changes.

So there we are. It doesn't matter what my experiments say, because (it is claimed) there are no testable predictions before The End. But also, everyone important already knew in advance that it'd be easy to get GPT-4 to interpret and execute your value-laden requests in a human-reasonable fashion. Even though ~no one said so ahead of time.

When talking with pre-2020 alignment folks about these issues, I feel gaslit quite often. You have no idea how many times I've been told things like "most people already understood that reward is not the optimization target"[2] and "maybe you had a lesson you needed ... (read more)

I get why you feel that way. I think there are a lot of us on LessWrong who are less vocal and more openminded, and less aligned with either optimistic network thinkers or pessimistic agent foundations thinkers. People newer to the discussion and otherwise less polarized are listening and changing their minds in large or small ways.

I'm sorry you're feeling so pessimistic about LessWrong. I think there is a breakdown in communication happening between the old guard and the new guard you exemplify. I don't think that's a product of venue, but of the sheer difficulty of the discussion. And polarization between different veiwpoints on alignment.

I think maintaining a good community falls on all of us. Formats and mods can help, but communities set their own standards.

I'm very, very interested to see a more thorough dialogue between you and similar thinkers, and MIRI-type thinkers. I think right now both sides feel frustrated that they're not listened to and understood better.

6Rohin Shah
(Presumably you are talking about how reward is not the optimization target.) While I agree that the statement is not literally true, I am still basically on board with that sentence and think it's a reasonable shorthand for the true thing. I expect that I understood the "reward is not the optimization target" point at the time of writing that post (though of course predicting what your ~5-years-ago self knew is quite challenging without specific quotes to refer to). I am confident I understood the point by the time I was working on the goal misgeneralization project (late 2021), since almost every example we created involved predicting ahead of time a specific way in which reward would fail to be the optimization target.

(I didn't follow this argument at the time, so I might be missing key context.)

The blog post "Reward is not the optimization target" gives the following summary of its thesis,

  1. Deep reinforcement learning agents will not come to intrinsically and primarily value their reward signal; reward is not the trained agent’s optimization target.
  2. Utility functions express the relative goodness of outcomes. Reward is not best understood as being a kind of utility function. Reward has the mechanistic effect of chiseling cognition into the agent's network. Therefore, properly understood, reward does not express relative goodness and is therefore not an optimization target at all.

I hope it doesn't come across as revisionist to Alex, but I felt like both of these points were made by people at least as early as 2019, after the Mesa-Optimization sequence came out in mid-2019. As evidence, I'll point to my post from December 2019 that was partially based on a conversation with Rohin, who seemed to agree with me,

consider a simple feedforward neural network trained by deep reinforcement learning to navigate my Chests and Keys environment. Since "go to the nearest key" i

... (read more)
9lc
I have no stake in this debate, but how is this particular point any different than what Eliezer says when he makes the point about humans not optimizing for IGF? I think the entire mesaoptimization concern is built around this premise, no?
5TurnTrout
I didn't mean to imply that you in particular didn't understand the reward point, and I apologize for not writing my original comment more clearly in that respect. Out of nearly everyone on the site, I am most persuaded that you understood this "back in the day." I meant to communicate something like "I think the quoted segment from Rohin and Dmitrii's post is incorrect and will reliably lead people to false beliefs."

Thanks for the edit :)

As I mentioned elsewhere (not this website) I don't agree with "will reliably lead people to false beliefs", if we're talking about ML people rather than LW people (as was my audience for that blog post).

I do think that it's a reasonable hypothesis to have, and I assign it more likelihood than I would have a year ago (in large part from you pushing some ML people on this point, and them not getting it as fast as I would have expected).

[-]evhub6844

It seems to me that often people rehearse fancy and cool-sounding reasons for believing roughly the same things they always believed, and comment threads don't often change important beliefs. Feels more like people defensively explaining why they aren't idiots, or why they don't have to change their mind. I mean, if so—I get it, sometimes I feel that way too. But it sucks and I think it happens a lot.

My sense is that this is an inevitable consequence of low-bandwidth communication. I have no idea whether you're referring to me or not, and I am really not saying you are doing so, but I think an interesting example (whether you're referring to it or not) are some of the threads recently where we've been discussing deceptive alignment. My sense is that neither of us have been very persuaded by those conversations, and I claim that's not very surprising, in a way that's epistemically defensible for both of us. I've spent literal years working through the topic myself in great detail, so it would be very surprising if my view was easily swayed by a short comment chain—and similarly I expect that the same thing is true of you, where you've spent much more time thinking about this and ... (read more)

FWIW, LessWrong does seem—in at least one or two ways—saner than other communities of similar composition. I agree it's better than Twitter overall. But in many ways it seems worse than other communities. I don't know what to do about it, and to be honest I don't have much faith in e.g. the mods.[1]

Hopefully my comments do something anyways, though. I do have some hope because it seems like a good amount has improved over the last year or two.

  1. ^

    Despite thinking that many of them are cool people.

2M. Y. Zuo
There's a caveat here. It's inevitable for communication that veers towards the emotional/subjective/sympathetic. When the average writer tries to compress it down to a few hundred or thousand letters on a screen it does often seem ridiculous. Even from moderately above average writers it often sounds more like anxious upper-middle-class virtue signalling then meaningful conversations.  I think it takes a really really clever writer to make it more substantial than that and escape the perception entirely. On the other hand, discussions of purely objective topics, that are falsifiable and verifiable by independent third parties, don't suffer the same pitfalls.  As long as you really know what you are talking about, or willing to learn, even the below average writer can communicate just fine.
[-]Wei Dai6553

Why are you so focused on Eliezer/MIRI yourself? If you think you (or events in general) have adequately shown that their specific concerns are not worth worrying about, maybe turn your attention elsewhere for a bit? For example you could look into other general concerns about AI risk, or my specific concerns about AIs based on shard theory. I don't think I've seen shard theory researchers address many of these yet.

6TurnTrout
I'll answer this descriptively. * When I trace the dependencies of common alignment beliefs and claims, a lot of them come back to e.g. RFLO and other ideas put forward by the MIRI cluster. Since I often find myself arguing against common alignment claims, I often argue against the historical causes of those ideas, which involves arguing against MIRI-takes. * I'm personally satisfied that their concerns are (generally) not worth worrying about. However, often people in my social circles are not. And such beliefs will probably have real-world consequences for governance.  * Neargroup—I have a few friends who work at MIRI, and debate them on alignment ideas pretty often. I also sometimes work near MIRI people.  * Because I disagree with them very sharply, their claims bother me more and are rendered more salient. * I feel bothered about MIRI still (AFAICT) getting so much funding/attention (even though it's relatively lower than it used to be), because it seems to me that since e.g. 2016 they have released ~zero technical research that helps us align AI in the present or in the future. It's been five years since they stopped disclosing any of their research, and it seems like no one else really cares anymore. That bothers me. As to why I haven't responded to e.g. your concerns in detail: * I currently don't put much value on marginal theoretical research (even in shard theory, which I think is quite a bit better than other kinds of theory).  * I feel less hopeful about LessWrong debate doing much, as I have described elsewhere. It feels like a better use of my time to put my head down, read a bunch of papers, and do good empirical work at GDM. * I am generally worn out of arguing about theory on the website, and have been since last December. (I will note that I have enjoyed our interactions and appreciated your contributions.)
8Wei Dai
Sounds like to the extent that you do have time/energy for theory, you might want to strategically reallocate your attention a bit? I get that you think a bunch of people are wrong and you're worried about the consequences of that, but diminishing returns is a thing, and you could be too certain yourself (that MIRI concerns are definitely wrong). And then empirical versus theory, how much do you worry about architectural changes obsoleting your empirical work? I noticed for example that in image generation GAN was recently replaced by latent diffusion, which probably made a lot of efforts to "control" GAN-based image generation useless. That aside, "heads down empirical work" only makes sense if you picked a good general direction before putting your head down. Should it not worry people that shard theory researchers do not seem to have engaged with (or better yet, preemptively addressed) basic concerns/objections about their approach?
[-]habryka4436

I feel pretty frustrated at how rarely people actually bet or make quantitative predictions about existential risk from AI. EG my recent attempt to operationalize a bet with Nate went nowhere. Paul trying to get Eliezer to bet during the MIRI dialogues also went nowhere, or barely anywhere—I think they ended up making some random bet about how long an IMO challenge would take to be solved by AI. (feels pretty weak and unrelated to me. lame. but huge props to Paul for being so ready to bet, that made me take him a lot more seriously.)

For what it's worth, I would be up for a dialogue or some other context where I can make concrete predictions. I do think it's genuinely hard, since I do think there is a lot of masking of problems going on, and optimization pressure that makes problems harder to spot (both internally in AI systems and institutionally), so asking me to make predictions feels a bit like asking me to make predictions about FTX before it collapsed. 

Like, yeah, I expect it to look great, until it explodes. Similarly I expect AI to look pretty great until it explodes. That seems like kind of a core part of the argument for difficulty for me. 

I would nevertheless be happy to try to operationalize some bets, and still expect we would have lots of domains where we disagree, and would be happy to bet on those.

Like, yeah, I expect it to look great, until it explodes. Similarly I expect AI to look pretty great until it explodes. That seems like kind of a core part of the argument for difficulty for me. 

If your hypothesis smears probability over a wider range of outcomes than mine, while I can more sharply predict events using my theory of how alignment works—that constitutes a Bayes-update towards my theory and away from yours. Right? 

"Anything can happen before the explosion" is not a strength for a theory. It's a vulnerability. If probability is better-concentrated by any other theories which make claims about both the present and the future of AI, then the noncommittal theory gets dropped. 

Sure, yeah, though like, I don't super understand. My model will probably make the same predictions as your model in the short term. So we both get equal Bayes points. The evidence that distinguishes our models seems further out, and in a territory where there is a decent chance that we will be dead, which sucks, but isn't in any way contradictory with Bayes rule. I don't think I would have put that much probability on us being dead at this point, so I don't think that loses much of any bayes points. I agree that if we are still alive in 20-30 years, then that's definitely bayes points, and I am happy to take that into account then, but I've never had timelines or models that predicted things to look that different from now (or like, where there were other world models that clearly predicted things much better). 

My model will probably make the same predictions as your model in the short term.

No, I don't think so. My model(s) I use for AGI risk is an outgrowth of the model I use for normal AI research, and so it makes tons of detailed predictions. That's why my I have weekly fluctuations in my beliefs about alignment difficulty. 

Overall question I'm interested in: What, if any, catastrophic risks are posed by advanced AI? By what mechanisms do they arise, and by what solutions can risks be addressed?

Making different predictions. The most extreme prediction of AI x-risk is that AI presents, well, an x-risk. But theories gain and lose points not just on their most extreme predictions, but on all their relevant predictions. 

I have a bunch of uncertainty about how agentic/transformative systems will look, but I put at least 50% on "They'll be some scaffolding + natural outgrowth of LLMs." I'll focus on that portion of my uncertainty in order to avoid meta-discussions on what to think of unknown future systems.

I don't know what your model of AGI risk is, but I'm going to point to a cluster of adjacent models and memes which have been popular on LW and point out a bunch of predictions t... (read more)

This model naturally predicts things like "it's intractably hard/fragile to get GPT-4 to help people with stuff." Sure, the model doesn't predict this with probability 1, but it's definitely an obvious prediction.

Another point is that I think GPT-4 straightforwardly implies that various naive supervision techniques work pretty well. Let me explain.

From the perspective of 2019, it was plausible to me that getting GPT-4-level behavioral alignment would have been pretty hard, and might have needed something like AI safety via debate or other proposals that people had at the time. The claim here is not that we would never reach GPT-4-level alignment abilities before the end, but rather that a lot of conceptual and empirical work would be needed in order to get models to:

  1. Reliably perform tasks how I intended as opposed to what I literally asked for
  2. Have negligible negative side effects on the world in the course of its operation
  3. Responsibly handle unexpected ethical dilemmas in a way that is human-reasonable

Well, to the surprise of my 2019-self, it turns out that naive RLHF with a cautious supervisor designing the reward model seems basically sufficient to do all of these things in a reas... (read more)

What did you think would happen, exactly? I'm curious to learn what your 2019-self was thinking would happen, that didn't happen.

6Wei Dai
On the other hand, it could be considered bad news that IDA/Debate/etc. haven't been deployed yet, or even that RLHF is (at least apparently) working as well as it is. To quote a 2017 post by Paul Christiano (later reposted in 2018 and 2019): It seems that AI labs are not yet actually holding themselves to producing scalable systems, and it may well be better if RLHF broke down in some obvious way before we reach potentially dangerous capabilities, to force them to do that. (I've pointed Paul to this thread to get his own take, but haven't gotten a response yet.) ETA: I should also note that there is a lot of debate about whether IDA and Debate are actually scalable or not, so some could consider even deployment of IDA or Debate (or these techniques appearing to work well) to be bad news. I've tended to argue on the "they are too risky" side in the past, but am conflicted because maybe they are just the best that we can realistically hope for and at least an improvement over RLHF?
6ryan_greenblatt
I think these methods are pretty clearly not indefinitely scalable, but they might be pretty scalable. E.g., perhaps scalable to somewhat smarter than human level AI. See the ELK report for more discussion on why these methods aren't indefinitely scalable. A while ago, I think Paul had maybe 50% that with simple-ish tweaks IDA could be literally indefinitely scalable. (I'm not aware of an online source for this, but I'm pretty confident this or something similar is true.) IMO, this seems very predictably wrong. TBC, I don't think we should necessarily care very much about whether a method is indefinitely scalable.
4ryan_greenblatt
Sometimes people do seem to think that debate or IDA could be indefinitely scalable, but this just seems pretty wrong to me (what is your debate about alphafold going to look like...).
3Ansh Radhakrishnan
I think the first presentation of the argument that IDA/Debate aren't indefinitely scalable was in Inaccessible Information, fwiw.
3Daniel Kokotajlo
I've been struggling with whether to upvote or downvote this comment btw. I think the point about how it's really important when RLHF breaks down and more attention needs to be paid to this is great. But the other point about how RLHF hasn't broke yet and this is evidence against the standard misalignment stories is very wrong IMO. For now I'll neither upvote nor downvote.
2Daniel Kokotajlo
I agree that if RLHF scaled all the way to von neumann then we'd probably be fine. I agree that the point at which RLHF breaks down is enormously important to overall alignment difficulty. I think if you had described to me in 2019 how GPT4 was trained, I would have correctly predicted its current qualitative behavior. I would not have said that it would do 1, 2, or 3 to a greater extent than it currently does. I'm in neither category (1) or (2); it's a false dichotomy.
2Matthew Barnett
The categories were conditioned on whether you're "not updating at all on observations about when RLHF breaks down". Assuming you are updating, then I think you're not really the the type of person who I'm responding to in my original comment.  But if you're not updating, or aren't updating significantly, then perhaps you can predict now when you expect RLHF to "break down"? Is there some specific prediction that you would feel comfortable making at this time, such that we could look back on this conversation in 2-10 years and say "huh, he really knew broadly what would happen in the future, specifically re: when alignment would start getting hard"? (The caveat here is that I'd be kind of disappointed by an answer like "RLHF will break down at superintelligence" since, well, yeah, duh. And that would not be very specific.)
6Daniel Kokotajlo
I'm not updating significantly because things have gone basically exactly as I expected. As for when RLHF will break down, two points: (1) I'm not sure, but I expect it to happen for highly situationally aware, highly agentic opaque systems. Our current systems like GPT4 are opaque but not very agentic and their level of situational awareness is probably medium. (Also: This is not a special me-take. This is basically the standard take, no? I feel like this is what Risks from Learned Optimization predicts too.) (2) When it breaks down I do not expect it to look like the failures you described -- e.g. it stupidly carries out your requests to the letter and ignores their spirit, and thus makes a fool of itself and is generally thought to be a bad chatbot. Why would it fail in that way? That would be stupid. It's not stupid. (Related question: I'm pretty sure on r/chatgpt you can find examples of all three failures. They just don't happen often enough, and visibly enough, to be a serious problem. Is this also your understanding? When you say these kinds of failures don't happen, you mean they don't happen frequently enough to make ChatGPT a bad chatbot?)
2Daniel Kokotajlo
Re: Missing the point: How? Re: Elaborating: Sure, happy to, but not sure where to begin. All of this has been explained before e.g. in Ajeya's Training Game report for example. Also Joe Carlsmith's thing. Also the original mesaoptimizers paper, though I guess it didn't talk about situational awareness idk. Would you like me to say more about what situational awareness is, or what agency is, or why I think both of those together are big risk factors for RLHF breaking down?
1Ann
From a technical perspective I'm not certain if Direct Preference Optimization is theoretically that much different from RLHF beyond being much quicker and lower friction at what it does, but so far it seems like it has some notable performance gains over RLHF in ways that might indicate a qualitative difference in effectiveness. Running a local model with a bit of light DPO training feels more intent-aligned compared to its non-DPO brethren in a pretty meaningful way. So I'd probably be considering also how DPO scales, at this point. If there is a big theoretical difference, it's likely in not training a separate model, and removing whatever friction or loss of potential performance that causes.
2Daniel Kokotajlo
What does this mean? I don't know as much about CNNs as you -- are you saying that their architecture allows for the reuse of internal representations, such that redundancy should never arise? Or are you saying that the goal square shouldn't be representable by this architecture?
4Vladimir_Nesov
There is a reference class judgement in this. If I have a theory of good moves in Go (and absently dabble in chess a little bit), while you have a great theory of chess, looking at some move in chess shouldn't lead to a Bayes-update against ability of my theory to reason about Go. The scope of classical alignment worries is typically about the post-AGI situation. If it manages to say something uninformed about the pre-AGI situation, that's something out of its natural scope, and shouldn't be meaningful evidence either way. I think the correct way of defeating classical alignment worries (about the post-AGI situation) is on priors, looking at the arguments themselves, not on observations where the theory doesn't expect to have clear or good predictions (and empirically doesn't). If the arguments appear weak, there is no recourse without observation of the post-AGI world, it remains weak at least until then. Even if it happened to have made good predictions about the current situation, it shouldn't count in its favor.
4lc
He didn't say "anything can happen before AI explodes". He said "I expect AI to look pretty great until it explodes." And he didn't say that his model about AGI safety generated that prediction; maybe his model about AGI safety generates some long-run predictions and then he's using other models to make the "look pretty great" prediction.

I feel pretty frustrated at how rarely people actually bet or make quantitative predictions about existential risk from AI.

Without commenting on how often people do or don't bet, I think overall betting is great and I'd love to see more it! 

I'm also excited how much of it I've seen since Manifold started gaining traction. So I'd like to give a shout out to LessWrong users who are active on Manifold, in particular on AI questions. Some I've seen are:

Rob Bensinger 

Jonas Vollmer 

Arthur Conmy 

Jaime Sevilla Molina 

Isaac King 

Eliezer Yudkowsky 

Noa Nabeshima 

Mikhail Samin 

Daniel Filan 

Daniel Kokotajlo 

Zvi 

Eli Tyre 

Ben Pace 

Allison Duettmann 

Matthew Barnett 

Peter Barnett 

Joe Brenton 

Austin Chen 

lc 

Good job everyone for betting on your beliefs :) 

There are definitely more folks than this: feel free to mention more folks in the comments who you want to give kudos to (though please don't dox anyone who's name on either platforms is pseudonymous and doesn't match the other). 

4Nathan Helm-Burger
Here's a couple of mine:  
3Nathan Young
Yeah I mean the answer is, just make prediction markets and bet on them. I think we are getting a lot better at that. (Also I'm a lesswrong user who makes a lot of prediction markets about AI) In particular: * A real money version of Yud and Paul's bet https://polymarket.com/event/will-an-ai-win-the-5-million-ai-math-olympiad-prize-before-august?tid=1702634083181 * An attempt at clustering the best AI progress markets into a dashboard https://manifold.markets/dashboard/ai-progress 

Yeah, I'm not really happy with the state of discourse on this matter either.

I think it's not a coincidence that many of the "canonical alignment ideas" somehow don't make any testable predictions until AI takeoff has begun. 🤔 

As a proponent of an AI-risk model that does this, I acknowledge that this is an issue, and I indeed feel pretty defensive on this point. Mainly because, as @habryka pointed out and as I'd outlined before, I think there are legitimate reasons to expect no blatant evidence until it's too late, and indeed, that's the whole reason AI risk is such a problem. As was repeatedly stated.

So all these moves to demand immediate well-operationalized bets read a bit like tactical social attacks that are being unintentionally launched by people who ought to know better, which are effectively exploiting the territory-level insidious nature of the problem to undermine attempts to combat it, by painting the people pointing out the problem as blind believers. Like challenges that you're set up to lose if you take them on, but which make you look bad if you turn them down.

And the above, of course, may read exactly like a defense attempt a particularly self-aware blin... (read more)

Your post defending the least forgiving take on alignment basically relies on a sharp/binary property of AGI, and IMO a pretty large crux is that either this property probably doesn't exist, or if it does exist, it is not universal, and IMO I think tends to be overused.

To be clear, I'm increasingly agreeing with a weak version of the hypothesis, and I also think you are somewhat correct, but IMO I dont think your stronger hypothesis is correct, and I think that the lesson of AI progress is that it's less sharp the more tasks you want, and the more general intelligence you want, which is in opposition to your hypothesis on AI progress being sharp.

But in the meanwhile, yeah, discussing the matter just makes me feel weary and tired.

I actually kinda agree with you here, but unfortunately, this is very, very important, since your allies are trying to gain real-life political power over AI, and given this is extremely impactful, it is basically required for us to discuss it.

7Thane Ruthenis
There's a bit of "one man's modus ponens is another's modus tollens" going on. I assume that when you look at a new AI model, and see how it's not doing instrumental convergence/value reflection/whatever, you interpret it as evidence against "canonical" alignment views. I interpret it as evidence that it's not AGI yet; or sometimes, even evidence that this whole line of research isn't AGI-complete. E. g., I've updated all the way on this in the case of LLMs. I think you can scale them a thousandfold, and it won't give you AGI. I'm mostly in favour of doing that, too, or at least fully realizing the potential of the products already developed. Probably same for Gemini and Q*. Cool tech. (Well, there are totalitarianism concerns, I suppose.) I also basically agree with all the takes in the recent "AI is easy to control" post. But what I take from it isn't "AI is safe", it's "the current training methods aren't gonna give you AGI". Because if you put a human – the only known type of entity with the kinds of cognitive capabilities we're worrying about – into a situation isomorphic to a DL AI's, the human would exhibit all the issues we're worrying about. Like, just because something has a label of "AI" and is technically an AI doesn't mean studying it can give you lessons about "AGI", the scary lightcone-eating thing all the fuss is about, yeah? Any more than studying GOFAI FPS bots is going to teach you lessons about how LLMs work? And that the Deep Learning paradigm can probably scale to AGI doesn't mean that studying the intermediary artefacts it's currently producing can teach us much about the AGI it'll eventually spit out. Any more than studying a MNIST-classifier CNN can teach you much about LLMs; any more than studying squirrel neurology can teach you much about winning moral-philosophy debates. That's basically where I'm at. LLMs and such stuff is just in the entirely wrong reference class for studying "generally intelligent"/scary systems.
5Noosphere89
No, but my point here is that once we increase the complexity of the domain, and require more tasks to be done, things start to smooth over, and we don't have nearly as sharp. I suspect a big part of that is the effects of Amdahl's law kicking in combined with Baumol's cost disease and power law scaling, which means you are always bottlenecked on the least automatable and doable tasks, so improvements in one area like Go don't exactly matter as much as you think. I'd say the main lesson of AI progress, one that might even have been formulatable in the 1970s-1980s days, is that compute and data were the biggest factors, by a wide margin, and these grow smoothly. Only now are algorithms starting to play a role, and even then, it's only because of the fact that transformers turn out to be fairly terrible at generalizing or doing stuff, which is related to your claim about LLMs being not real AGI, but I think this effect is weaker than you think, and I'm sympathetic to the continuous view as well. There probably will be some discontinuities, but IMO LWers have fairly drastically overstated how discontinuous progress was, especially if we realize that a lot of the outliers were likely simpler than the real world (Though Go comes close to it, at least for it's domain, the problem is that the domain is far too small to matter.) I think this roughly tracks how we updated, though there was a brief phase where I became more pessimistic as I learned that LLMs probably wasn't going to scale to AGI, and broke a few of my alignment plans, but I found other reasons to be more optimistic that didn't depend on LLMs nearly as much. My worry is that while I think it's fine enough to update towards "it's not going to have any impact on anything, and that's the reason it's safe." I worry that this is basically defining away the possibility of safety, and thus making the model useless: I think a potential crux here is whether to expect some continuity at all, or whether there is rea
5Thane Ruthenis
As far as producing algorithms that are able to, once trained on a vast dataset of [A, B] samples, interpolate a valid completion B for an arbitrary prompt sampled from the distribution of A? Yes, for sure. As far as producing something that can genuinely generalize off-distribution, strike way outside the boundaries of interpolation? Jury's still out. Like, I think my update on all the LLM stuff is "boy, who knew interpolation can get you this far?". The concept-space sure turned out to have a lot of intricate structure that could be exploited via pure brute force. Oh, I didn't mean "if we could hook up a flesh-and-blood human (or a human upload) to the same sort of cognition-shaping setup as we subject our AIs to". I meant "if the forward-pass of an LLM secretly simulated a human tasked with figuring out what token to output next", but without the ML researchers being aware that it's what's going on, and with them still interacting with the thing as with a token-predictor. It's a more literal interpretation of the thing sometimes called an "inner homunculus". I'm well aware that the LLM training procedure is never going to result in that. I'm just saying that if it did, and if the inner homunculus became smart enough, that'd cause all the deceptive-alignment/inner-misalignment/wrapper-mind issues. And that if you're not modeling the AI as being/having a homunculus, you're not thinking about an AGI, so it's no wonder the canonical AI-risk arguments fail for that system and it's no wonder it's basically safe.
3Noosphere89
I'd say this still applies even to non-LLM architectures like RL, which is the important part, but Jacob Cannell and 1a3orn will have to clarify. I agree, but with a caveat, in that I think we do have enough evidence to rule out extreme importance on algorithms, ala Eliezer, and compute is not negligible. Epoch estimates a 50/50 split between compute and algorithmic progress being important. Algorithmic progress will likely matter IMO, just not nearly as much as some LWers think it is. I definitely updated something in this direction, which is important, but I now think the AI optimist arguments are general enough to not rely on LLMs, and sometimes not even relying on a model of what future AI will look like beyond the fact that capabilities will grow, and people expect to profit from it. Not automatically, and there are potential paths to AGI like Steven Byrnes's path to Brain-like AGI that either outright avoid deceptive alignment altogether or make it far easier to solve (the short answer is that Steven Byrnes suspects there's a simple generator of value, so simple that it's dozens of lines long and if that's the case, then the corrigible alignment/value learning agent's simplicity gap is either 0, negative, or a very small positive gap, so small that very little data is required to pick out the honest value learning agent over the deceptive aligned agent, and we have a lot of data on human values, so this is likely to be pretty easy.) I think a crux is that I think that AIs will basically always have much more white-boxness to them than any human mind, and I think that a lot of future paradigms of AI, including the ones that scale to superintelligence, that the AI control research is easier point to still mostly be true, especially since I think AI control is fundamentally very profitable and AIs have no legal rights/IRB boards to slow down control research.
9Thane Ruthenis
Mm, I think the "algorithms vs. compute" distinction here doesn't quite cleave reality at its joints. Much as I talked about interpolation before, it's a pretty abstract kind of interpolation: LLMs don't literally memorize the data points, their interpolation relies on compact generative algorithms they learn (but which, I argue, are basically still bounded by the variance in the data points they've been shown). The problem of machine learning, then, is in finding some architecture + training-loop setup that would, over the course of training, move the ML model towards implementing some high-performance cognitive algorithms. It's dramatically easier than hard-coding the algorithms by hand, yes, and the learning algorithms we do code are very simple. But you still need to figure out in which direction to "push" your model first. (Pretty sure if you threw 2023 levels of compute at a Very Deep fully-connected NN, it won't match a modern LLM's performance, won't even come close.) So algorithms do matter. It's just our way of picking the right algorithms consists of figuring out the right search procedure for these algorithms, then throwing as much compute as we can at it.  So that's where, I would argue, the sharp left turn would lie. Not in-training, when a model's loss suddenly drops as it "groks" general intelligence. (Although that too might happen.) It would happen when the distributed optimization process of ML researchers tinkering with training loops stumbles upon a training setup that actually pushes the ML model in the direction of the basin of general intelligence. And then that model, once scaled up enough, would suddenly generalize far off-distribution. (Indeed, that's basically what happened in the human case: the distributed optimization process of evolution searched over training architectures, and eventually stumbled upon one that was able to bootstrap itself into taking off. The "main" sharp left turn happens during the architecture search, not duri
5jacob_cannell
Actually I think the evidence is fairly conclusive that the human brain is a standard primate brain with the only change being nearly a few compute scale dials increased (the number of distinct gene changes is tiny - something like 12 from what I recall). There is really nothing special about the human brain other than 1.) 3x larger than expected size, and 2.) extended neotany (longer training cycle). Neuroscientists have looked extensively for other 'secret sauce' and we now have some confidence in a null result: no secret sauce, just much more training compute.
4Thane Ruthenis
Yes, but: whales and elephants have brains several times the size of humans, and they're yet to build an industrial civilization. I agree that hitting upon the right architecture isn't sufficient, you also need to scale it up – but scale alone doesn't suffice either. You need a combination of scale, and an architecture + training process that would actually transmute the greater scale into more powerful cognitive algorithms. Evolution stumbled upon the human/primate template brain. One of the forks of that template somehow "took off" in the sense of starting to furiously select for larger brain size. Then, once a certain compute threshold was reached, it took a sharp left turn and started a civilization. The ML-paradigm analogue would, likewise, involve researchers stumbling upon an architecture that works well at some small scales and has good returns on compute. They'll then scale it up as far as it'd go, as they're wont to. The result of that training run would spit out an AGI, not a mere bundle of sophisticated heuristics. And we have no guarantees that the practical capabilities of that AGI would be human-level, as opposed to vastly superhuman. (Or vastly subhuman. But if the maximum-scale training run produces a vastly subhuman AGI, the researchers would presumably go back to the drawing board, and tinker with the architectures until they selected for algorithms with better returns on intelligence per FLOPS. There's likewise no guarantees that this higher-level selection process would somehow result in an AGI of around human level, rather than vastly overshooting it the first time they properly scale it up.)
6jacob_cannell
Size/capacity isn't all, but In terms of the capacity which actually matters (synaptic count, and upper cortical neuron count) - from what I recall elephants are at great ape cortical capacity, not human capacity. A few specific species of whales may be at or above human cortical neuron capacity but synaptic density was still somewhat unresolved last I looked. Human language/culture is more the cause of our brain expansion, not just the consequence. The human brain is impressive because of its relative size and oversized cost to the human body. Elephants/whales are huge and their brains are much smaller and cheaper comparatively. Our brains grew 3x too large/expensive because it was valuable to do so. Evolution didn't suddenly discover some new brain architecture or trick (it already had that long ago). Instead there were a number of simultaneous whole body coadapations required for larger brains and linguistic technoculture to take off: opposable thumbs, expressive vocal cords, externalized fermentation (gut is as energetically expensive as brain tissue - something had to go), and yes larger brains, etc. Language enabled a metasystems transition similar to the origin of multicelluar life. Tribes formed as new organisms by linking brains through language/culture. This is not entirely unprecedented - insects are also social organisms of course, but their tiny brains aren't large enough for interesting world models. The resulting new human social organisms had inter generational memory that grew nearly unbounded with time and creative search capacity that scaled with tribe size. You can separate intelligence into world model knowledge (crystal intelligence) and search/planning/creativity (fluid intelligence). Humans are absolutely not special in our fluid intelligence - it is just what you'd expect for a large primate brain. Humans raised completely without language are not especially more intelligent than animals. All of our intellectual super powers are cultural.
4jacob_cannell
We've basically known how to create AGI for at least a decade. AIXI outlines the 3 main components: a predictive world model, a planning engine, and a critic. The brain also clearly has these 3 main components, and even somewhat cleanly separated into modules - that's been clear for a while. Transformers LLMs are pretty much exactly the type of generic minimal ULM arch I was pointing at in that post (I obviously couldn't predict the name but). On a compute scaling basis GPT4 training at 1e25 flops uses perhaps a bit more than human brain training, and its clearly not quite AGI - but mainly because it's mostly just a world model with a bit of critic: planning is still missing. But its capabilities are reasonably impressive given that the architecture is more constrained than a hypothetical more directly brain equivalent fast-weight RNN of similar size. Anyway I don't quite agree with the characterization that these models are just " interpolating valid completions of any arbitrary prompt sampled from the distribution". Human intelligence also varies widely on a spectrum with tradeoffs between memorization and creativity. Current LLMs mostly aren't as creative as the more creative humans and are more impressive in breadth of knowledge, but eh part of that could be simply that they currently completely lack the component essential for creativity? That they accomplish so much without planning/search is impressive. Interestingly that is closer to my position and I thought that Byrnes thought the generator of value was somewhat more complex, although are views are admittedly fairly similar in general.

I feel pretty frustrated at how rarely people actually bet or make quantitative predictions about existential risk from AI. EG my recent attempt to operationalize a bet with Nate went nowhere. Paul trying to get Eliezer to bet during the MIRI dialogues also went nowhere, or barely anywhere—I think they ended up making some random bet about how long an IMO challenge would take to be solved by AI. (feels pretty weak and unrelated to me. lame. but huge props to Paul for being so ready to bet, that made me take him a lot more seriously.)

This paragraph doesn't seem like an honest summary to me. Eliezer's position in the dialogue, as I understood it, was:

  • The journey is a lot harder to predict than the destination. Cf. "it's easier to use physics arguments to predict that humans will one day send a probe to the Moon, than it is to predict when this will happen or what the specific capabilities of rockets five years from now will be". Eliezer isn't claiming to have secret insights about the detailed year-to-year or month-to-month changes in the field; if he thought that, he'd have been making those near-term tech predictions already back in 2010, 2015, or 2020 to show that he has
... (read more)

Thanks for you feedback. I certainly appreciate your articles and I share many of your views. Reading what you had to say, along with Quentin, Jacob Cannell, Nora was a very welcome alternative take that expanded my thinking and changed my mind. I have changed my mind a lot over the last year, from thinking AI was a long way off and Yud/Bostrom were basically right to seeing that its a lot closer and theories without data are almost always wrong in may ways - e.g. SUSY was expected to be true for decades by most of the world's smartest physicists. Many alignment ideas before GPT3.5 are either sufficiently wrong or irrelevant to do more harm than good.

Especially I think the over dependence on analogy, evolution. Sure when we had nothing to go on it was a start, but when data comes in, ideas based on analogies should be gone pretty fast if they disagree with hard data.

(Some background - I read the site for over 10 years have followed AI for my entire career, have an understanding of Maths, Psychology, and have built and deployed a very small NN model commercially.  Also as an aside I remember distinctly being surprised that Yud was skeptical of NN/DL in the earlier days when I considered it obviously where AI progress would come from - I don't have references because I didn't think that would be disputed afterwards)

I am not sure what the silent majority belief on this site is (by people not Karma)? Is Yud's worldview basically right or wrong?

Reply6521

analogies based on evolution should be applied at the evolutionary scale: between competing organizations.

1RussellThor
Well they definitely can be applied there - though perhaps its a stage further than analogy and direct application of theory? Then of course data can agree/disagree. 
4the gears to ascension
gradient descent is not evolution and does not behave like evolution. it may still have problems one can imagine evolution having, but you can't assume facts about evolution generalize - it's in fact quite different.
8Steven Byrnes
I really don’t want to go down a rabbit hole here, so probably won’t engage in further discussion, but I just want to chime in here and say that I’m pretty sure lots of the world’s smartest physicists (not sure what fraction) still expect the fundamental laws of physics in our universe to have (broken) supersymmetry, and I would go further and say that they have numerous very good reasons to expect that, like gauge coupling unification etc. Same as ever. The fact that supersymmetric partners were not found at LHC is nonzero evidence against supersymmetric partners existing, but it’s not strong evidence against them existing, because LHC was very very far from searching the whole space of possibilities. Also, we pretty much know for a fact that the universe contains at least one other yet-to-be-discovered elementary particle beyond the 17 (or whatever, depends on how you count) particles in the Standard Model. So I think it’s extremely premature to imply that the prediction of yet-to-be-discovered supersymmetric partner particles has been ruled out in our universe and haha look at those overconfident theoretical physicists. (A number of specific SUSY-involving theories have been ruled out, but I think the smart physicists knew all along that those were just plausible hypotheses worth checking, not confident theoretical predictions.)
3RussellThor
OK you are answering at a level more detailed than I raised and seem to assume I didn't consider such things. My reason and IMO the expected reading of "SUSY has failed" is not that such particles have been ruled out as I know they havn't, but that its theoretical benefits are severely weakened or entirely ruled out according to recent data. My reference to SUSY was specifically regarding its opportunity to solve the Hierarchy Problem. This is the common understanding of one of the reasons it was proposed.  I stand by my claim that many/most of the top physicists expected for >1 decade that it would help solve such a problem. I disagree with the claim: "but I think the smart physicists knew all along that those were just plausible hypotheses worth checking, " Smart physicists thought SUSY would solve the hierarchy problem. ---- Common knowledge, from GPT4: "can SUSY still solve the Hierarchy problem with respect to recent results" Hierarchy Problem: SUSY has been considered a leading solution to the hierarchy problem because it naturally cancels out the large quantum corrections that would drive the Higgs boson mass to a very high value. However, the non-observation of supersymmetric particles at expected energy levels has led some physicists to question whether SUSY can solve the hierarchy problem in its simplest forms. Fine-Tuning: The absence of low-energy supersymmetry implies a need for fine-tuning in the theory, which contradicts one of the primary motivations for SUSY as a solution to the hierarchy problem. This has led to exploration of more complex SUSY models, such as those with split or high-scale supersymmetry, where SUSY particles exist at much higher energy scales. ---- IMO ever more complex models rapidly become like epi-cycles.
5ryan_greenblatt
I think this will depend strongly on where you draw the line on "basically". I think the majority probably thinks: * AI is likely to be a really big deal * Existential risk from AI is at least substantial (e.g. >5%) * AI takeoff is reasonably likely to happen quite quickly in wall clock time if this isn't actively prevented (e.g. AI will cause there to be <10 years from a 20% annualized GDP growth rate to a 100x annualized growth rate) * The power of full technological maturity is extremely high (e.g. nanotech, highly efficient computing, etc.) But, I expect that the majority of people don't think: * Inside view, existential risk is >95% * A century of dedicated research on alignment (targeted as well as society would realistically do) is insufficient to get risk <15%. Which I think are both beliefs Yudkowsky has.
0RussellThor
For me -  1. Yes to AI being a big deal and extremely powerful ( yes I doubt anyone would be here otherwise) 2. Yes - Don't think anyone can reasonably claim its <5% but then so is not having AI if x-risk is defined to be humanity missing practically all of its Cosmic endowment. 3. Maybe - Even with slow takeoff, and hardware constrained you get much greater GDP, though I don't agree with 100x (for the critical period that is, 100x could happen later). E.g. car factories are made to produce robots, we get 1-10 billion more minds and bodies per year, but not quite 100X. ~10x per year is enough to be extremely disruptive and x-risk anyway. --- (1) Yes I don't think x-risk is >95% - say 20% as a very rough guess that humanity misses all its Cosmic endowment. I think AI x-risk needs to be put in this context - say you ask someone "What's the chance that humanity becomes successfully interstellar?" If they say 50/50 then being OK with any AI x-risk less than 50% is quite defensible if getting AI right means that its practically certain you get your cosmic endowment etc. --- (2) I do think its defensible that a century of dedicated research on alignment doesn't get risk <15% but because alignment research is only useful a little bit in advance of capabilities - say we had a 100 year pause, then I wouldn't have confidence in our alignment plan at the end of it. Anyway regarding x-risk I don't think there is a completely safe path. Too fast with AI and obvious risk, too slow and there is also other obvious risks. Our current situation is likely unstable. For example the famous quote "If you want a picture of the future, imagine a boot stamping on a human face— forever." I believe that is now possible with current tech, where it was not say for Soviet Russia. So we may be in the situation where societies can go 1984 totalitarian bad, but not come back because our tech coordination skills are sufficient to stop centralized empires from collapsing. LLM of course
1TurnTrout
Yup, and this is why I'm more excited to supervise MATS mentees who haven't read The Sequences. 

Hi there. 

> (High confidence) I feel like the project of thinking more clearly has largely fallen by the wayside, and that we never did that great of a job at it anyways. 

I'm new to this community. I've skimmed quite a few articles, and this sentence resonates with me for several reasons. 

1) It's very difficult in general to find websites like LessWrong these days. And among the few that exist, I've found that the intellectuals on them are so incredibly doubtful of their own intellect. This creates a sort of Ouroboros phenomenon where int... (read more)

[-]dr_s72

I feel pretty frustrated at how rarely people actually bet or make quantitative predictions about existential risk from AI.

I think that might be a result of how the topic is, well, just really fucking grim. I think part of what allows discussion of it and thought about it for a lot of people (including myself) is a certain amount of detachment. "AI doomers" get often accused of being LARPers or not taking their own ideas seriously because they don't act like people who believe the world is ending in 10 years, but I'd flip that around - a person who beli... (read more)

I think there are some great points in this comment but I think it's overly negative about the LessWrong community. Sure, maybe there is a vocal and influential minority of individuals who are not receptive to or appreciative of your work and related work. But I think a better measure of the overall community's culture than opinions or personal interactions is upvotes and downvotes which are much more frequent and cheap actions and therefore more representative. For example, your posts such as Reward is not the optimization target have received hundreds of... (read more)

No disagreement here that this place does this. I also think we should attempt to change many of these things. However, I don't expect the lesswrong team to do anything sufficiently drastic to counter the hero-worship. Perhaps they could consider hiding usernames by default, hiding vote counts until things have been around for some period of time, or etc.

[-]habryka2816

Hmm, my sense is Eliezer very rarely comments, and the people who do comment a lot don't have a ton of hero worship going on (like maybe Wentworth?). So I don't super believe that hiding usernames would do much about this.

7Thomas Kwa
Agree, and my guess is that the hero worship, to the extent it happens, is caused by something like * for Eliezer: people finding the rationality community and observing that they were less crazy than most other communities about various things, and Eliezer was a very prolific and persuasive writer * for Paul: Paucity of empirical alignment work before 2021 meant that Paul was one of the few people with formal CS experience and good alignment ideas, and had good name recognition due to posting on LW Both of these seem to be solving themselves.
-3Matt Goldenberg
I think one of the issues with Eliezer is that he sees himself as a hero, and it comes through both explicitly and in vibes in the writing, and Eliezer is also a persuasive writer.
3frontier64
What is wrong with seeing oneself as a hero?
1Matt Goldenberg
Nothing wrong with it, in fact I recommend it. But seeing oneself as a hero and persuading others of it will indeed be one of the main issues leading to hero worship.
5the gears to ascension
how would you operationalize a bet on this? I'd take "yes" on "will hiding usernames by default decrease hero worship on lesswrong" on manifold, if you want to do an AB test or something.
1Ebenezer Dukakis
Hacker News shows you the vote counts on your comments privately. I think that's a significant improvement. It nudges people towards thinking for themselves rather than trying to figure out where the herd is going. At least, I think it does, because HN seems to have remarkable viewpoint diversity compared with other forums.

Somewhat relatedly, there have been a good number of times where it seems like I've persuaded someone of A and of A ⟹ B and they still don't believe B, and coincidentally B is unpopular.

Would you mind sharing some specifiexamples? (Not of people of but of beliefs)

[-]lc20

I think it's fine for there to be a status hierarchy surrounding "good alignment research". It's obviously bad if that becomes mismatched with reality, as it almost certainly is to some degree, but I think people getting prestige for making useful progress is essentially what happens for it to be done at all.

3Ebenezer Dukakis
If we aren't good at assessing alignment research, there's the risk that people substitute the goal of "doing good alignment research" with "doing research that's recognized as good alignment research". This could lead to a feedback loop where a particular notion of "good research" gets entrenched: Research is considered good if high status researchers think it's good; the way to become a high status researcher is to do research which is considered good by the current definition, and have beliefs that conform with those of high status researchers. A number of TurnTrout's points were related to this (emphasis mine): I'd like to see more competitions related to alignment research. I think it would help keep assessors honest if they were e.g. looking at 2 anonymized alignment proposals, trying to compare them on a point-by-point basis, figuring out which proposal has a better story for each possible safety problem. If competition winners subsequently become high status, that could bring more honesty to the entire ecosystem. Teach people to focus on merit rather than politics.

aysja

10360

LessWrong.com is my favorite website. I’ve tried having thoughts on other websites and it didn't work. Seriously, though—I feel very grateful for the effort you all have put in to making this an epistemically sane environment. I have personally benefited a huge amount from the intellectual output of LW—I feel smarter, saner, and more capable of positively affecting the world, not to mention all of the gears-level knowledge I’ve learned, and model building I’ve done as a result, which has really been a lot of fun :) And when I think about what the world would look like without LessWrong.com I mostly just shudder and then regret thinking of such dismal worlds.
 

Some other thoughts of varying import:
 

  • I dislike emojis. They feel like visual clutter to me. I also feel somewhat assaulted when I read through comments sometimes, as people’s opinions jump out at me before I’ve had much chance to form my own.
  • I like dialogues a lot more than I was expecting. What I expected was something like “people will spend a bunch of time talking past each other in their own mentalese with little effort towards making the reader capable of understanding and it’ll feel cluttered and way too long and hard to make much sense of.” I think this does sometimes happen. But I’ve also been pleasantly surprised by the upsides which I was not anticipating—seeing more surface area on people’s thoughts which helps me make sense of their “deal” in a way that’s useful for modeling their other views, (relatedly) getting a better sense of how people generate thoughts, where their intuitions are coming from, and so on. It also makes LW feel more homey, in my opinion. 
  • If there were one dial I’d want to experiment with turning on LW it would be writing quality, in the direction of more of it. I don’t feel like I have super great ideas on how to cultivate this, but I’ll just relay the sort of experience that makes me say this. Sometimes I want to understand something someone has said. I think “ah, they probably said that there,” and then I go to a post, skim it, find the sort-of-related thing but it’s not quite right (they talk around the point without really saying it, or it’s not very clear, etc). But they link to ten other posts of theirs, all promising to tell me the thing I think they said, so I follow those links, but they’re also a bit slippery in the same ways. And I feel like I go in circles trying to pin down exactly what the claims are, never quite succeeding, until I feel like throwing up my hands in defeat. To some extent this seems like just par for the course with highly intellectually productive people—ideas outpace idea management and legibility, and in the absence of having the sort of streamlined clarity that I’m more used to seeing in, e.g., books, I would on the margin prefer they still publish. But I do think this sort of thing can make it harder to push the frontier of human knowledge together, and if I did have a dial I could turn to make writing quality better (clearer, more succinct, more linear, etc.), even at the expense of somewhat fewer posts, I’d at least want to try that for a bit. 
  • Something has long bothered me about how people talk about “p(doom)” around here. Like, here’s an experience I have regularly: I tell someone I am hoping to take an action in the future, they say “haha, what future? we’ll all be dead by then!” I really dislike this, not because I don’t agree that we’re facing serious risks, or that it’s never okay to joke about that, but more that I often don’t believe them. It seems to me that in many conversations high p(doom) is closer to type “meme” than “belief,” like a badge people wear to fit into the social fabric. 
  • But also, it feeds into this general vibe of nihilistic hopelessness that the Bay Area rationality scene has lapsed into, according to me, which I worry stems in part from deferring to Eliezer’s/Nate’s hopelessness. And I don’t know, if you really are on-model hopeless I guess that’s all well and good, but on a gut level I just don’t really buy that this makes sense. Alignment seems like a hard science problem but not an impossible one, and I think that if we actually try, we may very well have a good shot at figuring it out. But at present it feels to me like so few people are trying to solve the hard parts of the problem—that so much work has gone meta (e.g., community building, power for the sake of power, deferring the “solving it” part to uploads or AI); that even though people concede there's some chance things go well, that in their gut they basically just have some vague sense of “we’re fucked” which inhibits them from actually trying; that somehow our focus has become about managing tenth order effects of the social graph, the “well what if this faction does this, then people will update this way and then we’ll lose influence over there”… I don’t know, it just sort of feels like we’ve communally lost the spirit of something that seems really powerful to me—something that I took away from the Sequences—a sense of agency, ambition, truth-seeking, and integrity in the face of hard problems. A sense that we can… solve this. Like actually solve the actual problem! I would like that spirit back. 
  • I’m not sure how to get it, exactly, and I don’t know that this is aimed at the LW team in particular rather than being nebulously aimed at “Bay Area rats” or something. But just to add one small piece that I think LW could work on: I’ve occasionally seen the mods slip from “I think we are doomed” language to “we’re doomed” language. I’ve considered bringing it up although for any particular instance it feels a bit too aggressive relative to the slight, and because I get that it’s annoying to append your epistemic state to everything, and so on. But I do think that on this topic in particular it’s good to be careful, as it’s one of the most crazy-making aspects of this situation, and one that seems especially easy to spiral into group-think-y/deferral-y dynamics about. 
  • I feel sad about ending on a bad note, mostly because I feel sad that so many people seem to be dunking on MIRI/rationality/LW lately. And I have some kind of “can we please not throw the baby out with the bathwater” sense. I certainly have some gripes with the community, but on net I am really happy that it exists. And I continue to believe that the spirit of rationality is worth fighting for—both because it’s beautiful for its own sake, but also because I believe in its ability to positively shape our lightcone. I see LW as part of that mission, and I feel deeply grateful for it. 

If there were one dial I’d want to experiment with turning on LW it would be writing quality, in the direction of more of it.

I'd like to highlight this. In general, I think fewer things should be promoted to the front page.

[edit, several days later]: https://www.lesswrong.com/posts/SiPX84DAeNKGZEfr5/do-websites-and-apps-actually-generally-get-worse-after is a prime example. This has nothing to do with rationality or AI alignment. This is the sort of off-topic chatter that belongs somewhere else on the Internet.

RomanHauksson

59136

I’m a huge fan of agree/disagree voting. I think it’s an excellent example of a social media feature that nudges users towards truth, and I’d be excited to see more features like it.

[-]niplav2027

I also enjoy the reacts way more than I expected! They feel aesthetically at home here, especially with reacts for specific parts of the text.

7β-redex
I think the reacts being semantic instead of being random emojis is what makes this so much better. I wish other platforms experimented with semantic reacts as well, instead of just letting people react with any emoji of their choosing, and making you guess whether e.g. "thumbs up" means agreement, acknowledgement, or endorsement, etc.

It seems like it would be useful to have it for top-level posts. I love disagree voting and there are massive disparities sometimes between upvotes and agreements that show how useful it is in surfacing good arguments that are controversial.

I think I'm seeing some high effort, topical and well-researched top-level posts die on the vine because of controversial takes that are probably disagree voting. This is not a complaint about my own posts sometimes dying; I've been watching others posts with this hypothesis, and it fits.

I guess there's a reason for not having it on top-level posts, but I miss having it on top-level posts.

2Seth Herd
Do you know the reasons? It seems like it would be useful to have it on top-level posts for the same reasons it's so helpful on comments.
2TekhneMakre
IDK the reasons.
8habryka
Inline agree/disagree reacts are trying to do the equivalent. Comments are short enough that usually you can summarize your epistemic state with regards to their contents into a single "agree or disagree", but for posts I feel like it really mostly sets things up for polarization and misunderstandings to have a bunch of people "agree" and "disagree" to a huge bundle of claims and statements.  I think it's better for people to highlight specific passages of text and then react to those. 
2TekhneMakre
Ooh. That makes a lot of sense and is even better... I simply didn't realize there were inline reacts! Kudos.

I'd like to like this more but I don't have a clear idea of when to up one, up the other, down one, down the other, or down one and up the other.

Gordon Seidoh Worley

5132

The EA Forum has this problem worse, but I've started to see it on LessWrong: it feels to me like we have a lot more newbies on the site who don't really get what LW-style rationality is about, and they make LessWrong a less fun place to write because they are regressing discussion norms back towards the mean.

Earlier this year I gave up on EAF because it regressed so far towards the mean that it became useless to me. LW has still been passable but feels like it's been ages since I really got into a good, long, deep thread with somebody on here. Partly that's because I'm busy, but it's also because I'm been quicker to give up because my expectations of having a productive conversation here are now lower. :-(

Do you have any thoughts on what the most common issues you see are or is it more like that every time it is a different issue?

7Gordon Seidoh Worley
My impression is that people are quicker to jump to cached thoughts and not actually read and understand things. So I've spent more time dealing with what I would consider bad faith takes on posts than I used to where it's clear to me the person is trying to read in what they want it to have been that I said or meant to imply. I also have a standing complaint that people are hypocritical by being too lenient towards things they like and too critical of things they don't like for affiliative reasons rather than because they engaged with the reasoning and arguments. I see a lot of this from both sides. I know how to farm karma on here, I just mostly choose not to, but when I post things that are of the type that I expect them to be voted up I can be pretty lazy and people will vote it up because I hit the applause light for something they already wanted to applaud. If I post something that I know people will disagree with because it goes against standard takes, I've got to be way more detailed. But I see this as a bad asymmetry that results from confirmation bias. I would rather live in a world where lazy posts that say things people already agree with get downvoted for being low quality, or live in a world where posts that people disagree with get upvoted despite disagreeing because they respect the argumentation, but not the world we find ourselves in now.
3Ebenezer Dukakis
One thing I've been thinking about in this regard is the microhabits around voting. I only vote on a small minority of the stuff I read. I assume others are similar. And voting is a bit of a cognitive chore: There are 25 possible ways to vote: strong down/weak down/nothing/weak up/strong up, on the 2 different axes. I wish I had a principled way of choosing between those 25 different ways to vote, but I don't. I rarely feel satisfied with the choice I made. I'm definitely inconsistent in my behavior from comment to comment. For example, if someone makes a point that I might have made myself, is it OK to upvote them overall, or should I just vote to agree? I appreciate them making the point, so I usually give them an upvote for overall -- after all, if I made the point myself, I'd automatically give myself an "overall" upvote too. But now that I explicitly consider, maybe my threshold should be higher, e.g. only upvote "overall" if I think they made the point at least as well as I would've made it. In any case, the "point I would've made myself" situation is one of a fairly small number of scenarios where I get enough activation energy to actually vote on something. Sometimes I wonder what LW would be like if a user was only allowed to vote on a random 5% subset of the comments on any given page. (To make it deterministic, you could hand out vote privilege based on the hash of their user ID and the comment ID.) Then nudge users to actually vote on those 5%, or explicitly acknowledge a null vote. I wonder if this would create more of a "jury trial" sort of feel, compared to the current system which can have a "count the size of various tribes" feel.

lsusr

4935

First of all, I appreciate all the work the LessWrong / Lightcone team does for this website.

The Good

  • I was skeptical of the agree/disagree voting. After using it, I think it was a very good decision. Well done.
  • I haven't used the dialogue feature yet, but I have plans to try it out.
  • Everything just works. Spam is approximately zero. The garden is gardened so well I can take it for granted.
  • I love how much you guys experiment. I assume the reason you don't do more is just engineering capacity.

And yet…

Maybe there's a lot of boiling feelings out there about the site that never get voiced?

I tend to avoid giving negative feedback unless someone explicitly asks for it. So…here we go.

Over the 1.5 years, I've been less excited about LessWrong than any time since I discovered this website. I'm uncertain to what extent this is because I changed or because the community did. Probably a bit of both.

AI Alignment

The most obvious change is the rise of AI Alignment writings on LessWrong. There are two things that bother me about AI Alignment writing.

  • It's effectively unfalsifiable. Even betting markets don't really work when you're betting on the apocalypse.
  • It's highly political. AI Alignment became popular on LessWrong before AI Alignment became a mainstream political issue. I feel like LessWrong has a double-standard, where political writing is held to a high epistemic standard unless it's about AI.

I have hidden the "AI Alignment" tag from my homepage, but there is still a spillover effect. "Likes unfalsifiable political claims" is the opposite of the kind of community I want to be part of. I think adopting lc's POC || GTFO burden of proof would make AI Alignment dialogue productive, but I am pessimistic about that happening on a collective scale.

Weird ideas

When I write about weird ideas, I get three kinds of responses.

  • "Yes and" is great.
  • "I think you're wrong because " is fine.
  • "We don't want you to say that" makes me feel unwelcome.

Over the years, I feel like I've gotten fewer "yes and" comments and more "we don't want you to say that" comments. This might be because my writing has changed, but I think what's really going on is that this happens to every community as it gets older. What was once radical eventually congeals into dogma.

I used to post my weird ideas immediately to LessWrong. Now I don't, because I feel like the reception on LessWrong would bum me out.[1]

I wonder what fraction of the weirdest writers here feel the same way. I can't remember the last time I've read something on LessWrong and thought to myself, "What a strange, daring, radical idea. It might even be true. I'm scared of what the implications might be." I miss that.[2]

I get the basic idea

I have learned a lot from reading and writing on LessWrong. Eight months ago, I had an experience where I internalized something very deep about rationality. I felt like I graduated from Level 1 to Level 2.

According to Eliezer Yudkowsky, his target audience for the Sequences was 2nd grade. He missed and ended up hitting college-level. They weren't supposed to be comprehensive. They were supposed to be Level 1. But after that, nobody wrote a Level 2. (The postrats don't count.) I've been trying―for years―to write Level 2, but I feel like a sequence of blog posts is a suboptimal format in 2023. Yudkowsky started writing the Sequences in 2006, when YouTube was still a startup. That leads me to…

100×

The other reason I've been posting less on LessWrong is that I feel like I'm hitting a soft ceiling with what I can accomplish here. I'm nowhere near the my personal skill cap, of course. But there is a much larger potential audience (and therefore impact) if I shifted from writing essays to filming YouTube videos. I can't think of anything LessWrong is doing wrong here. The editor already allows embedded YouTube links.


  1. Exception: I can usually elicit a positive response by writing fiction instead of nonfiction. But that takes a lot more work. ↩︎

  2. This might be entirely in my head, due to hedonic adaptation. ↩︎

Over the years, I feel like I've gotten fewer "yes and" comments and more "we don't want you to say that" comments. This might be because my writing has changed, but I think what's really going on is that this happens to every community as it gets older. What was once radical eventually congeals into dogma.

This is the part I'm most frustrated with. It used to be you could say some wild stuff on on this site and people would take you seriously. Now there's a chorus of people who go "eww, gross" if you go too far past what they think should be acceptable. LessWrong culture originally had very high openness to wild ideas. At worst, if you reasoned well and people disagreed, they'd at least ignore you, but now you're more likely to get downvoted for saying controversial things because they are controversial and it feels bad.

This was always a problem, but feels like it's gotten worse.

Huh, I am surprised by this. I agree this is a thing in lots of the internet, but do you have any examples? I feel like we really still have a culture of pretty extreme openness and taking random ideas seriously (enough that sometimes I feel like wild sounding bad ideas get upvoted too much because people like being contrarian a bit too much).

[-]lsusr122

Here's part of a comment on one of my posts. The comment negatively impacted my desire to post deviant ideas on LessWrong.

Bullshit. If your desire to censor something is due to an assessment of how much harm it does, then it doesn't matter how open-minded you are. It's not a variable that goes into the calculation.

I happen to not care that much about the object-level question anymore (at least as it pertains to LessWrong), but on a meta level, this kind of argument should be beneath LessWrong. It's actively framing any concern for unrestricted speech as poorly motivated, making it more difficult to have the object-level discussion.

The comment doesn't represent a fringe opinion. It has +29 karma and +18 agreement.

[-]philh1610

I think I'm less open to weird ideas on LW than I used to be, and more likely to go "seems wrong, okay, next". Probably this is partly a me thing, and I'm not sure it's bad - as I gain knowledge, wisdom and experience, surely we'd expect me to become better at discerning whether a thing is worth paying attention to? (Which doesn't mean I am better, but like. Just because I'm dismissing more ideas, doesn't mean I'm incorrectly dismissing more ideas.)

But my guess is it's also partly a LW thing. It seems to me that compared to 2013, there are more weird ideas on LW and they're less worth paying attention to on average.

In this particular case... when you talk about "We don’t want you to say that" comments, it sounds to me like those comments don't want you to say your ideas. It sounds like Habryka and other commenters interpreted it that way too.

But my read of the the comment you're talking about here isn't that it's opposed to your ideas. Rather, it doesn't want you to use a particular style of argument, and I agree with it, and I endorse "we don't want bad arguments on LW". I downvoted that post of yours because it seemed to be arguing poorly. It's possible I missed something; I admi... (read more)

3Said Achmiz
I also endorse pretty much everything in this comment. (Except for the bit about the “avoid paying your taxes” post, because I don’t even remember that one.) To emphasize this point: in many cases, the problem with some “weird ideas” isn’t, like, “oh no, this is too weird, I can’t even, don’t even make me think about this weird stuff :(”. It’s more like: “this is straightforwardly dumb and wrong”. (Indeed, much of the time it’s not even interestingly wrong, so it’s not even worth my time to argue with it. Just: dumb nonsense, already very well known to be dumb nonsense, nothing new to see or say, downvote and move on with life.)
2Rafael Harth
You don't have to justify your updates to me (and also, I agree that the comment I wrote was too combative, and I'm sorry), but I want to respond to this because the context of this reply implies that I'm against against weird ideas. I vehemently dispute this. My main point was that it's possible to argue for censorship for genuine reasons (rather than become one is closed-minded). I didn't advocate for censoring anything, and I don't think I'm in the habit of downvoting things because they're weird, at all. This may sound unbelievable or seem like a warped framing, but I honestly felt like I was going against censorship by writing that comment. Like as a description of my emotional state while writing it, that was absolutely how I felt. Because I viewed (and still view) your post as a character attack on people-who-think-that-sometimes-censorship-is-justified, and one that's primarily based on an emotional appeal rather than a consequentialist argument. And well, you're a very high prestige person. Posts like this, if they get no pushback, make it extremely emotionally difficult to argue for a pro-censorship position regardless of the topic. So even though I acknowledge the irony, it genuinely did feel like you were effectively censoring pro-censorship arguments, even if that wasn't the intent. I guess you could debate whether or not censoring pro-censorship views is pro or anti censorship. But regardless, I think it's bad. It's not impossible for reality to construct a situation in which censorship is necessary. In fact, I think they already exist; if someone posts a trick that genuinely accelerates AI capabilities by 5 years, I want that be censored. (Almost all examples I'd think of would relate to AI or viruses.) The probability that something in this class happens on LW is not high, but it's high enough that we need to be able to talk about this without people feeling like they're impure for suggesting it.
2MondSemmel
I stumbled over this part. What makes someone high prestige? Their total LW karma? To me that doesn't really make sense as a proxy for prestige.
2MondSemmel
Hi there, lsusr! I read the post & comment which you linked, and indeed felt that the critical comment was too combative. (As a counterexample, I like this criticism of EY for how civil it is.) That being said, I think I understand the sentiment behind its tone: the commenter saw your post make a bunch of strong claims, felt that these claims were wrong and/or insufficiently supported by sources, and wrote the critical comment in a moment of annoyance. To give a concrete example, "We do not censor other people more conventional-minded than ourselves." is an interesting but highly controversial claim. Both because hardly anything in the world has a 100% correlation, and because it leads to unintuitive logical implications like "two people cannot simultaneously want to censor one another". Anyway, given that the post began with a controversial claim, I expected the rest of the post to support this initial claim with lots of sources and arguments. Instead, you took the claim further and built on it. That's a valid way to write, but it puts the essay in an awkward spot with readers that disagree with the initial claim. For this reason, I'm also a bit confused about the purpose of the essay: was it meant to be a libertarian manifesto, or an attempt to convince readers, or what? EDIT: Also, the majority of LW readers are not libertarians. What reaction did you expect to receive from them? If I were to make a suggestion, the essay might have worked better if it had been a dialogue between a pro-liberty and a pro-censorship character. Why? Firstly, if readers feel like an argument is insufficiently supported, they can criticize or yell at the character, rather than at you. And secondly, such a dialogue would've required making a stronger case in favor of censorship, and it would've given the censorship character the opportunity to push back against claims by the liberty character. This would've forestalled having readers make similar counterarguments. (Also see Scott's
2gilch
Hmm, is LessWrong really so intolerant of being reminded of the existence of "deviant ideas"? Social Dark Matter was pretty well received, with 248 karma, and was posted quite recently. The much older KOLMOGOROV COMPLICITY AND THE PARABLE OF LIGHTNING opened with a quote from the same Paul Graham essay you linked to (What You Can’t Say). I was not personally offended by your example post and upvoted it just now. I probably at least wouldn't have downvoted it had I seen it earlier, but I hadn't.
3Gordon Seidoh Worley
People love deviant ideas in abstract, hate to deal with specific deviant ideas that attack beliefs they hold dear.
2gilch
lsusr's example post seemed to not be a specific deviant idea though. To paraphrase one point: beware of banning apparent falsity lest you inadvertently ban true heresies, without naming any heresy in particular.
2lsusr
Many readers appeared to dislike my example post. IIRC, prior to mentioning it here, it's karma (excluding my auto hard upvote) was close to zero, despite it having about 40 votes.
4Gordon Seidoh Worley
My best example of this comes from this post of mine on EAF (my LW examples are a bit more ambiguous). Multiple folks quickly jumped to making a Nazi argument, almost in parody of Godwin's Law.
4MondSemmel
I don't have an opinion on your post itself, but it is indeed disappointing that the comments immediately jumped to the Nazi comparison, which of course made all further discussion pointless.
[-]habryka1010

I wonder what fraction of the weirdest writers here feel the same way. I can't remember the last time I've read something on LessWrong and thought to myself, "What a strange, daring, radical idea. It might even be true. I'm scared of what the implications might be." I miss that.

I thought Genesmith's latest post fully qualified as that! 

I totally didn't think adult gene editing was possible, and had dismissed it. It seems like a huge deal if true, and it's the kind of thing I don't expect would have been highlighted anywhere else.

I wonder what fraction of the weirdest writers here feel the same way. I can't remember the last time I've read something on LessWrong and thought to myself, "What a strange, daring, radical idea. It might even be true. I'm scared of what the implications might be." I miss that.

The post about not paying one's taxes was pretty out there and had plenty interesting discussion, but now it's been voted down to the negatives. I wish it was a bit higher (at 0-ish karma, say), which might've happened if people could disagree-vote on it.

But yes, overall this critic... (read more)

2niplav
I've strong-upvoted it to -1, because I agree.

Another improvement I didn't notice until right now is the "respond to a part of the original post" feature. I feel like it nudges comments away from nitpicking.

2Raemon
I didn't quite parse that – which UI element are you referring to?
2lsusr
I meant side-comments. I never use them myself, but people often use them to comment on my posts. When they do, the comments tend to be constructive, especially compared to blockquotes.
2Raemon
Ah cool. That was my best guess but wasn't sure.

The other reason I've been posting less on LessWrong is that I feel like I'm hitting a soft ceiling with what I can accomplish here. I'm nowhere near the my personal skill cap, of course. But there is a much larger potential audience (and therefore impact) if I shifted from writing essays to filming YouTube videos.

There are also writers with a very large reach. A recommendation I saw was to post where most of the people and hence most of the potential readers are, i.e. on the biggest social media sites. If you're trying to have impact as a writer, the reachable audience on LW is much smaller. (Though of course there are other ways of having a bigger impact than just reaching more readers.)

I can't think of anything LessWrong is doing wrong here. The editor already allows embedded YouTube links.

One thing that could help is to be able to have automatic crossposting from your YouTube channel like you can currently have from a blog. It would be even more powerful if it generated a transcript automatically (though that's currently difficult and expansive).

2MondSemmel
A few points on this: * Some Youtube videos already come with good captions. * For the rest, Youtube provides automatic captions. These are really bad, lack punctuation and capitalization, but even at that level of quality they could e.g. be used to pinpoint where something was said. * Transcription via OpenAI Whisper is cheap ($0.36 per hour) and quite decent if there's only one speaker. For interviews and podcasts, the experience is not good enough for transcription (to create this podcast transcript at the beginning of the year, I used Whisper as a base, but still had to put in many many hours of editing), because it e.g. doesn't do speaker diarisation or insert paragraph breaks. But I'm pretty sure that by now there are hybrid services out there which can do even the things Whisper is bad at. This still won't yield a professional-level transcript, though doing an editing pass with GPT4 might close the gap. My point is, these transcripts are not expensive, relative to labor costs. * The implementation of automatic AI transcripts has become surprisingly simple. E.g. as I mentioned here, I now get automatic transcripts for my voice notes, based on following a step-by-step video guide. The difficulty is not yet at consumer-level simple (though for those purposes, one can just pay for an AI transcription service app), but it's definitely already at the level of hobbyist-simple.

I wonder what fraction of the weirdest writers here feel the same way. I can't remember the last time I've read something on LessWrong and thought to myself, "What a strange, daring, radical idea. It might even be true. I'm scared of what the implications might be." I miss that.

Do you remember any examples from back in the day?

[-]nim10

I enjoy your content here and would like to continue reading you as you grow into your next platforms.

YouTube grows your audience in the immediate term, among people who have the tech and time to consume videos. However, text is the lowest common denominator for human communication across longer time scales. Text handles copying and archiving in ways that I don't think we can promise for video on a scale of hundreds of years, let alone thousands. Text handles search with an ease that we can only approximate for video by transcribing it. Transcription is tr... (read more)

2lsusr
I'm learning how to film, light and edit video. I'm learning how to speak better too, and getting a better understanding about how the media ecosystem works. Making videos is harder than writing, which means I learn more from it.
3nim
Ah, that makes perfect sense. On the other side, watching videos is often easier than reading, so I often feel like I learn more from the latter =)

Charlie Steiner

3118

I just posted a big effortpost and it may have been consigned to total obscurity because I posted it at the wrong time of day. Unsure whether I actually want the recommendation algorithm to have flattened time-discounting over periods with less activity on the site, or if I should just post more strategically in the future.

I have found the dialogues to be generally low-quality to read. The good ones tend to be more interview-like - "I have something I want to talk about but writing a post is harder than talking to a curious interlocutor about it." I think this maybe suggests that I want to see dialogues rebranded to not say "dialogue."

(Note, I don't think it's because it was posted at the wrong time of day. I think it's because the opening doesn't make a clear case for why people should read it. 

In my experience posts like this still get a decent amount of attention if they are good, but it takes a lot longer, since it spreads more by word-of-mouth. The initial attention burst of LW is pretty heavily determined by how much the opening paragraphs and title draw people in. I feel kind of sad about that, but also don't have a great alternative to the current HN-style algorithm that still does the other things we need karma/frontpage-sorting algorithm to do)

3MondSemmel
It's hard to envision a different solution to this problem. When I browse a feed and decide what to read, of course things like author, karma, title, and first paragraph are the things that determine whether I'll consider reading. How else could things work? @Charlie Steiner: Also see this comment thread on why it's so important to pay outsized importance to stuff like the title and presentation. Excerpts from my comment: And:
2Seth Herd
I think time of day combined with when it was approved for front page can easily make all the difference between takeoff and just fading into obscurity. This is an unfortunate situation, but I don't have a solution. I do wonder why posts with AI tags aren't on front page automatically without human review.
2habryka
I mean, many posts with AI tags don't meet frontpage norms. For example AI news isn't timeless, and as such doesn't make it onto the frontpage.
2Seth Herd
Ah, that makes sense. I never see AI stuff on the front page or in recent discussions that isn't worth at least a glance, but that's a good thing. I do not want to see every little AI news piece on the front page.
2Charlie Steiner
Yeah, fair enough.
1Zane
I'm not sure what the current algorithm is other than a general sense of "posts get promoted more if they're more recent," but it seems like it could be a good idea to just round it all up so that everything posted between 0 and N hours ago is treated as equally recent, so that time of day effects aren't as strong. Not sure about the exact value of N... 6? 12? It probably depends on what the current function is, and what the current cycle of viewership by time of day looks like. Does LW keep stats on that?
1NicholasKees
What about leaning into the word-of-mouth sharing instead, and support that with features? For example, being able to as effortlessly as possible recommend posts to people you know from within LW?
2habryka
Not crazy. I also think doing things that are a bit more social where you have ways to recommend (or disrecommend) a post with less anonymity attached, allowing us to propagate that information further, is not crazy, though I am worried about that incentivizing more groupthinking and weird social dynamics.

I have found the dialogues to be generally low-quality to read.

I think overall I've found dialogues pretty good, I've found them useful for understanding people's specific positions and getting people's takes on areas I don't know that well. 
My favorite one so far is AI Timelines, which I found useful for understanding the various pictures of how AI development will go in the near term. I liked How useful is mechanistic interpretability? and Speaking to Congressional staffers about AI risk for understanding people's takes on these areas.

Viliam

2721

AI content for specialists

There is a lot of AI content recently, and it is sometimes of the kind that requires specialized technical knowledge, which I (an ordinary software developer) do not have. Similarly, articles on decision theories are often written in a way that assumes a lot of background knowledge that I don't have. As a result there are many articles I don't even click at, and if I accidentally do, I just sigh and close them.

This is not necessarily a bad thing. As something develops, inferential distances increase. So maybe, as a community we are developing a new science, and I simply cannot keep up with it. -- Or maybe it is all crackpottery; I wouldn't know. (Would you? Are some of us upvoting content they are not sure about, just because they assume that it must be important? This could go horribly wrong.) Which is a bit of a problem for me, because now I can no longer recommend Less Wrong in good faith as a source of rational thinking. Not because I see obviously wrong things, but because there are many things where I have no idea whether they are right or wrong.

We had some AI content and decision theory here since the beginning. But those articles written back then by Eliezer were quite easy to understand, at least for me. For example, "How An Algorithm Feels From Inside" doesn't require anything beyond high-school knowledge. Compare it to "Hypothesis: gradient descent prefers general circuits". Probably something important, but I simply do not understand it.

Just like historically MIRI and CFAR split into two organizations, maybe Less Wrong should too.

Feeling of losing momentum

I miss the feeling that something important is happening right now (and I can be a part of it). Perhaps it was just an illusion, but at the first years of Less Wrong it felt like we were doing something important -- building the rationalist community, inventing the art of everyday rationality, with the perspective to raise the general sanity waterline.

It seems to me that we gave up on the sanity waterline first. The AI is near, we need to focus on the people who will make a difference (whom we could recruit for an AI research), there is no time to care about the general population.

Although recently, this baton was taken over by the Rational Animations team!

Is the rationalist community still growing? Offline, I guess it depends on the country. In Bratislava, where I live, it seems that ~ no one cares about rationality. Or effective altruism. Or Astral Codex Ten. Having five people at a meetup is a big success. Nearby Vienna is doing better, but it is merely climbing back to pre-COVID levels, not growing. Perhaps it is better at some other parts of the world.

Online, new people are still coming. Good.

Also, big thanks to all people who keep this website running.

But still it no longer feels to me anymore like I am here to change the world. It is just another form of procrastination, albeit a very pleasant one. (Maybe because I do not understand the latest AI and decision theory articles; maybe all the exciting things are there.)

Etc.

Some dialogs were interesting, but most are meh.

My greatest personal pet peeve was solved: people no longer talk uncritically about Buddhism and meditation. (Instead of talking more critically they just stopped talking about it at all. Works for me, although I hoped for some rational conclusion.)

It is difficult for me to disentangle what happens in the rationalist community from what happens in my personal life. Since I have kids, I have less free time. If I had more free time, I would probably be recruiting for the local rationality (+adjacent) community, spend more time with other rationalists, maybe even write some articles... so it is possible that my overall impression would be quite different.

(Probably forgot something; I may add some points later.)

Is the rationalist community still growing? Offline, I guess it depends on the country. In Bratislava, where I live, it seems that ~ no one cares about rationality. Or effective altruism. Or Astral Codex Ten. Having five people at a meetup is a big success. Nearby Vienna is doing better, but it is merely climbing back to pre-COVID levels, not growing. Perhaps it is better at some other parts of the world.

I think that starting things that are hard forks of the lesswrong memeplex might be beneficial to being able to grow. Raising the sanity waterline woul... (read more)

1Mo Putera
Please do. I've been mulling over related half-digested thoughts -- replacing the symbol / brand with the substance, etc.

GeneSmith

2517

I love LessWrong. I have better discussions here than anywhere else on the web.

I think I may have a slightly different experience with the site than the modal user because I am not very engaged in the alignment discourse.

I've found the discussions on the posts I've written to be of unusually high quality, especially the things I've written about fertility and polygenic embryo screening.

I concur with other comments about the ability to upvote and agree/disagree with a comment to be a great feature which I use all the time.

My number one requested feature continues to be the ability to see a retention graph on the posts I've written, i.e. where do people get bored and stop reading? After technical accuracy my number one goal is to write something interesting and engaging, but I lack any kind of direct feedback mechanism to optimize my writing in that way.

My number one requested feature continues to be the ability to see a retention graph on the posts I've written, i.e. where do people get bored and stop reading? After technical accuracy my number one goal is to write something interesting and engaging, but I lack any kind of direct feedback mechanism to optimize my writing in that way.

Yeah, I've been wanting something like this for a while. It would require capturing more data and processing a bunch of data than we have historically. Also distinguishing between someone skimming up and down a post and actua... (read more)

4the gears to ascension
perhaps showing the user what data they're creating by incrementally marking the post as read as they scroll down it, and display that to the user?

RomanHauksson

225

(low confidence, low context, just an intuition)

I feel as though the LessWrong team should experiment with even more new features, treating the project of maintaining a platform for collective truth-seeking like a tech startup. The design space for such a platform is huge (especially as LLMs get better).

From my understanding, the strategy that startups use to navigate huge design spaces is “iterate on features quickly and observe objective measures of feedback”, which I suspect LessWrong should lean into more. Although, I imagine creating better truth-seeking infrastructure doesn’t have as good of a feedback signal as “acquire more paying users” or “get another round of VC funding”.

[-]Ruby90

This is basically what we do, capped by our team capacity. For most of the last ~2 years, we had ~4 people working full-time on LessWrong plus shared stuff we get from EA Forum team. Since the last few months, we reallocated people from elsewhere in the org and are at ~6 people, though several are newer to working on code. So pretty small startup. Dialogues has been the big focus of late (plus behind the scenes performance optimizations and code infrastructure).

All that to say, we could do more with more money and people. If you know skilled developers willing to live in the Berkeley area, please let us know!

1gilch
Does GPT-4 or Copilot help with the coding? Have you tried? I'm a software developer, but it would have to be remote, and might depend on your stack.
7jacobjacob
I use Cursor, Copilot, sometimes GPT-4 in the chat, and also Hex.tech's built-in SQL shoggoth.  I would say the combination of all those helps a huge amount, and I think has been key in allowing me to go from pre-junior to junior dev in the last few months. (That is, from not being able to make any site changes without painstaking handholding, to leading and building a lot of the Dialogue matching feature and associated stuff (I also had a lot of help from teammates, but less in a "they need to carry things over the finish line for me", and more "I'm able to build features of this complexity, and they help out as collaborators")).  But also, PR review and advise from senior devs on the team has also been key, and much appreciated.
5Ruby
It does, quite a bit! Definitely speeds me up somewhere between 20% and 100% depending on task. And I think it's a bigger deal for those now working on code and who are newer to it.

Agreed! Cf. Proposal for improving the global online discourse through personalised comment ordering on all websites -- using LessWrong as the incubator for the first version of the proposed model would actually be critical.

Nathan Helm-Burger

2113

I feel a mix of pleased and frustrated. The main draw for me is AI safety discussion. I dislike the feeling of group-think around stuff, and I value the people who speak up against the group-think with contrary views (e.g. TurnTrout), who post high quality technical content, or well-researched and thought-out posts (e.g. Steven Byrnes). 

I feel frustrated at things like feeling that people don't always do a good job of voting comments up based on how valuable/coherent/high-effort the information content is, and then separately voting agree/disagree. I really like this feature, and I wish people gave it more respect. I am pleased that it does as well as it does though.

I like the new emojis and the new dialogues. I'm excited for the site designers to keep trying new (optional) stuff.

The things I'd like more from the site would be if it could split into two: one which was even more in the direction of technical discussion of AI safety, and the other for rationality and philosophy stuff. And then I'd like the technical side to have features like jupyter notebook-based posts for dynamic code demonstrations. And people presenting recent important papers not their own (e.g. from arxiv), for the sake of highlighting/summarizing/sparking-discussion. The weakness of the technical discussion here is, in my opinion, related to the lack of engagement with the wider academic community and empirical evidence.

Ultimately, I don't think it matters much what we do with the site in the longer term because I think things are about to go hockey stick singularity crazy. That's the bet I'm making anyway.

Yeah. The threshold for "okay, you can submit to alignmentforum" is way, way, way too high, and as a result, lesswrong.com is the actual alignmentforum. Attempts to insist otherwise without appropriately intense structural change will be met with lesswrong.com going right on being the alignmentforum.

Ok, slightly off topic, but I just had a wacky notion for how to break-up groupthink as a social phenomenon. You know the cool thing from Audrey Tang's ideas, Polis? What if we did that, but we found 'thought groups' of LessWrong users based on the agreement voting. And then posts/comments which were popular across thought-groups instead of just intensely within a thought group got more weight? 

Niclas Kupper tried a LessWrong Polis to gather our opinions a while back. https://www.lesswrong.com/posts/fXxa35TgNpqruikwg/lesswrong-poll-on-agi 

So, something like the community notes algorithm?

5Nathan Helm-Burger
https://vitalik.eth.limo/general/2023/08/16/communitynotes.html Ah, as a non-Twitter user I hadn't known about this. Neat. Quote   
4Seth Herd
This is the formalization of the concept "left hand whuffy" from Charlie Stross's "down and out in the magic kingdom", 2003. When people who usually disagree with people like you actually agree with you or like what you've said, that's special and deserves attention. I've always wanted to see it implemented. I don't usually tweet but I'll have to look at this.
3Said Achmiz
Down and Out in the Magic Kingdom was by Cory Doctorow, not Stross.
2Seth Herd
Good catch. I'd genuinely misremembered. I lump the two together, but generally far prefer Stross as a storyteller, even though Doctorow's futurism is also first-rate, in a different dimension. I found the story in Down and Out to be Stross-quality. That sort of good idea for a social network improvement is definitely signature Doctorow, though.
7Ebenezer Dukakis
Another idea is to upweight posts if they're made by a person in thought group A, but upvoted by people in thought group B.
4jacobjacob
Yeah, I'm interested in features in this space! Another idea is to implement a similar algorithm to Twitter's community votes: identify comments that have gotten upvotes by people who usually disagree with each other, and highlight those. 
4Roman Leventov
This idea is definitely simmering in many people's heads at the moment :)
3Nathan Young
How private are the LessWrong votes? Would you want to do it overall or blog by blog. Seems pretty doable.
3Nathan Helm-Burger
Currently, the information about who voted which way on what things is private to the individual who made the vote in question and the LW admins. So if doing this on LW votes, it'd need to be done in cooperation with the LW team.
2Nathan Helm-Burger
I'm pasting this here because it's the sort of thing I'd like to see. I'd like to see where I fall in it, and at least the anonymized position of others. Also, it'd be cool to track how I move over time. Movement over time should be expected unless we fall into the 'wrong sort of updateless decision theory' as jokingly described by TurnTrout (and term coined by Wei Dai). https://www.lesswrong.com/posts/j2W3zs7KTZXt2Wzah/how-do-you-feel-about-lesswrong-these-days-open-feedback?commentId=X7iBYqQzvEgsppcTb   

MondSemmel

1911

I still like the site, though I had to set the AI tag to -100 this year. One thing I wish was a bit different is that I've posted a whole bunch of LW-site-relevant feedback in comments (my natural inclination is to post comprehensive feedback on whatever content I interact with), and for a good fraction of them I've received no official reaction whatso