All of Ruby's Comments + Replies

I'm sympathetic to not wanting to live out remaining years being miserable, and think doing so would indeed be a mistake. I also acknowledge that living a good life in the Bay can be harder than than other places, but I also don't think impossible. I do think the challenges can be surmounted with effort and agency, and even then you might be worse off than other places, but still, the stakes are high.

2Lone Pine2h
The thing that's really a dealbreaker for me is the social scene. For some reason most people in America seem to hate me by default, with the Bay Area being particularly bad, and I could never figure out why given that other people with similar properties to me don't get socially abused all the time like I do. (Although I've definitely seen people that had it worse than I did.) If you could find me friends who won't treat me like shit, I might be interested (or at least, that might have worked a couple years ago; now I think I'm too accustomed to not being hated all the time and I don't think I could go back.) What I'm saying is, fix the damn social scene.
1[comment deleted]4h

Some people have very good reasons for not wanting to move (family that needs them, tied to partners who are tied to local work, etc), and not saying you're not one of them but still....

...we're talking about the single most consequential events in the history of human civilization (or Earth life, or perhaps the universe since its inception), but you won't get more involved in making them go well because it'd require living somewhere else? 

I'm writing this not just for you, but for all the many people who want to help with Alignment/x-risk but seem pretty capped in how much they're willing to give up for that (not claiming that I couldn't give more), but I get a missing mood from many people about it.


If the alignment problem is the most important problem in history, shouldn't alignment-focused endeavors be more willing to hire contributors who can't/won't relocate?

It's not like remote work isn't the easiest to implement that it's ever been in all of history.

Of course there needs to be some filtering out of candidates to ensure resources are devoted to the most promising individuals. But I really don't think that willingness to move correlates strongly enough with competence at solving alignment to warrant treating it like a dealbreaker.

9Lone Pine12h
I lived in the Bay Area for a long time, and I was very unhappy there due to the social scene, high cost of living, difficulty getting around, and the homeless problem. I have every reason to believe that London would be just about as bad. If we're going to die, I'm not going to spend the last years of my life being miserable. Not worth it.

The monthly Open & Welcome thread is a good general place for questions, otherwise you could make a top level question post about it.

The Alignment Forum is supposed to be a very high signal-to-noise place for Alignment content, where researchers can trust that all content they read will be material they're interested in seeing (even at the expense of some false negatives).

Answer by RubyMar 13, 202330


I'm afraid there isn't at this time, very unfortunately.

Curated. A parable explaining a probability lesson that many would benefit from – what's not to love? I like the format, I found the dialog/parable amusing rather than dry, and I think the point is valuable (and due to the format, memorable). I'll confess that I think this post will have me looking at blends of different forecasts more carefully, especially as regards to actual decision-making (particular regarding AI forecasts which are feeling increasingly relevant to decision-making these day).

Ugh, is this your homework? Will approve this in case someone feels like answering, but maybe try GPT/Bing


The hard part to me now seems to be in crafting some kind of useful standard rather than one in hindsight makes us go "well that sure have everyone a false sense of security".

I do like this post a lot, unfortunately it doesn't seem to resonate with many people.

I want to push back on anyone downvoting this because it's sexist, dehumanizing, and othering (rather than just being a bad model). I am sad if a model/analogy has those negative effect, but supposing the model/analogy in fact held and was informative, I think we should be able to discuss it. And even the possibility that something in the realm of gender relations has relevant lessons for Alignment seems like we should be able to discuss it.

Or alternatively stated, I want to push for Decoupling norms here.

In addition to what gears said, I think the sexist othering etc is not actually critical to the analogy, which is kind of the problem. "Figuring out the motives of people who kind of share goals with you but also have reasons to lie" is a pretty universal human experience. Adding some gender evopsych on top is just annoying (and prevents thinking about many of the more interesting ways in which this dynamic can play out).
8the gears to ascension1mo
In contexts where the model will not be used to make decisions about humans (which are rare!), sexist is when something is a bad model in the direction of sexism. There are real differences; accurate representations of them are not sexism. Those differences are quite small, and are often misunderstood as large in ways that produce nonsenical models. As @eukaryote [] wrote, the specific evopsych proposal under consideration here is privileging a hypothesis. Alternatively stated, you cannot convince me to decouple when there are real mechanistic reasons that the coupling exists, because then you're simply asking me to suspend my epistemic evaluation of the model. Of course, I also simply don't believe in decoupling norms in general because reductionism doesn't work to find the true mechanisms of reality in contexts where the mechanisms have significant amounts of complexity which is computationally intractable to discover by simulation, and therefore for practical purposes only exist as shapes in the macroscopic structure of worldstate; and decoupling/reductionism based models reliably mismodel those sorts of complex systems. One needs instead to figure out how to abstract over the coupling.

Curated. This post feels timely and warranted given the current climate. I think we, in our community, were already at some risk of throwing out our minds a decade ago, but it was less when it was easy to think timelines were 30-60 years. That allowed more time for play. Now as there's so much evidence of imminence and there are more people doing more things, AI x-risk isn't a side interest for many but a full-time occupation, yes, I think we're almost colluding in creating a culture that doesn't allow time for play. I like that this post makes the case fo... (read more)

People should be thinking about:

  • If you get truly choose your own work, is your judgment on what will help with alignment good? (this might be true for senior hires like evhub, unsure about others getting to choose)
  • If you are joining existing alignment teams, is their work actually good for reducing AI x-risk vs the opposite? For example, both OpenAI and Anthropic do some variant of RLHF, which is pretty controversial – as a prospective hire, have you formed a solid opinion on this question vs relying on the convenient answer that at least some people regar
... (read more)

Not saying you intended this, but I worry about people thinking "it's a an alignment role and therefore good" when considering joining companies that are pushing state of the art, and not thinking about it much harder than that.

What else should people be thinking about? You'd want to be sure that you'll, in fact, be allowed to work on alignment. But what other hidden downsides are there?

Curated. While mining biographies for patterns feels more dicey than a well-constructed study, I think there's real evidence here worth paying attention to. At the least, the patterns identified here seem worth promoting as hypotheses for ingredients of greatness. One thing that strikes me is how different these childhoods sound than the conventional ones in the modern world that consist of many hours of large-classroom schooling. I am pretty certain had I been tutored 1-1 for years, I'd have ended up knowing vastly more than I do, and ended up vastly more capable.

I am anticipate having my own child soon, and wonder how many elements here I'll be able to offer them.

Quick answer: I think might be able to help you with this, otherwise studying optimization and basic AI might clear this up.

Curated. This post is very cool. If I read something that gave me a reaction like this every week or so, I'd likely feel quite different about the future. I'll ride off Eliezer's comment for describing what's good about it:

Although I haven't had a chance to perform due diligence on various aspects of this work, or the people doing it, or perform a deep dive comparing this work to the current state of the whole field or the most advanced work on LLM exploitation being done elsewhere,

My current sense is that this work indicates promising people doing promisi

... (read more)

I want to register a weak but nonzero prediction that Anthropic’s interpretability publication of A Mathematical Framework for Transformers Circuits will turn out to lead to large capabilities gains and in hindsight will be regarded as a rather bad move that it was published.

Something like we’ll have capabilities-advancing papers citing it and using its framework to justify architecture improvements.

Not the same paper, but related: []
4the gears to ascension1mo
Agreed, and I don't think this is bad, nor that they did anything but become the people to implement what the zeitgeist demanded. It was the obvious next step, if they hadn't done it, someone else who cared less about trying to use it to make systems actually do what humans want would have done it. So the question is, are they going to release their work for others to use, or just hoard it until someone less scrupulous releases their models? It's looking like they're trying to keep it "in the family" so only corporations can use it. Kinda concerning. If human understandability hadn't happened, the next step might have been entirely automated sparsification, and those don't necessarily produce anything humans can use to understand. Distillation into understandable models is an extremely powerful trajectory.

Hey, sorry for not saying something sooner. As Screwtape says below, the LessWrong team was aware of this plan to make a survey. Since realistically we weren't going to run one ourselves, didn't make sense to get in the way of someone else doing one (and the questions seem reasonable).

I think putting something in title and prominently in the description like "Unofficial" would be good. Edit: I think if this post for the census got like 100 karma, that'd give it enough legitimacy from the community to be official even if the LW mod team wasn't endorsing it.

I'd say the status is that this not done with the collaboration/support of the LessWrong team, but neither do we wish to block it.

"Unofficial" is now in the post title, post description, and the title of of the Google form. Let me know if there's other changes you or the rest of the LessWrong team would like.   The extent of support that was particularly useful was having a the visibility front page brings, which it seems this now has. Thank you to whoever did that.

I agree they won't be enough in the long run. I've previously discussed with the team your suggestion for letting everyone post on Alignment Forum, doesn't yet seem like the right strategy, but we'll see. At least for now, a little indication via the tag defaults seems better than nothing.

Without trying to dissect this post carefully, I think there's something off here that'd be addressed by a more rigorous treatment of basic logic.

1Bryan Frances2mo
Thanks for the comment. I don't think there is any mistake in basic logic. I could formalize all of it in elementary symbolic logic. It would be first-order predicate logic, but still pretty basic.

EDIT: Oops, in a tired state I got muddled between this AMA post and the original introduction of dath ilan made in an April Fool's post in 2014 (golly, that's a while back)

When this was published, I had little idea of how ongoing a concept dath ilan would become in my world. I think there's value both in the further explorations of this (e.g. Mad Investor Chaos glowfic and other glowfics that illustrate a lot of rationality and better societal function than Earth), but also in just the underlying concept of "what would have produced you as the median pers... (read more)

I like this post for reinforcing a point that I consider important about intellectual progress, and for pushing against a failure mode of the Sequences-style rationalists.

As far as I can tell, intellectual progress is made bit by bit with later building on earlier Sequences. Francis Bacon gets credit for landmark evolution of the scientific method, but it didn't spring from nowhere, he was building on ideas that had built on ideas, etc.

This says the same is true for our flavor of rationality. It's built on many things, and not just probability theory.

The f... (read more)

I think I've known about happy/cheerful prices for a long time, (from before this post) and yet I find myself using the concept only once or twice a year, and not in a particularly important way. 

This was despite it seeming like a very valuable concept.

I think this is likely because people's happy prices can be quite high (too high to be acceptable) and yet it's worth it to still trade at less than this.

What I do think is valuable and this posts teaches, even if it's unintentionally, is you don't have to magically tied to the "market price" or "fair price" – you can just negotiate for what you want.

I was aware of this post and I think read it in 2021, but kind of bounced off it the dumb reason that "split and commit" sounds approximately synonymous with "disagree and commit", though Duncan is using it in a very different way.

In fact, the concept means something pretty damn useful, is my guess, and I can begin to see cases where I wish I was practicing this more. I intended to start. I might need to invent a synonym to make it feel less like an overloaded term. Or disagree and commit on matters of naming things :P

Curated. If I've understood correctly, "staring into the abyss" is an evocative way of saying "consider the uncomfortable and/or inconvenient. And I think the ability to do this is foundational to rationality. For most people considering that you are wrong requires considering something uncomfortable. Same for considering that you have room to improve. And the willingness to consider the uncomfortable often comes from having a clear map-territory distinction. You consider the uncomfortable because if it's real, it's real even if you don't consider it.

This ... (read more)

This announcement is a weird case for the LessWrong frontpage/personal distinction. I'm frontpaging it despite being an announcement because I expect the content of this podcast just to be pretty good material, the kind of world-modeling stuff I'd like to see on LessWrong.

For what it's worth, I consider the purpose of the episodes to be modelling the part of the world which is other people's models of things, which is a non-central case of "world-modelling".

Indeed ;) Still something about the conjugating feels off. But who knows I also thought "Lightcone" sounded too much like "Ice cream cone" to take off, but well...

I believe there are posts answering some of this in the Relationships tag.

Curated. Like Kaj_Sotala said, this concept feels intuitively natural (and useful), and one that I have thought without having a name for it (or very lucid explanation!). It seems right that many sentences are a bundling of lossy compression + checksum + illusion of transparency. Alas. I don't really like the particular word chosen (and one other LessWrong mod said the same), would be a shame if it didn't catch on for that reason. (I also liked the concept of "metacog" that Duncan defined elsewhere, but there too feel dissatisfied with the name, like I don't expect to use the concept with others till I've thought of another name.) Still, the concept(s) is good, and a benefit to society that you wrote it up so well!

You probably already know this, but: the "metacog" term comes from "metacognitive blindspot," i.e. when your metacognition is impaired in a way that prevents you from noticing, via metacognition, that you are impaired.

Hi louis030195, I appreciate your attempt to contribute! I want to flag that this post feels relatively low quality for LW (also your other one), hence the karma score. The moderation team (I'm on it) is currently uncertain between letting karma handle not great posts vs intervening and restricting how much users can post. I realize that might be confusing, I just mean to say that find a way to increase quality would be good.

Good flag. I was hesitant to include it but narrowly decided it was worth putting some credibility behind "I understand LW ideas quite well and this is my position which may or may not differ from yours."

I also second the advice others are giving that having the right kind of attitude to and relationship with your son is important.

Answer by RubyDec 17, 20227034

Hi concerned_dad,

As Svyatoslav suggests, the most effective way to approach this is to jointly explore with your son what is actually true about various drugs and their benefits and costs, and why they're treated the way they are. These are not necessarily easy questions to answer and could take some effortful research to actually get to the bottom of things. (I particularly like the advice here.) I doubt 3hr is enough.

To speculate, your son might currently be very excited by the discovery that yes, as it often might have seemed, the broader world has advi... (read more)

1Johannes C. Mayer3mo
Minor point: Having fun is not the only motivation one can have. One could end up doing a drug, even if they expect to have a bad time, but think it is worth it in the long run. I am talking especially about psychedelics.
Thanks!  I think one of the problems with his argument is that the anecdata he's collected from blogs and forums is weighted too highly in comparison to actual studies. He says he skimmed some papers that came up in Elicit searches but I don't know if that's enough, as they don't seem to have left him with a sufficient idea of the risks (or he's just really risk tolerant, as teenagers tend to be).  He also argues that everyone is different and the only way to find out if certain drugs will be beneficial or harmful is by experimenting on yourself, so "why keep reading studies" past a point, but this seems flawed especially without a concrete idea of the possible benefits.  He's probably seen this post by now, and I'm sure it'll result in a conversation that leads to a stronger conclusion, so I appreciate the contributions from everyone!

I'm the team lead for LessWrong and run/build/moderate the site.

I'm flagging this as concerning to have been considered relevant in this context (for generalized authority halo effect reasons; so it's about this coming to mind at all, not about having been mentioned in the comment).

I also second the advice others are giving that having the right kind of attitude to and relationship with your son is important.

You are probably subscribed to curated emails (checkbox at signup), you can turn those off in your account settings if you wish.

Hi, I see this is your first comment on LessWrong and it seems you came to join discussion about ChatGPT. This comment isn't quite in the preferred style for LessWrong, I recommend you read more of the comment threads to get a feel for discussion here. Cheers!

Thank you very much. Will read it.

Sorry, short on time, can't dig up links. Take a look at Inadequate Equilibria.

I think philosophy it might be less the case than any empirical field. Experts in biology have perhaps run experiments and seen the results, etc., whereas philosophy is arguments on the page that could easily be very detached from reality. 

And "more time spent" has some value, but not that much. There are people who've spent 10x more time driving a car than me, but are much worse because they weren't practicing and training the way I was. And more relevantly, you might say ... (read more)

Well, Chalmers has studied maths. The fact that someone is currently employed as a philosopher doesn't tell you much about their background, or side interests. Trust , of course , is irrelevant. You should consider the arguments. That would include the many untestable philosophical claims in the Sequences, of course.

I haven't read your post due to its extreme length, but to say something in response to your opening – I think much content on LW addresses the question of confidence contra putative experts on a field and high confidence often seems warranted. The most notable recent case is LW being ahead of the curve on Covid, but also see our latest curated post.

Could you link me to some of those posts.  I wouldn't agree with the heuristic 'never disagree with experts', but I'd generally -- particularly in an area like philosophy -- be wary of being super confident in a view that's extremely controversial among the people that have most seriously studied it.  

Curated. The question of inside view vs outside-view and expert deference vs own models has been debated before on LessWrong (and EA Forum), but this post does a superb job of making the case for the "use your own models more, trust your own information, be willing to go against the crowd and experts". It articulates the case clearly and crisply, in a way that I think is possibly more compelling than other sources.

A few points I particularly like:

The identification of selection bias on evidence in different directions:

The problem is that the bad consequenc

... (read more)

Curated! To ramble a bit on why: I love how this post makes me feel like I have a good sense of what John has been up to, been thinking about, and why, the insight of asking "how would an AI ensure a child AI is aligned with it?" feels substantive, and the optimism is nice and doesn't seem entirely foolhardy. Perhaps most significantly, it feels to me like a very big deal if alignment is moving towards something paradigmatic (should models and assumptions and questions and methods). I had thought that something we weren't going to get, but John does point ... (read more)

Thanks for pointing that out! The post was imported and unfortunately I don't think there's any easy or quick fix for this

Maybe we'll write up an FAQ on the topic, not sure, but I still wouldn't worry.

Yes please, I think that would be quite helpful. I'm no longer that scared of it but still has some background anxiety sometimes flaring up. I feel like an FAQ or at least some form of "official" explanation from knowledgeable ppl of why it's not a big deal would help a lot. :)

Hey, I wouldn't worry about it. I don't think anything productive will come of that.

Can you go in more detail?

Curated. The ELK paper/problem/challenge last year was a significant piece of work for our alignment community and my guess is hundreds of hours and maybe hundreds of thousands of dollars went into incentivizing solutions. Though prizes were awarded, I'm not aware that any particular proposed solution was deemed incredibly promising (or if it was, it wasn't something new), so I find it interesting to see what Paul and ARC have generated as they do stick on the same problem, roughly.

Speaking the truth is not something to be traded away, however costly it may be.

Stated without clarification, this is not something I'd say categorically and I suspect neither would you. As the classic example goes, you would probably lie to the Space Gestapo to prevent them from killing your mother.

2Ben Pace4mo
Yeah, I spoke too broadly there. I've replaced it with a different and more specific claim for now: Like, in this situation, the idea is trading away the ability for AI safety folks to speak the truth in order to be allowed just to openly talk to some people in power. Such a weak reward for such a great loss.

Curated. I like the central thesis of this post, but a further point I like about it is it takes the conversation beyond a simple binary of "are we doomed or not?", and "how doomed are we?" to a more interesting discussion of possible outcomes, their likelihoods, and the gears behind them. And I think that's epistemically healthy. I think it puts things into a mode of "make predictions for reasons" over "argue for a simplified position". Plus, this kind of attention to values and their origins is also one thing I think that hasn't gotten as much airtime on LessWrong and is important, both in remembering what we're fighting for (in very broad terms) and how we need to fight (i.e. what's ok to build).

This is so neat! I (32M) initially didn't look at this post (my brain had auto-completed it to "I made an epub" or something), but I'm familiar with this format and find the whole thing very cute. (I don't know how many people I expect to watch these, but I'm amused the exist. Kudos.)

I only watched one but would go for Minecraft or whatever the game with the cars is, less sudden and jerky movement.

Nice, thanks for the pointer. My overall guess is after surfing through / skimming philosophy literature on this for many hours is you can probably find all core ideas of this post somewhere in it, but it's pretty frustrating - scattered in many places and diluted by things which are more confused.

Curated. I've got to hand it to this post for raw unadulterated expression of pure Ravenclaw curiosity at how the world (and we ourselves) work. It is morbid and it's perhaps fortunate the images are broken, but I'm just enjoying how much the author is reveling in the knowledge and experience here. 

I like the generalized lesson here of GO LOOK AT THE WORLD, it's right there.

I don't know that I have the stomach to do this myself, but glad people are!

This point does not work literally as stated, and is vastly too underspecified to be useful not taken as 100%.

Load More