This post is a not a so secret analogy for the AI Alignment problem. Via a fictional dialog, Eliezer explores and counters common questions to the Rocket Alignment Problem as approached by the Mathematics of Intentional Rocketry Institute. 

MIRI researchers will tell you they're worried that "right now, nobody can tell you how to point your rocket’s nose such that it goes to the moon, nor indeed any prespecified celestial destination."

Are people in rich countries happier on average than people in poor countries? (According to GPT-4, the academic consensus is that it does, but I'm not sure it's representing it correctly.) If so, why do suicide rates increase (or is that a false positive)? Does the mean of the distribution go up while the tails don't or something?
I think that people who work on AI alignment (including me) have generally not put enough thought into the question of whether a world where we build an aligned AI is better by their values than a world where we build an unaligned AI. I'd be interested in hearing people's answers to this question. Or, if you want more specific questions: * By your values, do you think a misaligned AI creates a world that "rounds to zero", or still has substantial positive value? * A common story for why aligned AI goes well goes something like: "If we (i.e. humanity) align AI, we can and will use it to figure out what we should use it for, and then we will use it in that way." To what extent is aligned AI going well contingent on something like this happening, and how likely do you think it is to happen? Why? * To what extent is your belief that aligned AI would go well contingent on some sort of assumption like: my idealized values are the same as the idealized values of the people or coalition who will control the aligned AI? * Do you care about AI welfare? Does your answer depend on whether the AI is aligned? If we built an aligned AI, how likely is it that we will create a world that treats AI welfare as important consideration? What if we build a misaligned AI? * Do you think that, to a first approximation, most of the possible value of the future happens in worlds that are optimized for something that resembles your current or idealized values? How bad is it to mostly sacrifice each of these? (What if the future world's values are similar to yours, but is only kinda effectual at pursuing them? What if the world is optimized for something that's only slightly correlated with your values?) How likely are these various options under an aligned AI future vs. an unaligned AI future?
Roman Mazurenko is dead again. First resurrected person, Roman lived as a chatbot (2016-2024) created based on his conversations with his fiancé. You might even be able download him as an app.  But not any more. His fiancé married again and her startup pivoted from resurrection help to AI-girlfriends and psychological consulting.  It looks like they quietly removed Roman Mazurenko app from public access. It is especially pity that his digital twin lived less than his biological original, who died at 32. Especially now when we have much more powerful instruments for creating semi-uploads based on LLMs with large prompt window.
The cost of goods has the same units as the cost of shipping: $/kg. Referencing between them lets you understand how the economy works, e.g. why construction material sourcing and drink bottling has to be local. * An iPhone costs $4,600/kg, about the same as SpaceX charges to launch it to orbit. [1] * Beef is $11/kg, about the same as two 75kg people taking a 138km Uber ride costing $3/km. [6] * Strawberries cost $2-4/kg, about the same as flying them to Antarctica. [2] * Rice and crude oil are ~$0.60/kg, about the same the $0.72 for shipping it 5000km across the US via truck. [3,4] Palm oil, soybean oil, and steel are around this price range, with wheat being cheaper. [3] * Coal and iron ore are $0.10/kg, about the cost of shipping it the 10,000 km from Shanghai to LA via international sea freight. The shipping cost is actually lower because bulk carriers can be used rather than container ships. * Water is very cheap, with tap water $0.002/kg in NYC. But sea freight is also very cheap, so you can ship it 200 km before equaling the cost of the water. With SF prices and a dedicated tanker, I would guess you can get close to 1000 km. [1] iPhone is $4600/kg, large launches sell for $3500/kg, and rideshares for small satellites $6000/kg. Geostationary orbit is more expensive, so it's okay for them to cost more than an iPhone per kg, but Starlink has to be cheaper. [2] Can't find numbers but this cost $1.05/kg in 1996. [3] [4] [5]
Check my math: how does Enovid compare to to humming? Nitric Oxide is an antimicrobial and immune booster. Normal nasal nitric oxide is 0.14ppm for women and 0.18ppm for men (sinus levels are 100x higher).… Enovid is a nasal spray that produces NO. I had the damndest time quantifying Enovid, but this trial registration says 0.11ppm NO/hour. They deliver every 8h and I think that dose is amortized, so the true dose is 0.88. But maybe it's more complicated. I've got an email out to the PI but am not hopeful about a response…   so Enovid increases nasal NO levels somewhere between 75% and 600% compared to baseline- not shabby. Except humming increases nasal NO levels by 1500-2000%.…. Enovid stings and humming doesn't, so it seems like Enovid should have the larger dose. But the spray doesn't contain NO itself, but compounds that react to form NO. Maybe that's where the sting comes from? Cystic fibrosis and burn patients are sometimes given stratospheric levels of NO for hours or days; if the burn from Envoid came from the NO itself than those patients would be in agony.  I'm not finding any data on humming and respiratory infections. Google scholar gives me information on CF and COPD, @Elicit brought me a bunch of studies about honey.   With better keywords google scholar to bring me a bunch of descriptions of yogic breathing with no empirical backing. There are some very circumstantial studies on illness in mouth breathers vs. nasal, but that design has too many confounders for me to take seriously.  Where I'm most likely wrong: * misinterpreted the dosage in the RCT * dosage in RCT is lower than in Enovid * Enovid's dose per spray is 0.5ml, so pretty close to the new study. But it recommends two sprays per nostril, so real dose is 2x that. Which is still not quite as powerful as a single hum. 

Popular Comments

Recent Discussion

U.S. Secretary of Commerce Gina Raimondo announced today additional members of the executive leadership team of the U.S. AI Safety Institute (AISI), which is housed at the National Institute of Standards and Technology (NIST). Raimondo named Paul Christiano as Head of AI Safety, Adam Russell as Chief Vision Officer, Mara Campbell as Acting Chief Operating Officer and Chief of Staff, Rob Reich as Senior Advisor, and Mark Latonero as Head of International Engagement. They will join AISI Director Elizabeth Kelly and Chief Technology Officer Elham Tabassi, who were announced in February. The AISI was established within NIST at the direction of President Biden, including to support the responsibilities assigned to the Department of Commerce under the President’s landmark Executive Order.

Paul Christiano, Head of AI Safety, will design

But what? Should we insist that the entire time someone's inside a BSL-4 lab, we have a second person who is an expert in biosafety visually monitoring them to ensure they don't make mistakes? Or should their air supply not use filters and completely safe PAPRs, and feed them outside air though a tube that restricts their ability to move around instead?  Or do you have some new idea that isn't just a ban with more words?   Sure, list-based approaches are insufficient, but they have relatively little to do with biosafety levels of labs, they have to do with risk groups, which are distinct, but often conflated. (So Ebola or Smallpox isn't a "BSL-4" pathogen, because there is no such thing. ) That ban didn't go far enough, since it only applied to 3 pathogen types, and wouldn't have banned what Wuhan was doing with novel viruses, since that wasn't working with SARS or MERS, it was working with other species of virus. So sure, we could enforce a broader version of that ban, but getting a good definition that's both extensive enough to prevent dangerous work and that doesn't ban obviously useful research is very hard.
Having written extensively about it, I promise you I'm aware. But please, tell me more about how this supports the original claim which I have been disagreeing with, that these class of incidents were or are the primary concern of the EA biosecurity community, the one that led to it being a cause area.

I agree there other problems the EA biosecurity community focuses on, but surely lab escapes are one of those problems, and part of the reason we need biosecurity measures? In any case, this disagreement seems beside the main point that I took Adam to be making, namely that the track record for defining appropriate units of risk for poorly understood, high attack surface domains is quite bad (as with BSL). This still seems true to me.   

The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.

But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.

Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.


The RLCT = first-order term for in-distribution generalization error

Clarification: The 'derivation' for how the RLCT predicts generalization error IIRC goes through the same flavour of argument as the one the derivation of the vanilla Bayesian Information Criterion uses. I don't like this derivation very much. See e.g. this one on Wikipedia. 

So what it's actually showing is just that:

  1. If you've got a class of different hypotheses , containing many individual hypotheses  .
  2. And you've got a prior ahead of time that s
... (read more)
What's more likely: You being wrong about the obviousness of the sphere Earth theory to sailors, or the entire written record (which included information from people who had extensive access to the sea) of two thousand years of Chinese history and astronomy somehow ommitting the spherical Earth theory? Not to speak of other pre-Hellenistic seafaring cultures which also lack records of having discovered the sphere Earth theory.
4Lucius Bushnaq3h
It's measuring the volume of points in parameter space with loss <ϵ when ϵ is infinitesimal.  This is slightly tricky because it doesn't restrict itself to bounded parameter spaces,[1] but you can fix it with a technicality by considering how the volume scales with ϵ instead. In real networks trained with finite amounts of data, you care about the case where ϵ is small but finite, so this is ultimately inferior to just measuring how many configurations of floating point numbers get loss <ϵ, if you can manage that. I still think SLT has some neat insights that helped me deconfuse myself about networks. For example, like lots of people, I used to think you could maybe estimate the volume of basins with loss <ϵ using just the eigenvalues of the Hessian. You can't. At least not in general.    1. ^ Like the floating point numbers in a real network, which can only get so large. A prior of finite width over the parameters also effectively bounds the space

Post for a somewhat more general audience than the modal LessWrong reader, but gets at my actual thoughts on the topic.

In 2018 OpenAI defeated the world champions of Dota 2, a major esports game. This was hot on the heels of DeepMind’s AlphaGo performance against Lee Sedol in 2016, achieving superhuman Go performance way before anyone thought that might happen. AI benchmarks were being cleared at a pace which felt breathtaking at the time, papers were proudly published, and ML tools like Tensorflow (released in 2015) were coming online. To people already interested in AI, it was an exciting era. To everyone else, the world was unchanged.

Now Saturday Night Live sketches use sober discussions of AI risk as the backdrop for their actual jokes, there are hundreds...

I don't believe that data is limiting because the finite data argument only applies to pretraining. Models can do self-critique or be objectively rated on their ability to perform tasks, and trained via RL. This is how humans learn, so it is possible to be very sample-efficient, and currently a small proportion of training compute is RL.

If the majority of training compute and data are outcome-based RL, it is not clear that the "Playing human roles is pretty human" section holds, because the system is not primarily trained to play human roles.

This comes from a podcast called 18Forty, of which the main demographic of Orthodox Jews. Eliezer's sister (Hannah) came on and talked about her Sheva Brachos, which is essentially the marriage ceremony in Orthodox Judaism. People here have likely not seen it, and I thought it was quite funny, so here it is:

David Bashevkin:

So I want to shift now and I want to talk about something that full disclosure, we recorded this once before and you had major hesitation for obvious reasons. It’s very sensitive what we’re going to talk about right now, but really for something much broader, not just because it’s a sensitive personal subject, but I think your hesitation has to do with what does this have to do with the subject at hand?...

From the title I expected this to be embarrassing for Eliezer, but that was actually extremely sweet, and good advice!

Are people in rich countries happier on average than people in poor countries? (According to GPT-4, the academic consensus is that it does, but I'm not sure it's representing it correctly.) If so, why do suicide rates increase (or is that a false positive)? Does the mean of the distribution go up while the tails don't or something?

The next monthly discussion meetup is Saturday, May 4 @ 2 PM (see below for location etc.). For this meetup, we’ll be discussing the relationship between social media and mental health, with a focus on arguments by psychologist Jonathan Haidt (particularly from his new book) and some criticisms of his interpretation of the evidence. There are readings, a podcast, and a video below, BUT feel free to come whether or not you’ve reviewed any of it, as always. (NOTE: The June topic/discussion meetup will be moved to Saturday, June 8 @ 2 PM)


To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with
This is a linkpost for

A friend has spent the last three years hounding me about seed oils. Every time I thought I was safe, he’d wait a couple months and renew his attack:

“When are you going to write about seed oils?”

“Did you know that seed oils are why there’s so much {obesity, heart disease, diabetes, inflammation, cancer, dementia}?”

“Why did you write about {meth, the death penalty, consciousness, nukes, ethylene, abortion, AI, aliens, colonoscopies, Tunnel Man, Bourdieu, Assange} when you could have written about seed oils?”

“Isn’t it time to quit your silly navel-gazing and use your weird obsessive personality to make a dent in the world—by writing about seed oils?”

He’d often send screenshots of people reminding each other that Corn Oil is Murder and that it’s critical that we overturn our lives...

i'm sorry not to be engaging with the content of the post here; hopefully others have that covered. but i just wanna say, man this is so well written! at the sentence and paragraph level especially, i find it inspiring. it makes me wanna write more like i'm drunk and dgaf, though i doubt that exact thing would actually suffice to allow me to hit a similar stylistic target.

(the rest of this comment is gonna be largely for me and my own development, but maybe you'll like reading it anyway.)

i think you do a bunch of stuff that current me is too chicken to try... (read more)

I would consider most bread sold in stores to be processed or ultra processed and I think that's a pretty standard view but it's true there might be some confusion. I would consider all of those to be processed and unhealthy and I think thats a pretty standard view, but fair enough if there's some confusion around those things. I guess my view is that it's mostly not hogwash? The least healthy things are clearly and broadly much more processed than the healthiest things.
I typically consume my greens with ground flax seeds in a smoothie. I feel very confident that adding refined oil to vegetables shouldn't be considered healthy, in the sense that the opportunity cost of 1 Tablespoon of olive oil is 120 calories, which is over a pound of spinach for example. Certainly it's difficult to eat that much spinach and it's probably unwise, but I just say that to illustrate that you can get a lot more nutrition from 120 calories than the oil will be adding, even if it makes the greens more bioavailable. That said "healthy" is a complicated concept. If adding some oil to greens helps something eat greens they otherwise wouldn't eat for example, that's great.
Raw spinach in particular also has high levels of oxalic acid, which can interfere with the absorption of other nutrients, and cause kidney stones when binding with calcium. Processing it by cooking can reduce its concentration and impact significantly without reducing other nutrients in the spinach as much. Grinding and blending foods is itself processing. I don't know what impact it has on nutrition, but mechanically speaking, you can imagine digestion proceeding differently depending on how much of it has already been done. You do need a certain amount of macronutrients each day, and some from fat. You also don't necessarily want to overindulge on every micronutrient. If we're putting a number of olives in our salad equivalent to the amount of olive oil we'd otherwise use, we'll say 100 4g olives, that we've lowered the sodium from by some means to keep that reasonable ... that's 72% of recommended daily value of our iron and 32% of our calcium. We just mentioned that spinach + calcium can be a problem; and the pound of spinach itself contains 67% of iron and 45% of our calcium.  ... That's also 460 calories worth of olives. I'm not sure if we've balanced our salad optimally here. Admittedly, if I'm throwing this many olives in with this much spinach in the first place, I'm probably going to cook the spinach, throw in some pesto and grains or grain products, and then I've just added more olive oil back in again ... ;) And yeah, greens with oil might taste better or be easier to eat than greens just with fatty additions like nuts, seeds, meat, or eggs. 

Crosspost from my blog.  

If you spend a lot of time in the blogosphere, you’ll find a great deal of people expressing contrarian views. If you hang out in the circles that I do, you’ll probably have heard of Yudkowsky say that dieting doesn’t really work, Guzey say that sleep is overrated, Hanson argue that medicine doesn’t improve health, various people argue for the lab leak, others argue for hereditarianism, Caplan argue that mental illness is mostly just aberrant preferences and education doesn’t work, and various other people expressing contrarian views. Often, very smart people—like Robin Hanson—will write long posts defending these views, other people will have criticisms, and it will all be such a tangled mess that you don’t really know what to think about them.


It may be useful to write about how a consumer can distinguish contrarian takes from original insights. Until that's a common skill, there will remain a market for contrarians.

I think binary examples are deceptive in the reversed stupidity is not intelligence sense. Thinking through things from first principles is most important in areas that are new or rapidly changing where there are fewer references classes and experts to talk to. It's also helpful for areas where the consensus view is optimized for someone very unlike you.
I tend to read most of the high-profile contrarians with a charitable (or perhaps condescending) presumption that they're exaggerating for effect.  They may say something in a forceful tone and imply that it's completely obvious and irrefutable, but that's rhetoric rather than truth.   In fact, if they're saying "the mainstream and common belief should move some amount toward this idea", I tend to agree with a lot of it (not all - there's a large streak of "contrarian success on some topics causes very strong pressure toward more contrarianism" involved).

People have been posting great essays so that they're "fed through the standard LessWrong algorithm." This essay is in the public domain in the UK but not the US.

From a very early age, perhaps the age of five or six, I knew that when I grew up I should be a writer. Between the ages of about seventeen and twenty-four I tried to abandon this idea, but I did so with the consciousness that I was outraging my true nature and that sooner or later I should have to settle down and write books.

I was the middle child of three, but there was a gap of five years on either side, and I barely saw my father before I was eight. For this and other reasons I...

Orwell is one of my personal heroes, 1984 was a transformative book to me, and I strongly recommend Homage to Catalonia as well. That said, I'm not sure making theories of art is worth it. Even when great artists do it (Tolkien had a theory of art, and Oscar Wilde, and Flannery O'Connor, and almost every artist if you look close enough), it always seems to be the kind of theory which suits that artist and nobody else. Would advice like "good prose is like a windowpane" or "efface your own personality" improve the writing of, say, Hunter S. Thompson? Heck no, his writing is the opposite of that and charming for it! Maybe the only possible advice to an artist is to follow their talent, and advising anything more specific is as likely to hinder as help.

The theories are probably just rationalizations anyway.


A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA