This post is a not a so secret analogy for the AI Alignment problem. Via a fictional dialog, Eliezer explores and counters common questions to the Rocket Alignment Problem as approached by the Mathematics of Intentional Rocketry Institute. 

MIRI researchers will tell you they're worried that "right now, nobody can tell you how to point your rocket’s nose such that it goes to the moon, nor indeed any prespecified celestial destination."

Thomas Kwa12h173
0
The cost of goods has the same units as the cost of shipping: $/kg. Referencing between them lets you understand how the economy works, e.g. why construction material sourcing and drink bottling has to be local, but oil tankers exist. * An iPhone costs $4,600/kg, about the same as SpaceX charges to launch it to orbit. [1] * Beef, copper, and off-season strawberries are $11/kg, about the same as a 75kg person taking a three-hour, 250km Uber ride costing $3/km. * Oranges and aluminum are $2-4/kg, about the same as flying them to Antarctica. [2] * Rice and crude oil are ~$0.60/kg, about the same as $0.72 for shipping it 5000km across the US via truck. [3,4] Palm oil, soybean oil, and steel are around this price range, with wheat being cheaper. [3] * Coal and iron ore are $0.10/kg, significantly more than the cost of shipping it around the entire world via smallish (Handysize) bulk carriers. Large bulk carriers are another 4x more efficient [6]. * Water is very cheap, with tap water $0.002/kg in NYC. But shipping via tanker is also very cheap, so you can ship it maybe 1000 km before equaling its cost. It's really impressive that for the price of a winter strawberry, we can ship a strawberry-sized lump of coal around the world 100-400 times. [1] iPhone is $4600/kg, large launches sell for $3500/kg, and rideshares for small satellites $6000/kg. Geostationary orbit is more expensive, so it's okay for them to cost more than an iPhone per kg, but Starlink wants to be cheaper. [2] https://fred.stlouisfed.org/series/APU0000711415. Can't find numbers but Antarctica flights cost $1.05/kg in 1996. [3] https://www.bts.gov/content/average-freight-revenue-ton-mile [4] https://markets.businessinsider.com/commodities [5] https://www.statista.com/statistics/1232861/tap-water-prices-in-selected-us-cities/ [6] https://www.researchgate.net/figure/Total-unit-shipping-costs-for-dry-bulk-carrier-ships-per-tkm-EUR-tkm-in-2019_tbl3_351748799
I wish there were an option in the settings to opt out of seeing the LessWrong reacts. I personally find them quite distracting, and I'd like to be able to hover over text or highlight it without having to see the inline annotations. 
I think that people who work on AI alignment (including me) have generally not put enough thought into the question of whether a world where we build an aligned AI is better by their values than a world where we build an unaligned AI. I'd be interested in hearing people's answers to this question. Or, if you want more specific questions: * By your values, do you think a misaligned AI creates a world that "rounds to zero", or still has substantial positive value? * A common story for why aligned AI goes well goes something like: "If we (i.e. humanity) align AI, we can and will use it to figure out what we should use it for, and then we will use it in that way." To what extent is aligned AI going well contingent on something like this happening, and how likely do you think it is to happen? Why? * To what extent is your belief that aligned AI would go well contingent on some sort of assumption like: my idealized values are the same as the idealized values of the people or coalition who will control the aligned AI? * Do you care about AI welfare? Does your answer depend on whether the AI is aligned? If we built an aligned AI, how likely is it that we will create a world that treats AI welfare as important consideration? What if we build a misaligned AI? * Do you think that, to a first approximation, most of the possible value of the future happens in worlds that are optimized for something that resembles your current or idealized values? How bad is it to mostly sacrifice each of these? (What if the future world's values are similar to yours, but is only kinda effectual at pursuing them? What if the world is optimized for something that's only slightly correlated with your values?) How likely are these various options under an aligned AI future vs. an unaligned AI future?
avturchin15h8-3
0
Roman Mazurenko is dead again. First resurrected person, Roman lived as a chatbot (2016-2024) created based on his conversations with his fiancé. You might even be able download him as an app.  But not any more. His fiancé married again and her startup http://Replika.ai pivoted from resurrection help to AI-girlfriends and psychological consulting.  It looks like they quietly removed Roman Mazurenko app from public access. It is especially pity that his digital twin lived less than his biological original, who died at 32. Especially now when we have much more powerful instruments for creating semi-uploads based on LLMs with large prompt window.
Elizabeth2d183
2
Check my math: how does Enovid compare to to humming? Nitric Oxide is an antimicrobial and immune booster. Normal nasal nitric oxide is 0.14ppm for women and 0.18ppm for men (sinus levels are 100x higher). journals.sagepub.com/doi/pdf/10.117… Enovid is a nasal spray that produces NO. I had the damndest time quantifying Enovid, but this trial registration says 0.11ppm NO/hour. They deliver every 8h and I think that dose is amortized, so the true dose is 0.88. But maybe it's more complicated. I've got an email out to the PI but am not hopeful about a response clinicaltrials.gov/study/NCT05109…   so Enovid increases nasal NO levels somewhere between 75% and 600% compared to baseline- not shabby. Except humming increases nasal NO levels by 1500-2000%. atsjournals.org/doi/pdf/10.116…. Enovid stings and humming doesn't, so it seems like Enovid should have the larger dose. But the spray doesn't contain NO itself, but compounds that react to form NO. Maybe that's where the sting comes from? Cystic fibrosis and burn patients are sometimes given stratospheric levels of NO for hours or days; if the burn from Envoid came from the NO itself than those patients would be in agony.  I'm not finding any data on humming and respiratory infections. Google scholar gives me information on CF and COPD, @Elicit brought me a bunch of studies about honey.   With better keywords google scholar to bring me a bunch of descriptions of yogic breathing with no empirical backing. There are some very circumstantial studies on illness in mouth breathers vs. nasal, but that design has too many confounders for me to take seriously.  Where I'm most likely wrong: * misinterpreted the dosage in the RCT * dosage in RCT is lower than in Enovid * Enovid's dose per spray is 0.5ml, so pretty close to the new study. But it recommends two sprays per nostril, so real dose is 2x that. Which is still not quite as powerful as a single hum. 

Popular Comments

Recent Discussion

I've seen a lot of news lately about the ways that particular LLMs score on particular tests.

Which if any of those tests can I go take online to see how my performance on them compares to the models?

Answer by plexApr 26, 202420

https://www.equistamp.com/evaluations has a bunch, including an alignment knowledge one they made.

Image from https://threadreaderapp.com/thread/1666482929772666880.html

Chris Olah recently released a tweet thread describing how the Anthropic team thinks about AI alignment difficulty. On this view, there is a spectrum of possible scenarios ranging from ‘alignment is very easy’ to ‘alignment is impossible’, and we can frame AI alignment research as a process of increasing the probability of beneficial outcomes by progressively addressing these scenarios. I think this framing is really useful, and here I have expanded on it by providing a more detailed scale of AI alignment difficulty and explaining some considerations that arise from it.

The discourse around AI safety is dominated by detailed conceptions of potential AI systems and their failure modes, along with ways to ensure their safety. This article by the DeepMind safety team provides an overview of some...

Behavioural Safety is Insufficient

Past this point, we assume following Ajeya Cotra that a strategically aware system which performs well enough to receive perfect human-provided external feedback has probably learned a deceptive human simulating model instead of the intended goal. The later techniques have the potential to address this failure mode. (It is possible that this system would still under-perform on sufficiently superhuman behavioral evaluations)

There are (IMO) plausible threat models in which alignment is very difficult but we don't n... (read more)

Crosspost from my blog.  

If you spend a lot of time in the blogosphere, you’ll find a great deal of people expressing contrarian views. If you hang out in the circles that I do, you’ll probably have heard of Yudkowsky say that dieting doesn’t really work, Guzey say that sleep is overrated, Hanson argue that medicine doesn’t improve health, various people argue for the lab leak, others argue for hereditarianism, Caplan argue that mental illness is mostly just aberrant preferences and education doesn’t work, and various other people expressing contrarian views. Often, very smart people—like Robin Hanson—will write long posts defending these views, other people will have criticisms, and it will all be such a tangled mess that you don’t really know what to think about them.

For...

There are 2 topics mixed here.

  1. Existence of the contrarians.
  2. Side effects of their existence.

My own opinion on 1 is that they are necessary in moderation. They are doing the "exploration" part in the "exploration-exploitation dilemma". By the very fact of their existence they allow the society in general to check alternatives and find more optimal solutions to the problems comparing to already known "best practices". It's important to remember that almost everything that we know now started from some contrarian - once it was a well established truth tha... (read more)

2gjm1h
Please don't write comments all in boldface. It feels like you're trying to get people to pay more attention to your comment than to others, and it actually makes your comment a little harder to read as well as making the whole thread uglier.
3tailcalled4h
I'm convinced by the mainstream view on COVID origins and medicine. I'm ambivalent on education - I guess if done well, it'd consistently have good effects, and that currently, it on average has good effects, but also the effect varies a lot from person to person, so simplistic quantitative reviews don't tell you much. When I did an epistemic spot check on Caplan's book, it failed terribly (it cited a supposedly-ingenious experiment that university didn't improve critical thinking, but IMO the experiment had terrible psychometrics). I don't know enough about sleep research to disagree with Guzey on the basis of anything but priors. In general, I wouldn't update much on someone writing a big review, because often reviews include a lot of crap information. I might have to read Jayman's rebuttal of B-W genetic IQ differences in more detail, but at first glance I'm not really convinced by it because it seems to focus on small sample sizes in unusual groups, so it's unclear how much study noise, publication bias and and sampling bias effects things. At this point I think indirect studies are getting obsolete and it's becoming more and more feasible to just directly measure the racial genetic differences in IQ. However I also think HBDers have a fractal of bad takes surrounding this, because they deny the phenotypic null hypothesis and center non-existent abstract personality traits like "impulsivity" or "conformity" in their models.
1omnizoid4h
It's not that piece.  It's another one that got eaten by a Substack glitch unfortuantely--hopefully it will be back up soon! 

Post for a somewhat more general audience than the modal LessWrong reader, but gets at my actual thoughts on the topic.

In 2018 OpenAI defeated the world champions of Dota 2, a major esports game. This was hot on the heels of DeepMind’s AlphaGo performance against Lee Sedol in 2016, achieving superhuman Go performance way before anyone thought that might happen. AI benchmarks were being cleared at a pace which felt breathtaking at the time, papers were proudly published, and ML tools like Tensorflow (released in 2015) were coming online. To people already interested in AI, it was an exciting era. To everyone else, the world was unchanged.

Now Saturday Night Live sketches use sober discussions of AI risk as the backdrop for their actual jokes, there are hundreds...

3zeshen1h
Thanks for this post. This is generally how I feel as well, but my (exaggerated) model of the AI aligment community would immediately attack me by saying "if you don't find AI scary, you either don't understand the arguments on AI safety or you don't know how advanced AI has gotten". In my opinion, a few years ago we were concerned about recursively self improving AIs, and that seemed genuinely plausible and scary. But somehow, they didn't really happen (or haven't happened yet) despite people trying all sorts of ways to make it happen. And instead of a intelligence explosion, what we got was an extremely predictable improvement trend which was a function of only two things i.e. data + compute. This made me qualitatively update my p(doom) downwards, and I was genuinely surprised that many people went the other way instead, updating upwards as LLMs got better. 

My p(doom) was low when I was predicting the yudkowsky model was ridiculous, due to machine learning knowledge I've had for a while. Now that we have AGI of the kind I was expecting, we have more people working on figuring out what the risks really are, and the previous concern of the only way to intelligence being RL seems to be only a small reassurance because non-imitation-learned RL agents who act in the real world is in fact scary. and recently, I've come to believe much of the risk is still real and was simply never about the kind of AI that has been... (read more)

8quetzal_rainbow1h
General meta-problem of such discussions is that direct counterargument to "LLMs are safe" is to tell how to make LLM unsafe, and it's not a good practice.
3Vladimir_Nesov4h
There is enough pre-training text data for $0.1-$1 trillion of compute, if we merely use repeated data and don't overtrain (that is, if we aim for quality, not inference efficiency). If synthetic data from the best models trained this way can be used to stretch raw pre-training data even a few times, this gives something like square of that more in useful compute, up to multiple trillions of dollars. Issues with LLMs start at autonomous agency, if it happens to be within the scope of scaling and scaffolding. They are thinking too fast, about 100 times faster than humans, and there are as many instances as there is compute. Resulting economic and engineering and eventually research activity will get out of hand. Culture isn't stable, especially for minds fundamentally this malleable developed under unusual and large economic pressures. If they are not initially much smarter than humans and can't get a handle on global coordination, culture drift, and alignment of superintelligence, who knows what kinds of AIs they end up foolishly building within a year or two.

Please don’t feel like you “won’t be welcome” just because you’re new to ACX/EA or demographically different from the average attendee. You'll be fine!

Exact location: https://plus.codes/8CCGPRJW+V8

We meet on top of a small hill East of the Linha d'Água café in Jardim Amália Rodrigues. For comfort, bring sunglasses and a blanket to sit on. There is some natural shade. Also, it can get quite windy, so bring a jacket.

(Location might change due to weather)

Please don’t feel like you “won’t be welcome” just because you’re new to ACX/EA or demographically different from the average attendee. You'll be fine!

Exact location: https://plus.codes/8CCGPRJW+V8

We meet on top of a small hill East of the Linha d'Água café in Jardim Amália Rodrigues. For comfort, bring sunglasses and a blanket to sit on. There is some natural shade. Also, it can get quite windy, so bring a jacket.

(Location might change due to weather)

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with
2Rafael Harth11h
Are people in rich countries happier on average than people in poor countries? (According to GPT-4, the academic consensus is that it does, but I'm not sure it's representing it correctly.) If so, why do suicide rates increase (or is that a false positive)? Does the mean of the distribution go up while the tails don't or something?
4peterbarnett10h
People in rich countries are happier than people in poor countries generally (this is both people who say they are "happy" or "very happy", and self-reported life satisfaction), see many of the graphs here https://ourworldindata.org/happiness-and-life-satisfaction In general it seems like richer countries also have lower suicide rates: "for every 1000 US dollar increase in the GDP per capita, suicide rates are reduced by 2%" 
Viliam41m40

Possible bias, that when famous and rich people kill themselves, everyone is discussing it, but when poor people kill themselves, no one notices?

Also, I wonder what technically counts as "suicide"? Is drinking yourself to death, or a "suicide by cop", or just generally overly risky behavior included? I assume not. And these seem to me like methods a poor person would choose, while the rich one would prefer a "cleaner" solution, such as a bullet or pills. So the reported suicide rates are probably skewed towards the legible, and the self-caused death rate of the poor could be much higher.

1MichaelDickens9h
Have there been any great discoveries made by someone who wasn't particularly smart? This seems worth knowing if you're considering pursuing a career with a low chance of high impact. Is there any hope for relatively ordinary people (like the average LW reader) to make great discoveries?

I asked ChatGPT 

Have there been any great discoveries made by someone who wasn't particularly smart? (i.e. average or below)

and it's difficult to get examples out of it. Even with additional drilling down and accusing it of being not inclusive of people with cognitive impairments, most of its examples are either pretty smart anyway, savants or only from poor backgrounds. The only ones I could verify that fit are:

  • Richard Jones accidentally created the Slinky
  • Frank Epperson, as a child, Epperson invented the popsicle
  • George Crum inadvertently invented pot
... (read more)
1Quinn14h
I eventually decided that human chauvinism approximately works most of the time because good successor criteria are very brittle. I'd prefer to avoid lock-in to my or anyone's values at t=2024, but such a lock-in might be "good enough" if I'm threatened with what I think are the counterfactual alternatives. If I did not think good successor criteria were very brittle, I'd accept something adjacent to E/Acc that focuses on designing minds which prosper more effectively than human minds. (the current comment will not address defining prosperity at different timesteps). In other words, I can't beat the old fragility of value stuff (but I haven't tried in a while). I wrote down my full thoughts on good successor criteria in 2021 https://www.lesswrong.com/posts/c4B45PGxCgY7CEMXr/what-am-i-fighting-for AI welfare: matters, but when I started reading lesswrong I literally thought that disenfranching them from the definition of prosperity was equivalent to subjecting them to suffering, and I don't think this anymore.

e/acc is not a coherent philosophy and treating it as one means you are fighting shadows.

Landian accelerationism at least is somewhat coherent. "e/acc" is a bundle of memes that support the self-interest of the people supporting and propagating it, both financially (VC money, dreams of making it big) and socially (the non-Beff e/acc vibe is one of optimism and hope and to do things -- to engage with the object level -- instead of just trying to steer social reality). A more charitable interpretation is that the philosophical roots of "e/acc" are founded up... (read more)

12ryan_greenblatt16h
I think misaligned AI is probably somewhat worse than no earth originating space faring civilization because of the potential for aliens, but also that misaligned AI control is considerably better than no one ever heavily utilizing inter-galactic resources. Perhaps half of the value of misaligned AI control is from acausal trade and half from the AI itself being valuable. You might be interested in When is unaligned AI morally valuable? by Paul. One key consideration here is that the relevant comparison is: * Human control (or successors picked by human control) * AI(s) that succeeds at acquiring most power (presumably seriously misaligned with their creators) Conditioning on the AI succeeding at acquiring power changes my views of what their plausible values are (for instance, humans seem to have failed at instilling preferences/values which avoid seizing control). Hmm, I guess I think that some fraction of resources under human control will (in expectation) be utilized according to the results of a careful reflection progress with an altruistic bent. I think resources which are used in mechanisms other than this take a steep discount in my lights (there is still some value from acausal trade with other entities which did do this reflection-type process and probably a bit of value from relatively-unoptimized-goodness (in my lights)). I overall expect that a high fraction (>50%?) of inter-galactic computational resources will be spent on the outputs of this sort of process (conditional on human control) because: * It's relatively natural for humans to reflect and grow smarter. * Humans who don't reflect in this sort of way probably don't care about spending vast amounts of inter-galactic resources. * Among very wealthy humans, a reasonable fraction of their resources are spent on altruism and the rest is often spent on positional goods that seem unlikely to consume vast quantities of inter-galactic resources. Probably not the same, but if I didn't thi
4ryan_greenblatt16h
You might be interested in discussion under this thread I express what seem to me to be some of the key considerations here (somewhat indirect).

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA