This post is a not a so secret analogy for the AI Alignment problem. Via a fictional dialog, Eliezer explores and counters common questions to the Rocket Alignment Problem as approached by the Mathematics of Intentional Rocketry Institute. 

MIRI researchers will tell you they're worried that "right now, nobody can tell you how to point your rocket’s nose such that it goes to the moon, nor indeed any prespecified celestial destination."

Eric Neyman13h17-9
5
I think that people who work on AI alignment (including me) have generally not put enough thought into the question of whether a world where we build an aligned AI is better by their values than a world where we build an unaligned AI. I'd be interested in hearing people's answers to this question. Or, if you want more specific questions: * By your values, do you think a misaligned AI creates a world that "rounds to zero", or still has substantial positive value? * A common story for why aligned AI goes well goes something like: "If we (i.e. humanity) align AI, we can and will use it to figure out what we should use it for, and then we will use it in that way." To what extent is aligned AI going well contingent on something like this happening, and how likely do you think it is to happen? Why? * To what extent is your belief that aligned AI would go well contingent on some sort of assumption like: my idealized values are the same as the idealized values of the people or coalition who will control the aligned AI? * Do you care about AI welfare? Does your answer depend on whether the AI is aligned? If we built an aligned AI, how likely is it that we will create a world that treats AI welfare as important consideration? What if we build a misaligned AI? * Do you think that, to a first approximation, most of the possible value of the future happens in worlds that are optimized for something that resembles your current or idealized values? How bad is it to mostly sacrifice each of these? (What if the future world's values are similar to yours, but is only kinda effectual at pursuing them? What if the world is optimized for something that's only slightly correlated with your values?) How likely are these various options under an aligned AI future vs. an unaligned AI future?
Elizabeth1d183
1
Check my math: how does Enovid compare to to humming? Nitric Oxide is an antimicrobial and immune booster. Normal nasal nitric oxide is 0.14ppm for women and 0.18ppm for men (sinus levels are 100x higher). journals.sagepub.com/doi/pdf/10.117… Enovid is a nasal spray that produces NO. I had the damndest time quantifying Enovid, but this trial registration says 0.11ppm NO/hour. They deliver every 8h and I think that dose is amortized, so the true dose is 0.88. But maybe it's more complicated. I've got an email out to the PI but am not hopeful about a response clinicaltrials.gov/study/NCT05109…   so Enovid increases nasal NO levels somewhere between 75% and 600% compared to baseline- not shabby. Except humming increases nasal NO levels by 1500-2000%. atsjournals.org/doi/pdf/10.116…. Enovid stings and humming doesn't, so it seems like Enovid should have the larger dose. But the spray doesn't contain NO itself, but compounds that react to form NO. Maybe that's where the sting comes from? Cystic fibrosis and burn patients are sometimes given stratospheric levels of NO for hours or days; if the burn from Envoid came from the NO itself than those patients would be in agony.  I'm not finding any data on humming and respiratory infections. Google scholar gives me information on CF and COPD, @Elicit brought me a bunch of studies about honey.   With better keywords google scholar to bring me a bunch of descriptions of yogic breathing with no empirical backing. There are some very circumstantial studies on illness in mouth breathers vs. nasal, but that design has too many confounders for me to take seriously.  Where I'm most likely wrong: * misinterpreted the dosage in the RCT * dosage in RCT is lower than in Enovid * Enovid's dose per spray is 0.5ml, so pretty close to the new study. But it recommends two sprays per nostril, so real dose is 2x that. Which is still not quite as powerful as a single hum. 
keltan7h42
0
A potentially good way to avoid low level criminals scamming your family and friends with a clone of your voice is to set a password that you each must exchange. An extra layer of security might be to make the password offensive, an info hazard, or politically sensitive. Doing this, criminals with little technical expertise will have a harder time bypassing corporate language filters. Good luck getting the voice model to parrot a basic meth recipe!
A tension that keeps recurring when I think about philosophy is between the "view from nowhere" and the "view from somewhere", i.e. a third-person versus first-person perspective—especially when thinking about anthropics. One version of the view from nowhere says that there's some "objective" way of assigning measure to universes (or people within those universes, or person-moments). You should expect to end up in different possible situations in proportion to how much measure your instances in those situations have. For example, UDASSA ascribes measure based on the simplicity of the computation that outputs your experience. One version of the view from somewhere says that the way you assign measure across different instances should depend on your values. You should act as if you expect to end up in different possible future situations in proportion to how much power to implement your values the instances in each of those situations has. I'll call this the ADT approach, because that seems like the core insight of Anthropic Decision Theory. Wei Dai also discusses it here. In some sense each of these views makes a prediction. UDASSA predicts that we live in a universe with laws of physics that are very simple to specify (even if they're computationally expensive to run), which seems to be true. Meanwhile the ADT approach "predicts" that we find ourselves at an unusually pivotal point in history, which also seems true. Intuitively I want to say "yeah, but if I keep predicting that I will end up in more and more pivotal places, eventually that will be falsified". But.... on a personal level, this hasn't actually been falsified yet. And more generally, acting on those predictions can still be positive in expectation even if they almost surely end up being falsified. It's a St Petersburg paradox, basically. Very speculatively, then, maybe a way to reconcile the view from somewhere and the view from nowhere is via something like geometric rationality, which avoids St Petersburg paradoxes. And more generally, it feels like there's some kind of multi-agent perspective which says I shouldn't model all these copies of myself as acting in unison, but rather as optimizing for some compromise between all their different goals (which can differ even if they're identical, because of indexicality). No strong conclusions here but I want to keep playing around with some of these ideas (which were inspired by a call with @zhukeepa). This was all kinda rambly but I think I can summarize it as "Isn't it weird that ADT tells us that we should act as if we'll end up in unusually important places, and also we do seem to be in an incredibly unusually important place in the universe? I don't have a story for why these things are related but it does seem like a suspicious coincidence."
There was this voice inside my head that told me that since I got Something to protect, relaxing is never ok above strict minimum, the goal is paramount, and I should just work as hard as I can all the time. This led me to breaking down and being incapable to work on my AI governance job for a week, as I just piled up too much stress. And then, I decided to follow what motivated me in the moment, instead of coercing myself into working on what I thought was most important, and lo and behold! my total output increased, while my time spent working decreased. I'm so angry and sad at the inadequacy of my role models, cultural norms, rationality advice, model of the good EA who does not burn out, which still led me to smash into the wall despite their best intentions. I became so estranged from my own body and perceptions, ignoring my core motivations, feeling harder and harder to work. I dug myself such deep a hole. I'm terrified at the prospect to have to rebuild my motivation myself again.

Popular Comments

Recent Discussion

It seems to me worth trying to slow down AI development to steer successfully around the shoals of extinction and out to utopia.

But I was thinking lately: even if I didn’t think there was any chance of extinction risk, it might still be worth prioritizing a lot of care over moving at maximal speed. Because there are many different possible AI futures, and I think there’s a good chance that the initial direction affects the long term path, and different long term paths go to different places. The systems we build now will shape the next systems, and so forth. If the first human-level-ish AI is brain emulations, I expect a quite different sequence of events to if it is GPT-ish.

People genuinely pushing for AI speed over care (rather than just feeling impotent) apparently think there is negligible risk of bad outcomes, but also they are asking to take the first future to which there is a path. Yet possible futures are a large space, and arguably we are in a rare plateau where we could climb very different hills, and get to much better futures.

1EGI7h
What you are missing here is: * Existential risk apart from AI * People are dying / suffering as we hesitate Yes, there is a good argument that we need to solve alignment first to get ANY good outcome, but once an acceptable outcome is reasonably likely, hesitation is probably bad. Especially if you consider the likelihood that mere humans can accurately predict, let alone precisely steer a transhuman future.
3No77e6h
From a purely utilitarian standpoint, I'm inclined to think that the cost of delaying is dwarfed by the number of future lives saved by getting a better outcome, assuming that delaying does increase the chance of a better future. That said, after we know there's "no chance" of extinction risk, I don't think delaying would likely yield better future outcomes. On the contrary, I suspect getting the coordination necessary to delay means it's likely that we're giving up freedoms in a way that may reduce the value of the median future and increase the chance of stuff like totalitarian lock-in, which decreases the value of the average future overall. I think you're correct that there's also to balance the "other existential risks exist" consideration in the calculation, although I don't expect it to be clear-cut.
12David Hornbein10h
What is the mechanism, specifically, by which going slower will yield more "care"? What is the mechanism by which "care" will yield a better outcome? I see this model asserted pretty often, but no one ever spells out the details. I've studied the history of technological development in some depth, and I haven't seen anything to convince me that there's a tradeoff between development speed on the one hand, and good outcomes on the other.

Disclaimer: I don't necessarily support this view, I thought about it for like 5 minutes but I thought it made sense.

If we were to do things the same thing as other slowing down of regulation, then that might make sense, but I'm uncertain that you can take the outside view here? 

Yes, we can do the same as for other technologies by leaving it down to the standard government procedures to make legislation and then I might agree with you that slowing down might not lead to better outcomes. Yet, we don't have to do this. We can use other processes that mi... (read more)

This work represents progress on removing attention head superposition. We are excited by this approach but acknowledge there are currently various limitations. In the short term, we will be working on adjacent problems are excited to collaborate with anyone thinking about similar things! 

Produced as part of the ML Alignment & Theory Scholars Program - Summer 2023 Cohort

Summary: In transformer language models, attention head superposition makes it difficult to study the function of individual attention heads in isolation. We study a particular kind of attention head superposition that involves constructive and destructive interference between the outputs of different attention heads. We propose a novel architecture - a ‘gated attention block’ - which resolves this kind of attention head superposition in toy models. In future, we hope this architecture may be...

cmathw7m10

Thank you for the comment! Yep that is correct, I think perhaps variants of this approach could still be useful for resolving other forms of superposition within a single attention layer but not currently across different layers.

The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.

But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.

Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.

To...

Algon13m20

Second most? What's the first? Linearization of a Newtonian V(r) about the earth's surface?

4ChristianKl1h
Merchants were a lot weaker in China than in Europe. Chinese merchants also did a lot less sea voyages due to geography. If a bunch of low-status merchants believed that the Earth is a sphere it might not have influenced Chinese high-class beliefs in the same way as beliefs of political powerful merchants in Europe.
2Garrett Baker2h
[edit: nevermind I see you already know about the following quotes. There's other evidence of the influence in Sedley's book I link below] In De Reum Natura around line 716: Or for a more modern translation from Sedley's Lucretius and the Transformation of Greek Wisdom
7Alexander Gietelink Oldenziel3h
Did I just say SLT is the Newtonian gravity of deep learning? Hubris of the highest order! But also yes... I think I am saying that * Singular Learning Theory is the first highly accurate model of breath of optima. *  SLT tells us to look at a quantity Watanabe calls λ, which has the highly-technical name 'real log canonical threshold (RLCT). He proves several equivalent ways to describe it one of which is as the (fractal) volume scaling dimension around the optima. * By computing simple examples (see Shaowei's guide in the links below) you can check for yourself how the RLCT picks up on basin broadness. * The RLCT =λ first-order term for in-distribution generalization error and also Bayesian learning (technically the 'Bayesian free energy').  This justifies the name of 'learning coefficient' for lambda. I emphasize that these are mathematically precise statements that have complete proofs, not conjectures or intuitions.  * Knowing a little SLT will inoculate you against many wrong theories of deep learning that abound in the literature. I won't be going in to it but suffice to say that any paper assuming that the Fischer information metric is regular for deep neural networks or any kind of hierarchichal structure is fundamentally flawed. And you can be sure this assumption is sneaked in all over the place. For instance, this is almost always the case when people talk about Laplace approximation. * It's one of the most computationally applicable ones we have? Yes. SLT quantities like the RLCT can be analytically computed for many statistical models of interest, correctly predicts phase transitions in toy neural networks and it can be estimated at scale. EDIT: no hype about future work. Wait and see ! :)

This post brings together various questions about the college application process, as well as practical considerations of where to apply and go. We are seeing some encouraging developments, but mostly the situation remains rather terrible for all concerned.

Application Strategy and Difficulty

Paul Graham: Colleges that weren’t hard to get into when I was in HS are hard to get into now. The population has increased by 43%, but competition for elite colleges seems to have increased more. I think the reason is that there are more smart kids. If so that’s fortunate for America.

Are college applications getting more competitive over time?

Yes and no.

  1. The population size is up, but the cohort size is roughly the same.
  2. The standard ‘effort level’ of putting in work and sacrificing one’s childhood and gaming
...

200s

*2000s

1xpym9h
So, the signalling value of their degrees should be decreasing accordingly, unless one mainly intends to take advantage of the process. Has some tangible evidence of that appeared already, and are alternative signalling opportunities emerging?
6Wei Dai16h
Some of my considerations for college choice for my kid, that I suspect others may also want to think more about or discuss: 1. status/signaling benefits for the parents (This is probably a major consideration for many parents to push their kids into elite schools. How much do you endorse it?) 2. sex ratio at the school and its effect on the local "dating culture" 3. political/ideological indoctrination by professors/peers 4. workload (having more/less time/energy to pursue one's own interests)
3Jacob G-W18h
I'm assuming the recent protests about the Gaza war: https://www.nytimes.com/live/2024/04/24/us/columbia-protests-mike-johnson
This is a linkpost for https://arxiv.org/abs/2404.16014

Authors: Senthooran Rajamanoharan*, Arthur Conmy*, Lewis Smith, Tom Lieberum, Vikrant Varma, János Kramár, Rohin Shah, Neel Nanda

A new paper from the Google DeepMind mech interp team: Improving Dictionary Learning with Gated Sparse Autoencoders! 

Gated SAEs are a new Sparse Autoencoder architecture that seems to be a significant Pareto-improvement over normal SAEs, verified on models up to Gemma 7B. They are now our team's preferred way to train sparse autoencoders, and we'd love to see them adopted by the community! (Or to be convinced that it would be a bad idea for them to be adopted by the community!)

They achieve similar reconstruction with about half as many firing features, and while being either comparably or more interpretable (confidence interval for the increase is 0%-13%).

See Sen's Twitter summary, my Twitter summary, and the paper!

I refuse to join any club that would have me as a member.

— Groucho Marx

Alice and Carol are walking on the sidewalk in a large city, and end up together for a while.

"Hi, I'm Alice! What's your name?"

Carol thinks:

If Alice is trying to meet people this way, that means she doesn't have a much better option for meeting people, which reduces my estimate of the value of knowing Alice. That makes me skeptical of this whole interaction, which reduces the value of approaching me like this, and Alice should know this, which further reduces my estimate of Alice's other social options, which makes me even less interested in meeting Alice like this.

Carol might not think all of that consciously, but that's how human social reasoning tends to...

Assuming you're the first to explicitly point out that lemon market type of feature of 'random social interaction', kudos, I think it's a great way to express certain extremely common dynamics.

Anecdote from my country, where people ride trains all the time, fitting your description, although it takes a weird kind of extra 'excuse' in this case all the time: It would often feel weird to randomly talk to your seat neighbor, but ANY slightest excuse (sudden bump in the ride; info speaker malfunction; grumpy ticket collector; one weird word from a random perso... (read more)

11gjm8h
It looks to me as if, of the four "root causes of social relationships becoming more of a lemon market" listed in the OP, only one is actually anything to do with lemon-market-ness as such. The dynamic in a lemon market is that you have some initial fraction of lemons but it hardly matters what that is because the fraction of lemons quickly increases until there's nothing else, because buyers can't tell what they're getting. It's that last feature that makes the lemon market, not the initial fraction of lemons. And I think three of the four proposed "root causes" are about the initial fraction of lemons, not the difficulty of telling lemons from peaches. * urbanization: this one does seem to fit: it means that the people you're interacting with are much less likely to be ones you already know about, so you can't tell lemons from peaches. * drugs: this one is all about there being more lemons, because some people are addicts who just want to steal your stuff. * MLM schemes: again, this is "more lemons" rather than "less-discernible lemons". * screens: this is about raising the threshold below which any given potential interaction/relationship becomes a lemon (i.e., worse than the available alternative), so again it's "more lemons" not "less-discernible lemons". Note that I'm not saying that "drugs", "MLM", and "screens" aren't causes of increased social isolation, only that if they are the way they're doing it isn't quite by making social interactions more of a lemon market. (I think "screens" plausibly is a cause of increased social isolation. I'm not sure I buy that "drugs" and "MLM" are large enough effects to make much difference, but I could be convinced.) I like the "possible solutions" part of the article better than the section that tries to fit everything into the "lemon market" category, because it engages in more detail with the actual processes involved by actual considering possible scenarios in which acquaintances or friendships begin. When I th
3bhauth3h
You're mistaken about lemon markets: the initial fraction of lemons does matter. The number of lemon cars is fixed, and it imposes a sort of tax on transactions, but if that tax is low enough, it's still worth selling good cars. There's a threshold effect, a point at which most of the good items are suddenly driven out.
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

U.S. Secretary of Commerce Gina Raimondo announced today additional members of the executive leadership team of the U.S. AI Safety Institute (AISI), which is housed at the National Institute of Standards and Technology (NIST). Raimondo named Paul Christiano as Head of AI Safety, Adam Russell as Chief Vision Officer, Mara Campbell as Acting Chief Operating Officer and Chief of Staff, Rob Reich as Senior Advisor, and Mark Latonero as Head of International Engagement. They will join AISI Director Elizabeth Kelly and Chief Technology Officer Elham Tabassi, who were announced in February. The AISI was established within NIST at the direction of President Biden, including to support the responsibilities assigned to the Department of Commerce under the President’s landmark Executive Order.

Paul Christiano, Head of AI Safety, will design

...
2Davidmanheim39m
Having written extensively about it, I promise you I'm aware. But please, tell me more about how this supports the original claim which I have been disagreeing with, that these class of incidents were or are the primary concern of the EA biosecurity community, the one that led to it being a cause area.
2Adam Scholl16h
I mean, I'm sure something more restrictive is possible. But my issue with BSL levels isn't that they include too few BSL-type restrictions, it's that "lists of restrictions" are a poor way of managing risk when the attack surface is enormous. I'm sure someday we'll figure out how to gain this information in a safer way—e.g., by running simulations of GoF experiments instead of literally building the dangerous thing—but at present, the best available safeguards aren't sufficient. I'm confused why you find this underspecified. I just meant "gain of function" in the standard, common-use sense—e.g., that used in the 2014 ban on federal funding for such research.

I mean, I'm sure something more restrictive is possible. 

But what? Should we insist that the entire time someone's inside a BSL-4 lab, we have a second person who is an expert in biosafety visually monitoring them to ensure they don't make mistakes? Or should their air supply not use filters and completely safe PAPRs, and feed them outside air though a tube that restricts their ability to move around instead? 

Or do you have some new idea that isn't just a ban with more words?
 

"lists of restrictions" are a poor way of managing risk when the a

... (read more)

Epistemic – this post is more suitable for LW as it was 10 years ago

 

Thought experiment with curing a disease by forgetting

Imagine I have a bad but rare disease X. I may try to escape it in the following way:

1. I enter the blank state of mind and forget that I had X.

2. Now I in some sense merge with a very large number of my (semi)copies in parallel worlds who do the same. I will be in the same state of mind as other my copies, some of them have disease X, but most don’t.  

3. Now I can use self-sampling assumption for observer-moments (Strong SSA) and think that I am randomly selected from all these exactly the same observer-moments. 

4. Based on this, the chances that my next observer-moment after...

The "repeating" will not be repeating from internal point of view of a person, as he has completely erased the memories of the first attempt. So he will do it as if it is first time. 

2avturchin2h
Yes, here we can define magic as "ability to manipulate one's reference class". And special minds may be much more adapted to it.
2avturchin3h
Presumably in deep meditation people become disconnected from reality.
2Dagon2h
Only metaphorically, not really disconnected.  In truth, in deep meditation, the conscious attention is not focused on physical perceptions, but that mind is still contained in and part of the same reality. This may be the primary crux of my disagreement with the post.  People are part of reality, not just connected to it.  Dualism is false, there is no non-physical part of being.  The thing that has experiences, thoughts, and qualia is a bounded segment of the universe, not a thing separate or separable from it.

Note: It seems like great essays should go here and be fed through the standard LessWrong algorithm. There is possibly a copyright issue here, but we aren't making any money off it either. What follows is a full copy of "This is Water" by David Foster Wallace his 2005 commencement speech to the graduating class at Kenyon College.

Greetings parents and congratulations to Kenyon’s graduating class of 2005. There are these two young fish swimming along and they happen to meet an older fish swimming the other way, who nods at them and says “Morning, boys. How’s the water?” And the two young fish swim on for a bit, and then eventually one of them looks over at the other and goes “What the hell is water?”

This is...

mic41m10

If you worship money and things, if they are where you tap real meaning in life, then you will never have enough, never feel you have enough. It’s the truth.

Worship your impact and you will always you feel you are not doing enough.

3mic43m
I think I disagree with the first HN comment here. I personally find that my thoughts and actions have a significant influence over whether I am experiencing a positive or negative feeling. If I find that most times I go to the grocery store, I have profoundly negative thoughts about the people around me who are just doing normal things, probably I should figure out how to think more positively about the situation. Thinking positively isn't always possible, and in cases where you can't escape a negative feeling like sadness, sometimes it is best to accept the feeling and appreciate it for what it is. But I think it really is possible to transform your emotions through your thinking, rather than being helpless to a barrage of negative feelings.
2JenniferRM4h
I wonder what he would have thought was the downside of worshiping a longer list of things... For the things mentioned, it feels like he thinks "if you worship X then the absence of X will be constantly salient to you in most moments of your life". It seems like he claims that worshiping some version of Goodness won't eat you alive, but in my experiments with that, I've found that generic Goodness Entities are usually hungry for martyrs, and almost literally try to get would-be saints to "give their all" (in some sense "eating" them). As near as I can tell, it is an unkindness to exhort the rare sort of person who is actually self-editing and scrupulous enough to even understand or apply the injunction in that direction without combining it with an injunction that success in this direction will lead to altruistic self harm unless you make the demands of Goodness "compact" in some way. Zvi mentions ethics explicitly so I'm pretty sure readings of this sort are "intended". So consider (IF you've decided to try to worship an ethical entity) that one should eventually get ready to follow Zvi's advice in "Out To Get You" for formalized/externalized ethics itself so you can enforce some boundaries on whatever angel you summon (and remember, demons usually claim to be angels (and in the current zeitgeist it is SO WEIRD that so many "scientific rationalists" believe in demons without believing in angels as well)). Anyway. Compactification (which is possibly the same thing as "converting dangerous utility functions into safe formulas for satisficing"): For myself, I have so far found it much easier to worship wisdom than pure benevolence. Noticing ways that I am a fool is kinda funny. There are a lot of them! So many that patching each such gap would be an endless exercise! The wise thing, of course, would be to prioritize which foolishnesses are most prudent to patch, at which times. A nice thing here is that wisdom basically assimilates all valid criticism as helpful
2cousin_it9h
I think for good emotions the feel-it-completely thing happens naturally anyway.

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA