This post is a not a so secret analogy for the AI Alignment problem. Via a fictional dialog, Eliezer explores and counters common questions to the Rocket Alignment Problem as approached by the Mathematics of Intentional Rocketry Institute.
MIRI researchers will tell you they're worried that "right now, nobody can tell you how to point your rocket’s nose such that it goes to the moon, nor indeed any prespecified celestial destination."
Crosspost from my blog.
If you spend a lot of time in the blogosphere, you’ll find a great deal of people expressing contrarian views. If you hang out in the circles that I do, you’ll probably have heard of Yudkowsky say that dieting doesn’t really work, Guzey say that sleep is overrated, Hanson argue that medicine doesn’t improve health, various people argue for the lab leak, others argue for hereditarianism, Caplan argue that mental illness is mostly just aberrant preferences and education doesn’t work, and various other people expressing contrarian views. Often, very smart people—like Robin Hanson—will write long posts defending these views, other people will have criticisms, and it will all be such a tangled mess that you don’t really know what to think about them.
For...
The obsessive autists who have spent 10,000 hours researching the topic and writing boring articles in support of the mainstream position are left ignored.
It seems like you're spanning up three different categories of thinkers: Academics, public intellectuals, and "obsessive autists".
Notice that the examples you give overlap in those categories: Hanson and Caplan are academics (professors!), while the Natália Mendonça is not an academic, but is approaching being a public intellectual by now(?). Similarly, Scott Alexander strikes me as being in the "publ...
My understanding is that pilot wave theory (ie Bohmian mechanics) explains all the quantum physics with no weirdness like "superposition collapse" or "every particle interaction creates n parallel universes which never physically interfere with each other". It is not fully "local" but who cares?
Is there any reason at all to expect some kind of multiverse? Why is the multiverse idea still heavily referenced (eg in acausal trade posts)?
Edit April 11: I challenge the properly physics brained people here (I am myself just a Q poster) to prove my guess wrong: Can you get the Born rule with clean hands this way?
...They also implicitly claim that in order for the Born rule to work [under pilot wave], the particles have to start the sim following the psi^2 distribution.
The other problem is that MWI is up against various subjective and non-realist interpretations, so it's not it's not the case that you can build an ontological model of every interpretation.
The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.
But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.
Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.
To...
Counterfactual means, that if something would not have happened something else would have happened. It's a key concept in Judea Pearl's work on causality.
Summary: Evaluations provide crucial information to determine the safety of AI systems which might be deployed or (further) developed. These development and deployment decisions have important safety consequences, and therefore they require trustworthy information. One reason why evaluation results might be untrustworthy is sandbagging, which we define as strategic underperformance on an evaluation. The strategic nature can originate from the developer (developer sandbagging) and the AI system itself (AI system sandbagging). This post is an introduction to the problem of sandbagging.
There are environmental regulations which require the reduction of harmful emissions from diesel vehicles, with the goal of protecting public health and the environment. Volkswagen struggled to meet these emissions standards while maintaining the desired performance and fuel efficiency of their diesel engines (Wikipedia). Consequently, Volkswagen...
Epistemic – this post is more suitable for LW as it was 10 years ago
Thought experiment with curing a disease by forgetting
Imagine I have a bad but rare disease X. I may try to escape it in the following way:
1. I enter the blank state of mind and forget that I had X.
2. Now I in some sense merge with a very large number of my (semi)copies in parallel worlds who do the same. I will be in the same state of mind as other my copies, some of them have disease X, but most don’t.
3. Now I can use self-sampling assumption for observer-moments (Strong SSA) and think that I am randomly selected from all these exactly the same observer-moments.
4. Based on this, the chances that my next observer-moment after...
True. But for that you need there to exist another mind almost identical to yours except for that one thing.
In the question "how much of my memories can I delete while retaining my thread of subjective experience?" I don't expect there to be an objective answer.
About a year ago I decided to try using one of those apps where you tie your goals to some kind of financial penalty. The specific one I tried is Forfeit, which I liked the look of because it’s relatively simple, you set single tasks which you have to verify you have completed with a photo.
I’m generally pretty sceptical of productivity systems, tools for thought, mindset shifts, life hacks and so on. But this one I have found to be really shockingly effective, it has been about the biggest positive change to my life that I can remember. I feel like the category of things which benefit from careful planning and execution over time has completely opened up to me, whereas previously things like this would be largely down to the...
I know a child who often has this reaction to negative consequences, natural or imposed. I'd welcome discussion on what works well for that mindset. I don't have any insight, it's not how my mind works.
It seems like very very small consequences can help a bit. Also trying to address the anxiety with OTC supplements like Magnesium Glycinate and lavender oil.
My current main cruxes:
If there is reasonable consensus on any one of those, I'd much appreciate to know about it. Else, I think these should be research priorities.
In short: Training runs of large Machine Learning systems are likely to last less than 14-15 months. This is because longer runs will be outcompeted by runs that start later and therefore use better hardware and better algorithms. [Edited 2022/09/22 to fix an error in the hardware improvements + rising investments calculation]
Scenario | Longest training run |
Hardware improvements | 3.55 years |
Hardware improvements + Software improvements | 1.22 years |
Hardware improvements + Rising investments | 9.12 months |
Hardware improvements + Rising investments + Software improvements | 2.52 months |
Larger compute budgets and a better understanding of how to effectively use compute (through, for example, using scaling laws) are two major driving forces of progress in recent Machine Learning.
There are many ways to increase your effective compute budget: better hardware, rising investments in AI R&D and improvements in algorithmic efficiency. In this article...
This is actually corrected on the Epoch website but not here (https://epochai.org/blog/the-longest-training-run)