This post is a not a so secret analogy for the AI Alignment problem. Via a fictional dialog, Eliezer explores and counters common questions to the Rocket Alignment Problem as approached by the Mathematics of Intentional Rocketry Institute.
MIRI researchers will tell you they're worried that "right now, nobody can tell you how to point your rocket’s nose such that it goes to the moon, nor indeed any prespecified celestial destination."
Crosspost from my blog.
If you spend a lot of time in the blogosphere, you’ll find a great deal of people expressing contrarian views. If you hang out in the circles that I do, you’ll probably have heard of Yudkowsky say that dieting doesn’t really work, Guzey say that sleep is overrated, Hanson argue that medicine doesn’t improve health, various people argue for the lab leak, others argue for hereditarianism, Caplan argue that mental illness is mostly just aberrant preferences and education doesn’t work, and various other people expressing contrarian views. Often, very smart people—like Robin Hanson—will write long posts defending these views, other people will have criticisms, and it will all be such a tangled mess that you don’t really know what to think about them.
For...
Similarly, now that I’ve read through Scott’s response to Hanson on medicine, I’d bet at upwards of 9 to 1 odds that Hanson is wrong about it.
I'm broadly sympathetic to this post. I think a lot of people adjacent to the LessWrong cluster tend to believe contrarian claims on the basis of flimsy evidence. That said, I am fairly confident that Scott Alexander misrepresented Robin Hanson's position on medicine in that post, as I pointed out in my comment here. So, I'd urge you not to update too far on this particular question, at least until Hanson has respond...
About a year ago I decided to try using one of those apps where you tie your goals to some kind of financial penalty. The specific one I tried is Forfeit, which I liked the look of because it’s relatively simple, you set single tasks which you have to verify you have completed with a photo.
I’m generally pretty sceptical of productivity systems, tools for thought, mindset shifts, life hacks and so on. But this one I have found to be really shockingly effective, it has been about the biggest positive change to my life that I can remember. I feel like the category of things which benefit from careful planning and execution over time has completely opened up to me, whereas previously things like this would be largely down to the...
My depression is currently well-controlled at the moment, and I actually have found various methods to help me get things done, since I don't respond well to the simplest versions of carrot-and-stick methods. The most pleasant is finding someone else to do it with me (or at least act involved while I do the actual work).
On the other hand, there have been times when procrastinating actually gives me a thrill, like I'm getting away with something. Mediocre video games become much more appealing when I have work to avoid.
U.S. Secretary of Commerce Gina Raimondo announced today additional members of the executive leadership team of the U.S. AI Safety Institute (AISI), which is housed at the National Institute of Standards and Technology (NIST). Raimondo named Paul Christiano as Head of AI Safety, Adam Russell as Chief Vision Officer, Mara Campbell as Acting Chief Operating Officer and Chief of Staff, Rob Reich as Senior Advisor, and Mark Latonero as Head of International Engagement. They will join AISI Director Elizabeth Kelly and Chief Technology Officer Elham Tabassi, who were announced in February. The AISI was established within NIST at the direction of President Biden, including to support the responsibilities assigned to the Department of Commerce under the President’s landmark Executive Order.
...Paul Christiano, Head of AI Safety, will design
I agree there other problems the EA biosecurity community focuses on, but surely lab escapes are one of those problems, and part of the reason we need biosecurity measures? In any case, this disagreement seems beside the main point that I took Adam to be making, namely that the track record for defining appropriate units of risk for poorly understood, high attack surface domains is quite bad (as with BSL). This still seems true to me.
The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.
But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.
Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.
To...
The RLCT = first-order term for in-distribution generalization error
Clarification: The 'derivation' for how the RLCT predicts generalization error IIRC goes through the same flavour of argument as the one the derivation of the vanilla Bayesian Information Criterion uses. I don't like this derivation very much. See e.g. this one on Wikipedia.
So what it's actually showing is just that:
Post for a somewhat more general audience than the modal LessWrong reader, but gets at my actual thoughts on the topic.
In 2018 OpenAI defeated the world champions of Dota 2, a major esports game. This was hot on the heels of DeepMind’s AlphaGo performance against Lee Sedol in 2016, achieving superhuman Go performance way before anyone thought that might happen. AI benchmarks were being cleared at a pace which felt breathtaking at the time, papers were proudly published, and ML tools like Tensorflow (released in 2015) were coming online. To people already interested in AI, it was an exciting era. To everyone else, the world was unchanged.
Now Saturday Night Live sketches use sober discussions of AI risk as the backdrop for their actual jokes, there are hundreds...
I don't believe that data is limiting because the finite data argument only applies to pretraining. Models can do self-critique or be objectively rated on their ability to perform tasks, and trained via RL. This is how humans learn, so it is possible to be very sample-efficient, and currently a small proportion of training compute is RL.
If the majority of training compute and data are outcome-based RL, it is not clear that the "Playing human roles is pretty human" section holds, because the system is not primarily trained to play human roles.
This comes from a podcast called 18Forty, of which the main demographic of Orthodox Jews. Eliezer's sister (Hannah) came on and talked about her Sheva Brachos, which is essentially the marriage ceremony in Orthodox Judaism. People here have likely not seen it, and I thought it was quite funny, so here it is:
https://18forty.org/podcast/channah-cohen-the-crisis-of-experience/
David Bashevkin:
So I want to shift now and I want to talk about something that full disclosure, we recorded this once before and you had major hesitation for obvious reasons. It’s very sensitive what we’re going to talk about right now, but really for something much broader, not just because it’s a sensitive personal subject, but I think your hesitation has to do with what does this have to do with the subject at hand?...
From the title I expected this to be embarrassing for Eliezer, but that was actually extremely sweet, and good advice!
Are people in rich countries happier on average than people in poor countries? (According to GPT-4, the academic consensus is that it does, but I'm not sure it's representing it correctly.) If so, why do suicide rates increase (or is that a false positive)? Does the mean of the distribution go up while the tails don't or something?
The next monthly discussion meetup is Saturday, May 4 @ 2 PM (see below for location etc.). For this meetup, we’ll be discussing the relationship between social media and mental health, with a focus on arguments by psychologist Jonathan Haidt (particularly from his new book) and some criticisms of his interpretation of the evidence. There are readings, a podcast, and a video below, BUT feel free to come whether or not you’ve reviewed any of it, as always. (NOTE: The June topic/discussion meetup will be moved to Saturday, June 8 @ 2 PM)
People have been posting great essays so that they're "fed through the standard LessWrong algorithm." This essay is in the public domain in the UK but not the US.
From a very early age, perhaps the age of five or six, I knew that when I grew up I should be a writer. Between the ages of about seventeen and twenty-four I tried to abandon this idea, but I did so with the consciousness that I was outraging my true nature and that sooner or later I should have to settle down and write books.
I was the middle child of three, but there was a gap of five years on either side, and I barely saw my father before I was eight. For this and other reasons I...
The theories are probably just rationalizations anyway.