This post is a not a so secret analogy for the AI Alignment problem. Via a fictional dialog, Eliezer explores and counters common questions to the Rocket Alignment Problem as approached by the Mathematics of Intentional Rocketry Institute.
MIRI researchers will tell you they're worried that "right now, nobody can tell you how to point your rocket’s nose such that it goes to the moon, nor indeed any prespecified celestial destination."
TL;DR:
Does buying shorter-term OTM derivatives each year not work here?
Specific examples would be nice. Not sure if I understand correctly, but I imagine something like this:
You always choose A over B. You have been doing it for such long time that you forgot why. Without reflecting about this directly, it just seems like there probably is a rational reason or something. But recently, either accidentally or by experiment, you chose B... and realized that experiencing B (or expecting to experience B) creates unpleasant emotions. So now you know that the emotions were the real cause of choosing A over B all that time.
(This is p...
About a year ago I decided to try using one of those apps where you tie your goals to some kind of financial penalty. The specific one I tried is Forfeit, which I liked the look of because it’s relatively simple, you set single tasks which you have to verify you have completed with a photo.
I’m generally pretty sceptical of productivity systems, tools for thought, mindset shifts, life hacks and so on. But this one I have found to be really shockingly effective, it has been about the biggest positive change to my life that I can remember. I feel like the category of things which benefit from careful planning and execution over time has completely opened up to me, whereas previously things like this would be largely down to the...
I like comments about other users' experiences for similar reasons why I like OP. I think maybe the ideal such comment would identify itself more clearly as an experience report, but I'd rather have the report than not.
We are trying our best to honor mana donations!
If you are inactive you have until the rest of the year to donate at the old rate. If you want to donate all your investments without having to sell each individually, we are offering you a loan to do that.
We removed the charity cap of $10k donations per month, which is going beyond what we previous communicated.
N.B. This is a chapter in a planned book about epistemology. Chapters are not necessarily released in order. If you read this, the most helpful comments would be on things you found confusing, things you felt were missing, threads that were hard to follow or seemed irrelevant, and otherwise mid to high level feedback about the content. When I publish I'll have an editor help me clean up the text further.
In the previous three chapters we broke apart our notions of truth and knowledge by uncovering the fundamental uncertainty contained within them. We then built back up a new understanding of how we're able to know the truth that accounts for our limited access to certainty. And while it's nice to have this better understanding, you might...
Author's note: This chapter took a really long time to write. Unlike previous chapters in the book, this one covers a lot more stuff in less detail, but I still needed to get the details right, so it took a long time to both figure out what I really wanted to say and to make sure I wasn't saying things that I wouldn't upon reflection regret having said because they were based on facts that I don't believe or I had simply gotten wrong.
It's likely still not the best version of this chapter it could be, but at this point I think I've made all the key points I wanted to make here, so I'm publishing the draft now and expect this one to need a lot of love from an editor later on.
The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.
But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.
Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.
To...
What you probably mean is "completely unexpected", "surprising" or something similar
I think it means the more specific "a discovery that if it counterfactually hadn't happened, wouldn't have happened for a long time". I think this is roughly the "counterfactual" in "counterfactual impact", but I agree not the more widespread one.
It would be great to have a single word for this that was clearer.
Noteably, the mainline approach for catching doesn't involve any internals usage at all, let alone labeling a bunch of things.
This was indeed my impression (except for potentially using steering vectors, which I think are mentioned in one of the sections in 'Catching AIs red-handed'), but I think not using any internals might be overconservative / might increase the monitoring / safety tax too much (I think this is probably true more broadly of the current control agenda framing).
Concerns over AI safety and calls for government control over the technology are highly correlated but they should not be.
There are two major forms of AI risk: misuse and misalignment. Misuse risks come from humans using AIs as tools in dangerous ways. Misalignment risks arise if AIs take their own actions at the expense of human interests.
Governments are poor stewards for both types of risk. Misuse regulation is like the regulation of any other technology. There are reasonable rules that the government might set, but omission bias and incentives to protect small but well organized groups at the expense of everyone else will lead to lots of costly ones too. Misalignment regulation is not in the Overton window for any government. Governments do not have strong incentives...
So, it sounds like you'd be in favor of a 1-year pause or slowdown then, but not a 10-year?
That depends on the benefits that we get from a 1-year pause. I'd be open to the policy, but I'm not currently convinced that the benefits would be large enough to justify the costs.
Also, I object to your side-swipe at longtermism
I didn't side-swipe at longtermism, or try to dunk on it. I think longtermism is a decent philosophy, and I consider myself a longtermist in the dictionary sense as you quoted. I was simply talking about people who aren't "fully committed" to the (strong) version of the philosophy.
I’m pretty new here so apologies if this is a stupid question or if it has been covered before. I couldn’t find anything on this topic so thought I’d ask the question before writing a full post on the idea.
If we believe that discomfort can be quantified and ‘stacked’ (e.g. X people with specks of dust in their eye = 1 death), is there any reason why this has to scale linearly from all perspectives?
What if the total can be less than the sum of its parts depending on the observer?
Picture a dynamic logarithmic scale of discomfort stacking with a ‘hard cap’ where every new instance contributes less and less to the total to the point of flatlining on a graph.
Each discrete level of discomfort has a...
Picture a dynamic logarithmic scale of discomfort stacking with a ‘hard cap’ where every new instance contributes less and less to the total to the point of flatlining on a graph.
Reality is structured such that there tend to be an endless number of (typically very complicated) ways of increasing a probability by a tiny amount. The problem with putting a hard cap on the desirability of some need or want is that the agent will completely disregard that need or want to affect the probability of a need or want that is not capped (e.g., the need to avoid people's being tortured) even if that effect is extremely small.