Traditional economics thinking has two strong principles, each based on abundant historical data:
The context was: Principle (A) makes a prediction (“…human labor will retain a well-paying niche…”), and Principle (B) makes a contradictory prediction (“…human labor…will become so devalued that we won’t be able to earn enough money to afford to eat…”).
Obviously, at least one of those predictions is wrong. That’s what I said in the post.
So, which one is wrong? I wrote: “I have opinions, but that’s out-of-scope for this little post.” But since you’re asking, I actually agree with you! E.g. footnote here:
...“But what about comparative advantage?” you say. Well
My median expectation is that AGI[1] will be created 3 years from now. This has implications on how to behave, and I will share some useful thoughts I and others have had on how to orient to short timelines.
I’ve led multiple small workshops on orienting to short AGI timelines and compiled the wisdom of around 50 participants (but mostly my thoughts) here. I’ve also participated in multiple short-timelines AGI wargames and co-led one wargame.
This post will assume median AGI timelines of 2027 and will not spend time arguing for this point. Instead, I focus on what the implications of 3 year timelines would be.
I didn’t update much on o3 (as my timelines were already short) but I imagine some readers did and might feel disoriented now. I hope...
Nuclear warnings have been overused a little by some actors in the past, such that there's a credible risk of someone calling the bluff and continuing research in secrecy, knowing that they will certainly get another warning first, and not immediately a nuclear response.
If you have intelligence that indicates secret ASI research but the other party denies, at which point do you fire the nukes?
I expect they would be fired too late, with many months of final warnings before.
In this post we’ll be looking at the recent paper “Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks” by He et al. This post is partially a sequel to my earlier post on grammars and subgrammars, though it can be read independently. There will be a more technical part II.
I really like this paper. I tend to be pretty picky about papers, and find something to complain about in most of them (this will probably come up in future). I don’t have nitpicks about this paper. Every question that came up as I was reading and understanding this paper (other than questions that would require a significantly different or larger experiment, or a different slant of analysis) turned out to be answered in...
Doomimir: Humanity has made no progress on the alignment problem. Not only do we have no clue how to align a powerful optimizer to our "true" values, we don't even know how to make AI "corrigible"—willing to let us correct it. Meanwhile, capabilities continue to advance by leaps and bounds. All is lost.
Simplicia: Why, Doomimir Doomovitch, you're such a sourpuss! It should be clear by now that advances in "alignment"—getting machines to behave in accordance with human values and intent—aren't cleanly separable from the "capabilities" advances you decry. Indeed, here's an example of GPT-4 being corrigible to me just now in the OpenAI Playground:
Doomimir: Simplicia Optimistovna, you cannot be serious!
Simplicia: Why not?
Doomimir: The alignment problem was never about superintelligence failing to understand human values. The genie knows,...
And yet it behaves remarkably sensibly. Train a one-layer transformer on 80% of possible addition-mod-59 problems, and it learns one of two modular addition algorithms, which perform correctly on the remaining validation set. It's not a priori obvious that it would work that way! There are other possible functions on compatible with the training data.
Seems like Simplicia is missing the worrisome part--it's not that the AI will learn a more complex algorithm which is still compatible with the training data; it's that the simple...
In this short post we'll discuss fine-grained variants of the law of large numbers beyond the central limit theorem. In particular we'll introduce cumulants as a crucial (and very nice) invariant of probability distributions to track. We'll also briefly discusses parallels with physics. This post should be interesting on its own, but the reason I'm writing it is that this story contains a central idea for (one point of view) on a certain exciting physics-inspired point of view on neural nets. While the point of view has so far been explained in somewhat sophisticated physics language (involving quantum fields and Feynman diagrams), the main points can be explained without any physics background, purely in terms of statistics. Introducing this "more elementary" view on the subject is one...
Q: How can I use LaTeX in these comments? I tried to follow https://www.lesswrong.com/tag/guide-to-the-lesswrong-editor#LaTeX but it does not seem to render.
Here is the simplest case I know, which is a sum of dependent identically distributed variables. In physical terms, it is about the magnetisation of the 1d Curie-Weiss (=mean-field Ising) model. I follow the notation of the paper https://arxiv.org/abs/1409.2849 for ease of reference, this is roughly Theorem 8 + Theorem 10:
Let $M_n=\sum_{i=1}^n \sigma(i)$ be the sum of n dependent Bernouilli rando...
Epistemic status -- sharing rough notes on an important topic because I don't think I'll have a chance to clean them up soon.
Suppose a human used AI to take over the world. Would this be worse than AI taking over? I think plausibly:
I don't think that the current Claude would act badly if it "thought" it controlled the world - it would probably still play the role of the nice character that is defined in the prompt
If someone plays a particular role in every relevant circumstance, then I think it's OK to say that they have simply become the role they play. That is simply their identity; it's not merely a role if they never take off the mask. The alternative view here doesn't seem to have any empirical consequences: what would it mean to be separate from a role that one reliably plays i...
We have contact details and can send emails to 1500 students and former students who've received hard-cover copies of HPMOR (and possibly Human Compatible and/or The Precipice) because they've won international or Russian olympiads in maths, computer science, physics, biology, or chemistry.
This includes over 60 IMO and IOI medalists.
This is a pool of potentially extremely talented people, many of whom have read HPMOR.
I don't have the time to do anything with them, and people in the Russian-speaking EA community are all busy with other things.
The only thing that ever happened was an email sent to some kids still in high school about the Atlas Fellowship, and a couple of them became fellows.
I think it could be very valuable to alignment-pill these people. I think for most researchers...
Probably less efficient than other uses and is in the direction of spamming people with these books. If they’re everywhere, I might be less interested if someone offers to give them to me because I won a math competition.
This post is to record the state of my thinking at the start of 2025. I plan to update these reflections in 6-12 months depending on how much changes in the field of AI.
It is best not to pause AI progress until at least one major AI lab achieves a system capable of providing approximately a 10x productivity boost for AI research, including performing almost all tasks of an AI researcher. Extending the time we remain in such a state is critical for ensuring positive outcomes.
If it was possible to stop AI progress sometime before that and focus just on mind uploading, that would be preferable, however I don’t think that is feasible in the current world. Alignment work before such a state suffers from diminishing...
I think the problem with WBE is that anyone who owns a computer and can decently hide it (or fly off in a spaceship with it) becomes able to own slaves, torture them and whatnot. So after that technology appears, we need some very strong oversight - it becomes almost mandatory to have a friendly AI watching over everything.
This is a low-effort post. I mostly want to get other people’s takes and express concern about the lack of detailed and publicly available plans so far. This post reflects my personal opinion and not necessarily that of other members of Apollo Research. I’d like to thank Ryan Greenblatt, Bronson Schoen, Josh Clymer, Buck Shlegeris, Dan Braun, Mikita Balesni, Jérémy Scheurer, and Cody Rushing for comments and discussion.
I think short timelines, e.g. AIs that can replace a top researcher at an AGI lab without losses in capabilities by 2027, are plausible. Some people have posted ideas on what a reasonable plan to reduce AI risk for such timelines might look like (e.g. Sam Bowman’s checklist, or Holden Karnofsky’s list in his 2022 nearcast), but I find them insufficient for...
If I had more time, I would have written a shorter post ;)