A freshman year during the AI midgame: my approach to the next year

[-]TekhneMakre3y3725

I think there's something wrong with your categories: they're all about social perception. There's some reason for these to be correlated with the reality, but not that strong of a reason. People can be confused in either direction about what sort of AI is coming soon, and confusing people's sense of what sort of AI is coming soon wich what actual AI is coming soon would suggest bad plans.

[-]Jonas Hallgren3y1616

I feel like this is trying to say something important but my brain isn't parsing it.

First and foremost, what categorisation are we talking about? Secondly, in what way are the categories framed in terms of social perception? Thirdly, what do you mean by direction and how does Buck confuse the direction?

(Sorry if this is obvious)

[-]TekhneMakre3y1612

Hopefully this isn't too rude to say, but: I am indeed confused how you could be confused. Maybe there's some mental block for you, which would be interesting. Anyway, to answer your questions:

First and foremost, what categorisation are we talking about?

The main categorization in the post, of course. Quoting:

I want to split the AI timeline into the following categories.

The early game, during which interest in AI is not mainstream. I think this ended within the last year The midgame, during which interest in AI is mainstream but before AGI is imminent. [....] The endgame, during which AI companies conceive of themselves as actively building models that will imminently be transformative, and that pose existential takeover risk. [...]

Your Q:

Secondly, in what way are the categories framed in terms of social perception?

AFAICT the only condition here that isn't about the stories people are telling, is in the midgame, "but before AGI is immiment". Everything else is "interest in...", "interest in...", "concieve of themselves...".

Thirdly, what do you mean by direction and how does Buck confuse the direction?

People can think AGI will come soon when it doesn't, or think it won't when it will, and this can happen for any value of "AGI". Buck seems to be making plans based on stages centered around social perception / narrative rather than what's actually happening in terms of what actual AI stuff there is (big piles of data and compute, algorithms, etc).

[-]Quadratic Reciprocity3y1412

Hopefully this isn't too rude to say, but: I am indeed confused how you could be confused

Fwiw, I was also confused and your comment makes a lot more sense now. I think it's just difficult to convert text into meaning sometimes.

[-]TekhneMakre3y91

Ok, thanks for the data, updating some.

[-]Buck3y8-21

This is a reasonable point. What I actually are about is reality, but I expect social reality to track reality fairly well on these points.

[-]ryan_greenblatt10mo253Review for 2023 Review

I still like this post overall, but various things have changed that interestingly affect the content of the post:

We'd now be just starting the 3rd year of university in Buck's analogy. Does this seem right? I guess maybe. It feels a bit late to me. (Maybe I feel more like a 2nd year.)
Redwood is doing less blue-sky research and is much more focused on how to make very straightforward strategies work well, particularly in the context of control. We're also spending more time thinking about exactly what will and should be implemented and what overall plans should look like. We're also relatively more excited about currently working on improving society's understanding of risks with relatively-naturalistic model organisms and capability demos (or demonstrating negative results here).
4 years (from the time of this comment being written) is now pretty close to my median for "models have been built which are pretty clearly transformative or at least nearly there". I also put substantially more weight on stuff getting crazy this year or next year. This is partially due to the passage of time and partially due to updating toward shorter timelines. So, I'm less sure that new people joining the field should relate to the situation as a "freshman" (in the analogy Buck proposes).
Redwood is interacting with AI companies substantially more, though mostly from the perspective of advising on policies and pitching research, rather than on helping with implementation.

Feeling the time pressure of short timelines seems more relevant now than ever before, so the advice in this post about taking a more measured approach and feeling less rushed seems quite relevant, at least as long as transformative AI still seems to most likely be more than 4 years away.

[-]Adam Kaufman3y131

I am a literal freshman, and not feeling super optimistic about the future right now. How should I think about how to spend my time?

[-]Greg C3y52

Advocate for a global moratorium on AGI. Try and buy (us all) more time. Learn the basics of AGI safety (e.g. AGI Safety Fundamentals) so you are able to discuss the reasons why we need a moratorium in detail. YMMV, but this is what I'm doing as a financially independent 42 year-old. I feel increasingly like all my other work is basically just rearranging deckchairs on the Titanic.

[-]Matthew_Opitz3y40

In a similar vein, I'm an historian who teaches as an adjunct instructor. While I like my job, I am feeling more and more like I might not be able to count on this profession to make a living over the long term due to LLMs making a lot of the "bottom-rung" work in the social sciences redundant. (There will continue to be demand for top-notch research work for a while longer because LLMs aren't quite up to that yet, but that's not what I do currently).

Would there be any point in someone like me going back to college to get another 4-year degree in computer science at this moment? Or is that field just as at-risk of being made technologically-obsolete (especially the bottom rungs of the ladder)? Perhaps I should remain as an historian where, since I have about 10 years of experience in that field, I'm at least on the middle rungs of the ladder and might escape technological obsolescence if AGI gobbles up the bottom rungs.

And let's say I did get a computer science degree, or even did some sort of more-focused coding boot camp type of thing. By the time I finished my training, would my learning even remain relevant, or are things already moving too quickly to make bottom-rung coding knowledge useful?

Let's say I didn't care about making a living and just wanted to maximize my contributions to AI alignment. Would I be of more use to AI alignment by continuing my "general well-rounded public intellectual education" as an historian (especially one who dabbles in adjacent fields like economics and philosophy probably more than average), or would I be able to make greater contributions to AI alignment by becoming more technically proficient in computer science?

[-]Jayson_Virissimo3y3-3

FWIW, if my kids were freshmen at a top college, I would advise them to continue schooling, but switch to CS and take every AI-related course that was available if they hasn't already done so.

[-]jacquesthibs3y41

Regarding thinking about what to do in the endgame:

Having a bunch of practice at thinking about AI alignment in principle, which might be really useful for answering difficult-to-empirically-resolve questions about the AIs being trained.

Being well-prepared to use AI cognitive labor to do something useful, by knowing a lot about some research topic that we end up wanting to put lots of AI labor into. Maybe you could call this “preparing to be a research lead for a research group made up of AIs”. Or “preparing to be good at consuming AI research labor”.

That nicely put into words how I’m partially planning my “accelerating alignment with language models” agenda. I hope to come up with something that allows all alignment researchers to do the above with minimal friction and set up, and obvious benefit.

[-]Quadratic Reciprocity3y32

Thanks for posting this. It's insightful reading other people thinking through career/life planning of this type.

Am curious about how you feel about the general state of the alignment community going into the midgame. Are there things you hoped you/alignment community had more of / achievable things that could have been different by the time the early game ended that would have been nice?

"I have a crazy take that the kind of reasoning that is done in generative modeling has a bunch of things in common with the kind of reasoning that is valuable when developing algorithms for AI alignment"

Cool!!

[-]Review Bot2y*10

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

The AI midgame

I want to split the AI timeline into the following categories.

The early game, during which interest in AI is not mainstream. I think this ended within the last year 😢

The midgame, during which interest in AI is mainstream but before AGI is imminent. During the midgame:

The AI companies are building AIs that they don’t expect will be transformative.
The alignment work we do is largely practice for alignment work later, rather than an attempt to build AIs that we can get useful cognitive labor from without them staging coups.
For the purpose of planning my life, I’m going to imagine this as lasting four more years. This is shorter than my median estimate of how long this phase will actually last.

The endgame, during which AI companies conceive of themselves as actively building models that will imminently be transformative, and that pose existential takeover risk.

During the endgame, I think that we shouldn’t count on having time to develop fundamentally new alignment insights or techniques (except maybe if AIs do most of the work? Idt we should count on this); we should be planning to mostly just execute on alignment techniques that involve ingredients that seem immediately applicable.
For the purpose of planning my life, I’m going to imagine this as lasting three years. This is about as long as I expect this phase to actually take.

I think this division matters because several aspects of my current work seem like they’re optimized for midgame, and I should plausibly do something very differently in the endgame. Features of my current life that should plausibly change in the endgame:

I'm doing blue-sky alignment research into novel alignment techniques–during the endgame, it might be too late to do this.

I'm working at an independent alignment org and not interacting with labs that much. During the endgame, I probably either want to be working at a lab or doing something else that involves interacting with labs a lot. (I feel pretty uncertain about whether Redwood should dissolve during the AI endgame.)

I spend a lot of my time constructing alignment cases that I think analogous to difficulties that we expect to face later. During the endgame, you probably have access to the strategy “observe/construct alignment cases that are obviously scary in the models you have”, which seems like it partially obseletes this workflow.

Doing research that is practice rather than an actual attempt at aligning models or safely extracting cognitive labor from them. Some of the work I expect to want takeover-concerned people do during the endgame is probably very practical/empirical. But I expect us to also want to do some difficult-to-empirically-ground work to answer questions like “How could this particular model be scarily misaligned? How might our alignment strategy have failed such that this particular model will try to kill us?”

One core question here is: How is my impact distributed between work I do in the midgame vs the endgame? (As in, how much of my career value do I expect to lose if I suddenly die at the end of the midgame?)

Midgame impact:

The main mechanism here is that I think I (as part of Redwood) have a shot at developing alignment techniques (or other knowledge) that make a serious difference to the alignment plan, that can then be adopted by AI labs with no further actions from me/Redwood.
Secondarily, I think we have a shot at developing an alignment research methodology and/or organizational structure that allows Redwood and maybe the broader alignment community to do much more good work during the midgame.
Other midgame impacts I can have:
- helping other people to gain skills such that they have a really big impact in the endgame.
- setting up an organization which lasts into the endgame and is really productive at doing something crucial.

Endgame impact:

I have been assuming that most of my impact was in the midgame, but after thinking about it more over the last few weeks I’m now less sold on that.
Mechanisms by which Redwood-style backchained alignment researchers might have a lot of impact in the endgame:
- Being an expert on some particular thing which is extremely useful when trying to align AIs or safely extract cognitive labor from them. When I’m thinking about this, I’m mostly trying to imagine the head of the alignment department at an AI lab being ecstatically delighted to be able to hire me, and saying “I am so glad we are able to work with Buck; it’s so incredibly convenient that he has been obsessively focused on this topic for the last four years and he’s already understood a bunch of the ins and outs; if it wasn’t for him we’d be yoloing stuff way more.”
  - This could be a class of techniques, e.g. interpretability or MAD.
  - This could be a certain aspect of the alignment situation, from a more technique-agnostic perspective. For example, it seems pretty plausible to me that when we’re using AI to make lots of autonomous decisions, we’ll have some limited budget to oversee these decisions such that we can afford to oversee like 0.1% of them, and we’d really like to be smart about which 0.1% to review so that we maximize the probability of catching bad actions. There are lots of different approaches you might use here; e.g. you probably want to try some kind of anomaly detection, and you probably also want to concentrate oversight on examples where the AI action was higher stakes. If I/some Redwood people obsessively focused on the version of this problem that we expect to happen with TAI, I can imagine us being incredible hires due to having so much relevant experience and cached thinking.
    - As another example, if I decided to just focus on anti-adversarial training, it’s easy to imagine being way more familiar with that problem than anyone at an AI lab.
- Having a bunch of practice at thinking about AI alignment in principle, which might be really useful for answering difficult-to-empirically-resolve questions about the AIs being trained.
- Being well-prepared to use AI cognitive labor to do something useful, by knowing a lot about some research topic that we end up wanting to put lots of AI labor into. Maybe you could call this “preparing to be a research lead for a research group made up of AIs”. Or “preparing to be good at consuming AI research labor”.

Pacing: a freshman year

I think I want to treat my next year with the pacing of a freshman year in a US undergrad degree, for someone who wants to go into startups and thinks there’s some chance that they’ll want to graduate college early. I think that people going into their freshman year should be thinking a little bit about what they want to do after college. They should understand things that they need to do during college in order to be set up well for their post-college activities (e.g. they probably want to do some research as an undergrad, and they probably need to eventually learn various math). But meeting those requirements probably isn’t going to be where most of their attention goes.

Similarly, I think that I should be thinking a bit about my AI endgame plans, and make sure that I’m not failing to do fairly cheap things that will set me up for a much better position in the endgame. But I should mostly be focusing on succeeding during the midgame (at some combination of doing valuable research and at becoming an expert in topics that will be extremely valuable during the endgame).

When you’re a freshman, you probably shouldn’t feel like you’re sprinting all the time. You should probably believe that skilling up can pay off over the course of your degree. Every month is about 2% of your degree.

I think that this is how I want to feel. In a certain sense, four years is a really long time. I spent a reasonable amount of the last year feeling kind of exhausted and wrecked and rushed, and my guess is that this was net bad for my productivity. I think I should feel like there is real urgency, but also real amounts of space to learn and grow and play.

I went back and forth a lot on how I wanted to set up this metaphor; in particular, I was pretty tempted to suggest that I should think of this as a sophomore year rather than a freshman year. I think that freshmen should usually mostly ignore questions about career planning, whereas I think I should e.g. spend at least some time talking to labs about the possibility of them working with me/Redwood in various ways. I ended up choosing freshman rather than sophomore because I think that 3 years is less reasonable than 4.

And so, my plan is something like:

Put a bit of work into setting up my AI endgame plans.

E.g. talk to some people who are at labs and make sure they don’t think that my vague aspirations here are insane. I’m interested in more suggestions along these lines.
I think that if I feel more like I’ve deliberated once about this, I’ll find it easier to pursue my short-term plans wholeheartedly.

Mostly (like with 70% of my effort), push hard on succeeding at my midgame plans.

Spend about 20% of my effort on learning things that don’t have immediate benefits.

For example, I’ve spent some time over the last few weeks learning about generative modeling, and I plan to continue studying this. I have a few motivations here:
- Firstly, I think it’s pretty healthy for me to know more about how ML progress tends to happen, and I feel much more excited about this subfield of ML than most subfields of ML. I feel intuitively really impressed and admiring of the researchers in this field, and it seems healthy for me to have a research field with researchers who I look up to and who I wholeheartedly believe I can learn a lot from.
- Secondly, I have a crazy take that the kind of reasoning that is done in generative modeling has a bunch of things in common with the kind of reasoning that is valuable when developing algorithms for AI alignment.

LESSWRONG
LW

LESSWRONG
LW

154

A freshman year during the AI midgame: my approach to the next year

154

154

The AI midgame

Pacing: a freshman year