New Comment
2 comments, sorted by Click to highlight new comments since: Today at 3:56 PM

Person-affecting view

I want to see if I’m cut out to do nontrivial independent alignment research, and I want to get an answer as quickly as possible. The best way to do that is to waste everyone’s time publicly lay down my maps and hope that someone out there will feel white-hot intransigent rage at someone being wrong on the internet and correct me.

Why alignment research?

That is, why not work on anything else like biorisk? Or some other generic longtermist cause?

The God of Power Laws has decreed that most human endeavors be Pareto-distributed. In a high-dimensional world, that means most of the important work and most of the disproportionate outcomes will come from carving niches instead of just doggedly pursuing areas other people are already exploiting.

My physics + math + CS background might make me seem like I’m just another median LessWrong nerd, but I have the unique experience of having been particularly bad at them. But even though that is the case, I would like to think that I have this tendency to be wrong in interesting ways, mostly because I’m a wordcel who keeps clawing at shape rotating doors he shouldn’t be clawing at[1][2].

(The real reason is that a physics degree basically gives you all the math you need to start reading ML papers, and then some. Switching to biology might take years, even though I performed really well in my bio electives in uni. Plus, LessWrong has been my constant bedside reading since 2009, so like Ben Pace who knows me by another name.)

I hesitate to write all this because it’s deeply embarrassing. I was a pretty good clicker. I took the lesson of That Alien Message seriously and purposely read only a quarter of the Sequences so I can fill in the rest. When I was 12 I knew there was a nontrivial chance I wouldn’t get to live in a transhumanist spacefaring hypertechnological utopia because of our unchecked hubris. And yet, I spent the last fifteen years of my life optimising for my own happiness, learning random irrelevant things like music and storytelling, founding a VR company and failing hard, gaining people-skills and people-experiences far in excess of anyone working on alignment would ever need.

And yet, I ended up here, staring at the jaws of creation woefully unarmed.

Looking back, the biggest thing that turned me off the whole idea of becoming an AI safety researcher was to first-order everyone’s favourite cope i.e., that I wasn’t smart enough to meaningfully contribute to alignment. In my case, however, that hypothesis remains my leading candidate and not for a lack of trying (to refute its underlying generator, that no one who hasn’t IMO-level math skills is allowed to try). It’s just that I really, really find short-timeline arguments convincing to the point that I have stopped trying to optimise for raising genius children even if I have dreamed of doing it since I was in fourth grade.

Taking John Wentworth’s guide seriously means working with what I have, right now, instead of shoring up defenses around my weak spots. My impression is that there is a dangerous lack of wordcels in alignment research and thus the need for programs like the CAIS Philosophy Fellowship and PIBBS, and if that’s not the case then most of the marginal impact I would have had working on conceptual alignment directly will basically vanish. Of course, fleshing out exactly why I expect to be able to carve out a niche for myself in such a hotly-contested area should be done, but more on that later.

Why independent?

Mostly because of my visa situation. I have a particularly weak passport and some bad decisions I made in my younger years has made it difficult to rectify the situation[3]. In particular, the most viable way for me to get into Berkeley is to spend 2-3 years getting a master’s degree and using that to get residency in say, Canada, where it would be significantly easier for me to make trips southward. I think that’s time I could just spend working on alignment proper[4].

So this is my main physical constraint: what I must do, I can only do within the confines of Southeast Asia and the internet for the foreseeable future.

Q: It doesn’t sound too bad. I mean, most of the stuff you need is on the internet right?

Wrong. Conditional on The PhD Grind being an accurate look at academic research in general, and alignment work converging to similar patterns, anyone who isn’t physically present in the right offices are forever relegated to already-distilled, already-cleaned up versions of arguments and hypotheses. Most of research happens behind the scenes, out of the confines of PDFs and .edu webpages. No lunch breaks with colleagues means no Richard Hammings to fearlessly question your entire research agenda on a whim. No watercooler conversations mean you lose out on things like MATS.

Which also means you can pretty much avoid information cascades and are thus slightly better positioned to find novel research lines in a pre-paradigmatic field[5]. :P

Okay, I don’t think this is strictly the case. If I am unable to solve this geography problem within the next five years, I think my potential impact will be cut by at least half. No one can singlehandedly breed scenius. I and all the others like me are in a Milanese Leonardo situation, and unfortunate as it is it’s an inescapable test of agency that we must all pass one way or another. Either that, or figure out a way to escape the hard limit of presence.

Why nontrivial?

If I’m being honest, I would rather not work on this.

I think I speak for a lot of newcomers in this field when I say that, while thinking all day about Big Questions like what really goes on in a mind or how we can guarantee human flourishing sounds like a supremely attractive lifestyle, actually weighing it as an option versus all the other supremely attractive things in life is tough. Most of us can probably go on to do great things in other lines of work, and while funding in this space is still growing steadily there is a real chance that only a small minority of us will end up making the cut and actually make a living out of this.

A meme that’s currently doing the rounds in my circles is that there are only ~300 people working on AI safety at the moment. Taken at face value, that seems like a horrifyingly low number given the stakes we’re dealing with. But a cursory check on tells me there were only 611 scientists involved in both the US and German programs during the Manhattan Project. Sure, our researchers aren’t as highly selected as the 1927 Solvay Conference but do we really think that adding more people to the mix is the best way to climb the logistic success curve?

Here’s what I think: if you’re like me, then you’re doing this because the alternative is sickening. Decades of reading about smack-talking space detectives or wizards singlehandedly starting Industrial Revolutions are never gonna let us party while the rest of the world careens off a cliff. How can you? How can anyone? The only institution our civilisation has managed to produce that’s even remotely attempting to solve the problem is trying to justify to itself that we should take our time more than we already have. Who but the innocent can hide thine awful countenance?

If I decide against doing this after all, I would for the rest of my days stay up at night in cold sweat trying to convince myself I hadn’t consigned my loved ones to their early deaths. My only consolation is that, on the off-chance we win, then no one will die anymore and I could grapple with the shame of not having done anything to help until the last photon in the universe gets redshifted to oblivion.

And by the gods, I wish we’d all be around to see that.

  1. Last time I checked my quantitative ability lags behind my verbal ability by 2.5 σ. ↩︎

  2. I also spent the last three years hanging out with postrats on Twitter where terms like ‘wordcel’ and ‘shape rotator’ just float uncontested in the water supply. FYI they mean “high verbal skill” and “high math skill” respectively. Yes, IQ deltas are more important than subfactor differences, don’t @ me. ↩︎

  3. That is, I didn’t prioritise my GPA. I didn’t optimise for international competitions and/or model UN-type events, which would have given me access to a B1/B2 visa. This is not a justification but the main reason I only half-assedly tried to fix my situation was because I didn’t know I’d be doing this whole thing after all. Yes, I know I have developed an Ugh Field around this whole thing and part of working on it is publicly acknowledging it like what I’m doing here. ↩︎

  4. Okay, there are actually other paths but this one’s the surest. I could go join an SF startup and try my hand at the H1B lottery. I could give up on the US and optimise for London (but they have a similarly labyrinthian immigration process). There are several options, but they all take time and money and energy which I’d have to redirect from my actual work. The next-best choice really would be to ignore short timelines and just salarymaxx until I have enough money to bulldoze over the polite requirements of immigration bureaus. ↩︎

  5. We’re not really pre-paradigmatic so much as in a the-promising-paradigms-do-not-agree-with-each-other state, right? ↩︎

FYI they mean “high verbal skill” and “high math skill” respectively

I always parsed it as wordcel=insufficient training data to form connected mental representation, shape rotator=sufficient training data to form connected mental representation.

I would not call PIBBS an unconnected-words group.