The other day I was speaking to one of the most productive people I’d ever met.[1] He was one of the top people in a very competitive field who was currently single-handedly performing the work of a team of brilliant programmers. He needed to find a spot to do some work, so I offered to help him find a desk with a monitor. But he said he generally liked working from his laptop on a couch, and he felt he was “only 10% slower” without a monitor anyway.
I was aghast. I’d been trying to optimize my productivity for years. A 10% productivity boost was a lot! Those things compound! How was this man, one of the most productive people I’d ever met, shrugging it off like it was nothing?
I think this nonchalant attitude towards productivity is fairly common in top researchers (though perhaps less so in top executives?). I have no idea why some people are so much more productive than others. It surprises me that so much variance is even possible.
This guy was smart, but I know plenty of people as smart as him who are far less productive. He was hardworking, but not insanely so. He wasn’t aggressively optimizing his productivity.[2] He wasn't that old so it couldn't just be experience. Probably part of it was luck, but he had enough different claims to fame that that couldn’t be the whole picture.
If I had to chalk it up to something, I guess I'd call it skill and “research taste”: he had a great ability to identify promising research directions and follow them (and he could just execute end-to-end on his ideas without getting lost or daunted, but I know how to train that).
I want to learn this skill, but I have no idea how to do it and I'm still not totally sure it's real. Conducting research obviously helps, but that takes time and is clearly not sufficient. Maybe I should talk to a bunch of researchers and try to predict the results of their work?
Has anyone reading this ever successfully cultivated an uncanny ability to identify great research directions? How did you do it? What sub-skills does it require?
Am I missing some other secret sauce that lets some people produce wildly more valuable research than others?
Measured by more conventional means, not by positive impact on the long-term future; that's dominated by other people. Making sure your work truly steers at solving the world's biggest problems still seems like the best way to increase the value you produce, if you're into that sort of thing. But I think this person's abilities would multiply/complement any benefits from steering towards the most impactful problems.
Or maybe he was but there are so many 2x boosts the 10% ones aren’t worth worrying about?
Fair enough. This doesn't seem central to my point so I don't really want to go down a rabbit-hole here. As I said originally "I’m picking this example not because it’s the best analysis of its kind, but because it’s the sort of analysis I think people should be doing all the time and should be practiced at, and I think it's very reasonable to produce things of this quality fairly regularly." I know this particular analysis surfaced some useful considerations others' hadn't thought of, and I learned things from reading it.
I also suspect you dislike the original analysis for reasons that stem from deep-seated worldview disagreements with Eric, not because the methodology is flawed.
The advice and techniques from the rationality community seem to work well at avoiding a specific type of high-level mistake: they help you notice weird ideas that might otherwise get dismissed and take them seriously. Things like AI being on a trajectory to automate all intellectual labor and perhaps take over the world, animal suffering, longevity, cryonics. The list goes on.
This is a very valuable skill and causes people to do things like pivot their careers to areas that are ten times better. But once you’ve had your ~3-5 revelations, I think the value of these techniques can diminish a lot.[1]
Yet a lot of the rationality community’s techniques and culture seem oriented around this one idea, even on small scales: people pride themselves on being relentlessly truth-seeking and willing to consider possibilities they flinch away from.
On the margin, I think the rationality community should put more empasis on skills like:
Performing simple cost-effectiveness estimates accurately
I think very few people in the community could put together an analysis like this one from Eric Neyman on the value of a particular donation opportunity (see the section “Comparison to non-AI safety opportunities”). I’m picking this example not because it’s the best analysis of its kind, but because it’s the sort of analysis I think people should be doing all the time and should be practiced at, and I think it's very reasonable to produce things of this quality fairly regularly.
When people do practice this kind of analysis, I notice they focus on Fermi estimates where they get good at making extremely simple models and memorizing various numbers. (My friend’s Anki deck includes things like the density of typical continental crust, the dimensions of a city block next to his office, the glide ratio of a hang glider, the amount of time since the last glacial maximum, and the fraction of babies in the US that are twins).
I think being able to produce specific models over the course of a few hours (where you can look up the glide ratio of a hang glider if you need it) is more neglected but very useful (when it really counts, you can toss the back of the napkin and use a whiteboard).
Simply noticing something might be a big deal is only the first step! You need to decide if it’s worth taking action (how big a deal is it exactly?) and what action to take (what are the costs and benefits of each option?). Sometimes it’s obvious, but often it isn’t, and these analyses are the best way I know of to improve at this, other than “have good judgement magically” or “gain life experience”.
Articulating all the assumptions underlying an argument
A lot of the reasoning I see on LessWrong feels “hand-wavy”: it makes many assumptions that it doesn’t spell out. That kind of reasoning can be valuable: often good arguments start as hazy intuitions. Plus many good ideas are never written up at all and I don’t want to make the standards impenetrably high. But I wish people recognized this shortcoming and tried to remedy it more often.
By "articulating assumptions” I mean outlining the core dynamics at play that seem important, the ways you think these dynamics work, and the many other complexities you’re ignoring in your simple model. I don’t mean trying to compress a bunch of Bayesian beliefs into propositional logic.
Contact with reality
It’s really really powerful to look at things directly (read data, talk to users, etc), design and run experiments, and do things in the world to gain experience.
Everyone already knows this, empiricism is literally a virtue of rationality. But I don’t see people employing it as much as they should be. If you’re worried about AI risk, talk to the models! Read raw transcripts!
Scholarship
Another virtue of rationality. It's in the sequences, just not as present in the culture as you might expect. Almost nobody I know reads enough. I started a journal club at my company and after nearly every meeting folks tells me how useful it is. I so often see so much work that would be much better if the authors engaged with the literature a little more. Of course YMMV depending on the field you’re in; some literature isn't worth engaging with.
Being overall skilled and knowledgeable and able to execute on things in the real world
Maybe this doesn’t count as a rationality skill per-se, but I think the meta skill of sitting down and learning stuff and getting good at it is important. In practice the average person reading this short form would probably be more effective if they spent their energy developing whatever specific concrete skills and knowledge were most blocking them.
This list is far from complete.[2] I just wanted to gesture at the general dynamic.
They’re still useful. I could rattle off a half-dozen times this mindset let me notice something the people around me were missing and spring into action.
I especially think there's some skill that separates people with great research taste from people with poor research taste that might be crucial, but I don't really know what it is well enough to capture it here.
Sorry this is what I meant, you're right.
I think the canary string is slightly more likely to work than the robots.txt. And the things you link don't say which labs. I totally believe some labs scrape aggressively and ignore all robots.txt, but it wouldn't surprise me if others don't! In my opinion, if even one frontier lab doesn't train on this data, that's still quite valuable. And it seems so cheap to just try this as a bit of a Hail Mary.
Absolutely! But having models that don't know how to effectively scheme without access to external resources is still very useful. As just one example, we could interrogate the models and give them honeypots without giving them address to the internet.
Labs could also ban some of their models from reading certain webpages or ingesting certain information (as in filter at inference time all inputs to the model) in deployment (especially for models performing especially high-stakes tasks where sabotage would be easier).
I also suspect digesting a LessWrong post the model reads in deployment is more likely to involve explicit reasoning about the post in the CoT (or probes etc) and thus to be caught. By contrast, if a model has seen lots of of transcripts of misalignment in pre-training, the misaligned actions might be intuitive and second-nature, and thus harder to detect.
Probably I should have included a footnote about this. I'm well aware that this is not a foolproof mechanism, but it still seems better than nothing and I think it's very easy to have a disclaimer that makes this clear. As I said in the post, I think that people should only do this for information they would have posted on LessWrong anyway.
I disagree that these things are basically ignored by labs. My guess is many labs put some effort into filtering out data with the canary string, but that this is slightly harder than you might think and so they end up messing it up sometimes. (They might also sometimes ignore it on purpose, I'm not sure.)
Even if labs ignore the canary string now having the canary string in there would make it much easier to filter these things out if labs ever wanted to do that in the future.
I also suggest using better methods like captchas for non-logged-in users. I expect something like this to work somewhat well (though it still wouldn't be foolproof).
LessWrong feature request: make it easy for authors to opt-out of having their posts in the training data.
If most smart people were put in the position of a misaligned AI and tried to take over the world, I think they’d be caught and fail.[1] If I were a misaligned AI, I think I’d have a much better shot at succeeding, largely because I’ve read lots of text about how people evaluate and monitor models, strategies schemers can use to undermine evals and take malicious actions without being detected, and creative paths to taking over the world as an AI.
A lot of that information is from LessWrong.[2] It's unfortunate that this information will probably wind up in the pre-training corpus of new models (though sharing the information is often still worth it overall to share most of this information[3]).
LessWrong could easily change this for specific posts! They could add something to their robots.txt to ask crawlers looking to scrape training data to ignore the pages. They could add canary strings to the page invisibly. (They could even go a step further and add something like copyrighted song lyrics to the page invisibly.) If they really wanted, they could put the content of a post behind a captcha for users who aren’t logged in. This system wouldn't be perfect (edit: please don't rely on these methods. They're harm-reduction for information where you otherwise would have posted without any protections), but I think even reducing the odds or the quantity of this data in the pre-training corpus could help.
I would love to have this as a feature at the bottom of drafts. I imagine a box I could tick in the editor that would enable this feature (and maybe let me decide if I want the captcha part or not). Ideally the LessWrong team could prompt an LLM to read users’ posts before they hit publish. If it seems like the post might be something the user wouldn't want models trained on, the site could could proactively ask the user if they want to have their post be removed from the training corpus if it seems likely the user might want that.
As far as I know, no other social media platform has an easy way to try to avoid having their data up in the training corpus (and many actively sell it for this purpose). So LessWrong would be providing a valuable service.
The actual decisions around what should or shouldn’t be part of the pre-training corpus seem nuanced: if we want to use LLMs to help with AI safety, it might help if those LLMs have some information about AI safety in their pre-training corpus (though adding that information back in during post-training might work almost as well). But I want to at least give users the option to opt out of the current default.
That's not to say all misaligned AIs would fail; I think there will be a period where AIs are roughly as smart as me and thus could at least bide their time and hide their misalignment without being caught if they'd read LessWrong and might fail to do so and get caught if they hadn't. But you can imagine we're purchasing dignity points or micro-dooms depending on your worldview. In either case I think this intervention is relatively cheap and worthwhile.
Of course much of it is reproduced outside LessWrong as well. But I think (1) so much of it is still on LessWrong and nowhere else that it’s worth it, and (2) the more times this information is reported in the pre-training dats the more likely the model is to memorize it or have the information be salient to it.
And the information for which the costs of sharing it aren't worth it probably still shouldn't be posted even if the proposal I outline here is implemented, since there’s still a good chance it might leak out.
Interesting! How did Norquist/Americans for Tax Reform get so much influence? They seem to spend even less money than Intuit on lobbying, but maybe I'm not looking at the right sources or they have influence via means other than money?
I'm also somewhat skeptical of the claims. The agreement between the the IRS and the Free File Alliance feels too favorable to the Free File Alliance for them to have had no hand in it.
As to your confusion, I can see why an advocacy group that wants to lower taxes might want the process of filing taxes to be painful. I'm just speculating, but I bet the fact that taxes are annoying to file and require you to directly confront the sizable sum you may owe the government makes people favor lower taxes and simpler tax codes.
When I was first trying to learn ML for AI safety research, people told me to learn linear algebra. And today lots of people I talk to who are trying to learn ML[1] seem under the impression they need to master linear algebra before they start fiddling with transformers. I find in practice I almost never use 90% of the linear algebra I've learned. I use other kinds of math much more, and overall being good at empiricism and implementation seems more valuable than knowing most math beyond the level of AP calculus.
The one part of linear algebra you do absolutely need is a really, really good intuition for what a dot product is, the fact that you can do them in batches, and the fact that matrix multiplication is associative. Someone smart who can't so much as multiply matrices can learn the basics in an hour or two with a good tutor (I've taken people through it in that amount of time). The introductory linear algebra courses I've seen[2] wouldn't drill this intuition nearly as well as the tutor even if you took them.
In my experience it's not that useful to have good intuitions for things like eigenvectors/eigenvalues or determinants (unless you're doing something like SLT). Understanding bases and change-of-basis is somewhat useful for improving your intuitions, and especially useful for some kinds of interp, I guess? Matrix decompositions are useful if you want to improve cuBLAS. Sparsity sometimes comes up, especially in interp (it's also a very very simple concept).
The same goes for much of vector calculus. (You need to know you can take your derivatives in batches and that this means you write your d/dx as ∂/∂x or an upside-down triangle. You don't need curl or divergence.)
I find it's pretty easy to pick things like this up on the fly if you ever happen to need them.
Inasmuch as I do use math, I find I most often use basic statistics (so I can understand my empirical results!), basic probability theory (variance, expectations, estimators), having good intuitions for high-dimensional probability (which is the only part of math that seems underrated for ML), basic calculus (the chain rule), basic information theory ("what is KL-divergence?"), arithmetic, a bunch of random tidbits like "the log derivative trick", and the ability to look at equations with lots of symbols and digest them.
In general most work and innovation[3] in machine learning these days (and in many domains of AI safety[4]) is not based in formal mathematical theory, it's based on empiricism, fussing with lots of GPUs, and stacking small optimizations. As such, being good at math doesn't seem that useful for doing most ML research. There are notable exceptions: some people do theory-based research. But outside these niches, being good at implementation and empiricism seems much more important; inasmuch as math gives you better intuitions in ML, I think reading more empirical papers or running more experiments or just talking to different models will give you far better intuitions per hour.
By "ML" I mean things involving modern foundation models, especially transformer-based LLMs.
It's pretty plausible to me that I've only been exposed to particularly mediocre math courses. My sample-size is small, and it seems like course quality and content varies a lot.
Please don't do capabilities mindlessly.
The standard counterargument here is these parts of AI safety are ignoring what's actually hard about ML and that empiricism won't work. For example we need to develop techniques that work on the first model we build that can self-improve. I don't want to get into that debate.