Interested in math puzzles, fermi estimation, strange facts about the world, toy models of weird scenarios, unusual social technologies, and deep dives into the details of random phenomena.
Working on the pretraining team at Anthropic as of October 2024; before that I did independent alignment research of various flavors and worked in quantitative finance.
Thanks for writing this post! I'm curious to hear more about this bit of your beliefs going in:
The existential risk argument is suspiciously aligned with the commercial incentives of AI executives. It simultaneously serves to hype up capabilities and coolness while also directing attention away from the real problems that are already emerging. It’s suspicious that the apparent solution to this problem is to do more AI research as opposed to doing anything that would actually hurt AI companies financially.
Are there arguments or evidence that would have convinced you the existential risk worries in the industry were real / sincere?
For context, I work at a frontier AI lab and from where I sit it's very clear to me that the x-risk worries aren't coming from a place of hype, and people who know more about the technology generally get more worried rather than less. (The executives still could be disingenuous in their expressed concern, but if so they're doing it in order to placate their employees who have real concerns about the risks, not to sound cool to their investors.)
I don't know what sorts of things would make that clearer from the outside, though. Curious if any of the following arguments would have been compelling to you:
You can also look for welfare certifications on products you buy - Animal Welfare Institute has a nice guide to which labels actually mean things. (Don't settle for random good-sounding words on the package - some of them are basically meaningless or only provide very very weak guarantees!)
Personally, I feel comfortable buying meat that is certified GAP 4 or higher, and will sometimes buy GAP 3 or Certified Humane in a pinch. Products certified to this level are fairly uncommon but not super hard to find - you can order them from meat delivery services like Butcher Box, and many Whole Foods sell (a subset of) meat at GAP 4, especially beef and lamb (I've only ever seen GAP 3 or lower chicken and pork at my local Whole Foods though). You can use Find Humane to search for products in your area.
I'm starting to feel a bit sneezy and throat-bad-y this evening; I took a zinc lozenge maybe 2h after the first time I noticed anything feeling slightly off. Will keep it up for as long as I feel bad and edit accordingly, but preregistering early to commit myself to updating regardless of outcome.
security during takeoff is crucial (probably, depending on how exactly the nonproliferation works)
I think you're already tracking this but to spell out a dynamic here a bit more: if the US maintains control over what runs on its datacenters and has substantially more compute on one project than any other actor, then it might still be OK for adversaries to have total visibility into your model weights and everything else you do: you just work on a mix of AI R&D and defensive security research with your compute (at a faster rate than they can work on RSI+offense with theirs) until you become protected against spying, and then your greater compute budget means you can do takeoff faster and they only reap the benefits of your models up to a relatively early point. Obviously this is super risky and contingent on offense/defense balance and takeoff speeds and is a terrible position to be in, but I think there's a good chance it's kinda viable.
(Also there are some things you can do to differentially advantage yourself even during the regime in which adversaries can see everything you do and steal all your results. Eg your AI does research into a bunch of optimization tricks that are specific to a model of chip the US has almost all of, or studies techniques for making a model that you can't finetune to pursue different goals without wrecking its capabilities and implements them on the next generation.)
You still care enormously about security over things like "the datacenters are not destroyed" and "the datacenters are running what you think they're running" and "the human AI researchers are not secretly saboteurs" and so on, of course.
Yeah, I basically agree with you here - I'm very happy to read LLM-written content, if I know that it has substantive thought put into it and is efficiently communicating useful ideas. Unfortunately right now one of my easiest detectors for identifying which things might have substantial thought put into them is "does this set off my LLM writing heuristics", because most LLM-written content in 2025 has very low useful-info density, so I find the heuristic of "discard LLM prose and read human-written but lazily worded prose" very useful.
Yeah, I'm trying to distill some fuzzy intuitions that I don't have a perfectly legible version of and I do think it's possible for humans to write text that has these attributes naturally. I am pretty confident that I will have a good AUROC at classifying text written by humans from LLM-generated content even when the humans match many of the characteristics here; nothing in the last 10 comments you've written trips my AI detector at all.
(I also use bulleted lists, parentheticals, and em-dashes a lot and think they're often part of good writing – the "excessive" is somewhat load-bearing here.)
noticing the asymmetry in who you feel moved to complain about.
I think I basically complain when I see opinions that feel importantly wrong to me?
When I'm in very LessWrong-shaped spaces, that often looks like arguing in favor of "really shitty low-dignity approaches to getting the AIs to do our homework for us are >>1% to turn out okay, I think there's lots of mileage in getting slightly less incompetent at the current trajectory", and I don't really harp on the "would be nice if everyone just stopped" thing the same way I don't harp on the "2+2=4" thing, except to do virtue signaling to my interlocutor about not being an e/acc so I don't get dismissed as being in the Bad Tribe Outgroup.
When I'm in spaces with people who just think working on AI is cool, I'm arguing about the "holy shit this is an insane dangerous technology and you are not oriented to it with anything like a reasonable amount of caution" thing, and I don't really harp on the "some chance we make it out okay" bit except to signal that I'm not a 99.999% doomer so I don't get dismissed as being in the Bad Tribe Outgroup.
I think the asymmetry complaint is very reasonable for writing that is aimed at a broad audience, TBC, but when people are writing LessWrong posts I think it's basically fine to take the shared points of agreement for granted and spend most of your words on the points of divergence. (Though I do think it's good practice to signpost that agreement at least a little.)
Things that tripped my detector (which was set off before reading kave's comment):
a global moratorium on all aspects of AI capability progress for the next few decades would be a substantial improvement over the status quo
Saw some shrug reacts on this so wanted to elaborate a bit - I'm not super confident about this (maybe like 70% now rising to 80% the later we implement the pause), and become a lot more pessimistic about it if the moratorium does not cover things like hardware improvements, research into better algorithms, etc. I'm also sort of pricing in that there's sufficient political will to make this happen; the backlash from a decree like this if in fact most ordinary voters really hated it seems likely to be bad in various ways. As such I don't really try and do advocacy for such changes in 2025, though I'm very into preparing for such a push later on if we get warning shots or much more public will to put on the brakes. Happy to hear more on people's cruxes for those who think this is of unclear or negative sign.
FWIW, my enthusiasm for "make America more good at AI than China" type policies comes somewhat more from considerations like "a larger US advantage lets the US spend more of a lead on safety without needing international cooperation" than considerations like "a CCP-led corrigible ASI would lead to much worse outcomes than a USG-led corrigible ASI". Though both are substantial factors for me and I'm fairly uncertain; I would not be surprised if my ordering here switched in 6 months.