LESSWRONG
LW

1924
Drake Thomas
1069Ω6813681
Message
Dialogue
Subscribe

Interested in math puzzles, fermi estimation, strange facts about the world, toy models of weird scenarios, unusual social technologies, and deep dives into the details of random phenomena. 

Working on the pretraining team at Anthropic as of October 2024; before that I did independent alignment research of various flavors and worked in quantitative finance.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
5Drake Thomas's Shortform
1y
47
faul_sname's Shortform
Drake Thomas11d2-2

FWIW, my enthusiasm for "make America more good at AI than China" type policies comes somewhat more from considerations like "a larger US advantage lets the US spend more of a lead on safety without needing international cooperation" than considerations like "a CCP-led corrigible ASI would lead to much worse outcomes than a USG-led corrigible ASI". Though both are substantial factors for me and I'm fairly uncertain; I would not be surprised if my ordering here switched in 6 months.

Reply
If Anyone Builds It Everyone Dies, a semi-outsider review
Drake Thomas15d248

Thanks for writing this post! I'm curious to hear more about this bit of your beliefs going in:

The existential risk argument is suspiciously aligned with the commercial incentives of AI executives.  It simultaneously serves to hype up capabilities and coolness while also directing attention away from the real problems that are already emerging.  It’s suspicious that the apparent solution to this problem is to do more AI research as opposed to doing anything that would actually hurt AI companies financially.

Are there arguments or evidence that would have convinced you the existential risk worries in the industry were real / sincere? 

For context, I work at a frontier AI lab and from where I sit it's very clear to me that the x-risk worries aren't coming from a place of hype, and people who know more about the technology generally get more worried rather than less. (The executives still could be disingenuous in their expressed concern, but if so they're doing it in order to placate their employees who have real concerns about the risks, not to sound cool to their investors.)

I don't know what sorts of things would make that clearer from the outside, though. Curious if any of the following arguments would have been compelling to you:

  • The AI labs most willing to take costly actions now (like hire lots of safety researchers or support AI regulation that the rest of the industry opposes or make advance commitments about the preparations they'll take before releasing future models) are also the ones talking the most about catastrophic or existential risks.
    • Like if you thought this stuff was an underhanded tactic to drum up hype and get commercial success by lying to the public, then it's strange that Meta AI, not usually known for its tremendous moral integrity, is so principled about telling the truth that they basically never bring these risks up!
  • People often quit their well-paying jobs at AI companies in order to speak out about existential risk or for reasons of insufficient attention paid to AI safety from catastrophic or existential risks.
  • The standard trajectory is for lab executives to talk about existential risk a moderate amount early on, when they're a small research organization, and then become much quieter about it over time as they become subject to more and more commercial pressure. You actually see much more discussion of existential risk among the lower-level employees whose statements are less scrutinized for being commercially unwise. This is a weird pattern for something whose main purpose is to attract hype and investment!
Reply
Why you should eat meat - even if you hate factory farming
Drake Thomas1mo394

You can also look for welfare certifications on products you buy - Animal Welfare Institute has a nice guide to which labels actually mean things. (Don't settle for random good-sounding words on the package - some of them are basically meaningless or only provide very very weak guarantees!)

Personally, I feel comfortable buying meat that is certified GAP 4 or higher, and will sometimes buy GAP 3 or Certified Humane in a pinch. Products certified to this level are fairly uncommon but not super hard to find - you can order them from meat delivery services like Butcher Box, and many Whole Foods sell (a subset of) meat at GAP 4, especially beef and lamb (I've only ever seen GAP 3 or lower chicken and pork at my local Whole Foods though). You can use Find Humane to search for products in your area.

Reply11
Drake Thomas's Shortform
Drake Thomas1mo20

I'm starting to feel a bit sneezy and throat-bad-y this evening; I took a zinc lozenge maybe 2h after the first time I noticed anything feeling slightly off. Will keep it up for as long as I feel bad and edit accordingly, but preregistering early to commit myself to updating regardless of outcome.

Reply
Zach Stein-Perlman's Shortform
Drake Thomas1moΩ350

security during takeoff is crucial (probably, depending on how exactly the nonproliferation works)

I think you're already tracking this but to spell out a dynamic here a bit more: if the US maintains control over what runs on its datacenters and has substantially more compute on one project than any other actor, then it might still be OK for adversaries to have total visibility into your model weights and everything else you do: you just work on a mix of AI R&D and defensive security research with your compute (at a faster rate than they can work on RSI+offense with theirs) until you become protected against spying, and then your greater compute budget means you can do takeoff faster and they only reap the benefits of your models up to a relatively early point. Obviously this is super risky and contingent on offense/defense balance and takeoff speeds and is a terrible position to be in, but I think there's a good chance it's kinda viable.

(Also there are some things you can do to differentially advantage yourself even during the regime in which adversaries can see everything you do and steal all your results. Eg your AI does research into a bunch of optimization tricks that are specific to a model of chip the US has almost all of, or studies techniques for making a model that you can't finetune to pursue different goals without wrecking its capabilities and implements them on the next generation.)

You still care enormously about security over things like "the datacenters are not destroyed" and "the datacenters are running what you think they're running" and "the human AI researchers are not secretly saboteurs" and so on, of course.

Reply
The Problem with Defining an "AGI Ban" by Outcome (a lawyer's take).
Drake Thomas1mo30

Yeah, I basically agree with you here - I'm very happy to read LLM-written content, if I know that it has substantive thought put into it and is efficiently communicating useful ideas. Unfortunately right now one of my easiest detectors for identifying which things might have substantial thought put into them is "does this set off my LLM writing heuristics", because most LLM-written content in 2025 has very low useful-info density, so I find the heuristic of "discard LLM prose and read human-written but lazily worded prose" very useful.

Reply11
The Problem with Defining an "AGI Ban" by Outcome (a lawyer's take).
Drake Thomas1mo60

Yeah, I'm trying to distill some fuzzy intuitions that I don't have a perfectly legible version of and I do think it's possible for humans to write text that has these attributes naturally. I am pretty confident that I will have a good AUROC at classifying text written by humans from LLM-generated content even when the humans match many of the characteristics here; nothing in the last 10 comments you've written trips my AI detector at all.

(I also use bulleted lists, parentheticals, and em-dashes a lot and think they're often part of good writing – the "excessive" is somewhat load-bearing here.)

Reply
The title is reasonable
Drake Thomas1mo181

noticing the asymmetry in who you feel moved to complain about.

I think I basically complain when I see opinions that feel importantly wrong to me? 

When I'm in very LessWrong-shaped spaces, that often looks like arguing in favor of "really shitty low-dignity approaches to getting the AIs to do our homework for us are >>1% to turn out okay, I think there's lots of mileage in getting slightly less incompetent at the current trajectory", and I don't really harp on the "would be nice if everyone just stopped" thing the same way I don't harp on the "2+2=4" thing, except to do virtue signaling to my interlocutor about not being an e/acc so I don't get dismissed as being in the Bad Tribe Outgroup. 

When I'm in spaces with people who just think working on AI is cool, I'm arguing about the "holy shit this is an insane dangerous technology and you are not oriented to it with anything like a reasonable amount of caution" thing, and I don't really harp on the "some chance we make it out okay" bit except to signal that I'm not a 99.999% doomer so I don't get dismissed as being in the Bad Tribe Outgroup.

I think the asymmetry complaint is very reasonable for writing that is aimed at a broad audience, TBC, but when people are writing LessWrong posts I think it's basically fine to take the shared points of agreement for granted and spend most of your words on the points of divergence. (Though I do think it's good practice to signpost that agreement at least a little.)

Reply
The Problem with Defining an "AGI Ban" by Outcome (a lawyer's take).
Drake Thomas1mo60

Things that tripped my detector (which was set off before reading kave's comment):

  • Excessive use of bolded passages throughout.
  • Bulleted lists, especially with bolded headings in each bullet.
  • Very frequent use of short parenthetical asides where it doesn’t make a lot of sense to use a parenthetical instead of a comma or a new sentence. Eg the parenthetical "(compute caps)" reads more naturally as "[comma] like compute caps"; the parenthetical "(actual prison time)" is unnecessary; the phrase "its final outcome (extinction)" should just be "extinction".
    • Obviously humans write unnecessary or strained clauses all the time, but this particular sort of failure mode is like 10x more common in LLMs.
    • I would wildly conjecture that this behavior is in part an artifact of not having a backspace key, and when the LLMs write something that's underspecified or overstated, the only option they have is to modify in a parenthetical rather than rewrite the previous sentence.
  • Rule of three: “the X, Y, or Z”. “Sentence A. Sentence B. And [crucially / even / most importantly], sentence C." Obviously not a dead giveaway in one usage, but LLMs do this at a rate at least twice the human baseline, and the bits add up.
  • I’m not sure I can distill a nice rule here, but there’s a certain sort of punchy language that is a strong tell for me, where it’s like every paragraph is trying to have its own cute rhetorical flourish of the sort a human writer would restrain themselves to doing once at the end. It shows up especially often in the format "[short, punchy phase]: [short sentence]". Examples in this post:
    • "The principle is clear: regulate by measurable inputs and capabilities, not by catastrophic outcomes."
    • "We don't have this luxury: we cannot afford an AGI ban that is "80% avoided.""
Reply2
Safety researchers should take a public stance
Drake Thomas1mo30

a global moratorium on all aspects of AI capability progress for the next few decades would be a substantial improvement over the status quo

Saw some shrug reacts on this so wanted to elaborate a bit - I'm not super confident about this (maybe like 70% now rising to 80% the later we implement the pause), and become a lot more pessimistic about it if the moratorium does not cover things like hardware improvements, research into better algorithms, etc. I'm also sort of pricing in that there's sufficient political will to make this happen; the backlash from a decree like this if in fact most ordinary voters really hated it seems likely to be bad in various ways. As such I don't really try and do advocacy for such changes in 2025, though I'm very into preparing for such a push later on if we get warning shots or much more public will to put on the brakes. Happy to hear more on people's cruxes for those who think this is of unclear or negative sign. 

Reply
Load More
141Auditing language models for hidden objectives
Ω
8mo
Ω
15
5Drake Thomas's Shortform
1y
47
25Catastrophic Regressional Goodhart: Appendix
Ω
2y
Ω
1
180When is Goodhart catastrophic?
Ω
2y
Ω
30
Try Things
3 years ago
(+2/-3)