Here's my attempt at a neutral look at Prop 50, which people in California can vote on Tuesday (Nov 4th). The bill seems like a case-study in high-stakes game theory and when to cooperate or defect.
The bill would allow the CA legislature to re-write the congressional district maps until 2030 (when district-drawing would go back to normal). Currently, the district maps are drawn by an independent body designed to be politically neutral. In essence, this would allow the CA legislature to gerrymander California. That would probably give Democrats an extra 3-5 seats in Congress. It seems like there's a ~17% chance that it swings the House in the midterms.
Gerrymandering is generally agreed to be a bad thing, since it means elections are determined on the margin more by the map makers and less by the people. The proponents of this bill don't seem to think otherwise. They argue the bill is in response to Texas passing a similar bill to redistrict in a way that is predicted to give Republicans 5 new house seats (not to mention similar bills in North Carolina and Missouri that would give republicans an additional 2 seats).
Trump specifically urged Texas, North Carolina, and Missouri to pass ...
Humanity has only ever eradicated two diseases (and one of those, rinderpest, is only in cattle not humans). The next disease on the list is probably Guinea worm (though polio is also tantalizingly close).
At its peak Guinea worm infected ~900k people a year. In 2024 we so far only know of 7 cases. The disease isn't deadly, but it causes significant pain for 1-3 weeks (as a worm burrows out of your skin!) and in ~30% of cases that pain persists afterwards for about a year. In .5% of cases the worm burrows through important ligaments and leaves you permanently disabled. Eradication efforts have already saved about 2 million DALYs.[1]
I don't think this outcome was overdetermined; there's no recent medical breakthrough behind this progress. It just took a herculean act of international coordination and logistics. It took distributing millions of water filters, establishing village-based surveillance systems in thousands of villages across multiple countries, and meticulously tracking every single case of Guinea worm in humans or livestock around the world. It took brokering a six-month ceasefire in Sudan (the longest humanitarian ceasefire in history!) to allow healthcare workers to ac...
I don't think you need that footnoted caveat, simply because there isn't $150M/year worth of room for more funding in all of AMF, Malaria Consortium's SMC program, HKI's vitamin A supplementation program, and New Incentives' cash incentives for routine vaccination program all combined; these comprise the full list of GiveWell's top charities.
Another point is that the benefits of eradication keep adding up long after you've stopped paying for the costs, because the counterfactual that people keep suffering and dying of the disease is no longer happening. That's how smallpox eradication's cost-effectiveness can plausibly be less than a dollar per DALY averted so far and dropping (Guesstimate model, analysis). Quoting that analysis:
...3.10.) For how many years should you consider benefits?
It is not clear for how long we should continue to consider benefits, since the benefits of vaccines would potentially continue indefinitely for hundreds of years. Perhaps these benefits would eventually be offset by some other future technology, and we could try to model that. Or perhaps we should consider a discount rate into the future, though we don’t find that idea appealing.
Instead, w
Notes on living semi-frugally in the Bay Area.
I live in the Bay Area, but my cost of living is pretty low: roughly $30k/year. I think I live an extremely comfortable life. I try to be fairly frugal, both so I don't end up dependent on jobs with high salaries and so that I can donate a lot of my income, but it doesn't feel like much of a sacrifice. Often when I tell people how little I spend, they're shocked. I think people conceive of the Bay as exorbitantly expensive, and it can be, but it doesn't have to be.
Rent: I pay ~$850 a month for my room. It's a small room in a fairly large group house I live in with nine friends. It's a nice space with plenty of common areas and a big backyard. I know of a few other places like this (including in even pricier areas like Palo Alto). You just need to know where to look and to be willing to live with friends. On top of rent I pay ~$200/month (edit: I was missing one expense, it's more like $300) for things like utilities, repairs on the house, and keeping the house tidy.
I pool the grocery bill with my housemates so we can optimize where we shop a little. We also often cook for each other (notably most of us, including myself, also get free m...
I also live in the Bay area, and live similarly.
I currently save (and invest) something like 90% of my income. Though my my income has changed a lot in different years. When I'm working a lot less on paid projects, and don't have a salary, I make less money, and only save like 20% to 40%.
However, I'm semi-infamously indifferent to fun (and to most forms of physical pleasure), and I spend almost all my time working or studying. So my situation probably doesn't generalize to most people.
Note that most people either have or want children, which changes the calculus here: you need a larger place (often a whole house if you have many or want to live with extended family), and are more likely to benefit from paying a cleaner/domestic help (which is surprisingly expensive in the Bay and cannot be hired remotely). Furthermore, if you're a meat-eater and want to buy ethically sourced meat or animal products, this increases the cost of food a lot.
I want to push back on the idea of needing a large[1] place if you have a family.
In the US a four person family will typically live in a 2,000-2,500 square foot place, but in Europe the same family will typically live in something like 1,000-1,400 square feet. In Asia it's often less, and earlier in the US's history it also was much less than what it is today.
If smaller sizes work for others across time and space I believe it is often sufficient for people in the US today.
Well, you just said "larger".
There’s a cottage industry that thrives off of sneering, gawking, and maligning the AI safety community. This isn't new, but it's probably going to become more intense and pointed now that there are two giant super PACs that (allegedly[1]) see safety as a barrier to [innovation/profit, depending on your level of cynicism]. Brace for some nasty, uncharitable articles.
I think the largest cost of this targeted bad press will be the community's overreaction, not the reputational effects outside the AI safety community. I've already seen people shy away from doing things like donating to politicians that support AI safety for fear of provoking the super PACs.
Historically, the safety community often freaked out in the face of this kind of bad press. People got really stressed out, pointed fingers about whose fault it was, and started to let the strong frames in the hit pieces get into their heads.[2] People disavowed AI safety and turned to more popular causes. And the collective consciousness decided that the actions and people who ushered in the mockery were obviously terrible and dumb, so much so that you'd get a strange look if you asked them to justify that argument. In reality...
I hope we continue to hold ourselves to high standards for integrity and honor, and as long as we do, I will be proud to be part of this community no matter what the super PACs say.
I do wish this was the case, but as I have written many times in the past, I just don't think this is an accurate characterization. See e.g.: https://www.lesswrong.com/posts/wn5jTrtKkhspshA4c/michaeldickens-s-shortform?commentId=zoBMvdMAwpjTEY4st
I don't think the AI safety community has particularly much integrity or honor. I would like to make there be something in the space that has those attributes, but please don't claim valor we/you don't have!
For context, how would you rank the AI safety community w.r.t. integrity and honor, compared to the following groups:
1. AGI companies
2. Mainstream political parties (the organizations, not the voters, so e.g. the politicians and their staff)
3. Mainstream political movements e.g. neoliberalism, wokism, china hawks, BLM,
4. A typical university department
5. Elite opinion formers (e.g. the kind of people whose Substacks and op-eds are widely read and highly influential in DC, silicon valley, etc.)
6. A typical startup
7. A typical large bloated bureaucracy or corporation
8. A typical religion e.g. christianity, islam, etc.
9. The US military
My current best guess is that you have a higher likelihood of being actively deceived/have someone actively plot to mislead you/have someone put in very substantial optimization pressure to get you to believe something false or self-serving, if you interface with the AI safety community than almost any of the above.
This is a wild claim. Don't religions sort of centrally try to get you to believe known-to-be-false claims? Don't politicians famously lie all the time?
Are you saying that EAs are better at deceiving people than typical members of those groups?
Are you claiming that members of those groups may regularly spout false claims, but they're actually not that invested in getting others to believe them?
Can you be more specific about the way in which you think AI Safety folk are worse?
Like, members of those groups do not regularly sit down and make extensive plans about how to optimize other people's beliefs in the same way as seems routine around here.
I've been around the community for 10 years. I don't think I've ever seen this?[1]
Am I just blind to this? Am I seeing it all the time, except I have lower standards what should "count"? Am I just selected out of such conversations somehow?
I currently work for an org that is explicitly focused on communicating the AI situation to the world, and to policymakers in particular. We are definitely attempting to be strategic about that, and we put a hell of a lot of effort into doing it well (eg running many many test sessions, where we try to explain what's up to volunteers, see what's confusing, and adjust what we're saying).
(Is this the kind of thing you mean?)
But, importantly, we're clear about trying to frankly communicate our actual beliefs, including our uncertainties, and are strict about adhering to standards of local validity and precise honesty: I'm happy to talk with you about the confusing experimental results that weaken our high level claims (though admittedly, under normal time constraints,
The advice and techniques from the rationality community seem to work well at avoiding a specific type of high-level mistake: they help you notice weird ideas that might otherwise get dismissed and take them seriously. Things like AI being on a trajectory to automate all intellectual labor and perhaps take over the world, animal suffering, longevity, cryonics. The list goes on.
This is a very valuable skill and causes people to do things like pivot their careers to areas that are ten times better. But once you’ve had your ~3-5 revelations, I think the value of these techniques can diminish a lot.[1]
Yet a lot of the rationality community’s techniques and culture seem oriented around this one idea, even on small scales: people pride themselves on being relentlessly truth-seeking and willing to consider possibilities they flinch away from.
On the margin, I think the rationality community should put more empasis on skills like:
Performing simple cost-effectiveness estimates accurately
I think very few people in the community could put together an analysis like this one from Eric Neyman on the value of a particular donation opportunity (see the section “Comparison to non-AI safety opportuni...
I think very few people in the community could put together an analysis like this one from Eric Neyman on the value of a particular donation opportunity (see the section “Comparison to non-AI safety opportunities”).
Huh, FWIW, I thought this analysis was a quite classical example of streetlighting. It succeeded at quantifying some things related to the donation opportunity at hand, but it failed to cover the ones I considered most important. This seems like the standard error mode of this kind of estimate, and I was quite sad to see it here.
Like, the most important thing to estimate when evaluating a political candidate is their trustworthiness and integrity! It's the thing that would flip the sign on whether supporting someone is good or bad for the world. The model is silent on this point, and weirdly, it indeed, when I talked to many others about it, seemed to serve as a semantic stopsign for asking the much more important questions about the candidate.
Like, I am strongly in favor of making quick quantitative model, but I felt like this one missed the target. I mean, like, it's fine, I don't think it was a bad thing, but at least various aspects about how it was prese...
FWIW I wouldn't say "trustworthiness" is the most important thing, more like "can be trusted to take AI risk seriously", and my model is more about the latter.
No. Bad. Really not what I support. Strong disagree. Bad naive consequentialism.
Yes, of course I care about whether someone takes AI risk seriously, but if someone is also untrustworthy, in my opinion this serves as a multiplier of their negative impact on the world. I do not want to create scheming and untrustworthy stakeholders that start doing sketchy stuff around AI risk. That's how really a lot of bad stuff in the past has already happened.
I think political donations to trustworthy and reasonable politicians who are open to AI X-risk, but don't have an opinion on it are much better for the world (indeed, infinitely better due to inverted sign), than untrustworthy ones that do seem interested.
That said, I agree that you could put this in the model! I am not against quantitatively estimating integrity and trustworthiness, and think the model would be a bunch better for considering it.
Yes, of course I care about whether someone takes AI risk seriously, but if someone is also untrustworthy, in my opinion this serves as a multiplier of their negative impact on the world. I do not want to create scheming and untrustworthy stakeholders that start doing sketchy stuff around AI risk. That's how really a lot of bad stuff in the past has already happened.
No-true-Scotsman-ish counterargument: no-one who actually gets AI risk would engage in this kind of tomfoolery. This is the behavior of someone who almost got it, but then missed the last turn and stumbled into the den of the legendary Black Beast of Aaargh. In the abstract, I think "we should be willing to consider supporting literal Voldemort if we're sure he has the correct model of AI X-risk" goes through.
The problem is that it just totally doesn't work in practice, not even on pure consequentialist grounds:
When I was first trying to learn ML for AI safety research, people told me to learn linear algebra. And today lots of people I talk to who are trying to learn ML[1] seem under the impression they need to master linear algebra before they start fiddling with transformers. I find in practice I almost never use 90% of the linear algebra I've learned. I use other kinds of math much more, and overall being good at empiricism and implementation seems more valuable than knowing most math beyond the level of AP calculus.
The one part of linear algebra you do absolutely need is a really, really good intuition for what a dot product is, the fact that you can do them in batches, and the fact that matrix multiplication is associative. Someone smart who can't so much as multiply matrices can learn the basics in an hour or two with a good tutor (I've taken people through it in that amount of time). The introductory linear algebra courses I've seen[2] wouldn't drill this intuition nearly as well as the tutor even if you took them.
In my experience it's not that useful to have good intuitions for things like eigenvectors/eigenvalues or determinants (unless you're doing something like SLT)....
This is a great list!
Here's some stuff that isn't in your list that I think comes up often enough that aspiring ML researchers should eventually know it (and most of this is indeed universally known). Everything in this comment is something that I've used multiple times in the last month.
LessWrong feature request: make it easy for authors to opt-out of having their posts in the training data.
If most smart people were put in the position of a misaligned AI and tried to take over the world, I think they’d be caught and fail.[1] If I were a misaligned AI, I think I’d have a much better shot at succeeding, largely because I’ve read lots of text about how people evaluate and monitor models, strategies schemers can use to undermine evals and take malicious actions without being detected, and creative paths to taking over the world as an AI.
A lot of that information is from LessWrong.[2] It's unfortunate that this information will probably wind up in the pre-training corpus of new models (though sharing the information is often still worth it overall to share most of this information[3]).
LessWrong could easily change this for specific posts! They could add something to their robots.txt to ask crawlers looking to scrape training data to ignore the pages. They could add canary strings to the page invisibly. (They could even go a step further and add something like copyrighted song lyrics to the page invisibly.) If they really wanted, they could put the c...
I worry that canary strings and robots.txt are ~basically ignored by labs and that this could cause people to share things that on the margin they wouldn't if there were no such option[1]. More reliable methods exist, but they come with a lot of overhead and I expect most users wouldn't want to deal with it.
Especially since as the post says, canaries often don't serve the purpose of detection either with publicly accessible models claiming ignorance of them.
TurboTax and H&R Block famously lobby the US government to make taxes more annoying to file to drum up demand for their products.[1] But as far as I can tell, they each only spend ~$3-4 million a year on lobbying. That's... not very much money (contrast it with the $60 billion the government gave the IRS to modernize its systems or the $4.9 billion in revenue Intuit made last fiscal year from TurboTax or the hundreds of millions of hours[2] spent that a return-free tax filing system could save).
Perhaps it would "just" take a multimillionaire and a few savvy policy folks to make the US tax system wildly better? Maybe TurboTax and H&R Block would simply up their lobbying budget if they stopped getting their way, but maybe they wouldn't. Even if they do, I think it's not crazy to imagine a fairly modest lobbying effort could beat them, since simpler tax filing seems popular across party lines/is rather obviously a good idea, and therefore may have an easier time making its case. Plus I wonder if pouring more money into lobbying hits diminishing returns at some point such that even a small amount of funding against TurboTax could go a long way.
Nobody seems to be tryin...
The world seems bottlenecked on people knowing and trusting each other. If you're a trustworthy person who wants good things for the world, one of the best ways to demonstrate your trustworthiness is by interacting with people a lot, so that they can see how you behave in a variety of situations and they can establish how reasonable, smart, and capable you are. You can produce a lot of value for everyone involved by just interacting with people more.
I’m an introvert. My social skills aren't amazing, and my social stamina is even less so. Yet I drag myself to parties and happy hours and one-on-one chats because they pay off.
It's fairly common for me to go to a party and get someone to put hundreds of thousands of dollars towards causes I think are impactful, or to pivot their career, or to tell me a very useful, relevant piece of information I can act on. I think each of those things individually happens more than 15% of the time that I go to a party.
(Though this is only because I know of unusually good cause areas and career opportunities. I don't think I could get people to put money or time towards random opportunities. This is a positive-sum interaction where I'm ...
The other day I was speaking to one of the most productive people I’d ever met.[1] He was one of the top people in a very competitive field who was currently single-handedly performing the work of a team of brilliant programmers. He needed to find a spot to do some work, so I offered to help him find a desk with a monitor. But he said he generally liked working from his laptop on a couch, and he felt he was “only 10% slower” without a monitor anyway.
I was aghast. I’d been trying to optimize my productivity for years. A 10% productivity boost was a lot! Those things compound! How was this man, one of the most productive people I’d ever met, shrugging it off like it was nothing?
I think this nonchalant attitude towards productivity is fairly common in top researchers (though perhaps less so in top executives?). I have no idea why some people are so much more productive than others. It surprises me that so much variance is even possible.
This guy was smart, but I know plenty of people as smart as him who are far less productive. He was hardworking, but not insanely so. He wasn’t aggressively optimizing his productivity.[2] He wasn't that old so it couldn't just be experience. ...
Ways training incentivizes and disincentivizes introspection in LLMs.
Recent work has shown some LLMs have some ability to introspect. Many people were surprised to learn LLMs had this capability at all. But I found the results somewhat surprising for another reason: models are trained to mimic text, both in pre-training and fine-tuning. Almost every time a model is prompted in training to generate text related to introspection, the answer it's trained to give is whatever answer the LLMs in the training corpus would say, not what the model being trained actually observes from its own introspection. So I worry that even if models could introspect, they might learn to never introspect in response to prompting.
We do see models act consistently with this hypothesis sometimes: if you ask a model how many tokens it sees in a sentence or instruct it to write a sentence that has a specific number of tokens in it, it won't answer correctly.[1] But the model probably "knows" how many tokens there are; it's an extremely salient property of the input, and the space of possible tokens is a very useful thing for a model to know since it determines what it can output. At the very least models...
Ideas for how to spend very large amounts of money to improve AI safety:
If AI companies' valuations continue to skyrocket (or if new very wealthy actors start to become worried about AI risk), there might be a large influx of funding into the AI safety space. Unfortunately, it's not straightforward to magically turn money into valuable AI safety work. Many things in the AI safety ecosystem are more bottlenecked on having a good founder with the right talent and context, or having good researchers.
Here's a random incomplete grab-bag of ideas for ways you co...