Thomas Kwa

Was on Vivek Hebbar's team at MIRI, now working with Adrià Garriga-Alonso on various empirical alignment projects.

I'm looking for projects in interpretability, activation engineering, and control/oversight; DM me if you're interested in working with me.

I have signed no contracts or agreements whose existence I cannot mention.

Sequences

Catastrophic Regressional Goodhart

Wiki Contributions

Comments

Sorted by

First: yes, it's clearly a skill issue. If I was a more brilliant engineer or researcher then I'd have found a way to contribute to the field by now.

I am not sure about this, just because someone will pay you to work on AI safety doesn't mean you won't be stuck down some dead end. Stuart Russell is super brilliant but I don't think AI safety through probabilistic programming will work. 

Thomas KwaΩ6102

But when I hear Anthropic people (and most AI safety people) talk about AI welfare, the vibe is like it would be unacceptable to incur a [1% or 5% or so] risk of a deployment accidentally creating AI suffering worse than 10^[6 or 9 or so] suffering-filled human lives.

You have to be a pretty committed scope-sensitive consequentialist to disagree with this. What if they actually risked torturing 1M or 1B people? That seems terrible and unacceptable, and by assumption AI suffering is equivalent to human suffering. I think our societal norms are such that unacceptable things regularly become acceptable when the stakes are clear so you may not even lose much utility from this emphasis on avoiding suffering.

It seems perfectly compatible with good decision-making that there are criteria A and B, A is much more important and therefore prioritized over B, and 2 out of 19 sections are focused on B. The real question is whether the organization's leadership is able to make difficult tradeoffs, reassessing and questioning requirements as new information comes in. For example, in the 1944 Norwegian sabotage of a Nazi German heavy water shipment, stopping the Nazi nuclear program was the first priority. The mission went ahead with reasonable effort to minimize casualties and 14 civilians died anyway, less than it could have been. It would not really have alarmed me to see a document discussing 19 efforts with 2 being avoidance of casualties, nor to know that the planners regularly talked with the vibe that 10-100 civilian casualties should be avoided, as long as someone had their eye on the ball.

I'm a fan of this blog which is mainly translations and commentary on Chinese social media posts but also has some history posts.

The numbers for ion don't seem crazy. To get the impulse to catch a 1,000 ton object every 2 weeks you would need 10,000 Starlink thrusters massing 21 tons, plus 42 MW of power, which is 14 hectares of solar panels at an average of 300 W/m^2. That's only a couple dozen times the ISS.

Tethers can theoretically use more efficient propulsion because their thrust requirements are lower. The argon Hall effect thrusters on Starlink satellites have around 7x the specific impulse (fuel efficiency) of Starship engines while needing 7x the energy due to KE=mv^2/2 and having a tiny fraction of the thrust. This energy could come from a giant solar panel rather than the fuel, and every once in a while it could be refueled with a big tanker of liquid argon.

Back in 2019 GiveWell surveyed 2,000 people in Kenya and Ghana.

The results from this research are now available here. Among other findings, they suggest that survey respondents have higher values for saving lives (relative to reducing poverty) and higher values for averting deaths of children under 5 years old (relative to averting deaths of individuals over 5 years old) than we had previously been using in our decision-making.

He explains why two tweets down the thread.

The idea of a "p(doom)" isn't quite as facially insane as "AGI timelines" as marker of personal identity, but (1) you want action-conditional doom, (2) people with the same numbers may have wildly different models, (3) these are pretty rough log-odds and it may do violence to your own mind to force itself to express its internal intuitions in those terms which is why I don't go around forcing my mind to think in those terms myself, (4) most people haven't had the elementary training in calibration and prediction markets that would be required for them to express this number meaningfully and you're demanding them to do it anyways, (5) the actual social role being played by this number is as some sort of weird astrological sign and that's not going to help people think in an unpressured way about the various underlying factual questions that ought finally and at the very end to sum to a guess about how reality goes.

This seems very reasonable to me, and I think it's a very common opinion about [edit: I meant among] AI safety people that discussing p(doom) numbers without lots of underlying models is not very useful.

The important part of Eliezer's writing on probability IMO is to notice that the underlying laws of probability are Bayesian and do sanity checks, not to always explicitly calculate probabilities. Given that it's only kinda useful in life in general, it is reasonable that (4) and (5) can make trying it net negative.

Not quite, there is a finite quantity of fissiles. IIRC it's only an order of magnitude of energy more than fossil fuel reserves.

The power density of nanotech is extremely high (10 kW/kg), so it only takes 16 kilograms of active nanotech per person * 10 billion people to generate enough waste heat to melt the polar ice caps. Literally boiling the oceans should only be a couple more orders of magnitude, so it's well within possible energy demand if the AIs can generate enough energy. But I think it's unlikely they would want to.

Source: http://www.imm.org/Reports/rep054.pdf

An area can support far fewer people living first world lifestyles (50-100 full-time energy slaves) than ones living like the typical African bushman.

This is wrong. There is no recorded hunter-gatherer society that could reach even 10% of California's population density, and California is also one of the richest places in the world, a massive food exporter, and a clean energy pioneer.

Load More