Modeling Transformative AI Risk (MTAIR)

Wiki Contributions


I think what you call grader-optimization is trivially about how a target diverges from the (unmeasured) true goal, which is adversarial goodhart (as defined in paper, especially how we defined Campbell’s Law, not the definition in the LW post.) 

And the second paper's taxonomy, in failure mode 3, lays out how different forms of adversarial optimization in a multi-agent scenario relate to Goodhart's law, in both goal poisoning and optimization theft cases - and both of these seem relevant to the questions you discussed in terms of grader-optimization.

This relates closely to how to "solve" Goodhart problems in general. Multiple metrics / graders make exploitation more complex, but have other drawbacks. I discussed the different approaches in my paper here, albeit in the realm of social dynamics rather than AI safety.

This seems great!

If you are continuing work in this vein, I'd be interested in you looking at how these dynamics relate to different Goodhart failure modes, as we expanded on here. I think that much of the problem relates to specific forms of failure, and that paying attention to those dynamics could be helpful. I also think they accelerate in the presence of multiple agents - and I think the framework I pointed to here might be useful.

Cost-benefit analysis is a very weak tool here, since costs are very hard to assess, long term, and uncertain, and in every individual case, it's worth it because it's a collective action problem and others are doing it wrong already.

There are estimates of the cost of antibiotic resistance, for example, almost $5b/year in the US alone. So from a collective action standpoint, if you assume that all agents are going to follow a policy, you at the very least only want to prescribe specific antibiotics when they are clinically useful - and even if you're not running tests, etc. you need to know a really significant amount to know which antibiotics to use for which set of symptoms, and you should only prescribe them if there's a pretty significant chance of full compliance. Hence the DOTS regime for TB - WHO guidelines require observing the patient taking each dose, not just prescribing it.

I'd have gone with doctors without borders, which I linked to, which does far more on the ground work and would know about the ability to help better, but I think we agree here.

Don't you think you should first talk to the organizations with that experience, instead of trying to learn from your own experience, without even looking at what is already happening?

We can stop speculating about these questions - the answers exist and are relatively easy to check.

https://academic.oup.com/cid/article/27/Supplement_1/S12/459194 (Horizontal transfer is where the resistance is "trafficked" between different pathogens.)

https://academic.oup.com/cid/article-abstract/33/3/364/277722 (Geographic spread is very common, but you need better tracking to see exactly where and what the routes are.)

This isn't my field, but there are tons of people who could give concrete and specific answers to all of these questions, and so it seems silly to continue speculation.

"yet I don't hear about most of the world's human-infecting bacteria becoming antibiotic resistant"

You're not paying attention, or looking for the evidence - it's not covered by the news, because of course they ignore gradual threats that pose minor inconveniences. But there is a lot of academic work on this. Even on short time scales, changing resistance is observable. The problem isn't small. According to that last link, at this point, every year more people are killed due to antibiotic resistance than are killed by Malaria. And unlike Malaria, the trends are getting worse. I'm sure you can argue about the exact numbers, but unlike pumping gas, it's a classic collective action problem were individual acts harm themselves too little to be noticed, and erodes a global commons that we can't easily replenish or replace.

...but the problems being highlighted are, if anything, much more specifically applicable to the rationalist side of the divide. So this seems wrong.

Load More