evhub

I am a Research Fellow at MIRI working on inner alignment for amplification.

See: "What I'll doing at MIRI."

Pronouns: he/him/his

evhub's Comments

Three Kinds of Competitiveness

It's interesting how Paul advocates merging cost and performance-competitiveness, and you advocate merging performance and date-competitiveness.

Also I advocated merging cost and date competitiveness (into training competitiveness), so we have every combination covered.

Three Kinds of Competitiveness
evhub8d13Ω8

In the context of prosaic AI alignment, I've recently taken to splitting up competitiveness into “training competitiveness” and “objective competitiveness,”[1] where training competitiveness refers to the difficulty of training the system to succeed at its objective and objective competitiveness refers to the usefulness of a system that succeeds at that objective. I think my training competitiveness broadly maps onto a combination of your cost and date competitiveness and my objective competitiveness broadly maps onto your performance competitiveness. I think I mildly like my dichotomy better than your trichotomy in terms of thinking about prosaic AI alignment schemes, as I think it provides a better picture of the specific parts of a prosaic AI alignment proposal that are helping or hindering its overall competitiveness—e.g. if it's not very objective competitive, that tells you that you need a stronger objective, and if it's not very training competitive, that tells you that you need a better training process (it's also nice in terms of mirroring the inner/outer alignment distinction). That being said, your trichotomy is certainly more general in terms of applying to things that aren't just prosaic AI alignment.


  1. Objective competitiveness isn't a great term, though, since it can be misread as the opposite of subjective competitiveness—perhaps I'll switch now to using performance competitiveness instead. ↩︎

Zoom In: An Introduction to Circuits

I think for the remaining 5% to be hiding really big important stuff like the presence of optimization (which is to say, mesa-optimization) or deceptive cognition, it has to be the case that there was adversarial obfuscation (e.g. gradient hacking). Of course, I'm only hypothesizing here, but it seems quite unlikely for that sort of stuff to just be randomly obfuscated.

Given that assumption, I think it's possible to translate 95% transparency into a safety guarantee: just use your transparency to produce a consistent gradient away from deception such that your model never becomes deceptive in the first place and thus never does any sort of adversarial obfuscation.[1] I suspect that the right way to do this is to use your transparency tools to enforce some sort of simple condition that you are confident in rules out deception such as myopia. For more context, see my comment here and the full “Relaxed adversarial training for inner alignment” post.


  1. It is worth noting that this does introduce the possibility of getting obfuscation by overfitting the transparency tools, though I suspect that that sort of overfitting-style obfuscation will be significantly easier to deal with than actively adversarial obfuscation by a deceptive mesa-optimizer. ↩︎

Coronavirus: Justified Practical Advice Thread

Do you have any thoughts on where to buy a bipap and a capnometer? Can you get them without a prescription? Are they sold on amazon? If you or anyone else manages to get this to work (or even just starts buying supplies for it), I'd love to know where they obtained all their supplies and what they ended up needing.

Coronavirus: Justified Practical Advice Thread
Answer by evhubMar 05, 202015

I think you should try to get antibiotics, antivirals, and/or antifungals for secondary infections in case hospitals are full and you need to treat yourself. According to this study, “When populations with low immune function, such as older people, diabetics, people with HIV infection, people with long-term use of immunosuppressive agents, and pregnant women, are infected with 2019-nCoV, prompt administration of antibiotics to prevent infection and strengthening of immune support treatment might reduce complications and mortality.” About what treatment people in Wuhan were given, the study says:

Most patients were given antibiotic treatment (table 2); 25 (25%) patients were treated with a single antibiotic and 45 (45%) patients were given combination therapy. The antibiotics used generally covered common pathogens and some atypical pathogens; when secondary bacterial infection occurred, medication was administered according to the results of bacterial culture and drug sensitivity. The antibiotics used were cephalosporins, quinolones, carbapenems, tigecycline against methicillin-resistant Staphylococcus aureus, linezolid, and antifungal drugs. The duration of antibiotic treatment was 3–17 days (median 5 days [IQR 3–7]). 19 (19%) patients were also treated with methylprednisolone sodium succinate, methylprednisolone, and dexamethasone for 3–15 days (median 5 [3–7]).

I think this sort of treatment might be one of the biggest factors in lower mortality for people with access to hospitals, so I suspect that getting your hands on some prescription antibiotics beforehand could be quite valuable. Some of the pharmacies that Wei Dai recommends here could be good bets, though I'm still currently trying to figure out what the best way is to do this—if anyone has any ideas let me know.

Towards a mechanistic understanding of corrigibility

I don't think there's really a disagreement there—I think what Paul's saying is that he views corrigibility as the right way to get an acceptability guarantee.

Coronavirus: Justified Practical Advice Thread

How did you order this without a prescription? When I went to order from the second link it asked for a prescription which I don't have.

What are the merits of signing up for cryonics with Alcor vs. with the Cryonics Institute?

What were the results from this survey? And what conclusion if any did you come to?

At what point should CFAR stop holding workshops due to COVID-19?
Answer by evhubFeb 25, 202018

The CDC is currently warning that pandemic COVID-19 in the U.S. is likely and are currently moving their focus from prevention to mitigation. Specifically, the CDC has said that while they are “continuing to hope that we won't see [community] spread, ” the current goal is “that our measures give us extra time to prepare." Once spread within the US is confirmed, the CDC has noted that mitigation measures will likely include “social distancing, school closures, canceling mass gatherings, [...] telemedicine, teleschooling, [and] teleworking.” As CFAR workshops certainly seem like they fall into the “mass gatherings” category, the current guidance from the CDC seems to imply that they should be canceled once U.S. spread is confirmed and mitigation measures such as social distancing and school closures start to be announced.

Load More