Davidmanheim

Sequences

Modeling Transformative AI Risk (MTAIR)

Wiki Contributions

Comments

Sorted by

I know someone who has done lots of reporting on lab leaks, if that helps?

Also, there are some "standard" EA-adjacent journalists who you could contact / someone could introduce you to, if it's relevant to that as well.

Vaccine design is hard, and requires lots of work. Seems strange to assert that someone could just do it on the basis of a theoretical design. Viral design, though, is even harder, and to be clear, we've never seen anyone build one from first principles; the most we've seen is modification of extant viruses in minor ways where extant vaccines for the original virus are likely to work at least reasonably well.

DavidmanheimΩ242

I have a lot more to say about this, and think it's worth responding to in much greater detail, but I think that overall, the post criticizes Omhundro and Tegmark's more extreme claims somewhat reasonably, though very uncharitably, and then assumes that other proposals which seem to be related, especially Dalyrymple et al. approach, are essentially the same, and doesn't engage with the specific proposal at all.

To be very specific about how I think the post in unreasonable, there are a number of places where a seeming steel-man version of the proposals are presented, and then this steel-manned version, rather than the initial proposal for formal verification, is attacked. But this amounts to a straw-man criticism of the actual proposals being discussed!

For example, this post suggests that arbitrary DNA could be proved safe by essentially impossible modeling ("on-demand physical simulations of entire human bodies (with their estimated 36 trillion cells [9]), along with the interactions between the cells themselves and the external world and then run those simulations for years"). This is true, that would work - but the proposal ostensibly being criticized was to check narrower questions about whether DNA synthesis is being used to produce something harmful. And Dalyrmple et al explained explicitly what they might have included elsewhere in the paper ("Examples include machines provably impossible to login to without correct credentials, DNA synthesizers that provably cannot synthesize certain pathogens, and AI hardware that is provably geofenced, time-limited (“mortal”) or equipped with a remote-operated throttle or kill-switch. Provably compliant sensors can be specified to ensure “zeroization”, in which tampering with PCH is guaranteed to cause detection and erasure of private keys.")

But that premise falls apart as soon as a large fraction of those (currently) highly motivated (relatively) smart tech workers can only get jobs in retail or middle management.

Yeah, I think the simplest thing for image generation is for model hosting providers to use a separate tool - and lots of work on that already exists. (see, e.g., this, or this, or this, for different flavors.) And this is explicitly allowed by the bill.

For text, it's harder to do well, and you only get weak probabilistic identification, but it's also easy to implement an Aaronson-like scheme, even if doing it really well is harder. (I say easy because I'm pretty sure I could do it myself, given, say, a month working with one of the LLM providers, and I'm wildly underqualified to do software dev like this.)

  • Internet hosting platforms are responsible for ensuring indelible watermarks.

The bill requires that "Generative AI hosting platforms shall not make available a generative AI system that does not allow a GenAI provider, to the greatest extent possible and either directly providing functionality or making available the technology of a third-party vendor, to apply provenance data to content created or substantially modified by the system"

This means that sites running GenAI models need to allow the GenAI systems to implement their required watermarking, not that hosting providers (imgur, reddit, etc.) need to do so. Less obviously good, but still importantly, it also doesn't require the GenAI hosting provider the ensure the watermark is indelible, just that they include watermarking via either the model, or a third-party tool, when possible.

LLMs are already moderately-superhuman at the task of predicting next tokens. This isn't sufficient to help solve alignment problems. We would need them to meet the much much higher bar of being moderately-superhuman at the general task of science/engineering. 

We also need the assumption - which is definitely not obvious - that significant intelligence increases are relatively close to achievable. Superhumanly strong math skills presumably don't let AI solve NP problems in P time, and it's similarly plausible - though far from certain - that really good engineering skill tops out somewhere only moderately above human ability due to instrinsic difficulty, and really good deception skills top out somewhere not enough to subvert the best systems that we could build to do oversight and detect misalignment. (On the other hand, even with these objections being correct, it would only show that control is possible, not that it is likely to occur.)

Live like life might be short. More travel even when it means missing school, more hugs, more things that are fun for them.

Optimizing for both impact and personal fun, I think this is probably directionally good advice for the types of analytic people who think about the long term a lot, regardless of kids. (It's bad advice for people who aren't thinking very long term already, but that's not who is reading this.) 

First, I think this class of work is critical for deconfusion, which is critical if we need a theory for far more powerful AI systems, rather than for very smart but still fundamentally human level systems.

Secondly, concretely, it seems that very few other approaches to safety have the potential to provide enough fundamental understanding to allow us to make strong statements about models before they are fully trained. This seems like a critical issue if we are concerned about very strong models that could pose risks during testing, or possibly even during training. And as far as I'm aware, nothing in the interpretability and auditing spaces has a real claim to be able to make clear statements about those risks, other than perhaps to suggest interim testing during model training - which could work, if a huge amount of such work is done, but seems very unlikely to happen.

Edit to add: Given the votes on this, what specifically do people disagree with?

Ideally, statements should be at least two of true, necessary/useful, and kind. I agree that this didn't breach confidentiality of some sort, and yet I think that people should generally follow a policy where we don't publicize random non-notable people's names without explicit permission when discussing them in a negative light - and the story might attempt to be sympathetic, but it certainly isn't complementary. 

Load More