LessWrong developer, rationalist since the Overcoming Bias days. Jargon connoisseur. 


Nutrition is a bad example because it's an area with lots of known unknowns that have close causal connection to human health, and the prior on interventions is bad because the interventions disrupt an existing optimization process that is mostly aligned. That is: random perturbations on human biology are on average bad, because human biology has prior optimization that put it near a local optimum.

The same applies to interventions in economics: they have a high backfire rate because free markets are somewhat-aligned optimization processes.

Perturbations to mosquito biology, on the other hand, are on average good, because mosquito evolution points towards an attractor that's harmful to humans. And humanity is already spending lots of resources trying to eliminate them, and in the locales where this has been successful it's been positive.

You say nutrition is just one of a bazillion examples. I disbelieve. I think that once you've accounted for the presence of preexisting optimization processes in the environment, ruled out the possibility that you're dismantling an aligned process, and have a basic understanding of the domain, backfiring becomes rare.

If suitable biologists can be bought, then yes. I'm not certain how hard that is, though; most biologists are stuck in academic instutitions and ideology which tell them they need an IRB to sign off on their plan, and a lot of IRBs seem to be stuck in an ideology where they can't do cost-benefit analysis and veto everything. It's similar to the problem of organizing a COVID-vaccine challenge trial; the default outcome is that you try to find a grantee to do it, and they chicken out and do preliminary studies forever.

I think gene-drive mosquito control is in the same boat as nuclear power. It's perceived as having safety concerns, but the safety concerns can't be addressed by technological advancement because they're actually anxieties and misconceptions, not real problems. Laypeople seem to believe that ecosystems are fragile things that will spontaneously turn into wastelands if they're perturbed, which would mean that removing mosquitoes is risky, and that genomes are dark magic that will trigger zombie invasions if you do anything complicated with them.

The solution to this isn't progress and thoroughness, it's courage. Let the idiots believe what they will and do the right thing anyways.

Thank you for writing this. I plan on working through it over the next couple weeks to fill in gaps in my previous alignment-related knowledge.

A bug was causing an event-updated emails to get sent to everyone who RSVPed to this event, whenever an additional person RSVPed. That email type is only supposed to be sent for time and location changes. We've deployed a fix; spurious event-updated emails should stop no later than 15 minutes from now.

How good is the power quality, if you're making best-effort power from solar panels with neither a grid tie nor any associated batteries? If it looks like it's working but provides brownout power frequently, I imagine it could damage downstream electronics enough to not be worth it. OTOH the amount of batteries needed to prevent this would would be quite small compared to the amount of batteries needed to power the house for any meaningful duration.

It's worth noting explicitly: the resiliency advantages are larger than the fraction-of-residences-with-panels might suggest, since in the event of an extended blackout, people with island-capable solar can offer battery-recharging services to their friends and neighbors.

The US isn't short on places to live, it's short on places to live that are short drives from the people and businesses you most want to interact with. If you want to found a new city, there are cheaper and more desireable places to do it; the difficulty comes from the fact that very few people want to go somewhere that doesn't already have a large critical mass of people, business and infrastructure already in place.

It's totally feasible to make a (narrow, well defined) computer interface which has zero security vulnerabilities. It's much easier to secure the channel that an interpretability tool passes through than it is to secure the AI itself or its training environment, since the interface has a lot less total complexity. You can't use memory-unsafe programming languages like C, and you can't incorporate certain large pieces of software that you might have wanted to use, but it's not nearly as bad as you seem to think. Tricking the humans into doing dumb things that break the security is still an issue, of course. But no-human-in-the-loop setups, like "shut off the power if a particular plan is detected", don't create an escape channel unless badly designed in ways they have no good reason to be badly designed.

(Homomorphic encryption is not a real outside of thought experiments, and will not become a real thing on any relevant time scale.)

Fyi the link from the jefftk.com version of this post to the LessWrong crosspost is broken -- it has an extra consecutive slash in it, https://lesswrong.com//posts/6jKECJocj8o7nxGmd. It seems to be only this post, not any of the other posts that I checked.

It's looking like the direction things are going is that most AIs are going to have large language models, if not as their entire architecture at least as something that's present in their architecture somewhere. This means that an AI will have access to an enormous amount of data about what the real world should look like, in a form that's infeasible to restrict or ablate. So it will be difficult or impossible to make a simulated environment which can fool it.

Load More