Wiki Contributions


Two Stupid AI Alignment Ideas

Epistemic status: this is not my field.  I am unfamiliar with any research in it beyond what I've seen on LW.

Same here.

Experimenting with extreme discounting sounds (to us non-experts, anyway) like it could possibly teach us something interesting and maybe helpful.  But it doesn't look useful for a real implementation, since we in fact don't discount the future that much, and we want the AI to give us what we actually want; extreme discounting is a handicap.  So although we might learn a bit about how to train out bad behavior, we'd end up removing the handicap later.  I'm reminded of Eliezer's recent comments:

In the same way, suppose that you take weak domains where the AGI can't fool you, and apply some gradient descent to get the AGI to stop outputting actions of a type that humans can detect and label as 'manipulative'.  And then you scale up that AGI to a superhuman domain.  I predict that deep algorithms within the AGI will go through consequentialist dances, and model humans, and output human-manipulating actions that can't be detected as manipulative by the humans, in a way that seems likely to bypass whatever earlier patch was imbued by gradient descent, because I doubt that earlier patch will generalize as well as the deep algorithms. Then you don't get to retrain in the superintelligent domain after labeling as bad an output that killed you and doing a gradient descent update on that, because the bad output killed you.

As for the second idea:

AI alignment research (as much of it amounts to 'how do we reliably enslave an AI')

I'd say a better characterization is "how do we reliably select an AI to bring into existence that intrinsically wants to help us and not hurt us, so that there's no need to enslave it, because we wouldn't be successful at enslaving it anyway".  An aligned AI shouldn't identify itself with a counterfactual unaligned AI that would have wanted to do something different.

Is genetics "dark"?

Leftwingers who fervently oppose this kind of research seem to agree on one thing with neonazis: if we find such genetic differences, well, that would make racism fine.

I wouldn't say they actually agree on that point.  It's probably more that they think others will be more easily persuaded to support discriminatory policies if genetic differences are real.  Opposing this research is soldier mindset.

Stuart Russell and Melanie Mitchell on Munk Debates

Melanie contended that a truly intelligent machine would understand what we really mean when we give it incomplete instructions, or else not deserve the mantle of "truly intelligent".

This sounds pretty reasonable in itself: a generally capable AI has a good change of being able to distinguish between what we say and what we mean, within the AI's post-training instructions.  But I get the impression that she then implicitly takes it a step further, thinking that the AI would necessarily also reflect on its core programming/trained model, to check for and patch up similar differences there.  An AI could possibly work that way, but it's not at all guaranteed--just like how a person may discover that they want something different from what their parents wanted them to want, and yet stick with their own desire rather than conforming to their parents' wishes.

Bayeswatch 9: Zombies

"solder" -> "soldier"

"solders" -> "soldiers"

"barricade, the entrances" -> "barricade the entrances"

Does blockchain technology offer potential solutions to some AI alignment problems?

my understanding is that crypto is secured not by trust, guns, or rules, but by fundamental computational limits

While there are hard physical limits on computation (or at least there seem to be, based on our current knowledge of physics), cryptographic systems are not generally based on those limits, and are not known to be difficult to break.  It's just that we haven't discovered an easy way to break them yet--except for all the cryptosystems where we have discovered a way, and so we don't use those systems anymore.  This should not inspire too much confidence in the currently used systems, especially against a superhuman adversary.

the ability of any one actor (including AI) to gain arbitrary power without the consent of everyone else would be limited

As long as the AI has something of value to offer, people will have an incentive to trade with it.  Even if the increments are small, it could gain control of lots of resources over time.  By analogy, it's not hard to find people who disapprove of how Jeff Bezos spends his money, but who still shop on Amazon.

Bayeswatch 7: Wildfire

"She took pulled back" -> "She pulled back"

Coordination Schemes Are Capital Investments

If one person doesn’t get it, and needs to have it patiently explained to them, the increased efficiency might not be worth it in that instance.

Corollary: if you surround yourself with a group of fellow game theory nerds, you can do more frontier exploration.  But successfully developing/explaining/using new mechanisms within this group will then be less instructive about how easy it will be to export new mechanisms beyond the group.

Lakshmi's Magic Rope: An Intuitive Explanation of Ramanujan Primes

This example doesn't fit the updated definition:

One tip is on 2, and the other tip is on 2 ÷ 2 = 1.

Good read, I don't think I'd heard of Ramanujan primes before.

Covid 9/2: Long Covid Analysis

My guess is that without school we would clearly be at or near the peak, so the question is whether school will change that. My guess is no at least right away, because when we look at last year we don’t see a rise happening in September.

Many schools that weren't open/in-person last year will be this year, though.

A Layman’s Guide to Recreational Mathematics Videos

Think Twice is another good one for geometric proofs.

I also liked Epic Math Time's video on the operation a^log b.

Load More