Martin Randall

Posts

Sorted by New

Wiki Contributions

Comments

Sorted by

I agree it's not worldwide. An alternative read is that Japan's GDP in 2013 was ~5 trillion US dollars, and so there were trillions of dollars "at stake" in monetary policy, but that doesn't mean that any particular good policy decision has an expected value in the trillions. If the total difference between good policy and typical policy is +1% GDP growth then these are billion dollar decisions, not trillion dollar decisions.

By contrast "trillions of euros of damage" is wrong (or hyperbole). The EU's GDP is about 5x Japan's but the ECB has stronger constraints on its actions, including its scope for quantitative easing. I expect those also to be billion dollar decisions in general.

If someone doesn't reliably know what Japan's monetary policy currently is, then they probably also don't reliably know what Japan's monetary policy should be. If your map has "you are here" in the wrong place, then your directions are suspect.

If memory serves, I had a convo with some openai (or maybe anthropic?) folks about this in late 2021 or early 2022ish, where they suggested testing whether language models have trouble answering ethical Qs, and I predicted in advance that that'd be no harder than any other sort of Q. As makes me feel pretty good about me being like "yep, that's just not much evidence, because it's just not surprising."

This makes sense. Barnett is talking about an update between 2007 and 2023. GPT-3 was 2020. So by 2021/2022 you had finished making the update and were not surprised further by GPT-4.

Barnett is talking about what GPT-4 can do. GPT-4 is not a superintelligence. Quotes about what superintelligence can do are not relevant.

Where does Barnett say "AI is good at NLP, therefore alignment is easy"? I don't see that claim.

Evidence that MIRI believed "X is hard" is not relevant when discussing whether MIRI believed "Y is hard". Many things are hard about AI Alignment.

In this post Matthew Barnett notices that we updated our beliefs between ~2007 and ~2023. I saw "we" rather than MIRI or "Yudkowsky, Soares, and Bensinger" because I think this was a general update, but also to defuse the defensive reactions I observe in the comments.

What did we change our mind about? Well, in 2007 we thought that safely extracting approximate human values into a convenient format would be impossible. We knew that a superintelligence could do this. But a superintelligence would kill us, so this isn't helpful. We knew that human values are more complex than fake utility functions or magical categories. So we can't hard-code human values into a utility function. So we looked for alternatives like corrigibility.

By 2023, we learned that a correctly trained LLM can extract approximate human values without causing human extinction (yet). This post points to GPT-4 as conclusive evidence, which is fair. But GPT-3 was an important update and many people updated then. I imagine that MIRI and other experts figured it out earlier. This update has consequences for plans to avoid extinction or die with more dignity.

Unfortunately much of the initial commentary was defensive, attacking Barnett for claims he did not make. Yudkowsky placed a disclaimer on Hidden Complexity of Wishes implausibly denying that it is an AI parable. This could be surprising. Yudkowsky's Coming of Age and How to Actually Change Your Mind sequences are excellent. What went wrong?

An underappreciated sub-skill of rationality is noticing that I have, in the past, changed my mind. For me, this is pretty easy when I think back to my teenage years. But I'm in my 40s now, and I find it harder to think of major updates during my 20s and 30s, despite the world changing a lot in that time. Seeing this pattern of defensiveness in other people made me realize that it's probably common, and I probably have it to. I wish I had a guide to middle-aged rationality. In middle-age my experience is supposed to be my value-add, but conveniently forgetting my previous beliefs throws some of that away.

I shall call this the fallacy of magical categories - simple little words that turn out to carry all the desired functionality of the AI.  Why not program a chess-player by running a neural network (that is, a magical category-absorber) over a set of winning and losing sequences of chess moves, so that it can generate "winning" sequences?  Back in the 1950s it was believed that AI might be that simple, but this turned out not to be the case.

And then in the 2020s it turned out to be the case again! Eg ChessGPT. Today I learned that Stockfish is now a neural network (trained on board positions, not move sequences).

This is no way cuts against the point of this post, but it stood out when I read this 16 years after it was posted.

This is good news because this is more in line with my original understanding of your post. It's a difficult topic because there are multiple closely related problems of varying degrees of lethality and we had updates on many of them between 2007 and 2023. I'm going to try to put the specific update you are pointing at into my own words.

From the perspective of 2007, we don't know if we can lossilly extract human values into a convenient format using human intelligence and safe tools. We know that a superintelligence can do it (assuming that "human values" is meaningful), but we also know that if we try to do this with an unaligned superintelligence then we all die.

If this problem is unsolvable then we potentially have to create a seed AI using some more accessible value, such as corrigibility, and try to maintain that corrigibility as we ramp up intelligence. This then leads us to the problem of specifying corrigibility, and we see "Corrigibility is anti-natural to consequentialist reasoning" on List of Lethalities.

If this problem is solvable then we can use human values sooner and this gives us other options. Maybe we can find a basin of attraction around human values for example.

The update between 2007 and 2023 is that the problem appears solvable. GPT-4 is a safe tool (it exists and we aren't extinct yet) and does a decent job. A more focused AI could do the task better without being riskier.

This does not mean that we are not going to die. Yudkowsky has 43 items on List of Lethalities. This post addresses part of item 24. The remaining items are sufficient to kill us ~42.5 times. It's important to be able to discuss one lethality at a time if we want to die with dignity.

My read of older posts from Yudkowsky is that he anticipated a midrange level of complexity of human values, compared to your scale of simple mathematical function to perfect simulation of human experts.

Yudkowsky argued against very low complexity human values in a few places. There's an explicit argument against Fake Utility Functions that are simple mathematical functions. The Fun Theory Sequence is too big if human values are a 100 line python program.

But also Yudkowsky's writing is incompatible with extremely complicated human values that require a perfect simulation of human experts to address. This argument is more implicit, I think because that was not a common position. Look at Thou Art Godshatter and how it places the source of human values in the human genome, downstream of the "blind idiot god" of Evolution. If true, human values must be far less complicated than the human genome.

GPT-4 is about 1,000x bigger than the human genome. Therefore when we see that GPT-4 can represent human values with high fidelity this is not a surprise to Godshatter Theory. It will be surprising if we see that very small AI models, much smaller than the human genome, can represent human values accurately.

Disclaimers: I'm not replying to the thread about fragility of value, only complexity. I disagree with Godshatter Theory on other grounds. I agree that it is a small positive update that human values are less complex than GPT-4.

The example in the post below is not about an Artificial Intelligence literally at all! If the post were about what AIs supposedly can't do, the central example would have used an AI!

Contra this assertion, Yudkowksy-2007 was very capable of using parables. The "genie" in this article is easily recognized as metaphorically referring to an imagined AI. For example, here is Yudkowsky-2007 in Lost Purposes, linking here:

I have seen many people go astray when they wish to the genie of an imagined AI, dreaming up wish after wish that seems good to them, sometimes with many patches and sometimes without even that pretense of caution.

Similarly, portions of Project Lawful are about AI, That Alien Message is about AI, and so forth.

I'm very sympathetic to the claim that this parable has been misinterpreted. This is a common problem with parables! They are good at provoking thought, they are good at strategic ambiguity, and they are bad at clear communication.

I'm not sympathetic to the claim that this post is not about AI literally at all.

Load More