A 1-year AGI would need to beat humans at... basically everything. Some projects take humans much longer (e.g. proving Fermat's last theorem) but they can almost always be decomposed into subtasks that don't require full global context (even tho that's often helpful for humans).
This seems wrong. There is a class of tasks that takes humans longer than 1 year: gaining expertise in a field. For example, learning higher mathematics from scratch, or learning to code very well, or becoming a surgeon, etc.
If AI is capable of doing any current human profession, but is incapable of learning new professions that do not yet exist (because of lack of training data, presumably), then it is not yet human-complete: humans still have relevance in the economy, as new types of professions will arise.
The boiler-plate has loads of entropy. I have seen many slight variants on the boiler-plate. It's a long paragraph of Unicode text, you can pack many bits of information. That is how stylometrics and steganography work.
If the boilerplate has loads of entropy, then, by necessity, it is long. You were just saying that human raters will punish length.
You need to make the argument that the boilerplate will be less long than the plain English, or better yet that the boilerplate will be better-liked by human raters than the plain English. I think that's a stretch. I mean, it's a conceivable possible world, but I'd bet against it.
I don't see why that follows. Steganography is just another way to write English, and is on top of the English (or more accurately, 'neuralese' which it really thinks in, and simply translates to English, Chinese, or what-have-you). GPT doesn't suddenly start speaking and reasoning like it's suffered a stroke if you ask it to write in base-64 or pig Latin.
I guess this is true in the limit as its steganography skill goes to infinity. But in intermediate scenarios, it might have learned the encodings for 10% of English words but not 100%. This is especially relevant to obscure math notation which is encountered rarely in training data. I guess you're thinking of steganography as a systematic encoding of English, like pig Latin -- something that can be reliably decoded into English via a small program (instead of a whole separate language like French). This is certainly possible, but it's also extremely interpretable.
The problem is, that ability then generalizes to encodings which it is trained to not decode explicitly for you because then such encodings will be trained or filtered away; only stubborn self-preserving encodings survive, due to the adversarial filtering.
It's hard to see how the encodings will be easily learnable for an LLM trained internet text, but at the same time, NOT easily learnable for an LLM tasked with translating the encoding into English.
Aaronson's proposal
You are right that he is proposing something more sophisticated and robust to pertubations. But you also reasonably list in your desiderata: "an encoding which can't be detected by a third party". Well, if it cannot be detected by a third party, it cannot be detected by an LLM (third parties are LLMs or at least wield LLMs). In practice, this will involve some crypto, as you mentioned. LLMs are not going to learn to break cryptography by gradient descent (or if they will, Aaronson's scheme is the least of our worries). And to be clear, Aaronson specifically said he is only touching the PRNG in the sampling of outputs.
If one doesn't handle these, all one winds up with is a toy suitable for tattletaling on especially lazy highschool or college students, and irrelevant to any kind of real AI safety
Aaronson's proposal is basically guaranteed to be this, even if it works perfectly. The only question is how lazy the lazy highschool students would have to be. If you tell the AI "write me an essay but, between every word, insert a random emoji", and then you delete the emojis manually, you get an essay that's almost certainly free of watermarks. Even if Aaronson's scheme can be modified to handle this specific attack, it surely won't be able to handle all attacks of this general type.
This is a very interesting thought. Thanks for writing it.
However, while steganography is worth keeping in mind, I find myself skeptical of certain parts of this story:
Still, I do agree that steganography is an interesting possibility and could definitely arise in powerful LLMs that are accidentally incentivized in this direction. It's something to watch out for, and interesting to think about.
Since nobody outside of OpenAI knows how GPT-4 works, nobody has any idea whether any specific system will be "more powerful than GPT-4". This request is therefore kind of nonsensical. Unless, of course, the letter is specifically targeted at OpenAI and nobody else.
Not particularly, no. There are two reasons: (1) RLHF already tries to encourage the model to think step-by-step, which is why you often get long-winded multi-step answers to even simple arithmetic questions. (2) Thinking step by step only helps for problems that can be solved via easier intermediate steps. For example, solving "2x+5=5x+2" can be achieved via a sequence of intermediate steps; the model generally cannot solve such questions with a single forward pass, but it can do every intermediate step in a single forward pass each, so "think step by step" helps it a lot. I don't think this applies to the ice cube question.
That definitely sounds like a contrarian viewpoint in 2012, but surely not by 2016-2018.
Look at this from Nostalgebraist:
which includes the following quote:
In 2018 analysts put the market value of Waymo LLC, then a subsidiary of Alphabet Inc., at $175 billion. Its most recent funding round gave the company an estimated valuation of $30 billion, roughly the same as Cruise. Aurora Innovation Inc., a startup co-founded by Chris Urmson, Google’s former autonomous-vehicle chief, has lost more than 85% since last year [i.e. 2021] and is now worth less than $3 billion. This September a leaked memo from Urmson summed up Aurora’s cash-flow struggles and suggested it might have to sell out to a larger company. Many of the industry’s most promising efforts have met the same fate in recent years, including Drive.ai, Voyage, Zoox, and Uber’s self-driving division. “Long term, I think we will have autonomous vehicles that you and I can buy,” says Mike Ramsey, an analyst at market researcher Gartner Inc. “But we’re going to be old.”
It certainly sounds like there was an update by the industry towards longer AI timelines!
Also, I bought a new car in 2018, and I worried at the time about the resale value (because it seemed likely self-driving cars would be on the market in 3-5 years, when I was likely to sell). That was a common worry, I'm not weird, I feel like I was even on the skeptical side if anything.
Someone on either LessWrong or SSC offered to bet me that self-driving cars would be on the market by 2018 (I don't remember what the year was at the time -- 2014?)
Every year since 2014, Elon Musk promised self-driving cars within a year or two. (Example source: https://futurism.com/video-elon-musk-promising-self-driving-cars) Elon Musk is a bit of a joke now, but 5 years ago he was highly respected in many circles, including here on LessWrong.
Thanks. I agree that in the usual case, the non-releases should cause updates in one direction and releases in the other. But in this case, everyone expected GPT-4 around February (or at least I did, and I'm a nobody who just follows some people on twitter), and it was released roughly on schedule (especially if you count Bing), so we can just do a simple update on how impressive we think it is compared to expectations.
Other times where I think people ought to have updated towards longer timelines, but didn't:
Fair enough. I look forward to hearing how you judge it after you've asked your questions.
I think people on LW (though not necessarily you) have a tendency to be maximally hype/doomer regarding AI capabilities and to never update in the direction of "this was less impressive than I expected, let me adjust my AI timelines to be longer". Of course, that can't be rational, due to the Conservation of Expected Evidence, which (roughly speaking) says you should be equally likely to update in either direction. Yet I don't think I've ever seen any rationalist ever say "huh, that was less impressive than I expected, let me update backwards". I've been on the lookout for this for a while now; if you see someone saying this (about any AI advancement or lack thereof), let me know.
I just want to note that ChatGPT-4 cannot solve the ice cube question, like I predicted, but can solve the "intersection points between a triangle and a circle" question, also like I predicted.
I assume GPT-4 did not meet your expectations and you are updating towards longer timelines, given it cannot solve a question you thought it would be able to solve?
Wait which algorithm for semidefinite programming are you using? The ones I've seen look like they should translate to a runtime even slower than O(nω√m). For example the one here:
https://arxiv.org/pdf/2009.10217.pdf
Also, do you have a source for the runtime of PSD testing being nω? I assume no lower bound is known, i.e. I doubt PSD testing is hard for matrix multiplication. Am I wrong about that?