I am also not an economist and this might be totally off-base, but it seems to me that if there is real innovation and we can in fact do a bunch of new stuff that we couldn't before, then this will be reflected in the nominal GDP numbers going up. For the simple reason that in general people will be more likely to charge more for new and better goods and services rather than charging less for the same old goods and services (that can now be delivered more cheaply).
Do you expect learned ML systems to be updateless?
It seems plausible to me that updatelessness of agents is just as "disconnected from reality" of actual systems as EU maximization. Would you disagree?
It's just that nobody will buy all those cars.
Why would this be true?
Teslas are generally the most popular car in whatever segment they're in. And their automotive gross margins are at 25+%, so they've got room to cut prices if demand lightens a bit.
Add to this that a big tax credit is about to hit for EVs in the US and it's hard for me to see why demand would all-of-a-sudden fall off a cliff.
or truck manufacturers
Note that Tesla has (just) started producing a truck: https://www.tesla.com/semi. And electric trucks stand to benefit the most from self-driving tech, because their marginal cost of operation is lower than gas powered, so you get a bigger benefit from the higher utilization that not having a driver enables.
...But so much depends on how deeply levered they are and how much is already priced in - TSLA could EASILY already be counting on that in their current valuations. If so, it'll kill them if it doesn't happen, but only maintain
It seems to me that, all else equal, the more bullish you are on short-term AI progress, the more likely you should think vision-only self driving will work soon.
And TSLA seems like probably the biggest beneficiary of that if it works.
After reading through the Unifying Grokking and Double Descent paper that LawrenceC linked, it sounds like I'm mostly saying the same thing as what's in the paper.
(Not too surprising, since I had just read Lawrence's comment, which summarizes the paper, when I made mine.)
In particular, the paper describes Type 1, Type 2, and Type 3 patterns, which correspond to my easy-to-discover patterns, memorizations, and hard-to-discover patterns:
...In our model of grokking and double descent, there are three types of patterns learned at different
speeds. Type 1 patterns
So, just don't keep training a powerful AI past overfitting, and it won't grok anything, right? Well, Nanda and Lieberum speculate that the reason it was difficult to figure out that grokking existed isn't because it's rare but because it's omnipresent: smooth loss curves are the result of many new grokkings constantly being built atop the previous ones.
If the grokkings are happening all the time, why do you get double descent? Why wouldn't the test loss just be a smooth curve?
Maybe the answer is something like:
What makes you think that?
If we just look at the next year, they have two new factories (in Berlin and Austin) that have barely started producing cars. All they have to do to have another 50-ish% growth year is to scale up production at those two factories.
There may be some bumps along the way, but I see no reason to think they'll just utterly fail at scaling production at those factories.
Scaling in future years will eventually require new factories, but my understanding is that they're actively looking for new locations.
Their stated goal is to produce 20 ...
Ah, maybe the way to think about it is that if I think I have a 30% chance of success before the merger, then I need to have a 30%+epsilon chance of my goal being chosen after the merger. And my goal will only be chosen if it is estimated to have the higher chance of success.
And so, if we assume that the chosen goal is def going to succeed post-merger (since there's no destructive war), that means I need to have a 30%+epsilon chance that my goal has a >50% chance of success post-merger. Or in other words "a close to 50% probability of success", just as Wei said.
But if these success probabilities were known before the merger, the AI whose goal has a smaller chance of success would have refused to agree to the merger. That AI should only agree if the merger allows it to have a close to 50% probability of success according to its original utility function.
Why does the probability need to be close to 50% for the AI to agree to the merger? Shouldn't its threshold for agreeing to the merger depend on how likely one or the other AI is to beat the other in a war for the accessible universe?
Is there an assumption that the two AIs are roughly equally powerful, and that a both-lose scenario is relatively unlikely?
Btw, some of the best sources of information on TSLA, in my view, are:
Rob is a buy-and-hold retail trader with an optimistic outlook on Tesla. I find him to be remarkably evenhanded and thoughtful. He's especially good at putting daily news stories in the context of the big picture.
Gary comes from a more traditional Wall Street background, but is also a TSLA bull. He tends to be a bit more short-term focused than Rob (I presume because he manages a fund and has to show results each year), but I f...
I continue to like TSLA.
The 50% annual revenue growth that they've averaged over the last 9 years shows no signs of stopping. And their earnings are growing even faster, since turning positive in 2020. (See fun visualization of these phenomena here and here.)
Admittedly, the TTM P/E ratio is currently on the high side, at 50.8. But it's been dropping dramatically every quarter, as Tesla grows into its valuation.
a lot of recent LM progress has been figuring out how to prompt engineer and compose LMs to elicit more capabilities out of them
A deliberate nod?
The assumption means the ballot asks for a ranking of candidates, possibly with ties, and no other information.
Note that this is only true for ranked methods, and not scored methods, like Approval Voting, Star Voting, etc.
There is a brief golden age of science before the newly low-hanging fruit are again plucked and it is only lightning fast in areas where thinking was the main bottleneck, e.g. not in medicine.
Not one of the main points of the post, but FWIW it seems to me that thinking could be considered the main bottleneck for medicine, if we can include simulation and modeling a la AlphaFold as thinking.
My guess is that with sufficient computation you could invent new treatments / drugs that are so overwhelmingly better than what we have now that regulatory or other bot...
Also, was the date in footnote 32 supposed to be 10/6?
The VPT comparison was added in an edit on 12/6, see this comment from Rohin Shah.
Dumb nitpick on an otherwise great post, but FYI you're using "it's" for "its" throughout the post and comments.
But it sure looks like tractable constant time token predictors already capture a bunch of what we often call intelligence, even when those same systems can't divide!
This is crazy! I'm raising my eyebrows right now to emphasize it! Consider also doing so! This is weird enough to warrant it!
Why is this crazy? Humans can't do integer division in one step either.
And no finite system could, for arbitrary integers. So why should we find this surprising at all?
Of course naively, if you hadn't really considered it, it might be surprising. But in hindsight shouldn't we just be saying, "Oh, yeah that makes sense."?
Dumb question — are these the same polytopes as described in Anthropic's recent work here, or different polytopes?
Thanks for the answers!
Correct me if I'm wrong, but it struck while reading this that you can think of a neural network as learning two things at once:
(also each function is a linear transformation)
The 2^N classes would be all the different polytopes, and each function would be the linear transformation that the network implements when a polytope's neurons are on and all others are off.
To me this suggest...
I pointed out the OOM error on Twitter (citing this comment), and Dan has updated the post with a correction.
In a forthcoming report I will estimate how might change as increases. The report will enumerate different sources of text-based data (e.g. publicly-accessible internet text, private social media messages, human conversations, etc), and for each data-source the report will estimate the cost-per-token and the total availability of the data.
The analysis may be tricky to do, but I'd be particularly interested in seeing model-generated data included in this list. I suspect that in practice the way model-builders will get around th...
For instance, Facebook, Google, and Apple will be advantaged, whereas OpenAI, DeepMind, and EleutherAI will be disadvantaged.
Nitpick: Google owns DeepMind, so it doesn't seem right to list DM on the disadvantaged side.
What's a latent/patent in the context of a large language model? "patent" is ungoogleable if you're not talking about intellectual property law.
My money's on: typo.
Can you get anywhere with synthetic data? What happens if you train a model on its own output?
Glad you liked it! I think the ideas are very interesting too, for I think similar reasons to you.
Will be curious to see how much further they go.
As a person who worked on Arbitral, I agree with this.
To put it another way, I would agree that Eliezer has made (what seem to me like) world-historically-significant contributions to understanding and advocating for (against) AI risk.
So, if 2007 Eliezer was asking himself, "Why am I the only one really looking into this?", I think that's a very reasonable question.
But here in 2022, I just don't see this particular post as that significant of a contribution compared to what's already out there.
Imagine that this is v0 of a series of documents that need to evolve into humanity's (/ some specific group's) actual business plan for saving the world.
Why is this v0 and not https://arbital.com/explore/ai_alignment/, or the Sequences, or any of the documents that Evan links to here?
That's part of what I meant to be responding to — not that this post is not useful, but that I don't see what makes it so special compared to all the other stuff that Eliezer and others have already written.
If you disagree with the OP... that's pretty important! Share your thoughts.
Wrote a long comment here. (Which you've seen, but linking since your comment started as a response to me.)
I think he means that there are more points that could be made. (If the points in the post are the training set, can you also produce the points in the held-out test set?)
Separately from whether the plans themselves are safe or dangerous, I think the key question is whether the process that generated the plans is trying to deceive you (so it can break out into the real world or whatever).
If it's not trying to deceive you, then it seems like you can just build in various safeguards (like asking, "is this plan safe?", as well as more sophisticated checks), and be okay.
What do you think of a claim like "most of the intelligence comes from the steps where you do most of the optimization"? A corollary of this is that we particularly want to make sure optimization intensive steps of AI creation are safe WRT not producing intelligent programs devoted to killing us.
This seems probably right to me.
...Example: most of the "intelligence" of language models comes from the supervised learning step. However, it's in-principle plausible that we could design e.g. some really capable general purpose reinforcement learner where the intell
Can you visualize an agent that is not "open-ended" in the relevant ways, but is capable of, say, building nanotech and melting all the GPUs?
FWIW, I'm not sold on the idea of taking a single pivotal act. But, engaging with what I think is the real substance of the question — can we do complex, real-world, superhuman things with non-agent-y systems?
Yes, I think we can! Just as current language models can be prompt-programmed into solving arithmetic word problems, I think a future system could be led to generate a GPU-melting plan, without it needing to be a...
I disagree; I think the agency is necessary to build a really good world-model, one that includes new useful concepts that humans have never thought of.
Without the agency, some of the things that you lose are (and these overlap): Intelligently choosing what to attend to; intelligently choosing what to think about; intelligently choosing what book to re-read and ponder; intelligently choosing what question to ask; ability to learn and use better and better brainstorming strategies and other such metacognitive heuristics.
Why is agency necessary for these thi...
This sounds maybe exactly the same as Tegmark Level 4 Multiverse.
I agree!
For anyone interested in Wolfram's ideas but put off by his style, I encourage you to check out talks by Jonathan Gorard. He's the main collaborator on the "physics project", and strikes me as being more even-handed and less grandiose.
Here are some links:
Case 3 is not safe, because controlling the physical world is a useful way to control the simulation you're in. (E.g., killing all agents in base reality ensures that they'll never shut down your simulation.)
In my mind, this is still making the mistake of not distinguishing the true domain of the agent's utility function from ours.
Whether the simulation continues to be instantiated in some computer in our world is a fact about our world, not about the simulated world.
AlphaGo doesn't care about being unplugged in the middle of a game (unless that dynamic...
-3. I'm assuming you are already familiar with some basics, and already know what 'orthogonality' and 'instrumental convergence' are and why they're true.
I think this is actually the part that I most "disagree" with. (I put "disagree" in quotes, because there are forms of these theses that I'm persuaded by. However, I'm not so confident that they'll be relevant for the kinds of AIs we'll actually build.)
1. The smart part is not the agent-y part
It seems to me that what's powerful about modern ML systems is their ability to do data compression / patter...
GPT-3 does unsupervised learning on text data. Our brains do predictive processing on sensory inputs. My guess (which I'd love to hear arguments against!) is that there's a true and deep analogy between the two, and that they lead to impressive abilities for fundamentally the same reason.
Agree that self-supervised learning powers both GPT-3 updates and human brain world-model updates (details & caveats). (Which isn’t to say that GPT-3 is exactly the same as the human brain world-model—there are infinitely many different possible ML algorithms that all ...
...For example, I claim that while AlphaGo could be said to be agent-y, it does not care about atoms. And I think that we could make it fantastically more superhuman at Go, and it would still not care about atoms. Atoms are just not in the domain of its utility function.
In particular, I don't think it has an incentive to break out into the real world to somehow get itself more compute, so that it can think more about its next move. It's just not modeling the real world at all. It's not even trying to rack up a bunch of wins over time. It's just playing the si
They are not strong evidence that X works.
FWIW, in your comment above you had asked for "any empirical evidence".
I agree that Zvi's story is not "strong evidence", but I don't think that means it "doesn't count" — a data point is a data point, even if inconclusive on its own.
(And I think it's inappropriate to tell someone that a data point "doesn't count" in response to a request for "any empirical evidence". In other words, I agree with your assessment that you were being a little bit of an asshole in that response ;-) )
Yeah, for sure — a great technique for avoiding the Double Illusion of Transparency.
Nitpick: wouldn't this graph be much more natural with the x and y axes reversed? I'd want to input the reduction in log-error over a cheaper compute regime to predict the reduction in log-error over a more expensive one.
Ah, thanks for the clarification!
A fully general tech company is a technology company with the ability to become a world-leader in essentially any industry sector...
Notice here that I’m focusing on a company’s ability to do anything another company can do
To clarify, is this meant to refer to a fixed definition of sectors and what other companies can do as they existed prior to the TCS?
Or is it meant to include FGTCs being able to copy the output of other FGTCs?
I'd assume you mean something like the former, but I think it's worth being explicit about the fact that what sectors exist a...
The main theory we'll end up at, based on the accounting data, is that college costs are driven mainly by a large increase in diversity of courses available, which results in much lower student/faculty ratios, and correspondingly higher costs per student.
The "driven by" wording in the above suggests cause. It makes it sound to me like the increase in course diversity (and decrease in student-faculty ratios) comes first, and the increased cost is the result.
Is that what you meant?
If so, I think that case has not been demonstrated in the post. I'm with Eliez...
[TFP] isn’t measured directly, but calculated as a residual by taking capital and labor increases out of GDP growth.
What does it mean to take out capital increases?
I assume taking out labor increases just means adjusting for a growing population. But isn't the reason that we get economic growth per capita at all because we build new stuff (including intangible stuff like processes and inventions) that enables us to build more stuff more efficiently? And can't all that new stuff be thought of as capital?
Or is what's considered "capital" only a subset of tha...
Regarding all the bottlenecks, I think there is an analogy between gradient descent and economic growth / innovation: when the function is super high-dimensional, it's hard to get stuck in a local optimum.
So even if we stagnate on some dimensions that are currently bottlenecks, we can make progress on everything else (and then eventually the landscape may have changed enough that we can once again make progress on the previously stagnant sectors). This might look like a cost disease, where the stagnant things get more expensive. But that seems like it would go along with high nominal GDP growth rather than low.