# All of ESRogs's Comments + Replies

Regarding all the bottlenecks, I think there is an analogy between gradient descent and economic growth / innovation: when the function is super high-dimensional, it's hard to get stuck in a local optimum.

So even if we stagnate on some dimensions that are currently bottlenecks, we can make progress on everything else (and then eventually the landscape may have changed enough that we can once again make progress on the previously stagnant sectors). This might look like a cost disease, where the stagnant things get more expensive. But that seems like it would go along with high nominal GDP growth rather than low.

I am also not an economist and this might be totally off-base, but it seems to me that if there is real innovation and we can in fact do a bunch of new stuff that we couldn't before, then this will be reflected in the nominal GDP numbers going up. For the simple reason that in general people will be more likely to charge more for new and better goods and services rather than charging less for the same old goods and services (that can now be delivered more cheaply).

3NickyP8d
Yeah, though I think it depends on how many people are able to buy the new goods at a better price. If most well-paid employees (ie: the employees that companies get the most value from automating) no longer have a job, then the number of people who can buy the more expensive goods and services might go down. It seems counter-intuitive to me that GDP if the number of people who lost their jobs is high enough. It feels possible that the recent tech developments was barely net positive to nominal GDP despite rapid improvements, and that fast enough technological process could cause nominal GDP to go in the other direction.
4ESRogs8d
Regarding all the bottlenecks, I think there is an analogy between gradient descent and economic growth / innovation: when the function is super high-dimensional, it's hard to get stuck in a local optimum. So even if we stagnate on some dimensions that are currently bottlenecks, we can make progress on everything else (and then eventually the landscape may have changed enough that we can once again make progress on the previously stagnant sectors). This might look like a cost disease, where the stagnant things get more expensive. But that seems like it would go along with high nominal GDP growth rather than low.

Do you expect learned ML systems to be updateless?

It seems plausible to me that updatelessness of agents is just as "disconnected from reality" of actual systems as EU maximization. Would you disagree?

4Scott Garrabrant1mo
No, at least probably not at the time that we lose all control. However, I expect that systems that are self-transparent and can easily sellf-modify might quickly converge to reflective stability (and thus updatelessness). They might not, but I think the same arguments that might make you think they would develop a utility function also can be used to argue that they would develop updatelessness (and thus possibly also not develop a utility function).

It's just that nobody will buy all those cars.

Why would this be true?

Teslas are generally the most popular car in whatever segment they're in. And their automotive gross margins are at 25+%, so they've got room to cut prices if demand lightens a bit.

Add to this that a big tax credit is about to hit for EVs in the US and it's hard for me to see why demand would all-of-a-sudden fall off a cliff.

or truck manufacturers

Note that Tesla has (just) started producing a truck: https://www.tesla.com/semi. And electric trucks stand to benefit the most from self-driving tech, because their marginal cost of operation is lower than gas powered, so you get a bigger benefit from the higher utilization that not having a driver enables.

But so much depends on how deeply levered they are and how much is already priced in - TSLA could EASILY already be counting on that in their current valuations.  If so, it'll kill them if it doesn't happen, but only maintain

...

It seems to me that, all else equal, the more bullish you are on short-term AI progress, the more likely you should think vision-only self driving will work soon.

And TSLA seems like probably the biggest beneficiary of that if it works.

5Dagon2mo
I suspect trucking companies (or truck manufacturers, or maybe logistics companies that suck up all the surplus from truckers) are the biggest beneficiaries. But so much depends on how deeply levered they are and how much is already priced in - TSLA could EASILY already be counting on that in their current valuations. If so, it'll kill them if it doesn't happen, but only maintain if it does. A better plan might be to short (or long-term puts on) the companies you think will be hurt by the things you're predicting.

After reading through the Unifying Grokking and Double Descent paper that LawrenceC linked, it sounds like I'm mostly saying the same thing as what's in the paper.

(Not too surprising, since I had just read Lawrence's comment, which summarizes the paper, when I made mine.)

In particular, the paper describes Type 1, Type 2, and Type 3 patterns, which correspond to my easy-to-discover patterns, memorizations, and hard-to-discover patterns:

In our model of grokking and double descent, there are three types of patterns learned at different
speeds. Type 1 patterns

...

So, just don't keep training a powerful AI past overfitting, and it won't grok anything, right? Well, Nanda and Lieberum speculate that the reason it was difficult to figure out that grokking existed isn't because it's rare but because it's omnipresent: smooth loss curves are the result of many new grokkings constantly being built atop the previous ones.

If the grokkings are happening all the time, why do you get double descent? Why wouldn't the test loss just be a smooth curve?

Maybe the answer is something like:

1. The model is learning generalizable patterns
...
2ESRogs2mo
After reading through the Unifying Grokking and Double Descent [https://drive.google.com/file/d/1M0IBM0j8PbwwqQ_JNJqm5Mfms3ENOSqY/view] paper that LawrenceC linked [https://www.lesswrong.com/posts/fytgZ26AgxmrAdyB4/mesa-optimizers-via-grokking?commentId=Eu9nYL6fqB5RkqbAb] , it sounds like I'm mostly saying the same thing as what's in the paper. (Not too surprising, since I had just read Lawrence's comment, which summarizes the paper, when I made mine.) In particular, the paper describes Type 1, Type 2, and Type 3 patterns, which correspond to my easy-to-discover patterns, memorizations, and hard-to-discover patterns: The one thing I mention above that I don't see in the paper is an explanation for why the Type 2 patterns would be intermediate in learnability between Type 1 and Type 3 patterns or why there would be a regime where they dominate (resulting in overfitting). My proposed explanation is that, for any given task, the exact mappings from input to output will tend to have a characteristic complexity, which means that they will have a relatively narrow distribution of learnability. And that's why models will often hit a regime where they're mostly finding those patterns rather than Type 1, easy-to-learn heuristics (which they've exhausted) or Type 3, hard-to-learn rules (which they're not discovering yet). The authors do have an appendix section A.1 in the paper with the heading, "Heuristics, Memorization, and Slow Well-Generalizing", but with "[TODO]"s in the text. Will be curious to see if they end up saying something similar to this point (about input-output memorizations tending to have a characteristic complexity) there.

What makes you think that?

If we just look at the next year, they have two new factories (in Berlin and Austin) that have barely started producing cars. All they have to do to have another 50-ish% growth year is to scale up production at those two factories.

There may be some bumps along the way, but I see no reason to think they'll just utterly fail at scaling production at those factories.

Scaling in future years will eventually require new factories, but my understanding is that they're actively looking for new locations.

Their stated goal is to produce 20 ...

1Bernhard1mo
Oh they'll scale just fine. It's just that nobody will buy all those cars. They are already not selling them all, and we are about to enter the biggest recession of many of our lifetimes

Ah, maybe the way to think about it is that if I think I have a 30% chance of success before the merger, then I need to have a 30%+epsilon chance of my goal being chosen after the merger. And my goal will only be chosen if it is estimated to have the higher chance of success.

And so, if we assume that the chosen goal is def going to succeed post-merger (since there's no destructive war), that means I need to have a 30%+epsilon chance that my goal has a >50% chance of success post-merger. Or in other words "a close to 50% probability of success", just as Wei said.

But if these success probabilities were known before the merger, the AI whose goal has a smaller chance of success would have refused to agree to the merger. That AI should only agree if the merger allows it to have a close to 50% probability of success according to its original utility function.

Why does the probability need to be close to 50% for the AI to agree to the merger? Shouldn't its threshold for agreeing to the merger depend on how likely one or the other AI is to beat the other in a war for the accessible universe?

Is there an assumption that the two AIs are roughly equally powerful, and that a both-lose scenario is relatively unlikely?

4Slider2mo
It is first past the post, minorities get nothing. There might be an implicit assumption that the created new agent agrees with probablities with the old agents. 49% plausible papperclips, 51% plausible staples will act 100% staple and does not serve at all for paperclips.

Btw, some of the best sources of information on TSLA, in my view, are:

1. the Tesla Daily podcast, with Rob Maurer

Rob is a buy-and-hold retail trader with an optimistic outlook on Tesla. I find him to be remarkably evenhanded and thoughtful. He's especially good at putting daily news stories in the context of the big picture.

Gary comes from a more traditional Wall Street background, but is also a TSLA bull. He tends to be a bit more short-term focused than Rob (I presume because he manages a fund and has to show results each year), but I f...

I continue to like TSLA.

The 50% annual revenue growth that they've averaged over the last 9 years shows no signs of stopping. And their earnings are growing even faster, since turning positive in 2020. (See fun visualization of these phenomena here and here.)

Admittedly, the TTM P/E ratio is currently on the high side, at 50.8. But it's been dropping dramatically every quarter, as Tesla grows into its valuation.

1Bernhard2mo
What makes you think that? I am of the completely opposite opinion, and would be amazed if they are able to repeat that even for a single year longer. All the "creative" bookkeeping only work for so long, and right now seems to be the moment to pop bubbles, no?
2ESRogs2mo
Btw, some of the best sources of information on TSLA, in my view, are: 1. the Tesla Daily podcast [https://www.youtube.com/@TeslaDaily], with Rob Maurer 2. Gary Black [https://twitter.com/garyblack00] on Twitter Rob is a buy-and-hold retail trader with an optimistic outlook on Tesla. I find him to be remarkably evenhanded and thoughtful. He's especially good at putting daily news stories in the context of the big picture. Gary comes from a more traditional Wall Street background, but is also a TSLA bull. He tends to be a bit more short-term focused than Rob (I presume because he manages a fund and has to show results each year), but I find his takes helpful for understanding how institutional investors are likely to be perceiving events.

There are also some solutions discussed here and here. Though I'd assume Scott G is familiar with those and finds them unsatisfactory.

a lot of recent LM progress has been figuring out how to prompt engineer and compose LMs to elicit more capabilities out of them

A deliberate nod?

The assumption means the ballot asks for a ranking of candidates, possibly with ties, and no other information.

Note that this is only true for ranked methods, and not scored methods, like Approval Voting, Star Voting, etc.

4Ben Pace3mo
Checking my understanding here: is this assumption ruling out quadratic voting, which asks for a weighted-ranking of candidates?

There is a brief golden age of science before the newly low-hanging fruit are again plucked and it is only lightning fast in areas where thinking was the main bottleneck, e.g. not in medicine.

Not one of the main points of the post, but FWIW it seems to me that thinking could be considered the main bottleneck for medicine, if we can include simulation and modeling a la AlphaFold as thinking.

My guess is that with sufficient computation you could invent new treatments / drugs that are so overwhelmingly better than what we have now that regulatory or other bot...

Also, was the date in footnote 32 supposed to be 10/6?

The VPT comparison was added in an edit on 12/6, see this comment from Rohin Shah.

2jacob_cannell4mo
Haha yeah wow.

Dumb nitpick on an otherwise great post, but FYI you're using "it's" for "its" throughout the post and comments.

2ESRogs4mo
Also, was the date in footnote 32 supposed to be 10/6?
5jacob_cannell4mo
Oh yeah thanks - apparently my linguistic cortex tends to collapse those two, have to periodically remember to search and fix.

But it sure looks like tractable constant time token predictors already capture a bunch of what we often call intelligence, even when those same systems can't divide!

This is crazy! I'm raising my eyebrows right now to emphasize it! Consider also doing so! This is weird enough to warrant it!

Why is this crazy? Humans can't do integer division in one step either.

And no finite system could, for arbitrary integers. So why should we find this surprising at all?

Of course naively, if you hadn't really considered it, it might be surprising. But in hindsight shouldn't we just be saying, "Oh, yeah that makes sense."?

4porby4mo
A constant time architecture failing to divide arbitrary integers in one step isn't surprising at all. The surprising part is being able to do all the other things with the same architecture. Those other things are apparently computationally simple. Even with the benefit of hindsight, I don't look back to my 2015 self and think, "how silly I was being! Of course this was possible!" 2015-me couldn't just look at humans and conclude that constant time algorithms would include a large chunk of human intuition or reasoning. It's true that humans tend to suck at arbitrary arithmetic, but we can't conclude much from that. Human brains aren't constant time- they're giant messy sometimes-cyclic graphs where neuronal behavior over time is a critical feature of its computation. Even when the brain is working on a problem that could obviously be solved in constant time, the implementation the brain uses isn't the one a maximally simple sequential constant time program would use (even if you could establish a mapping between the two). And then there's savants. Clearly, the brain's architecture can express various forms of rapid non-constant time calculation. Most of us just don't work that way by default, and most of the rest of us don't practice it. Even 2005-me did think that intelligence was much easier than the people claiming "AI is impossible!" and so on, but I don't see how I could have strongly believed at that point that it was going to be this easy.

Dumb question — are these the same polytopes as described in Anthropic's recent work here, or different polytopes?

4Lee Sharkey4mo
No, they exist in different spaces: Polytopes in our work are in activation space whereas in their work the polytopes are in the model weights (if I understand their work correctly).

Correct me if I'm wrong, but it struck while reading this that you can think of a neural network as learning two things at once:

1. a classification of the input into 2^N different classes (where N is the total number of neurons), each of which gets a different function applied to it
2. those functions themselves

(also each function is a linear transformation)

The 2^N classes would be all the different polytopes, and each function would be the linear transformation that the network implements when a polytope's neurons are on and all others are off.

To me this suggest...

4Lee Sharkey4mo
Thanks for your interest in our post and your questions! That seems right! It seems possible to come up with other schemes that do this; it just doesn’t seem easy to come up with something that is competitive with neural nets. If I recall correctly, there’s work in previous decades (which I’m struggling to find right now, although it's easy to find similar more modern work e.g. https://pubmed.ncbi.nlm.nih.gov/23272922/ [https://pubmed.ncbi.nlm.nih.gov/23272922/] ) that builds a nonlinear dynamical system using N linear regions centred on N points. This work models a dynamical system, but there's no reason we can't just use the same principles for purely feedforward networks. The dynamics of the system are defined by whichever point the current state is closest to. The linear regions can have whatever dynamics you want. But then you’d have to store and look up N matrices, which isn’t great when N is large! I guess this depends on what you mean by ‘power’. I’m not sure! On the one hand, we could measure the dissimilarity of the transformations as the Frobenius norm (i.e. distance in matrix-space) of the difference matrix between linearized transformations on both sides of a polytope boundary. On the other hand, this difference can be arbitrarily large if the weights of our model are unbounded, because crossing some polytope boundaries might mean that a neuron with arbitrarily large weights turns on or off.

I pointed out the OOM error on Twitter (citing this comment), and Dan has updated the post with a correction.

In a forthcoming report I will estimate how  might change as  increases. The report will enumerate different sources of text-based data (e.g. publicly-accessible internet text, private social media messages, human conversations, etc), and for each data-source the report will estimate the cost-per-token and the total availability of the data.

The analysis may be tricky to do, but I'd be particularly interested in seeing model-generated data included in this list. I suspect that in practice the way model-builders will get around th...

Nitpick: Google owns DeepMind, so it doesn't seem right to list DM on the disadvantaged side.

2arabaga4mo
To be precise, Alphabet owns DeepMind. Google and DeepMind are sister companies. So it's possible for something to benefit Google without benefiting DeepMind, or vice versa.
3Cleo Nardo4mo
Google owns DeepMind, but it seems that there is little flow of information back and forth. Example 1: GoogleBrain spent approximately $12M to train PaLM, and$9M was wasted on suboptimal training because DeepMind didn't share the Hoffman2022 results with them. Example 2: I'm not a lawyer, but I think it would be illegal for Google to share any of its non-public data with DeepMind.

What's a latent/patent in the context of a large language model? "patent" is ungoogleable if you're not talking about intellectual property law.

My money's on: typo.

Can you get anywhere with synthetic data? What happens if you train a model on its own output?

Glad you liked it! I think the ideas are very interesting too, for I think similar reasons to you.

Will be curious to see how much further they go.

4Kenny8mo
I'm very satisfied at making this post, if only from being pointed at those videos!

Less antagonist.

Should be "less antagonistic", unless you're talking about the bad guy in a story.

3acylhalide8mo
Edited!

As a person who worked on Arbitral, I agree with this.

To put it another way, I would agree that Eliezer has made (what seem to me like) world-historically-significant contributions to understanding and advocating for (against) AI risk.

So, if 2007 Eliezer was asking himself, "Why am I the only one really looking into this?", I think that's a very reasonable question.

But here in 2022, I just don't see this particular post as that significant of a contribution compared to what's already out there.

Imagine that this is v0 of a series of documents that need to evolve into humanity's (/ some specific group's) actual business plan for saving the world.

Why is this v0 and not https://arbital.com/explore/ai_alignment/, or the Sequences, or any of the documents that Evan links to here?

That's part of what I meant to be responding to — not that this post is not useful, but that I don't see what makes it so special compared to all the other stuff that Eliezer and others have already written.

6ESRogs8mo
To put it another way, I would agree that Eliezer has made (what seem to me like) world-historically-significant contributions to understanding and advocating for (against) AI risk. So, if 2007 Eliezer was asking himself, "Why am I the only one really looking into this?", I think that's a very reasonable question. But here in 2022, I just don't see this particular post as that significant of a contribution compared to what's already out there.

If you disagree with the OP... that's pretty important! Share your thoughts.

Wrote a long comment here. (Which you've seen, but linking since your comment started as a response to me.)

I think he means that there are more points that could be made. (If the points in the post are the training set, can you also produce the points in the held-out test set?)

Separately from whether the plans themselves are safe or dangerous, I think the key question is whether the process that generated the plans is trying to deceive you (so it can break out into the real world or whatever).

If it's not trying to deceive you, then it seems like you can just build in various safeguards (like asking, "is this plan safe?", as well as more sophisticated checks), and be okay.

What do you think of a claim like "most of the intelligence comes from the steps where you do most of the optimization"? A corollary of this is that we particularly want to make sure optimization intensive steps of AI creation are safe WRT not producing intelligent programs devoted to killing us.

This seems probably right to me.

Example: most of the "intelligence" of language models comes from the supervised learning step. However, it's in-principle plausible that we could design e.g. some really capable general purpose reinforcement learner where the intell

...

Can you visualize an agent that is not "open-ended" in the relevant ways, but is capable of, say, building nanotech and melting all the GPUs?

FWIW, I'm not sold on the idea of taking a single pivotal act. But, engaging with what I think is the real substance of the question — can we do complex, real-world, superhuman things with non-agent-y systems?

Yes, I think we can! Just as current language models can be prompt-programmed into solving arithmetic word problems, I think a future system could be led to generate a GPU-melting plan, without it needing to be a...

4David Johnston8mo
FWIW, I'd call this "weakly agentic" in the sense that you're searching through some options, but the number of options you're looking through is fairly small. It's plausible that this is enough to get good results and also avoid disasters, but it's actually not obvious to me. The basic reason: if the top 1000 plans are good enough to get superior performance, they might also be "good enough" to be dangerous. While it feels like there's some separation between "useful and safe" and "dangerous" plans and this scheme might yield plans all of the former type, I don't presently see a stronger reason to believe that this is true.
2TekhneMakre8mo
>then rate them by feasibility, I mean, literal GPT is just going to have poor feasibility ratings for novel engineering concepts. >Do any of those steps really require that you have a utility function or that you're a goal-directed agent? Yes, obviously. You have to make many scientific and engineering discoveries, which involves goal-directed investigation. > Are our own thought processes making use of goal-directedness more than I realize? Yes, you know which ideas make sense by generalizing from ideas more closely tied in with the actions you take directed towards living.

I disagree; I think the agency is necessary to build a really good world-model, one that includes new useful concepts that humans have never thought of.

Without the agency, some of the things that you lose are (and these overlap): Intelligently choosing what to attend to; intelligently choosing what to think about; intelligently choosing what book to re-read and ponder; intelligently choosing what question to ask; ability to learn and use better and better brainstorming strategies and other such metacognitive heuristics.

Why is agency necessary for these thi...

7Steven Byrnes8mo
(Copying from here [https://www.lesswrong.com/posts/SzrmsbkqydpZyPuEh/my-take-on-vanessa-kosoy-s-take-on-agi-safety] :) (Does that count as “agency”? I don’t know, it depends on what you mean by “agency”.) In terms of the “task decomposition” strategy, this might be a tricky to discuss because you probably have a more detailed picture in your mind than I do. I’ll try anyway. It seems to me that the options are: (1) the subprocess only knows its narrow task (“solve this symplectic geometry homework problem”), and is oblivious to the overall system goal (“design a better microscope”), or (2) the subprocess is aware of the overall system goal and chooses actions in part to advance it. In Case (2), I’m not sure this really counts as “task decomposition” in the first place, or how this would help with safety. In Case (1), yes I expect systems to hit a hard wall—I’m skeptical that tasks we care about decompose cleanly. For example, at my last job, I would often be part of a team inventing a new gizmo, and it was not at all unusual for me to find myself sketching out the algorithms and sketching out the link budget and scrutinizing laser spec sheets and scrutinizing FPGA spec sheets and nailing down end-user requirements, etc. etc. Not because I’m individually the best person at each of those tasks—or even very good!—but because sometimes a laser-related problem is best solved by switching to a different algorithm, or an FPGA-related problem is best solved by recognizing that the real end-user requirements are not quite what we thought, etc. etc. And that kind of design work is awfully hard unless a giant heap of relevant information and knowledge is all together in a single brain / world-model. In the case of my current job doing AI alignment research, I sometimes come across small self-contained tasks that could be delegated, but I would have no idea how to decompose most of what I do. (E.g. writing this comment!) Here’s John Wentworth making a similar point m

For anyone interested in Wolfram's ideas but put off by his style, I encourage you to check out talks by Jonathan Gorard. He's the main collaborator on the "physics project", and strikes me as being more even-handed and less grandiose.

3Kenny8mo
(Sorry for replying so much to your comment!) More notes about the second video: * They've used the Wolfram model (and some additional "mathematical technology" they developed) to compute an "entanglement entropy" that agrees ("exactly") with the calculations using "path integrals using standard causal set theoretic techniques" – the latter tho seems to be a quantum field theory mathematical formalism that stills being developed * The particle physics is still "embryonic" – they have conjectures about particles being 'persistent tangles in graphs/networks', and some suggestive toy models, but no scattering amplitudes that can be calculated yet, and an estimate '5-6' mathematical milestones remain before they reach things like that * One of the hosts asks about 'emergence' (which seems a little 'cringe' to my old ears; I liked the idea, but it's pretty simple on its own, and was heavily abused as a marketing buzzword) – Gorard's answer is wonderful tho; * The other host mentioned that the computational focus of the theory seemed 'correct', and something that was overdue in physics education in his opinion – I don't think they've read NKS! They'd probably like it. * The field/gauge theory connections are preliminary but promising; they matched some calculation for electromagnetism (for a "Dirac monopole") but haven't completed others. * They're some interesting discussion of 'avoiding curve fitting' – "a good model is one where everything that can be emergent is emergent" * There's some experimental investigations ongoing (or were as of October 2021) in "dimension perturbations" in the early universe and "dimension perturbations" and their effect on the propagation of light (for astrophysics); the hope for the latter is to be ably to compute/calculate predictions of the effects of "small scale dimension perturbations". [One aspect of the theory is that spacetime is expected to (or just could?) be of
3Kenny8mo
The second video is really interesting! Jonathan Gorard, discussing the math, is very convincing. He said some very intriguing things about practical benefits with the theory for "quantum computation optimization", e.g. "circuit simplification for quantum computers" (for experiments or simulations). His description of quantum computing, using the 'multiway systems', as 'a statistical ensemble of inputs, on which the multiway system then performs all possible computations, producing a statistic ensemble of outputs'. In the first video, Gorard stated that the "worst case" outcome of the project, in his view, would be a bunch of really cool math/computation. I think they might be a good bit past that already. (The first two videos were recorded about five months apart.) Another great quote from the second video: 'multiway systems give you something like a path integral approach to computation'. That's something. (I don't really know what, but it seems cool!) (My math creds are a BA (and one graduate seminar class) and being generally interested. I've one some very amateurish 'original math' that was almost certainly independent re-discovery, to the extent I finished any of it. I make a living via classical computer programming.) Before just what of the second video I've now watched, I didn't think quantum computing would ever 'really work'. (I don't think 'quantum supremacy' has been definitely demonstrated still?) A big part of that was due to intuitions I picked up from reading NKS. That is very interesting that NKS+ is what has now convinced me that it probably will be working and practical. I excuse myself as having been driven mad learning about the continuity of the real numbers! (I just didn't think our universe could be made of real numbers!) So, the Wolfram Physics theory is: discrete ("quantized"), computational, multiway ('many worlds', "path integral", 'statistical'), and Jonathan Gorard seems like a legit mathematician/computer-scientist I like th
4Kenny8mo
The first video (Eigenbros episode 117) is great – Jonathan Gorard shares a lot of interesting details! He does in fact seem like a much more 'standard degree' of grandiose :)
1Kenny8mo
Thanks! I'm going to add them to my 'read later' list now.

Case 3 is not safe, because controlling the physical world is a useful way to control the simulation you're in. (E.g., killing all agents in base reality ensures that they'll never shut down your simulation.)

In my mind, this is still making the mistake of not distinguishing the true domain of the agent's utility function from ours.

Whether the simulation continues to be instantiated in some computer in our world is a fact about our world, not about the simulated world.

AlphaGo doesn't care about being unplugged in the middle of a game (unless that dynamic...

6Rob Bensinger8mo
What if the programmers intervene mid-game to give the other side an advantage? Does a Go AGI, as you're thinking of it, care about that? I'm not following why a Go AGI (with the ability to think about the physical world, but a utility function that only cares about states of the simulation) wouldn't want to seize more hardware, so that it can think better and thereby win more often in the simulation; or gain control of its hardware and directly edit the simulation so that it wins as many games as possible as quickly as possible. Why would having a utility function that only assigns utility based on X make you indifferent to non-X things that causally affect X? If I only terminally cared about things that happened a year from now, I would still try to shape the intervening time because doing so will change what happens a year from now. (This is maybe less clear in the case of shutdown, because it's not clear how an agent should think about shutdown if its utility is defined states of its simulation. So I'll set that particular case aside.)
ESRogs8moΩ15557

-3.  I'm assuming you are already familiar with some basics, and already know what 'orthogonality' and 'instrumental convergence' are and why they're true.

I think this is actually the part that I most "disagree" with. (I put "disagree" in quotes, because there are forms of these theses that I'm persuaded by. However, I'm not so confident that they'll be relevant for the kinds of AIs we'll actually build.)

1. The smart part is not the agent-y part

It seems to me that what's powerful about modern ML systems is their ability to do data compression / patter...

GPT-3 does unsupervised learning on text data. Our brains do predictive processing on sensory inputs. My guess (which I'd love to hear arguments against!) is that there's a true and deep analogy between the two, and that they lead to impressive abilities for fundamentally the same reason.

Agree that self-supervised learning powers both GPT-3 updates and human brain world-model updates (details & caveats). (Which isn’t to say that GPT-3 is exactly the same as the human brain world-model—there are infinitely many different possible ML algorithms that all ...

7James Payor8mo
Can you visualize an agent that is not "open-ended" in the relevant ways, but is capable of, say, building nanotech and melting all the GPUs? In my picture most of the extra sauce you'd need on top of GPT-3 looks very agenty. It seems tricky to name "virtual worlds" in which AIs manipulate just "virtual resources" and still manage to do something like melting the GPUs.

For example, I claim that while AlphaGo could be said to be agent-y, it does not care about atoms. And I think that we could make it fantastically more superhuman at Go, and it would still not care about atoms. Atoms are just not in the domain of its utility function.

In particular, I don't think it has an incentive to break out into the real world to somehow get itself more compute, so that it can think more about its next move. It's just not modeling the real world at all. It's not even trying to rack up a bunch of wins over time. It's just playing the si

...
4David Johnston8mo
What do you think of a claim like "most of the intelligence comes from the steps where you do most of the optimization"? A corollary of this is that we particularly want to make sure optimization intensive steps of AI creation are safe WRT not producing intelligent programs devoted to killing us. Example: most of the "intelligence" of language models comes from the supervised learning step. However, it's in-principle plausible that we could design e.g. some really capable general purpose reinforcement learner where the intelligence comes from the reinforcement, and the latter could (but wouldn't necessarily) internalise "agenty" behaviour. I have a vague impression that this is already something other people are thinking about, though maybe I read too much into some tangential remarks in this direction. E.g. I figured the concern about mesa-optimizers was partly motivated by the idea that we can't always tell when an optimization intensive step is taking place. I can easily imagine people blundering into performing unsafe optimization-intensive AI creation processes. Gain of function pathogen research would seem to be a relevant case study here, except we currently have less idea about what kind of optimization makes deadly AIs vs what kind of optimization makes deadly pathogens. One of the worries (again, maybe I'm reading too far into comments that don't say this explicitly) is that the likelihood of such a blunder approaches 1 over long enough times, and the "pivotal act" framing is supposed to be about doing something that could change this (??) That said, it seems that there's a lot that could be done to make it less likely in short time frames.

They are not strong evidence that X works.

I agree that Zvi's story is not "strong evidence", but I don't think that means it "doesn't count" — a data point is a data point, even if inconclusive on its own.

(And I think it's inappropriate to tell someone that a data point "doesn't count" in response to a request for "any empirical evidence". In other words, I agree with your assessment that you were being a little bit of an asshole in that response ;-) )

3PoignardAzur8mo
Alright, sorry. I should have asked "is there any non-weak empirical evidence that...". Sorry if I was condescending.

Yeah, for sure — a great technique for avoiding the Double Illusion of Transparency.

Nitpick: wouldn't this graph be much more natural with the x and y axes reversed? I'd want to input the reduction in log-error over a cheaper compute regime to predict the reduction in log-error over a more expensive one.

Ah, thanks for the clarification!

fully general tech company is a technology company with the ability to become a world-leader in essentially any industry sector...

Notice here that I’m focusing on a company’s ability to do anything another company can do

To clarify, is this meant to refer to a fixed definition of sectors and what other companies can do as they existed prior to the TCS?

Or is it meant to include FGTCs being able to copy the output of other FGTCs?

I'd assume you mean something like the former, but I think it's worth being explicit about the fact that what sectors exist a...

8Andrew_Critch9mo
Yep, you got it! The definition is meant to be non-recursive and grounded in 2022-level industrial capabilities. This definition is bit unsatisfying insofar as 2022 is a bit arbitrary, except that I don't think the definition would change much if we replaced 2022 by 2010. I decided not to get into these details to avoid bogging down the post with definitions, but if a lot of people upvote you on this I will change the OP. Thanks for raising this!

The main theory we'll end up at, based on the accounting data, is that college costs are driven mainly by a large increase in diversity of courses available, which results in much lower student/faculty ratios, and correspondingly higher costs per student.

The "driven by" wording in the above suggests cause. It makes it sound to me like the increase in course diversity (and decrease in student-faculty ratios) comes first, and the increased cost is the result.

Is that what you meant?

If so, I think that case has not been demonstrated in the post. I'm with Eliez...

4johnswentworth9mo
Fair point. I don't think the increase in course diversity was causal.

[TFP] isn’t measured directly, but calculated as a residual by taking capital and labor increases out of GDP growth.

What does it mean to take out capital increases?

I assume taking out labor increases just means adjusting for a growing population. But isn't the reason that we get economic growth per capita at all because we build new stuff (including intangible stuff like processes and inventions) that enables us to build more stuff more efficiently? And can't all that new stuff be thought of as capital?

Or is what's considered "capital" only a subset of tha...