All of Marius Hobbhahn's Comments + Replies

Copying from EAF

TL;DR: At least in my experience, AISC was pretty positive for most participants I know and it's incredibly cheap. It also serves a clear niche that other programs are not filling and it feels reasonable to me to continue the program.

I've been a participant in the 2021/22 edition. Some thoughts that might make it easier to decide for funders/donors.
1. Impact-per-dollar is probably pretty good for the AISC. It's incredibly cheap compared to most other AI field-building efforts and scalable.
2. I learned a bunch during AISC and I did enjoy it.... (read more)

I feel like both of your points are slightly wrong, so maybe we didn't do a good job of explaining what we mean. Sorry for that. 

1a) Evals both aim to show existence proofs, e.g. demos, as well as inform some notion of an upper bound. We did not intend to put one of them higher with the post. Both matter and both should be subject to more rigorous understanding and processes. I'd be surprised if the way we currently do demonstrations could not be improved by better science.
1b) Even if you claim you just did a demo or an existence proof and explicitly ... (read more)

1L Rudolf L1mo
1a) I got the impression that the post emphasises upper bounds more than existing proofs from the introduction, which has a long paragraph on the upper bound problem, and from reading the other comments. The rest of the post doesn't really bear this emphasis out though, so I think this is a misunderstanding on my part. 1b) I agree we should try to be able to make claims like "the model will never X". But if models are genuinely dangerous, by default I expect a good chance that teams of smart red-teamers and eval people (e.g. Apollo) to be able to unearth scary demos. And the main thing we care about is that danger leads to an appropriate response. So it's not clear to me that effective policy (or science) requires being able to say "the model will never X". 1c) The basic point is that a lot of the safety cases we have for existing products rely less on the product not doing bad things across a huge range of conditions, but on us being able to bound the set of environments where we need the product to do well. E.g. you never put the airplane wing outside its temperature range, or submerge it in water, or whatever. Analogously, for AI systems, if we can't guarantee they won't do bad things if X, we can work to not put them in situation X. 2a) Partly I was expecting the post to be more about the science and less about the field-building. But field-building is important to talk about and I think the post does a good job of talking about it (and the things you say about science are good too, just that I'd emphasise slightly different parts and mention prediction as the fundamental goal). 2b) I said the post could be read in a way that produces this feeling; I know this is not your intention. This is related to my slight hesitation around not emphasising the science over the field-building. What standards etc. are possible in a field is downstream of what the objects of study turn out to be like. I think comparing to engineering safety practices in other fields is a u

Nice work. Looking forward to that!

Not quite sure tbh.
1. I guess there is a difference between capability evaluations with prompting and with fine-tuning, e.g. you might be able to use an API for prompting but not fine-tuning. Getting some intuition for how hard users will find it to elicit some behavior through the API seems relevant. 
2. I'm not sure how true your suggestion is but I haven't tried it a lot empirically. But this is exactly the kind of stuff I'd like to have some sort of scaling law or rule for. It points exactly at the kind of stuff I feel like we don't have enough confidence in. Or at least it hasn't been established as a standard in evals.

I somewhat agree with the sentiment. We found it a bit hard to scope the idea correctly. Defining subcategories as you suggest and then diving into each of them is definitely on the list of things that I think are necessary to make progress on them. 

I'm not sure the post would have been better if we used a more narrow title, e.g. "We need a science of capability evaluations" because the natural question then would be "But why not for propensity tests or for this other type of eval. I think the broader point of "when we do evals, we need some reason to be confident in the results no matter which kind of eval" seems to be true across all of them. 

I think this post was a good exercise to clarify my internal model of how I expect the world to look like with strong AI. Obviously, most of the very specific predictions I make are too precise (which was clear at the time of writing) and won't play out exactly like that but the underlying trends still seem plausible to me. For example, I expect some major misuse of powerful AI systems, rampant automation of labor that will displace many people and rob them of a sense of meaning, AI taking over the digital world years before taking over the physical world ... (read more)

I still stand behind most of the disagreements that I presented in this post. There was one prediction that would make timelines longer because I thought compute hardware progress was slower than Moore's law. I now mostly think this argument is wrong because it relies on FP32 precision. However, lower precision formats and tensor cores are the norm in ML, and if you take them into account, compute hardware improvements are faster than Moore's law. We wrote a piece with Epoch on this:

If anything, ... (read more)

I think I still mostly stand behind the claims in the post, i.e. nuclear is undervalued in most parts of society but it's not as much of a silver bullet as many people in the rationalist / new liberal bubble would make it seem. It's quite expensive and even with a lot of research and de-regulation, you may not get it cheaper than alternative forms of energy, e.g. renewables. 

One thing that bothered me after the post is that Johannes Ackva (who's arguably a world-leading expert in this field) and Samuel + me just didn't seem to be able to communicate w... (read more)

In a narrow technical sense, this post still seems accurate but in a more general sense, it might have been slightly wrong / misleading. 

In the post, we investigated different measures of FP32 compute growth and found that many of them were slower than Moore's law would predict. This made me personally believe that compute might be growing slower than people thought and most of the progress comes from throwing more money at larger and larger training runs. While most progress comes from investment scaling, I now think the true effective compute growth... (read more)

I haven't talked to that many academics about AI safety over the last year but I talked to more and more lawmakers, journalists, and members of civil society. In general, it feels like people are much more receptive to the arguments about AI safety. Turns out "we're building an entity that is smarter than us but we don't know how to control it" is quite intuitively scary. As you would expect, most people still don't update their actions but more people than anticipated start spreading the message or actually meaningfully update their actions (probably still less than 1 in 10 but better than nothing).

At Apollo, we have spent some time weighing the pros and cons of the for-profit vs. non-profit approach so it might be helpful to share some thoughts. 

In short, I think you need to make really sure that your business model is aligned with what increases safety. I think there are plausible cases where people start with good intentions but insufficient alignment between the business model and the safety research that would be the most impactful use of their time where these two goals diverge over time. 

For example, one could start as an organizatio... (read more)

4Seth Herd2mo
It seems like all of those points are of the form "you could do better alignment work if you didn't worry about profits". Which is definitely true. But only if you have some other source of funding. Since alignment work is funding-constrained, that mostly isn't true. So, what's the alternative? Work a day job and work nights on alignment?
4Roman Leventov2mo
An important factor that should go into this calculation (not just for you or your org but for anyone) is the following: given that AI safety is currently quite severely funding-constrained (just look at the examples of projects that are not getting funded right now), I think people should assess their own scientific calibre relative to other people in technical AI safety who will seek for funding. It's not a black-and-white choice between doing technical AI safety research, or AI governance/policy/advocacy, or not contributing to reducing the AI risk at all. The relevant 80000 hours page perpetuates this view and therefore is not serving the cause well in this regard. For people with more engineering, product, and business dispositions I believe there are many ways to help some to reduce the AI risk, many of which I referred to in other comments on this page, and here. And we should do a better job at laying out these paths for people, a-la "Work on Climate for AI risks".
This is an interesting point. I also feel like the governance model of the org and culture of mission alignment with increasing safety is important, in addition to the exact nature of the business and business model at the time the startup is founded. Looking at your examples, perhaps by “business model” you are referring both to what brings money in but also the overall governance/decision-making model of the organization?
1Eric Ho2mo
Thanks Marius, definitely agreed that business model alignment is critical here, and that culture and investors matter a bunch in determining the amount of impact an org has.

Thx. updated:

"You might not be there yet" (though as Neel points out in the comments, CV screening can be a noisy process)“You clearly aren’t there yet”

2Neel Nanda3mo

Fully agree that this is a problem. My intuition that the self-deception part is much easier to solve than the "how do we make AIs honest in the first place" part. 

If we had honest AIs that are convinced bad goals are justified, we would likely find ways to give them less power or deselect them early. The problem mostly arises when we can't rely on the selection mechanisms because the AI games them. 

We considered alternative definitions of DA in Appendix C.

We felt like being deceptive about alignment / goals was worse than what we ended up with (copied below):

“An AI is deceptively aligned when it is strategically deceptive about its misalignment”

Problem 1: The definition is not clear about cases where the model is strategically deceptive about its capabilities. 

For example, when the model pretends to not have a dangerous capability in order to pass the shaping & oversight process, we think it should be considered deceptively aligned, but it’s... (read more)

Sounds like an interesting direction. I expect there are lots of other explanations for this behavior, so I'd not count it as strong evidence to disentangle these hypotheses. It sounds like something we may do in a year or so but it's far away from the top of our priority list. There is a good chance, we will never run it. If someone else wants to pick this up, feel free to take it on.

(personal opinion; might differ from other authors of the post)

Thanks for both questions. I think they are very important. 

1. Regarding sycophancy: For me it mostly depends on whether it is strategic or not. If the model has the goal of being sycophantic and then reasons through that in a strategic way, I'd say this counts as strategic deception and deceptive alignment. If the model is sycophantic but doesn't reason through that, I'd probably not classify it as such. I think it's fine to use different terms for the different phenomena and have sycopha... (read more)

Thanks! First response makes sense, there's a lot of different ways you could cut it.  On the question of non-strategic, non-intentional deception, I agree that deceptive alignment is much more concerning in the medium term. But suppose that we develop techniques for making models honest. If mechanistic interpretability, unsupervised knowledge detection, or another approach to ELK pans out, we'll have models which reliably do what they believe is best according to their designer's goals. What major risks might emerge at that point? Like an honest AI, humans will often only do what they consciously believe is morally right. Yet the CEOs of tobacco and oil companies believe that their work is morally justified. Soldiers on both sides of a battlefield will believe they're on the side of justice. Scientists often advance dangerous technologies in the names of truth and progress. Sometimes, these people are cynical, pursuing their self-interest even if they believe it's immoral. But many believe they are doing the right thing. How do we explain that? These are not cases of deception, but rather self-deception. These individuals operate in an environment where certain beliefs are advantageous. You will not become the CEO of a tobacco company or a leading military commander if you don't believe your cause is justified. Even if everyone is perfectly honest about their own beliefs and only pursues what they believe is normatively right, the selection pressure from the environment is so strong that many powerful people will end up with harmful false beliefs.  Even if we build honest AI systems, they could be vulnerable to self-deception encouraged by environmental selection pressure. This is a longer term concern, and the first goal should be to build honest AI systems. But it's important to keep in mind the problems that would not be solved by honesty alone. 

Seems like one of multiple plausible hypotheses. I think the fact that models generalize their HHH really well to very OOD settings and their generalization abilities in general could also mean that they actually "understood" that they are supposed to be HHH, e.g. because they were pre-prompted with this information during fine-tuning. 

I think your hypothesis of seeking positive ratings is just as likely but I don't feel like we have the evidence to clearly say so wth is going on inside LLMs or what their "goals" are.

3Jay Bailey5mo
Interesting. That does give me an idea for a potentially useful experiment! We could finetune GPT-4 (or RLHF an open source LLM that isn't finetuned, if there's one capable enough and not a huge infra pain to get running, but this seems a lot harder) on a "helpful, harmless, honest" directive, but change the data so that one particular topic or area contains clearly false information. For instance, Canada is located in Asia. Does the model then: * Deeply internalise this new information? (I suspect not, but if it does, this would be a good sign for scalable oversight and the HHH generalisation hypothesis) * Score worse on honesty in general even in unrelated topics? (I also suspect not, but I could see this going either way - this would be a bad sign for scalable oversight. It would be a good sign for the HHH generalisation hypothesis, but not a good sign that this will continue to hold with smarter AI's) One hard part is that it's difficult to disentangle "Competently lies about the location of Canada" and "Actually believes, insomuch as a language model believes anything, that Canada is in Asia now", but if the model is very robustly confident about Canada being in Asia in this experiment, trying to catch it out feels like the kind of thing Apollo may want to get good at anyway.

I'm not going to crosspost our entire discussion from the EAF. 

I just want to quickly mention that Rohin and I were able to understand where we have different opinions and he changed my mind about an important fact. Rohin convinced me that anti-recommendations should not have a higher bar than pro-recommendations even if they are conventionally treated this way. This felt like an important update for me and how I view the post. 

All of the above but in a specific order. 
1. Test if the model has components of deceptive capabilities with lots of handholding with behavioral evals and fine-tuning. 
2. Test if the model has more general deceptive capabilities (i.e. not just components) with lots of handholding with behavioral evals and fine-tuning. 
3. Do less and less handholding for 1 and 2. See if the model still shows deception. 
4. Try to understand the inductive biases for deception, i.e. which training methods lead to more strategic deception. Try to answer ques... (read more)

(cross-posted from EAG)

Meta: Thanks for taking the time to respond. I think your questions are in good faith and address my concerns, I do not understand why the comment is downvoted so much by other people. 

1. Obviously output is a relevant factor to judge an organization among others. However, especially in hits-based approaches, the ultimate thing we want to judge is the process that generates the outputs to make an estimate about the chance of finding a hit. For example, a cynic might say "what has ARC-theory achieve so far? They wrote some nice f... (read more)

(cross-posted from EAF)  appreciate you sharing your impression of the post. It’s definitely valuable for us to understand how the post was received, and we’ll be reflecting on it for future write-ups. 1) We agree it's worth taking into account aspects of an organization other than their output. Part of our skepticism towards Conjecture – and we should have made this more explicit in our original post (and will be updating it) – is the limited research track record of their staff, including their leadership. By contrast, even if we accept for the sake of argument that ARC has produced limited output, Paul Christiano has a clear track record of producing useful conceptual insights (e.g. Iterated Distillation and Amplification) as well as practical advances (e.g. Deep RL From Human Preferences) prior to starting work at ARC. We're not aware of any equally significant advances from Connor or other key staff members at Conjecture; we'd be interested to hear if you have examples of their pre-Conjecture output you find impressive. We're not particularly impressed by Conjecture's process, although it's possible we'd change our mind if we knew more about it. Maintaining high velocity in research is certainly a useful component, but hardly sufficient. The Builder/Breaker method proposed by ARC feels closer to a complete methodology. But this doesn't feel like the crux for us: if Conjecture copied ARC's process entirely, we'd still be much more excited about ARC (per-capita). Research productivity is a product of a large number of factors, and explicit process is an important but far from decisive one. In terms of the explicit comparison with ARC, we would like to note that ARC Theory's team size is an order of magnitude smaller than Conjecture. Based on ARC's recent hiring post, our understanding is the theory team consists of just three individuals: Paul Christiano, Mark Xu and Jacob Hilton. If ARC had a team ten times larger and had spent close to $10 mn, then we would

(cross-posted from EAF)

Some clarifications on the comment:
1. I strongly endorse critique of organisations in general and especially within the EA space. I think it's good that we as a community have the norm to embrace critiques.
2. I personally have my criticisms for Conjecture and my comment should not be seen as "everything's great at Conjecture, nothing to see here!". In fact, my main criticism of leadership style and CoEm not being the most effective thing they could do, are also represented prominently in this post. 
3. I'd also be fine with the a... (read more)

(cross-commented from EA forum)

I personally have no stake in defending Conjecture (In fact, I have some questions about the CoEm agenda) but I do think there are a couple of points that feel misleading or wrong to me in your critique. 

1. Confidence (meta point): I do not understand where the confidence with which you write the post (or at least how I read it) comes from. I've never worked at Conjecture (and presumably you didn't either) but even I can see that some of your critique is outdated or feels like a misrepresentation of their work to me (see... (read more)

(cross-posted from EAF, thanks Richard for suggesting. There's more back-and-forth later.)

I'm not very compelled by this response.

It seems to me you have two points on the content of this critique. The first point:

I think it's bad to criticize labs that do hits-based research approaches for their early output (I also think this applies to your critique of Redwood) because the entire point is that you don't find a lot until you hit.

I'm pretty confused here. How exactly do you propose that funding decisions get made? If some random person says they are pursu... (read more)

(crossposted from the EA Forum)

We appreciate your detailed reply outlining your concerns with the post. 

Our understanding is that your key concern is that we are judging Conjecture based on their current output, whereas since they are pursuing a hits-based strategy we should expect in the median case for them to not have impressive output. In general, we are excited by hits-based approaches, but we echo Rohin's point: how are we meant to evaluate organizations if not by their output? It seems healthy to give promising researchers sufficient ... (read more)

(cross-posted from EAF)

Some clarifications on the comment:
1. I strongly endorse critique of organisations in general and especially within the EA space. I think it's good that we as a community have the norm to embrace critiques.
2. I personally have my criticisms for Conjecture and my comment should not be seen as "everything's great at Conjecture, nothing to see here!". In fact, my main criticism of leadership style and CoEm not being the most effective thing they could do, are also represented prominently in this post. 
3. I'd also be fine with the a... (read more)

Clarified the text: 

Update (early April 2023): I now think the timelines in this post are too long and expect the world to get crazy faster than described here. For example, I expect many of the things suggested for 2030-2040 to already happen before 2030. Concretely, in my median world, the CEO of a large multinational company like Google is an AI. This might not be the case legally but effectively an AI makes most major decisions.

Not sure if this is "Nice!" xD. In fact, it seems pretty worrying. 

2Daniel Kokotajlo10mo
Well nice that you updated at least! :) But yeah I'm pretty scared.

So far, I haven't looked into it in detail and I'm only reciting other people's testimonials. I intend to dive deeper into these fields soon. I'll let you know when I have a better understanding.  

I agree with the overall conclusion that the burden of proof should be on the side of the AGI companies. 

However, using the FDA as a reference or example might not be so great because it has historically gotten the cost-benefit trade-offs wrong many times and e.g. not permitting medicine that was comparatively safe and highly effective. 

So if the association of AIS evals or is similar to the FDA, we might not make too many friends. Overall, I think it would be fine if the AIS auditing community is seen as generally cautious but it should not give... (read more)

That seems like an excellent angle to the issue - I agree that reference models and stakeholders' different attitudes towards them likely have a huge impact.  As such, the criticisms the FDA faces might indeed be an issue! (at least that's how I understand your comment);  However, I'd carefully offer a bit of pushback on the aviation industry as an example, keeping in mind the difficult tradeoffs and diverging interests regulators will face in designing an approval process for AI systems. I think the problems that regulators will face are more similar to those of the FDA & policymakers (if you assume they are your audience) might be more comfortable with a model that can somewhat withstand these problems.  Below my reasoning (with a bit of an overstatement/ political rhetoric e.g., "risking peoples live") As you highlighted, FDA is facing substantial criticism for being too cautious, e.g., with the Covid Vaccine taking longer to approve than the UK. Not permitting a medicine that would have been comparatively safe and highly effective, i.e., a false negative, can mean that medicine could have had a profound positive impact on someone's life. And beyond the public interest, industry has quite some financial interests in getting these through too. In a similar vein, I expect that regulators will face quite some pushback when "slowing" innovation down, i.e. not approving a model. On the other side, being too fast in pushing drugs through the pipeline is also commonly criticized (e.g., the recent Alzheimer's drug approval as a false positive example). Even more so, losing its reputation as a trustworthy regulator has a lot of knock-on effects. (i.e., will people trust an FDA-approved vaccine in the future?).  As such, both being too cautious and being too aggressive have both potentially high costs to people's lives, striking the right balance is incredibly difficult. The aviation industry also faces a tradeoff, but I would argue, one side is inherently "weaker" tha
This makes sense. Can you say more about how aviation regulation differs from the FDA? In other words, are there meaningful differences in how the regulatory processes are set up? Or does it just happen to be the case that the FDA has historically been worse at responding to evidence compared to the Federal Aviation Administration?  (I think it's plausible that we would want a structure similar to the FDA even if the particular individuals at the FDA were bad at cost-benefit analysis, unless there are arguments that the structure of the FDA caused the bad cost-benefit analyses).

People could choose how they want to publish their opinion. In this case, Richard chose the first name basis. To be fair though, there aren't that many Richards in the alignment community and it probably won't be very hard for you to find out who Richard is. 

Just to get some intuitions. 

Assume you had a tool that basically allows to you explain the entire network, every circuit and mechanism, etc. The tool spits out explanations that are easy to understand and easy to connect to specific parts of the network, e.g. attention head x is doing y. Would you publish this tool to the entire world or keep it private or semi-private for a while? 

8Mark Xu10mo
I think this case is unclear, but also not central because I'm imagining the primary benefit of publishing interp research as being making interp research go faster, and this seems like you've basically "solved interp", so the benefits no longer really apply?

Thank you!

I also agree that toy models are better than nothing and we should start with them but I moved away from "if we understand how toy models do optimization, we understand much more about how GPT-4 does optimization". 

I have a bunch of project ideas on how small models do optimization. I even trained the networks already. I just haven't found the time to interpret them yet. I'm happy for someone to take over the project if they want to. I'm mainly looking for evidence against the outlined hypothesis, i.e. maybe small toy models actually do fair... (read more)

How confident are you that the model is literally doing gradient descent from these papers? My understanding was that the evidence in these papers is not very conclusive and I treated it more as an initial hypothesis than an actual finding. 

Even if you have the redundancy at every layer, you are still running copies of the same layer, right? Intuitively I would say this is not likely to be more space-efficient than not copying a layer and doing something else but I'm very uncertain about this argument. 

I intend to look into the Knapsack + DP algorithm problem at some point. If I were to find that the model implements the DP algorithm, it would change my view on mesa optimization quite a bit. 

I think that these papers do provide sufficient behavioral evidence that transformers are implementing something close to gradient descent in their weights.  Garg et al. 2022 examine the performance of 12-layer GPT-style transformers trained to do few-shot learning and show that they can in-context learn 2-layer MLPs.  The performance of their model closely matches an MLP with GD for 5000 steps on those same few-shot examples, and it cannot be explained by heuristics like averaging the K-nearest neighbors from the few-shot examples.  Since the inputs are fairly high-dimensional, I don't think they can be performing this well by only memorizing the weights they've seen during training.  The model is also fairly robust to distribution shifts in the inputs at test time, so the heuristic they must be learning should be pretty similar to a general-purpose learning algorithm.   I think that there also is some amount of mechanistic evidence that transformers implement some sort of iterative optimization algorithm over some quantity stored in the residual stream.   In one of the papers mentioned above (Akyurek et al. 2022), the authors trained a probe to extract the ground-truth weights of the linear model from the residual stream and it appears to somewhat work.  The diagrams seem to show that it gets better when trained on activations from later layers, so it seems likely that the transformer is iteratively refining its prediction of the weights.

No plans so far. I'm a little unhappy with the experimental design from last time. If I ever come back to this, I'll change the experiments up anyways.

Could you elaborate a bit more about the strategic assumptions of the agenda? For example,
1. Do you think your system is competitive with end-to-end Deep Learning approaches?
1.1. Assuming the answer is yes, do you expect CoEm to be preferable to users?
1.2. Assuming the answer is now, how do you expect it to get traction? Is the path through lawmakers understanding the alignment problem and banning everything that is end-to-end and doesn't have the benefits of CoEm? 
2. Do you think this is clearly the best possible path for everyone to take right now o... (read more)

fair. You convinced me that the effect is more determined by layer-norm than cross-entropy.

I agree that the layer norm does some work here but I think some parts of the explanation can be attributed to the inductive bias of the cross-entropy loss. I have been playing around with small toy transformers without layer norm and they show roughly similar behavior as described in this post (I ran different experiments, so I'm not confident in this claim). 

My intuition was roughly:
- the softmax doesn't care about absolute size, only about the relative differences of the logits.
- thus, the network merely has to make the correct logits really big an... (read more)

I agree that there's many reasons that directions do matter, but clearly distance would matter too in the softmax case!  Also, without layernorm, intermediate components of the network could "care more' about the magnitude of the residual stream (whereas it only matters for the unembed here), while for networks w/ layernorm the intermediate components literally do not have access to magnitude data!

I don't think there is a general answer here. But here are a couple of considerations:
- networks can get stuck in local optima, so if you initialize it to memorize, it might never find a general solution.
- grokking has shown that with high weight regularization, networks can transition from memorized to general solutions, so it is possible to move from one to the other.
- it probably depends a bit on how exactly you initialize the memorized solution. You can represent lookup tables in different ways and some are much more liked by NNs than others. For examp... (read more)

I agree with everything you're saying. I just want to note that as soon as someone starts training networks in a way where not all weights are updated simultaneously, e.g. because the weights are updated only for specific parts of the network, or when the network has an external memory that is not changed every training step, gradient hacking seems immediately much more likely and much scarier. 

And there are probably hundreds of researchers out there working on modular networks with memory, so it probably won't take that long until we have models that... (read more)

This criticism has been made for the last 40 years and people have usually had new ideas and were able to execute them. Thus, on priors, we think this trend will continue even if we don't know exactly which kind of ideas they will be. 

In fact, due to our post, we were made aware of a couple of interesting ideas about chip improvements that we hadn't considered before that might change the outcome of our predictions (towards later limits) but we haven't included them in the model yet. 

Hmmm interesting. 

Can you provide some of your reasons or intuitions for this fast FOOM?

My intuition against it is mostly like "intelligence just seems to be compute bound and thus extremely fast takeoffs (hours to weeks) are unlikely". But I feel very uncertain about this take and would like to refine it. So just understanding your intuitions better would probably already help a lot. 

4Daniel Kokotajlo1y
Every time I sit down to make a model of takeoff, or read someone else's model & input my own values for the parameters, it ends up being pretty fast. Much faster than your story. (In fact, I'm not sure I've ever seen a model that results in a takeoff as slow as your story, even with other people's values to the parameters.) That's the main reason why I have faster-takeoff views. There's a big gap between "hours to weeks" and "10+ years!" I don't think intelligence is compute bounded in the relevant sense, but even if it was (see my "Main alternative" in response to Richard elsewhere in this thread) it would maybe get us to a 3 year post-AGI takeoff at most, I'd say. If you have time to elaborate more on your model -- e.g. what you mean by intelligence being compute bounded, and how that translates into numbers for post-AGI takeoff speed -- I'd be interested to hear it!  

I think it's mostly my skepticism about extremely fast economic transformations. 

Like GPT-3 could probably automate more parts of the economy today but somehow it just takes a while for people to understand that and get it to work in practice. I also expect that it will take a couple of years between showing the capabilities of new AI systems in the lab and widespread economic impact just because humans take a while to adapt (at least with narrow systems). 

At some point (maybe in 2030) we will reach a level where AI is as capable as humans in man... (read more)

4Daniel Kokotajlo1y
For pre-AGI systems, I agree that it's going to take a substantial amount of time for them to be deployed as products and transform the economy. Which is why my What 2026 Looks Like story doesn't have GDP even begin to accelerate by 2026. For post-AGI systems, I forecast a fairly quick FOOM period (less than a year probably? Weeks or days, possibly) during which we keep basically the same hardware, but the "software" improves dramatically. We end up with something like von Neumann's brain, but qualitatively better in every way, and also cheaper and faster, and with a huge population size (millions? Billions?). They do more quality-adjusted intellectual labor in a few hours than all of human science could do in decades. They figure out a plan for dramatically better hardware, dramatically better robots, etc. and then from then on the human economy works feverishly to follow The Plan as fast as possible, instructed by superintelligent AI overseers. I think maybe it'll take only a few years at most to get to crazy nanobot tech (or some other kind of fully automated industry). Could take only a few days potentially.  

Well maybe. I still think it's easier to build AGI than to understand the brain, so even the smartest narrow AIs might not be able to build a consistent theory before someone else builds AGI. 

I'm not very bullish on HMI. I think the progress humanity makes in understanding the brain is extremely slow and because it's so hard to do research on the brain, I don't expect us to get much faster. 

Basically, I expect humanity to build AGI way before we are even close to understanding the brain. 

Well, narrow superintelligent AIs might help us understand the brain before then.

I know your reasoning and I think it's a plausible possibility. I'd be interested in how the disruption of AI into society looks like in your scenario. 

Is it more like one or a few companies have AGIs but the rest of the world is still kinda normal or is it roughly like my story just 2x as fast?

Thanks for pointing this out. I made a clarification in the text. 

Similar to Daniel, I'd also be interested in what public opinion was at the time or what the consensus view among experts was if there was one. 

Also, it seems like the timeframe for mobile phones is 1993 to 2020 if you can trust this statistic.

The timeframe for mobile phones is actually 1993 to 2013, mobile phone ownership went from 20% to 50% from 2010 to 2013.

Definitely could be but don't have to be. We looked a bit into cooling and heat and did not find any clear consensus on the issue. 

We did consider modeling it explicitly. However, most estimates on the Landauer limit give very similar predictions as size limits. So we decided against making an explicit addition to the model and it is "implicitly" modeled in the physical size. We intend to look into Landauer's limit at some point but it's not a high priority right now. 

We originally wanted to forecast FLOP/s/$ instead of just FLOP/s but we found it hard to make estimates about price developments. We might look into this in the future. 

Thanks. Another naive question: how do power and cooling requirements scale with transistor and GPU sizes? Could these be barriers to how large supercomputers can be built in practice?

Well, depending on who you ask, you'll get numbers between 1e13 and 1e18 for the human brain FLOP/s equivalent. So I think there is lots of uncertainty about it. 

However, I do agree that if it was at 1e16, your reasoning sounds plausible to me. What a wild imagination. 

Jacob Cannell thinks brains are doing 10^14 10^15 FLOP/s.

Yeah, I also expect that there are some ways of compensating for the lack of miniaturization with other tech. I don't think progress will literally come to a halt. 

We looked more into this because we wanted to get a better understanding of Ajeya's estimate of price-performance doubling every 2.5 years. Originally, some people I talked to were skeptical and thought that 2.5 years is too conservative. I now think that 2.5 years is probably insufficiently conservative in the long run. 

However, I also want to note that there are still reasons to believe a doubling time of 2 years or less could be realistic due to progress in specialization or other breakthroughs. I still have large uncertainty about the doubling tim... (read more)

By uncertainty I mean, I really don't know, i.e. I could imagine both very high and very low gains. I didn't want to express that I'm skeptical. 

For the third paragraph, I guess it depends on what you think of as specialized hardware. If you think GPUs are specialized hardware than a gain of 1000x from CPUs to GPUs sounds very plausible to me. If you think GPUs are the baseline and specialized hardware are e.g. TPUs, then a 1000x gain sounds implausible to me. 

My original answers wasn't that clear. Does this make more sense to you?

It does, thanks! (I had interpreted the claim in the paper as comparing e.g. TPUs to CPUs, since the quote mentions CPUs as the baseline.)
Load More