All of Vladimir_Nesov's Comments + Replies

Building a powerful AI such that doing so is a good thing rather than a bad thing. Perhaps even there being survivors shouldn't insist on the definite article, on being the question, as there are many questions with various levels of severity, that are not mutually exclusive.

1Shankar Sivarajan1h
Do you believe this answers the question "… for whom?" or are you helpfully illustrating how it typically gets hand-waved away? The usage of the definite article does not imply there are no other questions, just that they are all subordinate to this one.

When boundaries leak, it's important to distinguish commitment to rectify them from credence that they didn't.

These are all failures to acknowledge the natural boundaries that exist between individuals.

You shouldn't worry yet, the models need to be far more capable.

The right time to start worrying is too early, otherwise it will be too late.

(I agree in the sense that current models very likely can't be made existentially dangerous, and in that sense "worrying" is incorrect, but the proper use of worrying is planning for the uncertain future, a different sense of "worrying".)

It's not entirely clear how and why GPT-4 (possibly a 2e25 FLOPs model) or Gemini Ultra 1.0 (possibly a 1e26 FLOPs model) don't work as autonomous agents, but it seems that they can't. So it's not clear that the next generation of LLMs built in a similar way will enable significant agency either. There are millions of AI GPUs currently being produced each year, and millions of GPUs can only support a 1e28-1e30 FLOPs training run (that doesn't individually take years to complete). There's (barely) enough text data for that.

GPT-2 would take about 1e20 FLOPs ... (read more)

The premise is autonomous agents at near-human level with propensity and opportunity to establish global lines of communication with each other. Being served via API doesn't in itself control what agents do, especially if users can ask the agents to do all sorts of things and so there are no predefined airtight guardrails on what they end up doing and why. Large context and possibly custom tuning also makes activities of instances very dissimilar, so being based on the same base model is not obviously crucial.

The agents only need to act autonomously the wa... (read more)

For it to make sense to say that the math is wrong, there needs to be some sort of ground truth, making it possible for math to also be right, in principle. Even doing the math poorly is exercise that contributes to eventually making the math less wrong.

If allowed to operate in the wild and globally interact with each other (as seems almost inevitable), agents won't exist strictly within well-defined centralized bureaucracies, the thinking speed that enables impactful research also enables growing elaborate systems of social roles that drive the collective decision making, in a way distinct from individual decision making. Agent-operated firms might be an example where economy drives decisions, but nudges of all kinds can add up at scale, becoming trends that are impossible to steer.

3Daniel Kokotajlo2d
But all of the agents will be housed in one or three big companies. Probably one. And they'll basically all be copies of one to ten base models. And the prompts and RLHF the companies use will be pretty similar. And the smartest agents will at any given time be only deployed internally, at least until ASI. 

The thing that seems more likely to first get out of hand is activity of autonomous non-ASI agents, so that the shape of loss of control is given by how they organize into a society. Alignment of individuals doesn't easily translate into alignment of societies. Development of ASI might then result in another change, if AGIs are as careless and uncoordinated as humanity.

2Daniel Kokotajlo2d
Can you elaborate? I agree that there will be e.g. many copies of e.g. AutoGPT6 living on OpenAI's servers in 2027 or whatever, and that they'll be organized into some sort of "society" (I'd prefer the term "bureaucracy" because it correctly connotes centralized heirarchical structure). But I don't think they'll have escaped the labs and be running free on the internet.  

A model is like compiled binaries, except compilation is extremely expensive. Distributing a model alone and claiming it's "open source" is like calling a binary distributive without source code "open source".

The term that's catching on is open weight models as distinct from open source models. The latter would need to come with datasets and open source training code that enables reproducing the model.

I think the compiled binary analogy isn't quite right. For instance, the vast majority of modifications and experiments people want to run are possible (and easiest) with just access to the weights in the LLM case.

As in, if you want to modify an LLM to be slightly different, access to the original training code or dataset is mostly unimportant.

(Edit: unlike the software case where modifying compiled binaries to have different behavior isn't really doable without the source code.)

Yes, agreed - as I said in the post, "Open Source AI simply means that the models have the model weights released - the equivalent of software which makes the compiled code available. (This is otherwise known as software.)"

My impression is that one point Hanson was making in the spring-summer 2023 podcasts is that some major issues with AI risk don't seem different in kind from cultural value drift that's already familiar to us. There are obvious disanalogies, but my understanding of this point is that there is still a strong analogy that people avoid acknowledging.

If human value drift was already understood as a serious issue, the analogy would seem reasonable, since AI risk wouldn't need to involve more than the normal kind of cultural value drift compressed into short tim... (read more)

You are directing a lot of effort at debating details of particular proxies for an optimization target, pointing out flaws. My point is that strong optimization for any proxy that can be debated in this way is not a good idea, so improving such proxies doesn't actually help. A sensible process for optimizing something has to involve continually improving formulations of the target as part of the process. It shouldn't be just given any target that's already formulated, since if it's something that would seem to be useful to do, then the process is already f... (read more)

If your favoured alignment target suffers from a critical flaw, that is inherent in the core concept, then surely it must be useful for for you to discover this. So I assume that you agree that, conditioned on me being right about CEV suffering from such a flaw, you want me to tell you about this flaw. In other words, I think that I have demonstrated, that CEV suffers from a flaw, that is not related to any detail, of any specific version, or any specific description, or any specific proxy, or any specific attempt to describe what CEV is, or anything else along those lines. Instead, this flaw is inherent in the core concept, of building an AI that is describable as ``doing what a Group wants''. The Suffering Reducing AI (SRAI) alignment target is known to suffer from this type of a core flaw. The SRAI flaw is not related to any specific detail, of any specific version, or proxy, or attempt to describe what SRAI is, etc. And the flaw is not connected to any specific definition of ``Suffering''. Instead, the tendency to kill everyone, is inherent in the core concept of SRAI. It must surely be possible for you to update the probability that CEV also suffers from a critical flaw of this type (a flaw inherent in the core concept). SRAI sounds good on the surface, but it it is known to suffer from such a core flaw. Thus, the fact that CEV sounds good on the surface, does not rule out the existence of such a core flaw in CEV. I do not think, that it possible to justify making no update, when discovering that the version of CEV, that you linked to, implies an outcome that would be far, far worse that extinction. I think that the probability must go up, that CEV contains a critical flaw, inherent in the core concept. Outcomes massively worse than extinction, is not an inherent feature, of any conceivable detailed description, of any conceivable alignment target. To take a trivial example, such an outcome is not implied by any given specific description of SRAI. The only way

The blast radius of AGIs is unbounded in the same way as that of humanity, there is potential for taking over all of the future. There are many ways of containing it, and alignment is a way of making the blast a good thing. The point is that a sufficiently catastrophic failure that doesn't involve containing the blast is unusually impactful. Arguments about ease of containing the blast are separate from this point in the way I intended it.

If you don't expect AGIs to become overwhelmingly powerful faster than they are made robustly aligned, containing the b... (read more)

Stronger versions of seemingly-aligned AIs are probably effectively misaligned in the sense that optimization targets they formulate on long reflection (or superintelligent reflection) might be sufficiently different from what humanity should formulate. These targets don't concretely exist before they are formulated, which is very hard to do (and so won't yet be done by the time there are first AGIs), and strongly optimizing for anything that does initially exist is optimizing for a faulty proxy.

The arguments about dangers of this kind of misalignment seem... (read more)

8Wei Dai9d
Interesting connection you draw here, but I don't see how "AIs don’t change that" can be justified (unless interpreted loosely to mean "there is risk either way"). From my perspective, AIs can easily make this problem better (stop the complacent value drift as you suggest, although so far I'm not seeing much evidence of urgency), or worse (differentially decelerate philosophical progress by being philosophically incompetent). What's your view on Robin's position?

Basic science and pure mathematics enable their own subsequent iterations without having them as explicit targets or even without being able to imagine these developments, while doing the work crucial in making them possible.

Extensive preparation never happened with a thing that is ready to be attempted experimentally, because in those cases we just do the experiments, there is no reason not to. With AGI, the reason not to do this is the unbounded blast radius of a failure, an unprecedented problem. Unprecedented things are less plausible, but unfortunatel... (read more)

2Gerald Monroe17d
Is it true or not true that there is no evidence for an "unbounded" blast radius for any AI model someone has trained.  I am not aware of any evidence. What would constitute evidence that the situation was now in the "unbounded" failure case?  How would you prove it?   So we don't end up in a loop, assume someone has demonstrated a major danger with current AI models.  Assume there is a really obvious method of control that will contain the problem.  Now what?  It seems to me like the next step would be to restrict AI development in a similar way to how cobalt-60 sources are restricted, where only institutions with licenses, inspections, and methods of control can handle the stuff, but that's still not a pause... When could you ever reach a situation where a stronger control mechanism won't work? Like I try to imagine it, and I can imagine more and more layers of defense - "don't read anything the model wrote", "more firewalls, more isolation, servers in a salt mine" - but never a point where you couldn't agree it was under control.  Like if you make a radioactive source more radioactive you just add more inches of shielding until the dose is acceptable.

Consider an indefinite moratorium on AGI that awaits better tools that make building it a good idea rather than a bad idea. If there was a magic button that rewrote laws of nature to make this happen, would it be a good idea to press it? My point is that we both endorse pressing this button, the only difference is that your model says that building an AGI immediately is a good idea, and so the moratorium should end immediately. My model disagrees. This particular disagreement is not about the generations of people who forgo access to potential technology (... (read more)

2Gerald Monroe17d
Do any examples of preparation over an extended length of time exist in human history? I would suspect they do not for the simple reason that preparation in advance of a need you don't have, has no roi.

Hypotheticals disentangle models from values. A pause is not a policy, not an attempt at a pause that might fail, it's the actual pause, the hypothetical. We can then looks at the various hypotheticals and ask what happens there, which one is better. Hopefully our values can handle the strain of out-of-distribution evaluation and don't collapse into incoherence of goodharting, unable to say anything relevant about situations that our models consider impossible in actual reality.

In the hypothetical of a 100-year pause, the pause actually happens, even if th... (read more)

2Gerald Monroe17d
How would you know any method of alignment works without AGI of increasing capabilities/child AGI that are supposed to inherit aligned property to test this? One of the reasons I gave current cybersecurity as an example is that pub/private key signing is correct. Nobody has broken the longer keys. Yet if you spent 20 years or 100 years proving it correct then deployed to software using present techniques you would get hacked immediately. Implementation is hard and is the majority of the difficulty. Assuming ai alignment can be paper solved like this way I see it as the same situation. It will fail in ways you won't know until you try it for real.

"AI pause" talk [...] dooms [...] to more of the same

This depends on the model of risks. If risks without a pause are low, and they don't significantly reduce with a pause, then a pause makes things worse. If risks without a pause are high, but risks after a 20-year pause are much lower, then a pause is an improvement even for personal risk for sufficiently young people.

If risks without pause are high, risks after a 50-year pause remain moderately high, but risks after a 100-year pause become low, then not pausing trades significant measure of the futur... (read more)

2Gerald Monroe18d
Yes. Although you have 2 problems: 1. Why do you think a 20 year pause, or any pause, will change anything. Like for example you may know that cybersecurity on game consoles and iPhones keeps getting cracked. AI control is similar in many ways to cybersecurity in that you are trying to limit the AIs access to functions that let it do bad things, and prevent the AI from seeing information that will allow it to fail. (Betrayal is control failure, the model cannot betray in coordinated way if it doesn't somehow receive a message from other models that now is the time) Are you going to secure your iPhones and consoles by researching cybersecurity for 20 years and then deploying the next generation? Or do you do the best you can with information from the last failures and try again? Each time you try to limit the damage, for example with game consoles there are various strategies that have grown more sophisticated to encourage users to purchase access to games rather than pirate them. With AI you probably won't learn during a pause anything that will help you. We know this from experience because on paper, securing the products I mentioned is trivially easy. Sign everything with a key server, don't run unsigned code, check the key is valid on really well verified and privileged code. Note that no one who breaks game consoles or iphones does it by cracking the encryption directly, pub/private key crypto is still unbroken. Similarly I would expect very early on human engineers will develop an impervious method of AI control. You can write one yourself it's not difficult. But like everything it will fail on implementation.... 1. You know the cost of a 20 year pause. Just add up the body bags, or 20 years of deaths worldwide to aging. More than a billion people. You don't necessarily have a good case that the benefit of the pause will save the lives of all humans because even guessing the problem will benefit during a pause is speculation. It's not speculation to

The most likely way to get to extremely safe AGI or ASI systems is not by humans creating them, it's by other less-safe AGI systems creating them.

This does seem more likely, but managing to sidestep the less-safe AGI part would be safer. In particular, it might be possible to construct a safe AGI by using safe-if-wielded-responsibly tool AIs (that are not AGIs), if humanity takes enough time to figure out how to actually do that.

The current paradigm of AI research makes it hard to make really pure tool AIs. We have software tools, like Wolfram Alpha, and we have LLM-derived systems. This is probably the set of tools we will either win or die with

the view that there’s probably no persisting identity over time anyway and in some sense I probably die and get reborn all the time in any case

In the long run, this is probably true for humans in a strong sense that doesn't depend on litigation of "personal identity" and "all the time". A related phenomenon is value drift. Neural nets are not a safe medium for keeping a person alive for a very long time without losing themselves, physical immortality is insufficient to solve the problem.

That doesn't mean that the problem isn't worth solving, or that it ... (read more)

1Q Home18d
What if endorsed long term instability leads to negation of personal identity too? (That's something I thought about.)

Metaphorically, there is a question CEV tries to answer, and by "something like CEV" I meant any provisional answer to the appropriate question (so that CEV-as-currently-stated is an example of such an answer). Formulating an actionable answer is not a project humans would be ready to work on directly any time soon. So CEV is something to aim at by intention that defines CEV. If it's not something to aim at, then it's not a properly constructed CEV.

This lack of a concrete formulation is the reason goodharting and corrigibility seem salient in operationaliz... (read more)

The version of CEV, that is described on the page that your CEV link leads to, is PCEV. The acronym PCEV was introduced by me. So this acronym does not appear on that page. But that's PCEV that you link to. (in other words: the proposed design, that would lead to the LP outcome, can not be dismissed as some obscure version of CEV. It is the version that your own CEV link leads to. I am aware of the fact, that you are viewing PCEV as: ``a proxy for something else'' / ``a provisional attempt to describe what CEV is''. But this fact still seemed noteworthy) On terminology: If you are in fact using ``CEV'' as a shorthand, for ``an AI that implements the CEV of a single human designer'', then I think that you should be explicit about this. After thinking about this, I have decided that without explicit confirmation that this is in fact your intended usage, I will proceed as if you are using CEV as a shorthand, for ``an AI that implements the Coherent Extrapolated Volition of Humanity'' (but I would be perfectly happy to switch terminology, if I get such confirmation). (another reading of your text, is that: ``CEV'' (or: ``something like CEV'') is simply a label that you attach, to any good answer, to the correct phrasing of the ``what alignment target should be aimed at?'' question. That might actually be a sort of useful shorthand. In that case I would, somewhat oddly, have to phrase my claim as: under no reasonable set of definitions, does the Coherent Extrapolated Volition of Humanity, deserve the label ``CEV'' / ``something like CEV''. Due to the chosen label(s), the statement looks odd. But there is no more logical tension in the above statement, than there is logical tension in the following statement: ``under no reasonable set of definitions, does the Coherent Extrapolated Volition of Steve, result in the survival of any of Steve's cells'' (which is presumably a true statement for at least some human individuals). Until I hear otherwise, I will however stay with
I think that ``CEV'' is usually used as shorthand for ``an AI that implements the CEV of Humanity''. This is what I am referring to, when I say ``CEV''. So, what I mean when I say that ``CEV is a bad alignment target'', is that, for any reasonable set of definitions, it is a bad idea, to build an AI, that does what ``a Group'' wants it to do (in expectation, from the perspective of essentially any human individual, compared to extinction). Since groups and individuals, are completely different types of things, it should not be surprising to learn, that doing what one type of thing wants (such as ``a Group''), is bad for a completely different type of thing (such as a human individual). In other words, I think that ``an AI that implements the CEV of Humanity'', is a bad alignment target, in the same sense, as I think that SRAI is a bad alignment target. But I don't think your comment uses ``CEV'' in this sense. I assume that we can agree, that aiming for ``the CEV of a chimp'', can be discovered to be a bad idea (for example by referring to facts about chimps, and using thought experiments, to see what these facts about chimps, implies about likely outcomes). Similarly, it must be possible to discover, that aiming for ``the CEV of Humanity'', is also a bad idea (for human individuals). Surely, discovering this, cannot be, by definition, impossible. Thus, I think that you are in fact, not, using ``CEV'' as shorthand for ``an AI that implements the CEV of Humanity''. (I am referring to your sentence: ``If it's not something to aim at, then it's not a properly constructed CEV.'') Your comment makes perfect sense, if I read ``CEV'' as shorthand for ``an AI that implements the CEV of a single human designer''. I was not expecting this terminology. But it is a perfectly reasonable terminology, and I am happy to make my argument, using this terminology. If we are using this terminology, then I think that you are completely right, about the problem that I am trying to desc

The issue with proxies for an objective is that they are similar to it. So an attempt to approximately describe the objective (such as an attempt to say what CEV is) can easily arrive at a proxy that has glaring goodharting issues. Corrigibility is one way of articulating a process that fixes this, optimization shouldn't outpace accuracy of the proxy, which could be improving over time.

Volition of humanity doesn't obviously put the values of the group before values of each individual, as we might put boundaries between individuals and between smaller group... (read more)

I think that my other comment to this, will hopefully be sufficient, to outline what my position actually is. But perhaps a more constructive way forwards, would be to ask how certain you are, that CEV is in fact, the right thing to aim at? That is, how certain are you, that this situation is not symmetrical, to the case where Bob thinks that: ``a Suffering Reducing AI (SRAI), is the objectively correct thing to aim at''? Bob will diagnose any problem, with any specific SRAI proposal, as arising from proxy issues, related to the fact that Bob is not able to perfectly define ``Suffering'', and must always rely on a proxy (those proxy issues exists. But they are not the most serious issue, with Bob's SRAI project). I don't think that we should let Bob proceed with an AI project, that aims to find the correct description of ``what SRAI is'', even if he is being very careful, and is trying to implement a safety measure (that will, while it continues to work as intended, prevent SRAI from killing everyone). Because those safety features might fail, regardless of whether or not someone has pointed out a critical flaw in them, before the project reaches the point of no return (this conclusion is not related to Corrigibility. I would reach the exact same conclusion, if Bob's SRAI project, was using any other safety measure). For the exact same reason, I simply do not think, that it is a good idea, to proceed with your proposed CEV project (as I understand that project). I think that doing so, would represent a very serious s-risk. At best, it will fail in a safe way, for predictable reasons. How confident are you, that I am completely wrong about this? Finally, I should note, that I still don't understand your terminology. And I don't think that I will, until you specify what you mean with ``something like CEV''. My current comments, are responding to my best guess, of what you mean (which is, that MPCEV, from my linked to post, would not count as ``something like CEV'',
It is getting late here, so I will stop after this comment, and look at this again tomorrow (I'm in Germany). Please treat the comment below as not fully thought through. The problem from my perspective, is that I don't think that the objective, that you are trying to approximate, is a good objective (in other words, I am not referring to problems, related to optimising a proxy. They also exist, but they are not the focus of my current comments). I don't think that it is a good idea, to do what an abstract entity, called ``humanity'', wants (and I think that this is true, from the perspective of essentially any human individual). I think that it would be rational, for essentially any human individual, to strongly oppose the launch of any such ``Group AI''. Human individuals, and groups, are completely different types of things. So, I don't think that it should be the surprising, to learn that doing what a group wants, is bad for the individuals, in that group. This is a separate issue, from problems related to optimising for a proxy. I give one example, of how things can go wrong, in the post: A problem with the most recently published version of CEV  This is of course just one specific example, and it is meant as an introduction, to the dangers, involved in building an AI, that is describable as ``doing what a group wants''. Showing that a specific version of CEV, would lead to an outcome, that is far, far, worse than extinction, does not, on its own, prove that all versions of CEV are dangerous. I do however think that all versions of CEV, are, very, very, dangerous. And I do think, that this specific thought experiment, can be used to hint at a more general problem. I also hope, that this thought experiment will at least be sufficient, for convincing most readers that there, might, exist a deeper problem, with the core concept. In other words, I hope that it will be sufficient, to convince most readers that you, might, be going after the wrong objective, when

This seems mostly goodharting, how the tails come apart when optimizing or selecting for a proxy rather than for what you actually want. And people don't all want the same thing without disagreement or value drift. Near term practical solution is not optimizing too hard and building an archipalago with membranes between people and between communities that bound the scope of stronger optimization. Being corrigible about everything might also be crucial. Longer term idealized solution is something like CEV, saying in a more principled and precise way what th... (read more)

I'm not sure that I agree with this. I think it mostly depends on what you mean by: ``something like CEV''. All versions of CEV are describable as ``doing what a Group wants''. It is inherent in the core concept of building an AI, that is ``Implementing the Coherent Extrapolated Volition of Humanity''. This rules out proposals, where each individual, is given meaningful influence, regarding the adoption, of those preferences, that refer to her. For example as in MPCEV (described in the post that I linked to above). I don't see how an AI can be safe, for individuals, without such influence. Would you say that MPCEV counts as ``something like CEV''? If so, then I would say that it is possible, that ``something like CEV'', might be a good, long term solution. But I don't see how one can be certain about this. How certain are you, that this is in fact a good idea, for a long term solution? Also, how certain are you, that the full plan that you describe (including short term solutions, etc), is actually a good idea?

Right, a probable way of doing continued pretraining could as well be called "full-tuning", or just "tuning" (which is what you said, not "fine-tuning"), as opposed to "fine-tuning" that trains fewer weights. Though people seem unsure about "fine-tuning" implying that it's not full-tuning, resulting in terms like dense fine-tuning to mean full-tuning.

good terms to distinguish full-tuning the model in line with the original method of pretraining, and full layer LoRA adaptations that 'effectively' continue pretraining but are done in a different manner

Yo... (read more)

If you wanted a term which would be less confusing than calling continued pretraining 'full-tuning' or 'fine-tuning', I would suggest either 'warmstarting' or 'continual learning'. 'Warmstarting' is the closest term, I think: you take a 'fully' trained model, and then you train it again to the extent of a 'fully' trained model, possibly on the same dataset, but just as often, on a new but similarish dataset.

Some things are best avoided entirely when you take their risks into account, some become worthwhile only if you manage their risks instead of denying their existence even to yourself. But even when denying risks gives positive outcomes in expectation, adequately managing those risks is even better. Unless society harms the project for acknowledging some risks, which it occasionally does. In which case managing them without acknowledgement (which might require magic cognitive powers) is in tension with acknowledging them despite the expected damage from doing so.

being tuned on a Llama 70B

Based on Mensch's response, Miqu is probably continued pretraining starting at Llama2-70B, a process similar to how CodeLlama or Llemma were trained. (Training on large datasets comparable with the original pretraining dataset is usually not called fine-tuning.)

less capable model trained on the same dataset

If Miqu underwent continued pretraining from Llama2-70B, the dataset won't be quite the same, unless mistral-medium is also pretrained after Llama2-70B (in which case it won't be released under Apache 2).

Hmm. Not sure how relevant here, but do we currently have any good terms to distinguish full-tuning the model in line with the original method of pretraining, and full layer LoRA adaptations that 'effectively' continue pretraining but are done in a different manner? I've seen it can be used for continued pretraining as well as finetuning, but I don't know if I'd actually call it a full tune, and I don't think it has the same expenses. I'm honestly unsure what distinguishes a pretraining LoRA from a fine-tuning LoRA. Even if the dataset is a bit different between miqu and mistral-medium, they apparently have quite similar policies, and continued pretraining would push it even more to the new dataset than fine-tuning to my understanding.

Bard Gemini Pro (as it's called in lmsys arena) has access to the web and an unusual finetuning with a hyper-analytical character, it often explicitly formulates multiple subtopics in a reply and looks into each of them separately. In contrast the earlier Gemini Pro entries that are not Bard have a finetuning or prompt not suitable for the arena, often giving a single sentence or even a single word as a first response. Thus like Claude 2 (with its unlikable character) they operate at a handicap relative to base model capabilities. GPT-4 on lmsys arena does... (read more)

This notion of thinking speed makes sense for large classes of tasks, not just specific tasks. And a natural class of tasks to focus on is the harder tasks among all the tasks both systems can solve.

So in this sense a calculator is indeed much faster than GPT-4, and GPT-4 is 2 OOMs faster than humans. An autonomous research AGI is capable of autonomous research, so its speed can be compared to humans at that class of tasks.

AI accelerates the pace of history only when it's capable of making the same kind of progress as humans in advancing history, at which ... (read more)

current AIs are not thinking faster than humans [...] GPT-4 has higher token latency than GPT-3.5, but I think it's fair to say that GPT-4 is the model that "thinks faster"

This notion of thinking speed depends on the difficulty of a task. If one of the systems can't solve a problem at all, it's neither faster nor slower. If both systems can solve a problem, we can compare the time they take. In that sense, current LLMs are 1-2 OOMs faster than humans at the tasks both can solve, and much cheaper.

Old chess AIs were slower than humans good at chess. If fu... (read more)

2Ege Erdil24d
Sure, but in that case I would not say the AI thinks faster than humans, I would say the AI is faster than humans at a specific range of tasks where the AI can do those tasks in a "reasonable" amount of time. As I've said elsewhere, there is a quality or breadth vs serial speed tradeoff in ML systems: a system that only does one narrow and simple task can do that task at a high serial speed, but as you make systems more general and get them to handle more complex tasks, serial speed tends to fall. The same logic that people are using to claim GPT-4 thinks faster than humans should also lead them to think a calculator thinks faster than GPT-4, which is an unproductive way to use the one-dimensional abstraction of "thinking faster vs. slower". You might ask "Well, why use that abstraction at all? Why not talk about how fast the AIs can do specific tasks instead of trying to come up with some general notion of if their thinking is faster or slower?" I think a big reason is that people typically claim the faster "cognitive speed" of AIs can have impacts such as "accelerating the pace of history", and I'm trying to argue that the case for such an effect is not as trivial to make as some people seem to think.

Projects that involve interplanetary transit are not part of the development I discuss, so they can't slow it down. You don't need to wait for paint to dry if you don't use paint.

There are no additional pieces of infrastructure that need to be in place to make programmable cells, only their design and what modern biotech already has to manufacture some initial cells. It's a question of sample efficiency in developing simulation tools, how many observations does it take for simulation tools to get good enough, if you had centuries to design the process of d... (read more)

Machining equipment takes time to cut an engine, nano lathe a part, or if we are growing human organs to treat VIPs it takes months for them to grow.

That's why you don't do any of the slower things at all (in a blocking way), and instead focus on the critical path of controllable cells for macroscopic biotech or something like that, together with the experiments needed to train simulators good enough to design them. This enables exponentially scaling physical infrastructure once completed, which can be used to do all the other things. Simulation is not ... (read more)

2Gerald Monroe25d
You can speed things up. The main takeaway is there's 4 orders of magnitude here. Some projects that involve things like interplanetary transits to setup are going to be even slower than that. And you will most assuredly start out at 4 oom slower bootstrapping from today's infrastructure. Yes maybe you can eventually develop all the things you mentioned, but there are upfront costs to develop them. You don't have programmable cells or self replicating nanotechnology when you start, and you can't develop them immediately just by thinking about it for thousands of years. This specifically is an argument again sudden and unexpected "foom" the moment agi exists. If 20-50 years later in a world full of robots and rapid nanotechnology and programmable biology you start to see exponential progress that's a different situation.

there's significant weight on logarithmically diminishing returns such that the things that are strong than us never get so much stronger that we have no hope of understanding what they're doing

If autonomous research level AGIs are still 2 OOMs faster than humans, that leads to massive scaling of hardware within years even if they are not smarter, at which point it's minds the size of cities. So the probable path to weak takeoff is a slow AGI that doesn't get faster on hardware of the near future, and being slow it won't soon help scale hardware.

When you design a thing, you can intentionally make it more predictable and faster to test, in particular with modularity. If the goal is designing cells that grow and change in controllable ways, all experiments are tiny. Like with machine learning, new observations from the experiments generalize by improving the simulation tools, not just object level designs. And much more advanced theory of learning should enable much better sample efficiency with respect to external data.

If millionfold speedup is currently already feasible, it doesn't take hardware a... (read more)

2Gerald Monroe25d
Absolutely. This happens today, where there is only time in silicon release cycles for a few revisions. My main point with the illustrative numbers was to show how the time complexity works. You have this million times faster AI - it can do 10 years of work in 2.24 minutes it seems. (Assuming a human is working 996 or 72 hours a week) Even if we take the most generous possible assumptions about how long it takes to build something real and test it, then fix your mistakes, the limiting factors are 43,000 times slower than we can think. Say we reduce our serial steps for testing and only need 2 prototypes and then the final version instead of 10. So we made it 3 times faster! So the real world is still slowing us down by a factor of 14,400. Machining equipment takes time to cut an engine, nano lathe a part, or if we are growing human organs to treat VIPs it takes months for them to grow. Same for anything else you think of. Real world just has all these slow steps, from time for concrete to cure, paint to dry, molten metal in castings to cool, etc. This is succinctly why FOOM is unlikely. But 5-50 years of research in 1 year is still absolutely game changing.

Any testing can be done in simulation, as long as you have a simulator and it's good enough. A few hundreds times speedup in thinking allows very quickly writing very good specialized software for learning and simulation of all relevant things, based on theory that's substantially better. The speed of simulation might be a problem, and there's probably a need for physical experiments to train the simulation models (but not to directly debug object level engineering artifacts).

Still, in the physical world activity of an unfettered 300x speed human level AGI... (read more)

3Gerald Monroe25d
Hardware: No this hardware would not make simulations faster. Different hardware could speed it up some, but since simulations already are done on supercomputers running at multiple ghz, the speedup would be about 1 OOM, as this is typical for going from general purpose processors to ASICs. It would still be the bottleneck. This lesswrong post argues pretty convincingly that simulations cannot model everything, especially for behavior relevant to nanotechnology and medicine: Assuming the physicist who wrote that lesswrong post is correct, cycles of trial and error and prototyping and experiments are unavoidable. I also agree with the post for a different reason : real experimental data, such as human written papers on biology or nanoscale chemistry, leave enough uncertainty to fit trucks through. The issue is that you have hand copied fields of data, large withheld datasets because they had negative findings, needlessly vague language to describe what was done, different labs at different places with different staff and equipment, different subjects (current tech cannot simulate or build a living mockup of a human body and there is insufficient data due to the above to do either), and so on. You have to try things, even if it's just to collect data you will use in your simulation, and 'trying stuff' is slow. (mammalian cells take hours to weeks to grow complex structures. electron beam nanolathes take hours to carve a new structure. etc.)

Throughput doesn't straightforwardly accelerate history, serial speedup does. At a serial speedup of 10x-100x, decades pass in a year. If an autonomous researcher AGI develops better speculative decoding and other improvements during this time, the speedup quickly increases once the process starts, though it might still remain modest without changing hardware or positing superintelligence, only centuries a year or something.

For neurons-to-transistors comparison, probably both hardware and algorithms would need to change to make this useful, but then the cr... (read more)

4Gerald Monroe25d
Unless the project is completely simulable (example go or chess), then you're rate limited by the slowest serial steps. This is just Amdahls law. I mean absolutely this will help and you also can build your prototypes in parallel or test the resulting new product in parallel, but the serial time for a test becomes limiting for the entire system. For example, if the product is a better engine, you evaluate all data humans have ever recorded on engines, build an engine sim, and test many possibilities. But there is residual uncertainty in any sim - to resolve this you need thousands of experiments. If you can do all experiments in parallel you still must wait the length of time for a single experiment. Accelerated lifecycle testing is where you compress several years of use into a few weeks of testing. You do this by elevating the operating temperature and running the device under extreme load for the test period. So if you do "1 decade" of engine design in 1 day, come up with 1000 candidate designs, then you manufacture all 1000 in parallel over 1 month (casting and machining have slow steps), then 1 more month of accelerated lifecycle testing. Suppose you need to do this iteration loop 10 times to get to "rock solid" designs better than currently used ones. Then it took you 20 months, vs 100 years for humans to do it. 50 times speedup is enormous but it's not millions. I think a similar argument applies for most practical tasks. Note that tasks like "design a better aircraft" have the same requirement for testing. Design better medicine crucially does. One task that seems like an obvious one for AI R&D: design a better AI processor, is notable because current silicon fabrication processes take months of machine time, and so are a pretty extreme example, where you must wait months between iteration cycles. Also note you needed enough to robotic equipment to do the above. Since robots can build robots thats not going to take long, but you have several years or

The important part isn't assertions (which honestly I don't see here), it's asking the question. Like with advice, it's useless when taken as a command without argument, but as framing it's asking whether you should be doing a thing more or less than you normally do it, and that can be valuable by drawing attention to that question, even when the original advice is the opposite of what makes sense.

With discussion of potential issues of any kind, having norms that call for avoiding such discussion or for burdening it with rigor requirements makes it go away... (read more)

I think it would be proper to provide a specific prediction, so here is one: Assuming that we could somehow quantify "good done" and "cutting corners", I expect a negative correlation between these two among the organizations in EA environment.

It spits out much scarier information than a google search supplies. Much.

I see a sense in which GPT-4 is completely useless for serious programming in the hands of a non-programmer who wouldn't be capable/inclined to become a programmer without LLMs, even as it's somewhat useful for programming (especially with unfamiliar but popular libraries/tools). So the way in which a chatbot helps needs qualification.

One possible measure is how much a chatbot increases the fraction of some demographic that's capable of some achievement within some amount of time.... (read more)

2Nathan Helm-Burger1mo
Yes, I quite agree. Do you have suggestions for what a credible objective eval might consist of? What sort of test would seem convincing to you, if administered by a neutral party?
I was trying to say "cost in time/money goes down by that factor for some group".

I'd like to at least see some numbers before you declare something immoral and dangerous to discuss!

Discussing hypothetical dangers shouldn't require numbers. It's probably not so dangerous to discuss hypothetical dangers that they shouldn't be discussed when there are no numbers.

2Garrett Baker1mo
This is correct in general. For this particular discussion? It may be right. Numbers may be too strong a requirement to change my mind. At least a Fermi estimate would be nice, also any kind of evidence, even personal, supporting Viliam’s assertions will definitely be required.

A bad map that expresses the territory with great uncertainty can be confidently called a bad map, calling it a good map is clearly wrong. In that sense the shoggoth imagery reflects the quality of the map, and as it's clearly a bad map, better imagery would be misleading about the map's quality. Even if the underlying territory is lovely, this isn't known, unlike the disastorous quality of the map of the territory, whose lack of quality is known with much more confidence and in much greater detail. Here be dragons.

(This is one aspect of the meme where it ... (read more)

My point is that algorithmic improvements (in the way I defined them) are very limited, even in the long term, and that there hasn't been a lot of algorithmic improvement in this sense in the past as well. The issue is that the details of this definition matter, if you start relaxing them and start interpreting "algorithmic improvement" more informally, you become able to see more algorithmic improvement in the past (and potentially in the future).

One take is how in the past, data was often limited and carefully prepared, models didn't scale beyond all rea... (read more)

Before Chinchilla scaling, nobody was solving the relevant optimization problem. Namely, given a perplexity target, adjust all parameters including model size and geometry, sparsity, and amount of data (sampled from a fixed exceedingly large dataset) to hit the perplexity target with as few FLOPs as possible. Do this for multiple perplexities, make a perplexity-FLOP plot of optimized training runs to be able to interpolate. Given a different achitecture with its own different plot, estimated improvement in these FLOPs for each fixed perplexity within some ... (read more)

Thanks for the comment! I agree that would be a good way to more systematically measure algorithmic efficiency improvements. You won't be able to infer the effects of differences in data quality though - or are you suggesting you think those are very limited anyway?

This is another example of how matching specialized human reasoning skill seems routinely feasible with search guided by 100M scale networks trained for a task a human would spend years mastering. These tasks seem specialized, but it's plausible all breadth of human activity can be covered with a reasonable number of such areas of specialization. What's currently missing is automation of formulation and training of systems specialized in any given skill.

The often touted surprisingly good human sample efficiency might just mean that when training is set up ... (read more)

Philosophy and to some extent even decision theory are more like aspects of value content. AGIs and ASIs have the capability to explore them, if only they had the motive. Not taking away this option and not disempowering its influence doesn't seem very value-laden, so it's not pivotal to explore it in advance, even though it would help. Avoiding disempowerment is sufficient to eventually get around to industrial production of high quality philosophy. This is similar to how the first generations of powerful AIs shouldn't pursue CEV, and more to the point don't need to pursue CEV.

I think human level AGIs being pivotal in shaping ASIs is very likely if AGIs get developed in the next few years as largely the outcome of scaling, and still moderately likely overall. If that is the case, what matters is alignment of human level AGIs and the social dynamics of their deployment and their own activity. So control despite only being aligned as well as humans are (or somewhat better) might be sufficient, as one of the things AGIs might work on is improving alignment.

The point about deceptive alignment being a special case of trustworthiness ... (read more)

Agreed, and obviously that would be a lot more practicable if you knew what its trigger and secret goal were. Preventing deceptive alignment entirely would be ideal, but failing that we need reliable ways to detect it and diagnose its details: tricky to research when so far we only have model organisms of it, but doing interpretability work on those seems like an obvious first step.

It seems very weird to ascribe a generic "bad takes overall" summary to that group, given that you yourself are directly part of it.

This sentence channels influence of an evaporative cooling norm (upon observing bad takes, either leave the group or conspicuously ignore the bad takes), also places weight on acting on the basis of one's identity. (I'm guessing this is not in tune with your overall stance, but it's evidence of presence of a generator for the idea.)

The question of whether a human level AGI safety plan is workable is separate from the question of presence of ASI risk. Many AGI safety plans, not being impossibly watertight, rely on the AGI not being superintelligent, hence the distinction is crucial for the purpose of considering such plans. There is also some skepticism of it being possible to suddenly get an ASI, in which case the assumption of AGIs being approximately human level becomes implicit without getting imposed by necessity.

The plans for dealing with ASI risk are separate, they go through t... (read more)

Are AIs really so similar to aliens — something we have literally no actual experience with — but aren't similar to real physical objects that we are familiar with like LLMs and domesticated animals?

Being real or familiar has nothing to do with being similar to a given thing.

Current open source models are not themselves any kind of problem. Their availability accelerates timelines, helps with alignment along the way. If there is no moratorium, this might be net positive. If there is a moratorium, it's certainly net positive, as it's the kind of research that the moratorium is buying time for, and it doesn't shorten timelines because they are guarded by the moratorium.

It's still irreversible proliferation even when the impact is positive. The main issue is open source as an ideology that unconditionally calls for publishing all the things, and refuses to acknowledge the very unusual situations where not publishing things is better than publishing things.

On being able to predictably publish papers as a malign goal, one point is standards of publishability in existing research communities not matching what's useful to publish for this particular problem (which used to be the case more strongly a few years ago). Aiming to publish for example on LessWrong fixes the issue in that case, though you mostly won't get research grants for that. (The other point is that some things shouldn't be published at all.)

In either case, I don't see discouragement from building on existing work, it's not building arguments out... (read more)

A general point unrelated to use of analogies for AI risk specifically is that a demand for only using particular forms of argument damages ability to think or communicate, by making that form of argument less available, even in cases where it would turn out to be useful.

An observation that certain forms of argument tend to be useless or misleading should take great care to guard against turning into a norm. If using a form of argument requires justification or paying a social cost of doing something that's normally just not done, that makes it at least sl... (read more)

Even if the only acceptable forms are the only valid forms?
-3Joshua Porter1mo
You write about using forms of arguments and hampering the ability to communicate and how it can hamper understanding. I think there are many ways of getting at a truth but to get attached to one form of attaining just makes it harder to attain it. In this case, analogies would be an example of an argument I believe, so I'd disagree with what you say at the beginning about it being unrelated to AI risk analogies. I think analogies are a great way to introduce new ideas to people who are hearing an idea for the first time. Analogies helps you learn from what you already know, when it becomes a problem, I think, is when you get attached to the analogy and try to make it fit your argument in a way that the analogy obscures the truth.  Ultimately, we are aiming to seek out truth so it's important to see what an analogy may be trying to portray, and as you learn about a topic more, you can let go of the idea of the analogy. I think learning about anything follows this formula of emptying your cup so that it can become full once again. Being able to let go of previous understandings in exchange for a more accurate one. The blog post you link for inconveniences also makes sense, since if I am learning about a new topic, I am much more likely to continue learning if it is made initially easy, with the difficulty and complexity of the topic scaling with my understanding.  If we are to not use analogies as a convenient way to get introduced into a new topic, what would be a good alternative that is somewhat simple to understand for a novice?

taking your ASI-level opponents seriously

The distinction between human level AGIs and ASIs is often crucial when discussing risks and control/alignment methods. Yet moratorium advocates often object to plans aimed at human level AGIs by pointing at ASIs, while AGI-building advocates often object to risks of ASI by pointing at plans aimed at human-level AGIs.

So more clarity on this distinction should help. The real cruxes are mostly in whether sufficient safety work can be extracted from human level AGIs before there are also ASIs to deal with, and how well the plans aimed at human level AGIs actually work for human level AGIs.

I don't see this distinction as mattering much: how many ASI paths are there which somehow never go through human-level AGI? On the flip side, every human-level AGI is an ASI risk.
Load More