AI #16: AI in the UK

[-]Rohin Shah3y141

Biggest disagreement I have with him is that, from my perspective, is that he understands that the optimization deck stacks against us, but he does not understand the degree to which the optimization deck is stacked against us or the extent to which this ramps up and diversifies and changes its sources as capabilities ramp up, and thus thinks many things could work that I don’t see as having much chance of working. I also don’t think he’s thinking enough about what type and degree of alignment would ‘be enough’ to survive the resulting Phase 2.

I'm not committing to it, but if you wrote up concrete details here, I expect I'd engage with it.

[-]Zvi3y80

Thanks! I've seen various attempts to articulate different aspects of this but I may be able to find a better way to put it, and hopefully it would at least help you understand the position better.

[-]Orpheus163y20

Perhaps a podcast discussion between you two would be interesting and/or productive. Or perhaps a Slack discussion that turns into a post (sort of like this):

https://www.lesswrong.com/posts/JdGuqg7ifRwPiirCe/wentworth-and-larsen-on-buying-time

If you’re interested, I would be happy to moderate or help find a suitable moderator.

[-]Zvi3y20

Thank you for reminding me on this, I need to keep at it. I've been working on getting this into written form but the exercise demands clarity in various places in ways that actually make it really hard, also useful. So it's taking a while.

Once I get that right, I'm hoping we can chat - we've done so before and I think it went well.

[-]Templarrr3y40

You would not want to use that calculator as part of a computer program, sure

Floating point shenanigans has entered the chat.

A lot of math running under the hood of modern programs, especially with heavy matrix/tensors calculations and especially ran on GPU without guaranteed order of operations (so - all SOTA AI systems ) are much closer to 95% accurate calculator then to 100%. This is already the world we live in.

[-]faul_sname3y42

Sayash Karpoor and Arvind Narayanan say licensing of models wouldn’t work because it is unenforceable, and also it would stifle competition and worsen AI risks. I notice those two claims tend to correlate a lot, despite it being really very hard for both of them to be true at once – either you are stifling the competition or you’re not, although there is a possible failure mode where you harm other efforts but not enough. The claimed ‘concentration risks’ and ‘security vulnerabilities’ do not engage with the logic behind the relevant extinction risks.

Making it harder to legally use models accomplishes two things:

Decreases the number of people who use those models
Among the people who are still using the models, increases the fraction of them who broke laws to do that.

Consider the situation with opiates in the US: our attempts to erect legal barriers for people obtaining opiates has indeed reduced the number of people legally obtaining opiates, and probably even reduced total opiate consumption, but at the cost that a lot of people were driven to buy their opiates illegally instead of going through medical channels.

I don't expect computing power sufficient to train powerful models to be easier to control than opiates, in worlds where doom looks like rapid algorithmic advancements that decrease the resource requirements to train and run powerful models by orders of magnitude.

[-]Nathan Helm-Burger3y30

Sam Altman (in context of risks of AI): “What I lose the most sleep over is the hypothetical idea that we already have done something really bad by launching ChatGPT.”
That seems misplaced, in the sense that launching OpenAI and its general path of development seem like the places to be most worried you are in error, unless the error in ChatGPT is ‘get everyone excited about AI.’ Which is a different risk model that has many implications. I would have liked to hear more details of the threat model here.

I can't speak to Sam's threat model, but my current threat model is that the greatest risk comes from open source code and models, or non-safety-conscious labs. I think our best hope is for a safety conscious lab to get to a powerful AGI first and keep it safely contained, and study it and make iterative progress on aligning it well enough to use it to undertake a pivotal act. If this roughly corresponding to Sam's view, then I think what he said about ChatGPT would make sense, since the hype seems to be greatly speeding up the open source developments and drawing in more funding for new attempts. Especially if he, like me, believes that there are algorithmic advances which can be discovered that are so powerful that they could allow small orgs to leap frog all the big ones overnight. In such a scenario, the more independent groups you have making a serious try in approximately the right direction, the more 'lottery tickets' that Moloch gets to buy in the kill-us-all lottery.

[-]Noosphere893y*31

A lot of my hope is definitely in the ‘we don’t find a way to build an AGI soon’ bucket.

My biggest hopes, at least for doom, lie in the fact that instrumental convergence both is too weak of an assumption, without other assumptions to be a good argument for doom, and the fact that unbounded instrumental convergence is actually useless for capabilities, compared to much more bounded instrumental convergence making alignment way easier (It's still hard, but not nearly as hard as many doomers probably think).

Cf this post on how instrumental convergence is mostly wrong for predicting that AI doom will happen:

https://www.lesswrong.com/posts/w8PNjCS8ZsQuqYWhD/instrumental-convergence-draft

But now, onto my main comment here:

None. Of. That. Has. Anything. To. Do. With. Us. Not. Dying.

It is deeply troubling to see the question of extinction risk not even dismissed. It is ignored entirely.

I do like the discussion about releasing low-confidence findings with warnings attached, rather than censoring low-confidence results. You love to see it.

I'm going to be blunt and say that the attitude expressed in the first sentence is a good representative of an attitude I hate on LW: The idea that the scientific method, which generally includes empirical evidence is fundamentally irrelevant to a field is a very big problem that I see here, because as Richard Ngo in my view correctly said what the problem is with the attitude that the scientific method and empirical evidence are irrelevant to AI safety, and here goes:

Historically, the way that great scientists have gotten around this issue is by engaging very heavily with empirical data (like Darwin did) or else with strongly predictive theoretical frameworks (like Einstein did). Trying to do work which lacks either is a road with a lot of skulls on it. And that's fine, this might be necessary, and so it's good to have some people pushing in this direction, but it seems like a bunch of people around here don't just ignore the skulls, they seem to lack any awareness that the absence of the key components by which scientific progress has basically ever been made is a red flag at all.

In particular, this is essentially why I'm so concerned about the epistemics of AI safety, especially on LW, because this dissing of empirical evidence/the scientific method is to put it bluntly, a good example of not realizing that basically all of our ability to know much of anything is based on that.

I really, really wish LWers aren't nearly this hostile to admitting that the scientific method/empirical evidence mattered as Zvi is showing here.

EDIT: I retract most of this comment, with the exception of the first paragraph.

[-]AnthonyC3y20

Not irrelevant, just insufficient. And that's not a dig against the field, it's true of every human endeavor that requires quick decision making based on incomplete data, or does not permit multiple attempts to succeed at something. Science has plenty of things to say before and after, about training for such tasks or understanding what happened and why. And that's true here, too. There's plenty science can tell us now, in advance, that we can use in AI and AI alignment research. But the problem is, aligning the first AGI or ASI may be a one-shot opportunity: succeed the first time or everyone dies. In that scenario, the way the scientific method is usually carried out (iterative hypothesis testing) is obviously insufficient, in the same way that such a method is insufficient for testing defenses against Earth-bound mile-wide asteroids.

[-]Noosphere893y10

I disagree with this, but note that this is a lot saner than the original response I was focusing on. The point I was trying to make was that it was either very surprising or indicated an example of very bad epistemics that science was not relevant to making us not dying, which is much stronger than your claim, and it would definitely need to be way better defended than this post did.

I disagree with your comment, but the claim you make is way less surprising than Zvi's claim on science.

[-]AnthonyC3y30

I don't think it is much stronger, I think Zvi is shorthanding an idea that has been discussed many times and at much greater length on this site and elsewhere. The fact that scientists usually know which clusters of hypotheses are worth testing, long before our scientific institutions would consider them justified in claiming that anyone knows the answer, is already sufficiently strong evidence that "the scientific method," as it is currently instantiated in the world, has much stricter standards of evidence than what epistemology fundamentally allows. Things like the replication crisis are similarly strong evidence that its standards are somewhat misaligned with epistemology, in that they can lead scientists astray for a long time before evidence builds up that forces it back on track.

The specific claim here is not "science as a whole and scientific reasoning are irrelevant." It's "If we rely on getting hard, reliable scientific evidence to align AGI, that will generally require many failed experiments and disproven hypotheses, because that's how Science accumulates knowledge. But in a context where a single failed experiment can result in human extinction, that's just not going to be a process that makes survival likely." Which, we can disagree on the premise about whether we're facing such a scenario, but I really don't understand how to meaningfully disagree with the conclusion given the premise.

As an example: If the Manhattan Project physicists had been wrong in their calculations and the Trinity test had triggered a self-sustaining atmospheric nitrogen fission/fusion reaction, humanity would have gone extinct in seconds. This would have been the only experimental evidence in favor of the hypothesis, and it would have arrived too late to save humanity. In that case we were triply lucky: the physicists thought of the possibility, took it seriously enough to do the calculations, and were correct in their conclusion that it would not happen. Years later, they were wrong about similar (but more complicated) calculations on whether lithium-6 would contribute significantly to H-bomb yields, but thankfully this error was not the existential one.

Similarly, there were extremely strong reasons why we knew the LHC was not going to destroy the planet with tiny black holes or negative strangelets or whatever other nonsense was thrown around in popular media before it started up, and the scientists involved thought carefully about the possibilities anyway. But, the whole point of experiments is to look for the places where our models and predictions are wrong, and AI doesn't have anywhere near enough of a theoretical basis to make the strong predictions that particle physics does.

[-]Noosphere893y76

I have sort of changed my mind on this, in that while I still disagree with Zvi and you, I now think that my response to Zvi was way too uncharitable, and as a consequence I'll probably retract my first comment.

I disagree with the premise, and one of the assumptions used very often in EA/LW analyses isn't enough to show it, without other assumptions, though

I might respond to the rest later on.

[-]AnthonyC3y20

I look forward to reading it if you do!

[-]Nathan Helm-Burger3y30

My response to Xuan would be that I don’t expect us to ‘just add data and compute’ in the next seven years, or four years. I expect us to do many other things as well. If you are doing the thought experiment expecting no such progress, you are doing the wrong thought experiment. Also my understanding is we already have a pattern of transformers looking unable to do something, then we scale and suddenly that changes for reasons we don’t fully understand.

I copied Xuan's twitter thread into the comments section of the LessWrong post of Jacob Steinhardt's GPT 2030 predictions. In defense of Xuan, she also says that she does not expect us to 'just add data and compute'. She said that she thinks Jacob's predictions are unlikely seeming iff you assume we will do nothing but add data and compute, but that she thinks this is unlikely. Thus, she is in agreement with you about criticizing Jacob's assumption of 'only data and compute added'.

Where you seem to differ from her point of view is that you think that additional data and compute only could indeed lead to novel and surprising emergent capabilities. She seems to think we've found about all there is to find there.

I, in agreement with you, believe that there are more novel emergent capabilities to be found through only adding data and compute. I do however think that we have reached a regime of diminishing returns. I believe that much greater efficiency in making forward progress will be found through algorithmic progress, and so the fact that technically more data and compute alone would be sufficient will become irrelevant since that will be exorbitantly expensive compared to discovering and utilizing algorithmic improvements.

It is my understanding that this is also what you think, and perhaps also what Xuan thinks but I haven't read enough from her to know that. I've only read Xuan's one twitter thread I copied.

<rant> This is a prime example of why I dislike Twitter so much as a medium of debate. The fractured tree of threads and replies makes it too hard to see a cohesive discussion between people and get a comprehensive understanding of other's viewpoints. The nuance gets lost. Had this discussion taken place in a comment thread in a forum, such as this one, then it would have been much easier to tell when you'd read Xuan's entire comment thread and gotten her full viewpoint. </rant>

[-]Zvi3y42

I hear that rant. I do my best now to reconcile thing into a cohesive whole in text but that means the timestamps are gone, so if I miss something later I can't tell.

I am skeptical that we're near the end of the useful road on data/compute, although I agree that in the 2030 timeframe that's not where the low hanging fruit is mostly going to be. My prediction of 'this 2030 arrives by 2027' is based largely on other ways of improving.

To extent compute/data help I think of it more as helping to enable other things.

[-]Nathan Helm-Burger3y20

Yes, in particular, more compute means it's easier to automate searches for algorithmic improvements....

[-]Zac Hatfield-Dodds3y22

Re:

Meta’s commitment to open source

Note that only one of Meta's recent releases have been open-source.

For Llama, the code is available under GPL-3 (ie open source), but the model itself and weights are under a bespoke non-commercial license.
Segment Anything is truly open source under the Apache 2.0 license.
Massively Multilingual Speech is CC-BY-NC, a standard but noncommercial and thus non-open-source license.

I'm an open-source maintainer myself, though not an absolutist or convinced that eg Llama should have been open-sourced. I do however find it pretty frustrating when these models are incorrectly described as open source (including by Yann LeCun, who ought to know better). As is, we collectively get many of the research benefits, all the misuse, but very little of the commercial innovation or neat product improvements that open-sourcing would bring.

[-]Nathan Helm-Burger3y62

Yeah, a better way to phrase this might be: "Meta's dismissal of potential security risks from making code and model weights publically available."

[-]Nathan Helm-Burger3y20

I’d also worry a lot about scenarios where the difficulty level in practice is hard but not impossible, where making incremental progress is most likely to lead you down misleading paths that make the correct solutions harder rather than easier to find, because you have a much harder time directing attention to them, keeping attention there, or being rewarded for any incremental progress given ‘the competition’ and you have to worry those in charge will go with other, non-working solutions instead at many points.
...
I do not think that long-term alignment is so continuous with near-term reliability and control. I expect that successful solutions will likely be found to many near-term reliability and control problems, and that those solutions will have very little chance of working on AGI and then ASI systems. If I did not believe this, a lot of the strategic landscape would change dramatically, and my p(doom) from lack of alignment would decline dramatically, although I would still worry about whether that alignment actually saves us from the dynamics that happen after that – which is an under-considered problem.

I strongly agree with this, and have discussed this with a friend who does theoretical safety research who also agrees. It's easy to get excited by making incremental iterative progress. That's a thing that teams of humans tend to be great at. This makes it much easier to put additional resources into it. But it's likely that focusing only on this would lead us to ignore the likely fact that we're iterating on the wrong things and marching into a blind alley. I, and my friend, expect that the true paths to good solutions are not yet found and thus not yet available for incremental iterative progress. If that's the case, we need more theoretical researchers, and more serial time, to get to the beginning of a workable path to iterate on.

If these currently-iterable safety-ish plans manage to buy us at least a delay before things get doom-y, then that could be a benefit. I think it's plausible they could buy us a year or two of doom-delay.

Related: I also think that compute governance is potentially a good idea in that it might buy us some delay while we are in a low-algorithmic-efficiency regime. It current seems like the first models to be really dangerous will be made by well-resourced groups using lots of compute. I think that compute governance is potentially a terrible idea in that I expect it to fail completely and suddenly when, in the not so distant future, we transition to a high-algorithmic-efficiency regime. Then the barrier will be knowledge of the efficient algorithms, not large amounts of compute. I believe we can know that this high-algorithmic-efficiency regime exists because of looking at the way compute and learning work in the brain, but that we can't be sure of when algorithmic leaps will be made or how far they will get us. So if we put our trust in compute governance, we are driving our bus onto a lake which we know ahead of time has patches of ice thin enough to suddenly give way beneath us. With no way to know when we will reach the weak ice. Seems scary.

[-]Nathan Helm-Burger3y20

Chris Olah: One of the ideas I find most useful from @AnthropicAI‘s Core Views on AI Safety post (https://anthropic.com/index/core-views-on-ai-safety…) is thinking in terms of a distribution over safety difficulty. Here’s a cartoon picture I like for thinking about it:

I like this picture a lot. I personally place the peak of my distribution in between Apollo and P/NP. My lower tail does not go as low as Steam Engine, my upper tail does include impossible.