zoop - LessWrong

I hear what you're saying. I probably should have made the following distinction:

A technology in the abstract (e.g. nuclear fission, LLMs)
A technology deployed to do a thing (e.g. nuclear in a power plant, LLM used for customer service)

The question I understand you to be asking is essentially how do we make safety cases for AI agents generally? I would argue that's more situation 1 than 2, and as I understand it safety cases are basically only ever applied to case 2. The nuclear facilities document you linked definitely is 2.

So yeah, admittedly the document you were looking for doesn't exist, but that doesn't really surprise me. If you started looking for narrowly scoped safety principles for AI systems you start finding them everywhere. For example, a search for "artificial intelligence" on the ISO website results in 73 standards .

Just a few relevant standards, though I admit, standards are exceptionally boring (also many aren't public, which is dumb):

UL 4600 standard for autonomous vehicles
ISO/IEC TR 5469 standard for ai safety stuff generally (this one is decently interesting)
ISO/IEC 42001 this one covers what you do if you set up a system that uses AI

You also might find this paper a good read: https://ieeexplore.ieee.org/document/9269875

New report: Safety Cases for AI

zoop1mo30

I've published in this area so I have some meta comments about this work.

First the positive:

1. Assurance cases are the state of the art for making sure things don't kill people in a regulated environment. Ever wonder why planes are so safe? Safety cases. Because the actual process of making one is so unsexy (GSNs make me want to cry), people tend to ignore them, so you deserve lots of credit for somehow getting ex-risk people to upvote this. More lesswronger types should be thinking about safety cases.

2. I do think you have good / defensible arguments overall, minus minor quibbles that don't matter much.

Some bothers:

1. Since I used to be a little involved, I am perhaps a bit too aware of the absolutely insane amount of relevant literature was not mentioned. To me, the introduction made it sound a little bit like the specifics of applying safety cases to AI systems have not been studied. That is very, very, very not true.

That's not to say you don't have a contribution! Just that I don't think it was placed well in the relevant literature. Many have done safety cases for AI but they usually do it as part of concrete applied work on drones or autonomous vehicles, not ex-risk pie-in-the-sky stuff. I think your arguments would be greatly improved by referencing back to this work.

I was extremely surprised to see so few of the (to me) obvious suspects referenced, particularly more from York. Some labs with people that publish lots in this area.

University of York Institute for Safe Autonomy
NASA Intelligent Systems Division
Waterloo Intelligent Systems Engineering Lab
Anything funded by the DARPA Assured Autonomy program

2. Second issue is a little more specific, related to this paragraph:

To mitigate these dangers, researchers have called on developers to provide evidence that their systems are safe (Koessler & Schuett, 2023; Schuett et al., 2023); however, the details of what this evidence should
look like have not been spelled out. For example, Anderljung et al vaguely state that this evidence should be “informed by evaluations of dangerous capabilities and controllability”(Anderljung et al., 2023). Similarly, a recently proposed California bill asserts that developers should provide a “positive safety determination” that “excludes hazardous capabilities” (California State Legislature, 2024). These nebulous requirements raise questions: what are the core assumptions behind these evaluations? How might developers integrate other kinds of evidence?

The reason the "nebulous requirements" aren't explicitly stated is that when you make a safety case you assure the safety of a system against specific relevant hazards for the system you're assuring. These are usually identified by performing a HAZOP analysis or similar. Not all AI systems have the same list of hazards, so its obviously dubious to expect you can list requirements a priori. This should have been stated, imo.

Lsusr's Rationality Dojo

zoop2mo1-2

I don't think it works if there isn't a correct answer, e.g. predicting the future, but I'm positive this is a good way to improve how convincing your claims are to others.

If there isn't ground truth about a claim to refer to, any disagreement around a claim is going to be about how convincing and internally/externally consistent the claim is. As we keep learning from prediction markets, rationale don't always lead to correctness. Many cases of good heuristics (priors) doing extremely well.

If you want to be correct, good reasoning is often a nice-to-have, not a need-to-have.

AI Is Not Software

zoop4mo3-2

I very strongly disagree. In my opinion, this argument appears fatally confused about the concept of "software."

As others have pointed out, this post seems to be getting at a distinction between code and data, but many of the examples of software given by OP contain both code and data, as most software does. Perhaps the title should have been "AI is Not Code," but since it wasn't I think mine is a legitimate rebuttal.

I'm not trying to make an argument by definition. My comment is about properties of software that I think we would likely agree on. I think OP both ignores some properties software can have while assuming all software shares other separate properties, to the detriment of the argument.

I think the post is correct in pointing out that traditional software is not similar to AI in many ways, but that's where my agreement ends.

1: Software, I/O, and such

Most agree on the following basic definition: software is a set of both instructions and data, hosted on hardware, that governs how input data is transformed to some sort of output. As you point out, inputs and outputs are not software.

For example, photos of a wedding or a vacation aren’t software, even if they are created, edited, and stored using software.

Yes.

Second, when we run the model, it takes the input we give it and performs “inference” with the model. This is certainly run on the computer, but the program isn’t executing code that produces the output, it’s using the complicated probability model which grew, and was stored as a bunch of numbers.

No! It is quite literally executing code to produce the output! Just because this specific code and the data it interacts with specifies a complicated probability model that does not mean it is not software.

Every component of the model is software. Even the pseudorandomness of the model outputs is software (torch.randn(), often). There is no part of this inference process that generates outputs that is not software. To run inference is only to run software.

2: Stochasticity

The model responds to input by using the probability model to estimate the probability of difference responses, in order to output something akin to what the input data did - but it does so in often unexpected or unanticipated ways.

Software is often, but is not necessarily deterministic. Software can have stochastic or pseudorandom outputs. For example, software that generates pseudorandom numbers is still software. The fact that AI generates stochastic outputs humans don't expect does not make it not software.

Also, software is not necessarily interpretable and outputs are not necessarily expected or expectable.

3: Made on Earth by Humans

First, we can talk about how it is created. Developers choose a model structure and data, and then a mathematical algorithm uses that structure and the training data to “grow” a very complicated probability model of different responses... The AI model itself, the probability model which was grown, is generating output based on a huge set of numbers that no human has directly chosen, or even seen. It’s not instructions written by a human.

Neither a software's code nor its data is necessarily generated by humans.

4: I have bad news for you about software engineering

Does software work? Not always, but if not, it fails in ways that are entirely determined by the human’s instructions.

This is just not true, many bugs are caused by specific interactions between inputs and the code + data, some also caused by inputs, code, data, and hardware (buffer overflows being the canonical example). You could get an error due to cosmic bit flips, that has nothing to do with humans or instructions at all! Data corruption... I could go on and on.

For example, unit tests are written to verify that the software does what it is expected to do in different cases. The set of cases are specified in advance, based on what the programmer expected the software to do.

... or the test is incorrect. Or both the test and the software are incorrect. Of course this assumes you wrote tests, which you probably didn't. Also, who said you can't write unit tests for AI? You can, and people do. All you have to do is fix the temperature parameter and random seed. One could argue benchmarks are just stochastic tests...

If it fails a single unit test, the software is incorrect, and should be fixed.

Oh dear. I wish the world worked like this.

Badly written, buggy software is still software. Not all software works, and it isn't always software's fault. Not all software is fixable or easy to fix.

5: Implications

What we call AI in 2024 is not software. It's kind of natural to put it in the same category as other things that run on a computer, but thinking about LLMs, or image generation, or deepfakes as software is misleading, and confuses most of the ethical, political, and technological discussions.

In my experience, thinking of AI as software leads to higher quality conversations about the issues. Everyone understands at some level that software can break, be misused, or be otherwise in-optimal for any number of reasons.

I have found that when people begin to think AI is not software, they often devolve into dorm room philosophy debates instead of dealing with its many concrete, logical, potentially fixable issues.

Social Dark Matter

zoop5mo5-2

I think this post is probably correct, but I think most of the discourse over-complicated what I interpret to be the two core observations:

People condition their posteriors on how and how much things are discussed.
Societal norms affect how and how often things are discussed.

All else follows. The key takeaway for me is that you should also condition your posteriors on societal norms.

We Should Prepare for a Larger Representation of Academia in AI Safety

zoop8mo176

Here be cynical opinions with little data to back them.

It's important to point out that "AI Safety" in an academic context usually means something slightly different from typical LW fare. For starters, as most AI work descended from computer science, its pretty hard [1] to get anything published in a serious AI venue (conference/journal) unless you

Demonstrate a thing works
Use theory to explain a preexisting phenomenon

Both PhD students and their advisors want to publish things in established venues, so by default one should expect academic AI Safety research to have a near-term prioritization and be less focused on AGI/ex-risk. That isn't to say research can't accomplish both things at once, but its worth noting.

Because AI Safety in the academic sense hasn't traditionally meant safety from AGI ruin, there is a long history of EA aligned people not really being aware of or caring about safety research. Safety has been getting funding for a long time, but it looked less like MIRI and more like the University of York's safe autonomy lab [2] or the DARPA Assured Autonomy program [3]. With these dynamics in mind, I fully expect the majority of new AI safety funding to go to one of the following areas:

Aligning current gen AI with the explicit intentions of its trainers in adversarial environments, e.g. make my chatbot not tell users how to make bombs when users ask, reduce the risk of my car hitting pedestrians.
Blurring the line between "responsible use" and "safety" (which is a sort of alignment problem), e.g. make my chatbot less xyz-ist, protecting training data privacy, ethics of AI use.
Old school hazard analysis and mitigation. This is like the hazard analysis a plane goes through before the FAA lets it fly, but now the planes have AI components.

The thing that probably won't get funding is aligning a fully autonomous agent with the implicit interests of all humans (not just trainers), which generalizes to the ex-risk problem. Perhaps I lack imagination, but with the way things are I can't really imagine how you get enough published in the usual venues about this to build a dissertation out of it.

[1] Yeah, of course you can get it published, but I think most would agree that its harder to get a pure theory ex-risk paper published in a traditional CS/AI venue than other types of papers. Perhaps this will change as new tracks open up, but I'm not sure.

[2] https://www.york.ac.uk/safe-autonomy/research/assurance/

[3] https://www.darpa.mil/program/assured-autonomy

Building and Entertaining Couples

zoop1y92

The core B/E dichotomy rang true, but the post also seemed to imply a correlated separation between autonomous and joint success/failure modes: building couples succeed/fail on one thing together, entertaining couples succeed/fail on two things separately.

I have not observed this to be true. Experientially, it seems a little like a quadrant, where the building / entertaining distinction is about the type of interaction you crave in a relationship, and autonomous / joint distinction is about how you focus your productive energies.

Examples:

Building / Joint: (as above) two individuals building a home / business / family together
Building / Autonomous: two individuals with distinct careers and interests, who both derive great meaning from helping the other achieve their goals.
Entertaining / Joint: two individuals who enjoy entertainment and focus on that pursuit together. A canonical example might be childless couples who frequently travel, host parties, etc, or the "best friends who do everything together" couple everyone knows.
Entertaining / Autonomous: (as above) individuals with separate lives who come together for conversation, sex, etc.

I might be extra sensitive to this, my last relationship failed because my partner wanted an "EJ" relationship while I wanted a "BA" relationship, neither of which followed cleanly from the post.

The Teacup Test

zoop2y10

"What is intelligence?" is a question you can spend an entire productive academic career failing to answer. Intentionally ignoring the nerd bait, I do think this post highlights how important it is for AGI worriers to better articulate which specific qualities of "intelligent" agents are the most worrisome and why.

For example, there has been a lot of handwringing over the scaling properties of language models, especially in the GPT family. But as Gary Marcus continues to point out in his inimitable and slightly controversial way, scaling these models fails to fix some extremely simple logical mistakes - logical mistakes that might need to be fixed by a non-scaling innovation before an intelligent agent poses an ex-risk. On forums like these it has long been popular to say something along the lines of "holy shit look how much better these models got when you add __ amount of compute! If we extrapolate that out we are so boned." But this line of thinking seems to miss the "intelligence" part of AGI completely, it seemingly has no sense at all of the nature of the gap between the models that exist today and the spooky models they worry about.

It seems to me that we need a better specification for describing what exactly intelligent agents can do and how they get there.

Transformer language models are doing something more general

zoop2y20

I'm seeking some clarification, my reading of your post is that you see the following concepts as intertwined:

Efficient representation of learned information
Efficient learning of information

As you point out (and I agree) that transformer parameters live in a small space and the realities of human biology seem to imply that we can do #1 better, that is, use a "lighter" algorithm with fewer free parameters to store our learned information.

If I understand you correctly, you believe that this "far more efficient architecture trying to get out" would also be better at #2 (require less data to reach this efficient representation). While I agree that an algorithm to do this better must exist, it is not obvious to me that a better compressed/sparse storage format for language models would necessarily require less data to train.

So, questions: Did I misunderstand you, and if so, where? Are there additional reasons you believe the two concepts to be correlated?

LESSWRONG
LW

Posts

Wiki Contributions

Comments