Sorted by New

Wiki Contributions


I think a takeaway here is that organizational maze-fulness is entropy: you can keep it low with constant effort, but it's always going to increase by default.

I feel like there's a better name to be found for this. Like, some name that is very obviously a metaphor for the concept of Sazen, in a way that helps you guess the concept if you've been exposed to it before but have never had a name for it.

Something like "subway map" or "treasure map", to convey that it's a compression of information meant to help you find it; except the name also needs to express that it's deceiving and may lead to illusion of transparency, where you think you understood but you didn't really.

Maybe "composite sketch" or photofit? It's a bit of a stretch though.

Reading Worth the Candle with a friend gave us a few weird words that are sazen in and of themselves

I'd be super interested in specifics, if you can think of them.

One big obstacle you didn't mention: you can make porn with that thing. It's too late to stop it.

More seriously, I think this cat may already be out of the bag. Even if the scientific community and the american military-industrial complex and the chinese military-industrial complex agreed to stop AI research, existing models and techniques are already widely available on the internet.

Even if there is no official AI lab anywhere doing AI research, you will still have internet communities pooling compute together for their own research projects (especially if crypto collapses and everybody suddenly has a lot of extra compute on their hands).

And these online communities are not going to be open-minded about AI safety concerns. We've seen that already with the release of Stable Diffusion 2.0: the internet went absolutely furious that the model was limited in (very limited ways) that impacted performance. People wanted their porn machine to be as good as it could possibly be and had no sympathy whatsoever for the developers' PR / safety / not-wanting-to-be-complicit-with-nonconsensual-porn-fakes concerns.

Of course, if we do get to the point only decentralized communities do AI research, it will be a pretty big win for the "slowing down" strategy. I get your general point about "we should really exhaust all available options even if we think it's nigh impossible". I just think you're underestimating a bit how nigh-impossible it is. We can barely stop people from using fossil fuels, and that's with an infinitely higher level of buy-in from decision-makers.

Good article.

I think a good follow-up article could be one that continues the analogy by examining software development concepts that have evolved to address the "nobody cares about security enough to do it right" problem.

I'm thinking of two things in particular: the Rust programming language, and capability-oriented programming.

The Rust language is designed to remove entire classes of bugs and exploits (with some caveats that don't matter too much in practice). This does add some constraints to how you can build you program; for some developers, this is a dealbreaker, so Rust adoption isn't an automatic win. But many (I don't really have the numbers to quantify better) developers thrive within those limitations, and even find them helpful to better structure their program.

This selection effect has also lead to the Rust ecosystem having a culture of security by design. Eg a pentest team auditing the rustlst crate "considered the general code quality to be exceptional and can attest to a solid impression left consistently by all scope items".

Capability oriented is a more general idea. The concept is pretty old, but still sound: you only give your system as many resources as it plausibly needs to perform its job. If your program needs to take some text and eg count the number of words in that text, you only give the program access to an input channel and an output channel; if the program tries to open a network socket or some file you didn't give it access to, it automatically fails.

Capability-oriented programming has the potential to greatly reduce the vulnerability of a system, because now, to leverage a remote execution exploit, you also need a capability escalation / sandbox escape exploit. That means the capability system must be sound (with all the testing and red-teaming that implies), but "the capability system" is a much smaller attack surface than "every program on your computer".

There hasn't really been a popular OS that was capability-oriented from the ground up. Similar concepts have been used in containers, WebAssembly, app permissions on mobile OSes, and some package formats like flatpak. The in-development Google OS "Fuschia" (or more precisely, its kernel Zirkon) is the most interesting project I know of on that front.

I'm not sure what the equivalent would be for AI. I think there was a LW article mentioning a project the author had of building a standard "AI sandbox"? I think as AI develops, toolboxes that figure out a "safe" subset of AIs that can be used without risking side effects, while still getting the economic benefits of "free" AIs might also be promising.

I read the title plus two lines of the article before I thought "This is going to be a Duncan Sabien essay, isn't it?". Quick author check aaaand, yup.

Good article. I agree with your uncertainty in the end, in that I'm not sure it's actually better at conveying its message than "In Defense of Punch Bug" was.

I'm a bit disappointed by this article. From the title, I fought it would be something like "A list of strategies AI might use to kill all humanity", not "A list of reasons AIs are incredibly dangerous, and people who disagree are wrong". Arguably, it's not very good at being the second.

But "ways AI could be lethal on an extinction level" is a pretty interesting subject, and (from what I've read on LW) somewhat under-explored. Like... what's our threat model?

For instance, the basic Terminator scenario of "the AI triggers a nuclear war" seems unlikely to me. A nuclear war would produce a lot of EMPs, shut down a lot of power plants and blow up a lot of data centers. Even if the AI is backed up in individual laptops or in Starlink satellites, it would lose any way of interacting with the outside world. Boston dynamics robots would shut down because there are no more miners producing coal for the coal plant that produced the electricity these robots need to run. (and, you know, all the other million parts of the supply chain being lost).

In fact, even if an unfriendly AI escaped its sandbox, it might not want to kill us immediately. It would want to wait until we've developed some technologies in the right directions: more automation in data-centers and power plants, higher numbers of drones and versatile androids, better nanotechnology, etc.

That's not meant to be reassuring. The AI would still kill us eventually, and it wouldn't sit tight in the meantime. It would influence political and economic processes to make sure no other AI can concurrence it. This could take many forms, from the covert (eg manipulating elections and flooding social networks with targeted disinformation) to the overt (eg assassinating AI researchers or bombing OpenAI datacenters). The point is that its interventions would look "soft" at first compared to the "flood the planet with nanbots and kill everyone at the same time" scenario, because it would be putting its pieces in place for that scenario to happen.

Again, that doesn't mean the AI would lose. If you're Afghanistan and you're fighting against the US, you're not going to win just because the US is unwilling to immediately jump to nukes. In fact, if the US is determined to win at all costs and will prefer using nukes over losing, you're still fucked. But the war will look like you have a fighting chance during the initial phases, because the enemy will be going easy on you in preparation for the final phase.

All that is just uninformed speculating, of course. Again, my main point is that I haven't really seen discussions of these scenarios and what the probable limits of an unfriendly AI would be. The question probably deserves to be explored more.

Alright, sorry. I should have asked "is there any non-weak empirical evidence that...". Sorry if I was condescending.

This seems like a major case study for interpretability.

What you'd really want is to be able to ask the network "In what ways is this woman similar to the prompt?" and have it output a causality chain or something.

Fascinating. Dall-E seems to have a pretty good understanding of "things that should be straight lines", at least in this case.

Load More