lc — LessWrong

Solving adversarial attacks in computer vision as a baby version of general AI alignment

lc1mo*20

This is a cool paper. Quoting for visibility:

I spent the last few months trying to tackle the problem of adversarial attacks in computer vision from the ground up. The results of this effort are written up in our new paper Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness (explainer on X/Twitter). Taking inspiration from biology, we reached state-of-the-art or above state-of-the-art robustness at 100x – 1000x less compute, got human-understandable interpretability for free, turned classifiers into generators, and designed transferable adversarial attacks on closed-source (v)LLMs such as GPT-4 or Claude 3. I strongly believe that there is a compelling case for devoting serious attention to solving the problem of adversarial robustness in computer vision, and I try to draw an analogy to the alignment of general AI systems here.

An Opinionated Guide to Privacy Despite Authoritarianism

lc1mo*150

The best privacy/security guide I am aware of is Michael Bazzells book. Michael Bazzell is a former computer crimes investigator, and his methods are red teamed at least in the sense that he works with e.g. people with extremely determined stalkers. Some things that book goes over that this doesn't:

How to buy your home/car/P.O. box with LLCs and keep them out of your name, how to get a SIM card not tied to you personally.
The who/what/where of how your personal information (incl. address, phone number, etc.) gets collected in the first place, and ends up in public databases (which of course the government also leverages). What you can do about data already there, to the extent that you can do something.
(For people in really advanced situations) How to disinform in a way that actually works & ends up poisoning records, for example by taking up an electricity bill inside a building you don't own.

Obviously the government has capabilities that private individuals don't, so maybe the threat model here is different. For the most part though I would say that peoples' biggest privacy/security risk is that there are infinity public databases with all of their personal information, and anybody with a credit card can pull up their address. Stopping the inflow to those should be priority #1 and the solution isn't even really that digital, it's just arcane legal procedures.

AISLE discovered three new OpenSSL vulnerabilities

lc1mo*240

Two of the bugs AISLE highlighted are memory corruption primitives. They could be used in certain situations to crash a program that was running OpenSSL (like a web server), which is a denial of service risk. Because of modern compiler safety techniques, they can't on their own be used to access data or run code, but they're still concerning because it sometimes turns out to be possible to chain primitives like these into more dangerous exploits.

The third bug is a "timing side-channel bug" with a particular opt-in certificate algorithm that OpenSSL provides, when used on ARM architectures. It's a pretty niche circumstance but it does look legitimate to me. The only way to know if it's exploitable would be to try to build some kind of a PoC.

OpenSSL is a very hardened target, and lots of security researchers look at it. Any security-relevant bugs found on OpenSSL are pretty impressive.

AISLE discovered three new OpenSSL vulnerabilities

lc1mo*190

I don't know if OpenSSL actually goes through the process of minting CVEs for a lot of the security problems they patch, so this may sound more impressive than it is. My company has reported several similar memory corruption primitives to OpenSSL in the last month found by our scanner and I'm not sure if we ever got any CVEs for it.

Because AI security startups are trying to attract media attention, they have a habit of crediting findings to an AI when they actually involved a bunch of human effort - especially when their tools are not publicly available. You should be healthily skeptical of anything startups report on their own. For a practitioner's perspective on the state of security scanning, there was a blog post posted last month that provided a good independent overview at the time: https://joshua.hu/llm-engineer-review-sast-security-ai-tools-pentesters^[1]

^{^}
Full disclosure: we've since hired this guy, but we only reached out to him after he posted this blog.

1a3orn's Shortform

lc1mo2941

As a rationalist I also strongly dislike subtweeting

Cheap Labour Everywhere

lc1mo30

One small thing I noticed when living in India is how the escalators would stop moving when people got off them, just to save a little power.

Shortform

lc1mo92

As soon as you convincingly argue that there is an underestimation, it goes away

It's not a belief. It's an entire cognitive profile that affects how they relate to and interact with other people, and the wrong beliefs are adaptive. For nice people, treating other people you know as nice-until-proven-evil opens up a much wider spectrum of cooperative interactions. For evil people, genuinely believing the people around you are just as self-interested gives you a bit more cover to be self-interested too.

Shortform