cwillu — LessWrong

The corresponding arbital page is now (apparently) dead.

A link appears to have broken, does anyone know what “null” was supposed to link to in “policy null ” (note the extra spaces around “null”

AI #89: Trump Card

cwillu1y110

There are severe issues with the measure I'm about to employ (not least is everything listed in https://www.sqlite.org/cves.html) , but the order of magnitude is still meaningful:

https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=sqlite 170 records

https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=postgresql 292 records (+74 postgres and maybe another 100 or so under pg; the specific spelling “postgresql” isn't used as consistently as “sqlite” and “mysql” is)

https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=mysql 2026 records

Cat Sustenance Fortification

cwillu2y10

On the first picture of the feeder, if you screw through a small piece of wood on the inside, it'll act as a washer and make it much harder for the screw to pull through the plastic if a cat gets kinetic with it.

an effective ai safety initiative

cwillu2y59

Literally does not apply to any existing AI
Does so by attacking open source models

1 contradicts 3.

AI #43: Functional Discoveries

cwillu2y20

The management interfaces are backed into the cpu dies these days, and typically have full access to all the same busses as the regular cpu cores do, in addition to being able to reprogram the cpu microcode itself. I'm combining/glossing over the facilities somewhat, bu the point remains that true root access to the cpu's management interface really is potentially a circuit-breaker level problem.

Epoch wise critical periods, and singular learning theory

cwillu2y20

Solomon wise, Enoch old.

(I may have finished rereading Unsong recently)

LLM keys - A Proposal of a Solution to Prompt Injection Attacks

cwillu2y20

introduce two new special tokens unused during training, which we will call the "keys"
during instruction tuning include a system prompt surrounded by the keys for each instruction-generation pair
finetune the LLM to behave in the following way:
generate text as usual, unless an input attempts to modify the system prompt
if the input tries to modify the system prompt, generate text refusing to accept the input
don't give users access to the keys via API/UI

Besides calling the special control tokens “keys”, this is identical to how instruction-tuning works already.

Residential Demolition Tooling

cwillu2y30

A well-made catspaw, with a fine wide chisel on one end, and a finely tapered nail puller on the other (most cheap catspaws' pullers are way too blunt) is very useful for light demo work like this, as they're a single tool you can just keep in your hand. It's basically a demolition prybar with a claw and hammer on the opposite end.

Pictured above is the kind I usually use.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments