jfw01
jfw01 has not written any posts yet.

Changing from training to test data (CTT; I may have made this up) isn't exactly the same as going out of distribution (OOD), but I currently think that that change is the proto-version of going OOD.
The evidence about CTT says that bigger models eventually do better:
https://www.lesswrong.com/posts/FRv7ryoqtvSuqBxuT/understanding-deep-double-descent
but someone could probably usefully summarise the new results in "double descent".
a particular technique doesn’t immediately solve a problem
I remember a story that got coverage on the state radio in New Zealand years ago. It said that multiple people have parts of the solution to some problem, and there is progress when there is an accident that introduces them to each other. There was a book about it, but I'm failing to find the details.
implement a relatively limited policy
I read this as Libertarian; the hope that there could be a very stiff, strong government that was also small, and did only a subset of the things in the short-term interest of its supporters.
Alignment isn’t like that; it was chosen to be an important problem
Like medicine.
This was specifically commented on in a book whose preface I read as a child. It was called something like "Medicine: from science to magic", and I have not found a clear link back to it.
Furthe to Matt,
I like this distinction. At the cost of generalising from fiction, in "A Civil Campaign", Lois McMaster Bujold phrased it as: "Reputation is what other people know about you. Honour is what you know about yourself." Quoted here:
https://tvtropes.org/pmwiki/pmwiki.php/WhatYouAreInTheDark/Literature
Further to Kaj and Eric,
a fear of the career consequences of being in the line of fire
this sounds like people who are in the middle of an Immoral Maze. That's probably statistically true, because any corporation large enough to be worth attacking is probably large enough to have three layers of middle-management.
Assuming that, doing 'honour' requires having goals other than power-seeking which, according to that sequence, makes one untrustworthy for the modal middle-manager, who has sacrificed everything else to it, and professionally doomed.
I don't know whether I'm an optimist or a doomer. I have two very specific responses to different parts of the situation:
Ok. So, if:
then we have procedures (however imperfect) for allocating criminal liability among humans. What does it mean to allocate criminal liability to LLMs?
My proposed answer is: it means restricting or prohibiting the use of that collection of weights. If that collection of weights is particularly bad, then this directly minimises the harm. Given that it cost so much to create them, it also... (read more)