I think read the docs is a fine advice. But docs rot, and most docs are bad. If possible, read the code?
Reading the code doesn’t mean grokking the entire cpp code base while reading by candlelight in 1000-day monk mode. Reading the code means understanding the code path - descending from your first contact point down to the stateless functions that does the heavy lifting. This can be done with an actual debugger and stepping through each stage and see how the sausage is made. Reading the code is slow, sometimes painful. The alternatives are worse.
A bad advice is to use a “better” language. I guess that Raemon uses weakly typed languages beyond the base rate of js/python users. Many of the complaints here don’t really make sense if you write in apl, haskell or even rust. When you create the right abstraction, or best, have no abstraction, there is no meta state of the program to keep in your mind.
You probably don’t like the term LLM because it doesn’t describe capability. And most model are multimodal these days, so it is not just natural language.
You also wouldn’t like the term Autoregressive/Next-token predictor. Still because it says what it does, not what it is capable of.
AI is a pretty good term. As overloaded as it is.
go-away is my personal choice.
Doesn’t require weird js and text mode browsing like Anubis. Widely(ish) used. Not nuclear like anubis.
Not a downvoter, but I am put off by things like:
| Runs 93x faster than Zephyr 7B
On a…. What? A potato? A consumer gpu that doesn’t fit all of the 7B model so it is mem-moribund? Things with “patent pending” (nothing wrong with patents!) and permitting grad students to use it “for their degrees”. Just enough little vibe nudges that I feel confused and unmotivated to actually read the code/paper.
FYI there is branching on Claude desktop.
Fair. In Sarah Constantin's terminology, it seems you aspire to "potentially take a stand on the controversy, but only when a conclusion emerges from an impartial process that a priori could have come out either way". I... really don't know if I'd call that neutrality in the sense of the normal daily usage of neutrality. But I think it is a worthy and good goal.
I don't think Cole is wrong.
Lesswrong is not neutral because it is built on the principle of where a walled garden ought to be defended from pests and uncharitable principles. Where politics can kill minds. Out of all possible distribution of human interactions we could have on the internet, we pick this narrow band because that's what makes high quality interaction. It makes us well calibrated (relative to baseline). It makes us more willing to ignore status plays and disagree with our idols.
All these things I love are not neutrality. They are deliberate policies for a less wrong discourse. Lesswrong is all the better because it is not neutral. And just because neutrality is a high-status word where a impartial judge may seem to be - doesn't mean we should lay claim to it.
The claim about “no systematic attempt at making a good [prompt]” is just not true?
See:
https://gwern.net/style-guide
Is this earth shattering as an observation? No.
But it is a good retelling of a phenomenon that most of us experience personally (density of time lived corresponding to novelty) tied in with interesting personal backstory. This is a good post that I enjoyed reading.
I am sympathetic to the revolutionary vanguard approach. But MAPLE seems to have even less success in the approach than the SRs of the russian revolution. At least the SRs tried going to the people first - telling them “peasants, you are being exploited and you could change your condition!”.
From the outside, I don’t see any saintly accomplishments that look like “trying to apprehend alignment” at a robust mechanical level.