Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
This is a special post for short-form writing by So8res. Only they can create top-level comments. Comments here also appear on the Shortform Page and All Posts page.
A number of years ago, when LessWrong was being revived from its old form to its new form, I did not expect the revival to work. I said as much at the time. For a year or two in the middle there, the results looked pretty ambiguous to me. But by now it's clear that I was just completely wrong--I did not expect the revival to work as well as it has to date.
Oliver Habryka in particluar wins Bayes points off of me. Hooray for being right while I was wrong, and for building something cool!
Aww, thank you! ^-^
LW is more of AI Alignment forum now, but without mad science agent foundations spirit of earlier post-Sequences time. Probably capable of being alive because the field grew. So it's not revived, it's something new. Development of rationality mostly remains in the past, little significant discussion in recent years.
Yeah, I think some of this is true, but while there is a lot of AI content, I actually think that a lot of the same people would probably write non-AI content, and engage on non-AI content, if AI was less urgent, or the site had less existing AI content.
That counterfactual is hard to evaluate, but like, a lot of people who used to be core contributors to LW 1.0 are now also posting to LW 2.0, though they are now posting primarily on AI, and I think that's evidence that it's more that there has been a broader shift among LW users that AI is just like really urgent and important, instead of there having been a very different new user base that was discovered.
I kind of agree on development of rationality feeling kind of stagnant right now. I think there are still good posts being written, but a lot of cognitive energy is definitely going into AI stuff, more so than rationality stuff.
I would love to be able to stop worrying about AI and go back to improving rationality. Yet another thing to look forward to once we leap this hurdle.
Totally agree. Oliver & co. won tons of Bayes points off me.
Same! LW is an outstanding counterexample to my belief that resurrections are impossible. But I haven't incorporated it into my gears-level model yet, and I'm unsure how to. What did LW do differently, or which gear in my head caused me to fail to predict this?
The original LW was a clone of Reddit. The Reddit source code was quite complex. I am a software developer, I have looked at that code myself, tried to figure out some things, then gave up.
I do not remember whether I made any predictions at that time. But ignoring what I know now, I probably would have said the following:
If I understand it correctly, what happened is that some people got paid to work on this full-time. And they turned out to be very good at their job. They rewrote everything from scratch, which was probably the easier way, but it required a lot of time, and a lot of trust because it was "either complete success, or nothing" (as opposed to gradually adding new features to Reddit code).
This is about what I was going to say in response, before reading your comment.
I think the key factor that makes it different from other examples is that it was a competent person's full time job.
There are some other things that need to go right in addition to that, but I suspect that there are lots of things that people are correctly outside view gloomy about which can just be done, if someone makes it their first priority.
Things that need to go right:
Other than the unexpected disasters, it seems like something that a competent civilization should easily do. Once you have competent people, allow them to demonstrate their competence, look for intersection between what they want to do and what you need (or if you are sufficiently rich, just an intersection between what they want to do and what you believe is a good thing), give them money, and let them work.
In real life, having the right skills and sending the right signals is not the same thing; people who do things are not the same as people who decide things; time is wasted on meetings and paperwork.
That anyone with any agency and competence was working on it as their primary goal, as opposed to nobody doing so.
Just to be clear. There were people working on it who had both agency and competence, but they were working on it as a side project. I think having something be someone's only priority and full-time job makes a large difference on how much agency someone can bring to bear on a project.
I was recently part of a group-chat where some people I largely respect were musing about this paper and this post and some of Scott Aaronson's recent "maybe intelligence makes things more good" type reasoning).
Here's my replies, which seemed worth putting somewhere public:
See also instrumental convergence.
And then in reply to someone pointing out that the paper was perhaps trying to argue that most minds tend to wind up with similar values because of the fact that all minds are (in some sense) rewarded in training for developing similar drives:
Someone recently privately asked me for my current state on my 'Dark Arts of Rationality' post. Here's some of my reply (lightly edited for punctuation and conversation flow), which seemed worth reproducing publicly:
I've also gone ahead and added a short retraction-ish paragraph to the top of the dark arts post, and might edit it later to link it to the aforementioned update-posts, if they ever make it out of the editing queue.
A few people recently have asked me for my take on ARC evals, and so I've aggregated some of my responses here:
- I don't have strong takes on ARC Evals, mostly on account of not thinking about it deeply.
- Part of my read is that they're trying to, like, get a small dumb minimal version of a thing up so they can scale it to something real. This seems good to me.
- I am wary of people in our community inventing metrics that Really Should Not Be Optimized and handing them to a field that loves optimizing metrics.
- I expect there are all sorts of issues that would slip past them, and I'm skeptical that the orgs-considering-deployments would actually address those issues meaningfully if issues were detected ([cf](https://www.lesswrong.com/posts/thkAtqoQwN6DtaiGT/carefully-bootstrapped-alignment-is-organizationally-hard)).
- Nevertheless, I think that some issues can be caught, and attempting to catch them (and to integrate with leading labs, and make "do some basic checks for danger" part of their deployment process) is a step up from doing nothing.
- I have not tried to come up with better ideas myself.
Overall, I'm generally enthusiastic about the project of getting people who understand some of the dangers into the deployment-decision loop, looking for advance warning signs.
Reproduced from a twitter thread:
I've encountered some confusion about which direction "geocentrism was false" generalizes. Correct use: "Earth probably isn't at the center of the universe". Incorrect use: "All aliens probably have two arms with five fingers."
The generalized lesson from geocentrism being false is that the laws of physics don't particularly care about us. It's not that everywhere must be similar to here along the axes that are particularly salient to us.
I see this in the form of people saying "But isn't it sheer hubris to believe that humans are rare with the property that they become more kind and compassionate as they become more intelligent and mature? Isn't that akin to believing we're at the center of the universe?"
I answer: no; the symmetry is that other minds have other ends that their intelligence reinforces; kindness is not priviledged in cognition any more than Earth was priviledged as the center of the universe; imagining all minds as kind is like imagining all aliens as 10-fingered.
(Some aliens might be 10-fingered! AIs are less likely to be 10-fingered, or to even have fingers in the relevant sense! See also some of Eliezer's related thoughts)