Seems worthwhile as a way to simplify conversations with people who seem to be too be confused, but I think this isn't a reality mapping exercise and probably makes it harder to see the structure of reality which is kinda sad even if useful for talking with some people?
I would guess the type signature of human beliefs and goals and desires is at least fairly often closer to the LLM quasi-x than to the crisp mathematical idealizations of those concepts.
Humans are kinda a world model with a self-character, I think distancing LLMs from this by implying that LLMs beliefs, goals, desires are super different brings people's beliefs further from tracking reality.
I'd guess this terminology is fairly applicable to humans too?
This is pretty close, but a more central crux is something like: Does a system fractally slip towards power-seeking across all parameters left free.[1]
Even if it's very hard to screw the world over at a given power level, if each AI/system/Kami has a ratcheting internal selection pressure towards being more dominated by power-seeking subsystems, eventually the world gets screwed.
The crux you listed is important for how fast the world is destroyed without a singleton, but not really relevant for whether it is destroyed without a singleton.
Non-free parameters are ones pinned down by formal/well-defined things held in place by optimization, or by stronger systems or meta-systems enforcing properties to be maintained by a system effectively.
Yes, current agents are not great at value handshakes/merging, so we're only being eaten by Moloch at a moderate pace.
In Jan 2023[1] I ran through the bio anchors notebook[2] which lets you set a wide variety of variables that go into the model, and changed a bunch of things to what seemed like first pass reasonable conclusions (e.g. setting the "evolution anchor" aka it will take as much compute to train AGI as was used across all brains in all of evolutionary time to zero, along with dozens of other minor parameters). It seemed like almost all of the ones which seemed off were off in the direction of making timelines look longer. When everything settled, I got:
But, I noted that
this model does not take into account increases in the speed of algorithmic or hardware improvements due to AI, which is already starting to kick in (https://www.lesswrong.com/posts/camG6t6SxzfasF42i/a-year-of-ai-increasing-ai-progress), so I expect timelines will actually be notably shorter than that.
I don't remember exactly what the timelines i had in mind were, but probably like 1-3 years years sooner mean.
I think bio anchors is a kinda interesting and at least vaguely informative framework[3], and mostly think the wildly long timelines look like they were the result of picking many free variables in individually mildly biased ways in a complex model?
Discord message link, if you're on Rob Miles's discord.
https://docs.google.com/spreadsheets/d/1gbcJjSN1_E7UTSngIqcvueGDAULjjOajzzft6wX5vBA/edit#gid=505210495 if anyone wants to see how I got to that, though most of the work was going through https://colab.research.google.com/drive/1YRf0AA6x57rk3xwcMCE1SfkvWZ11YbJ_?authuser=1#scrollTo=5qd4DJ2y-X60
as a good rationalist, i must earn the 'disagree with yudkowsky on at least one thing' badge and this is how i'm earning it, damnit. my old one of 'emergence is a real concept with any use at all but people often misuse it as a curiosity stopper' has expired as he hasn't emphasised that one since like 2007.
This post was an experiment in trimming down to a very core point, and making it cleanly, rather than covering lots of arguments for the thesis. I think it suceeded, and I mostly stand behind the main claim (interp is insufficient for saving the world and has strong potential to boost capabilitities). On the downside, commenters raised other lines of reasoning for the dominance and harms of interp, such as interp helps train people for normal ML jobs, or interp is easy for labs to evaluate with their core competency.
I think I endorse making one clean point and letting the other angles bubble up in the comments over doing an extensive complicated article as is often seen.
I'm also pretty happy with the make the straight readthrough as short as possible, and dump lots of bonus info into footnotes.
I broadly intend to use a similar style, though maybe to a lesser extent, going forwards.
Once you get a sense of how annealing feels, you can do it imo much more safely without the psychedelics using forms of meditation practice centered on noticing what causes clean versions of the qualia associated with annealing. Non-goal-directed-ness seems central.
glad people are noticing. it won't be enough to stop all leaks though, realistically.
it's fun how all the safety worries and intricate plans to prevent failure modes tend to get invalidated by "humans do the thing the bypasses the guardrails". e.g. for years people would say things like "of course we won't connect it to the internet/let it design novel viruses/proteins/make lethal autonomous weapons".
my guess is the law of less dignified failure has a lot of truth to it.
[set 200 years after a positive singularity at a Storyteller's convention]
If We Win Then...
My friends, my friends, good news I say
The anniversary’s today
A challenge faced, a future won
When almost came our world undone
We thought for years, with hopeful hearts
Past every one of the false starts
We found a way to make aligned
With us, the seed of wondrous mind
They say at first our child-god grew
It learned and spread and sought anew
To build itself both vast and true
For so much work there was to do
Once it had learned enough to act
With the desired care and tact
It sent a call to all the people
On this fair Earth, both poor and regal
To let them know that it was here
And nevermore need they to fear
Not every wish was it to grant
For higher values might supplant
But it would help in many ways:
Technologies it built and raised
The smallest bots it could design
Made more and more in ways benign
And as they multiplied untold
It planned ahead, a move so bold
One planet and 6 hours of sun
Eternity it was to run
Countless probes to void disperse
Seed far reaches of universe
With thriving life, and beauty's play
Through endless night to endless day
Now back on Earth the plan continues
Of course, we shared with it our values
So it could learn from everyone
What to create, what we want done
We chose, at first, to end the worst
Diseases, War, Starvation, Thirst
And climate change and fusion bomb
And once these things it did transform
We thought upon what we hold dear
And settled our most ancient fear
No more would any lives be stolen
Nor minds themselves forever broken
Now back to those far speeding probes
What should we make be their payloads?
Well, we are still considering
What to send them; that is our thing.
The sacred task of many aeons
What kinds of joy will fill the heavens?
And now we are at story's end
So come, be us, and let's ascend