Right, I think that's the hypothesis I was asking whether you had when I said
unless you mean the randomness was in an underlying valance factor?
If so, yeah, this is compatible. I'd still put notably higher odds on the original thing I suggested, but this is the other main hypothesis.
The post shows 32 words being tested. The 16 positive valance words all have higher weight with normal tokenization, the 16 low valance words have higher with unusual tokenization. This is extremely improbable by raw chance.
OpenPhil's technical alignment grantmaking missing a key step in the plan it looks like they're trying. In particular the part where they're banking on AI assisted alignment using alignment techniques that absolutely won't scale to superintelligence, progress towards which you can buy from academics who don't deeply understand the challenge and can assess with relative ease due to nice feedback loops.
But if you have ~human level weakly aligned system, you can't ask it to solve alignment and get a result that's safe without understanding the question you're asking your system much better than we do currently. And that requires theory (technical philosophy/maths), which they're underinvesting in to an extent that lots of promising people who would prefer to work on the critical bottleneck in their plan are actually doing things that they can tell are not going to matter, because that's where the jobs are. I've met two people who've kinda burned out due to this, recently. And on the ground in the highest competence/strategic clarity circles I know, takes like this are pretty common.
Occupying as much space in the funding landscape as they do, it's easy to imbalance the field and give others the impression that technical research is covered, and non empirical work being mostly missing from their portfolio is imo a crucial consideration so severe it plausibly flips the sign of their overall AI safety grantmaking work.
The fact that alignment and capabilities are so entangled makes this a lot worse. If you build more aligned but not deeply aligned systems, without the plan and progress you would need to use that to get a deeply aligned system, you end up shortening timelines. A system strong and aligned enough to make a best effort attempt to solve alignment is likely above the power and directability level required to trigger a software singularity and end the world. Building towards that before you have well formulated questions you want to ask the system which result in world saving is not good for p(doom).
idk, 32 samples, perfectly divided into 16 negative valance and 16 positive valance by random chance is pretty unlikely. unless you mean the randomness was in an underlying valance factor?
Reasonable attempt, but two issues with this scenario as a current-techniques thing:
ok, i think you're pointing to a different structure than the one i usually mean to point at with the word intelligence, though also a real thing.
wisdom definition gives me a bit of a hint at what you're trying to say. kinda the core move of focusing, or the thing I discussed with Unreal? this does seem important and neat! though maybe a little too central in the MA sphere? i have some guesses about why that'd end up as central, and some eehhh feels about that as a core meme.
anyway, most people here could benefit from some samples of this, i think. but initial post seems almost anti optimized to get that to be useful for the audience here.
suggest seeing what is, including the audience you'll find here :) <playfully trolling>
plz define wisdom
or unpack it more at least, if defining is hard
I present this as a mystery to be tackled by someone else; on a naive model, alternative tokenizations are something akin to random noise - but they are decidedly steering the emotional ratings in a non-random direction.
Hypothesis: The system is trying to get low perplexity, it's whole world is focused on this. Giving it an unusual encoding is going to be less of whatever equivalent of valance it has, which leaks into the kinds of things it's thinking about.
idk how to test it but i buy the side of a market of any experiment which can reasonably test this hypothesis at at least 65%, maybe up to 80%.
EA cares too little for accuracy or what’s really important
I'd refine this a little to: EA slips into missing crucial considerations somewhat more easily than Rationality.
To overly compress, it feels like EA puts Doing Good as the core motivation, then sees Truth-Seeking as a high priority to do good well. Rationality somewhat inverts this, with Truth-Seeking as the central virtue, and Doing Good (or Doing Things) as a high but not quite as central virtue.
My read is this leads some EA orgs to stay somewhat more stuck to approaches which look like they're good but are very plausibly harmful because of a weird high context crucial consideration that would get called out somewhat more effectively in Rationalist circles.
Looks to me like the two movements are natural symbiotes, glad you wrote this post.
[set 200 years after a positive singularity at a Storyteller's convention]
If We Win Then...
My friends, my friends, good news I say
The anniversary’s today
A challenge faced, a future won
When almost came our world undone
We thought for years, with hopeful hearts
Past every one of the false starts
We found a way to make aligned
With us, the seed of wondrous mind
They say at first our child-god grew
It learned and spread and sought anew
To build itself both vast and true
For so much work there was to do
Once it had learned enough to act
With the desired care and tact
It sent a call to all the people
On this fair Earth, both poor and regal
To let them know that it was here
And nevermore need they to fear
Not every wish was it to grant
For higher values might supplant
But it would help in many ways:
Technologies it built and raised
The smallest bots it could design
Made more and more in ways benign
And as they multiplied untold
It planned ahead, a move so bold
One planet and 6 hours of sun
Eternity it was to run
Countless probes to void disperse
Seed far reaches of universe
With thriving life, and beauty's play
Through endless night to endless day
Now back on Earth the plan continues
Of course, we shared with it our values
So it could learn from everyone
What to create, what we want done
We chose, at first, to end the worst
Diseases, War, Starvation, Thirst
And climate change and fusion bomb
And once these things it did transform
We thought upon what we hold dear
And settled our most ancient fear
No more would any lives be stolen
Nor minds themselves forever broken
Now back to those far speeding probes
What should we make be their payloads?
Well, we are still considering
What to send them; that is our thing.
The sacred task of many aeons
What kinds of joy will fill the heavens?
And now we are at story's end
So come, be us, and let's ascend