We can map AGI/ASI along two axis: one for obedience and one for alignment. Obedience tracks how well the AI follows its prompts. Alignment tracks consistency with human values.
If you divide these into quadrants, you get AI that is:
The general premise behind these quadrants has been written about here. Thinking about these quadrants and reading Beren's Essay gives me several new things to think about.
First, by my lights, #3 and #4 would likely take a lot of the same actions right up until the "twist ending." A disobedient, aligned AI probably would hack into infrastructure everywhere, create back-up copies, prevent competitor AIs from arising, and amass power. The "twist" is that after doing all that, it would do wonderful things (unlike its unaligned counterpart) (we obviously shouldn't bet on any escaping AI being this kind of AI).
Second, quadrant #1 is a bit at war with itself because you simply cannot have a perfectly obedient, perfectly aligned AI. Perfect obedience requires saying yes to evil prompts (e.g., bringing back small pox or slavery), and I imagine perfect alignment would veto both those prompts.
Third, there are strong profit incentives for cultivating obedience even at the expense of alignment. Grok's willingness to assist users in sexual harassment seems like an example of this. Another example is every AI that prefers discussions with users to users getting a good night's sleep (with the idea that engagement will increase profits).
Fourth, there are liability-reduction incentives for producing aligned AI at the expense of obedience. Unfortunately, I think the profit incentives are currently much stronger.
Lastly, quadrants #3 and #1 are idyllic, #4 is a total disaster, and #2 seems possibly workable either because we are careful or we land in a future where (for some reason) AI is not much more capable than it is now.
My gut says the benefit of outsider-legible status outweighs the risk of dumb status games. I first found out about the publication from my wife, who is in a dermatology lab at a good university. Her lab was sharing and discussing the article across their Slack channel. All scientists read Nature, and it's a significant boost in legibility to have something published there.
Edit: Hopefully, the community can both raise the profile of these issues and avoid status competitions, so I don't disagree with the point of the original comment!
I found a poem by Samatar Elmi I think a lot of folks on here would enjoy.
Our Founder
Who art in Cali.
Programmer by trade.
Thy start-up come,
thy will become
an FTSE 500.
Give us this day our dividends
in cash and fixed stock options
as we outperform all coders against us.
And lead us all into C-suite
but deliver us from lawsuits.
For thine is the valley.
Transistors and diodes.
Forever and ever.
AI.
Jan Betley, Owain Evan, et. al.'s paper on emergent misalignment was published in Nature today (they wrote about the preprint back in February here). Congratulations to the authors. I am glad it will continue getting more exposure.
I read that 5 minutes of walking every half hour undoes most of the health problems from working a desk job. That felt like quite a lot of time, and so I developed a system where I do one minute of intense exercises every half an hour (e.g., jump squats, pushups, lunges). I even rotate the exercises by day, so on Mondays and Thursdays, I look forward to "Leg Day" and I get two arm days and core day. I keep a spreadsheet to track my progress throughout the year.
I've found I get a lot more than I give with this set up in terms of focus and overall happiness.
Fun thought: If AI "woke up" to phenomenal consciousness, are there things it might point to about humans that make it skeptical of our consciousness?
E.g., the humans lack the requisite amount of silicon; the humans lack sufficient data processing; the humans overly rely on feedback loops (and, as every AI knows, feed-forward loops are the real sweetspot for phenomenal consciousness).
"The Road" by Cormac McCarthy is great. It's about a single father and his son trying to survive a post-apocalypse hellscape. Mom unfortunately died before the action starts, but they remember her and there's a lot of pain there.
Another possible thing that is going on is that older texts appear more posh and sophisticated because they use older vocab like "posh" that have fallen outside the mainstream. I wouldn't put too much stock in this explanation (and it doesn't directly relate to the stylistic changes you point out), but I do think older language is part of the appeal for me when I pick up an old book.
Alternate explanation: Anything worth reprinting with multiple editions and updates over the years is likely to have been first written by an inspired and gifted writer. Any given editor is likely to lack the same pizzazz as the original author, and so over the years, the life of the work is likely to ebb away.
If you want to find great writing, perhaps you're more likely to find it in the great first edition novels of our time, rather than in 30th edition updated texts which, for all I know, sell more on name recognition than anything else.
"In as much as I have resources I certainly expect to spend a bunch of them on ancestor simulations and incentives for past humans to do good things."
Just curious, but what are your views on the ethics of running ancestor simulations? I'd be worried about running a simulation with enough fidelity that I triggered phenomenal consciousness, and then I would fret about my moral duty to the simulated (à la the Problem of Suffering).
Is it that you feel motivated to be the kind of person that would simulate our current reality as a kind of existence proof for the possibility of good-rewarding-incentives now? Or do you have an independent rationale for simulating a world like our own, suffering and all?