There's been a lot of discussion about how Less Wrong is mostly just AI these days.
If that's something that folk want to address, I suspect that the best way to do this would be to run something like the Roots of Progress Blog-Building Intensive. My admittedly vague impression is that it seems to have been fairly successful.
Between Less Wrong for distribution, Lighthaven for a writing retreat and Less Online for networking, a lot of the key infrastructure is already there to run a really strong program if the Lightcone team ever decided to pursue this.
Ther...
I guess orgs need to be more careful about who they hire as forecasting/evals researchers.
Sometimes things happen, but three people at the same org...
This is also a massive burning of the commons. It is valuable for forecasting/evals orgs to be able to hire people with a diversity of viewpoints in order to counter bias. It is valuable for folks to be able to share information freely with folks at such orgs without having to worry about them going off and doing something like this.
But this only works if those less worried about AI risks who join such a collaboration don't use the knowledge they gain to cash in on the AI boom in an acceleratory way. Doing so undermines the very point of such a project, namely, to try to make AI go well. Doing so is incredibly damaging to trust within the community.
Now let's suppose you're an x-risk funder considering whether to fund their previous org. This org does really high-quality work, but the argument for them being net-positive is now significantly weaker. This is quite likely to make finding future funding harder for them.
This is less about attacking those three folks and more just noting that we need to strive to avoid situations where things like this happen in the first place. This requires us to be more careful in terms of who gets hired.
I think the conclusion is not Epoch shouldn't have hired Matthew, Tamay, and Ege but rather [Epoch / its director] should have better avoided negative-EV projects (e.g. computer use evals) (and shouldn't have given Tamay leadership-y power such that he could cause Epoch to do negative-EV projects — idk if that's what happened but seems likely).
It is valuable for forecasting/evals orgs to be able to hire people with a diversity of viewpoints in order to counter bias.
This requires us to be more careful in terms of who gets hired in the first place.
I mean, good luck hiring people with a diversity of viewpoints who you're also 100% sure will never do anything that you believe to be net negative. Like what does "diversity of viewpoints" even mean apart from that?
But this only works if those less worried about AI risks who join such a collaboration don't use the knowledge they gain to cash in on the AI boom in an acceleratory way. Doing so undermines the very point of such a project, namely, to try to make AI go well. It is incredibly damaging to trust within the community.
...This is less about attacking those three folks and more just noting that we need to strive to avoid situations where things like this happen in the first place.
(note: I work at Epoch) This attitude feels like a recipe for creating an intellectual bubble. Of course people will use the knowledge they gain in collaboration with you for the purposes that they think are best. I think it would be pretty bad for the AI safety community if it just relied on forecasting work from card-carrying AI safety advocates.
Thanks for weighing in.
This attitude feels like a recipe for creating an intellectual bubble
Oh, additional screening could very easily have unwanted side-effects. That's why I wrote: "It is valuable for forecasting/evals orgs to be able to hire people with a diversity of viewpoints in order to counter bias" and why it would be better for this issue to never have arisen in the first place. Actions like this can create situations with no good trade-offs.
I think it would be pretty bad for the AI safety community if it just relied on forecasting work from card-carrying AI safety advocates.
I was definitely not suggesting that the AI safety community should decide which forecasts to listen to based on the views of the forecasters. That's irrelevant, we should pay attention to the best forecasters.
I was talking about funding decisions. This is a separate matter.
If someone else decides to fund a forecaster even though we're worried they're net-negative or they do work voluntarily, then we should pay attention to their forecasts if they're good at their job.
Of course people will use the knowledge they gain in collaboration with you for the purposes that they think are best
Seems like several...
Random thought: We should expect LLM's trained on user responses to have much more situational knowledge than early LLM's trained on the pre-Chatbot internet because users will occasionally make reference to the meta-context.
It may be possible to get some of this information from pre-training on chatlogs/excerpts that make their way onto the internet, but the information won't be quite as accessible because of differences in the context.
If this were a story, there'd be some kind of academy taking in humanity's top talent and skilling them up in alignment.
Most of the summer fellowships seem focused on finding talent that is immediately useful. And I can see how this is tempting given the vast numbers of experienced and talented folks seeking to enter the space. I'd even go so far as to suggest that the majority of our efforts should probably be focused on finding people who will be useful fairly quickly.
Nonetheless, it does seem as though there should be at least one program that aims to find the best talent (even if they aren't immediately useful) and which provides them with the freedom to explore and the intellectual environment in which to do so.
I wish I could articulate my intuition behind this clearer, but the best I can say for now is that my intuition is that continuing to scale existing fellowships would likely provide decreasing marginal returns and such an academy wouldn't be subject to this because it would be providing a different kind of talent.
Collapsable boxes are amazing. You should consider using them in your posts.
They are a particularly nice way of providing a skippable aside. For example, filling in background information, answering an FAQ or including evidence to support an assertion.
Compared to footnotes, collapsable boxes are more prominent and are better suited to contain paragraphs or formatted text.
Less Wrong might want to consider looking for VC funding for their forum software in order to deal with the funding crunch. It's great software. It wouldn't surprise me if there were businesses who would pay for it and it could allow an increase in the rate of development. There's several ways this could go wrong, but it at least seems worth considering.
For the record, I see the new field of "economics of transformative AI" as overrated.
Economics has some useful frames, but it also tilts people towards being too "normy" on the impacts of AI and it doesn't have a very good track record on advanced AI so far.
I'd much rather see multidisciplinary programs/conferences/research projects, including economics as just one of the perspectives represented, then economics of transformative AI qua economics of transformative AI.
(I'd be more enthusiastic about building economics of transformative AI as a field if we w...
I just created a new Discord server for generated AI safety reports (ie. using Deep Research or other tools). Would be excited to see you join (ps. Open AI now provides uses on the plus plan 10 queries per month using Deep Research).
https://discord.gg/bSR2hRhA
About this Post - 🔗: 🕺wiseaiadvisors.com , 🕴️Formal Edition (coming soon)
This post is still in draft, so any feedback would be greatly appreciated 🙏. It'll be posted as a full, proper Less Wrong/EA Forum/Alignment forum post, as opposed to just a short-form, when it's ready 🌱🌿🌳.
📲 QR Code
This post is a collaboration between Chris Leong (primary author) and Christopher C
How about "Please summarise Eliezer Yudkowsky's views on decision theory and its relevance to the alignment problem".
Sharing this resource doc on AI Safety & Entrepreneurship that I created in case anyone finds this helpful:
https://docs.google.com/document/d/1m_5UUGf7do-H1yyl1uhcQ-O3EkWTwsHIxIQ1ooaxvEE/edit?usp=sharing
There is a world that needs to be saved. Saving the world is a team sport. All we can do is to contribute our part of the puzzle, whatever that may be and no matter how small, and trust in our companions to handle the rest. There is honor in that, no matter how things turn out in the end.
I'll post some extracts from the Seoul Summit. I can't promise that this will be a particularly good summary, I was originally just writing this for myself, but maybe it's helpful until someone publishes something that's more polished:
Frontier AI Safety Commitments, AI Seoul Summit 2024
The major AI companies have agreed to Frontier AI Safety Commitments. In particular, they will publish a safety framework focused on severe risks: "internal and external red-teaming of frontier AI models and systems for severe and novel threats; to work toward information sharing; to invest in cybersecurity and insider threat safeguards to protect proprietary and unreleased model weights; to incentivize third-party discovery and reporting of issues and vulnerabilities; to develop and deploy mechanisms that enable users to understand if audio or visual content is AI-generated; to publicly report model or system capabilities, limitations, and domains of appropriate and inappropriate use; to prioritize research on societal risks posed by frontier AI models and systems; and to develop and deploy frontier AI models and systems to help address the world’s greatest challenges"
"Risk assessments should consid...
Was thinking about entropy and the Waluigi effect (in a very broad, metaphorical sense).
The universe trends towards increasing entropy, in such an environment it is evolutionarily advantageous to have the ability to resist it. Notice though that life seems to have overshot and resulted in far more complex ordered systems (both biological or manmade) than what exists elsewhere.
It's not entirely clear to me, but it seems at least somewhat plausible that if entropy were weaker, the evolutionary pressure would be weaker and the resulting life and systems produce by such life would ultimately be less complex than they are in our world.
I think that there's good reasons why the discussion on Less Wrong has turned increasingly towards AI Alignment, but I am also somewhat disappointed that there's no longer a space focusing on rationality per se.
Just as the Alignment forum exists as a separate space that automatically cross-posts to LW, I'm starting to wonder if we need a rationality forum that exists as a separate space that cross-posts to LW, as if I were just interested in improving my rationality I don't know if I'd come to Less Wrong.
(To clarify, unlike the Alignment Forum, I'd expect such a forum to be open-invite b/c the challenge would be gaining any content at all).
On free will: I don't endorse the claim that "we could have acted differently" as an unqualified statement.
However, I do believe that in order to talk about decisions, we do need to grant validity to a counterfactual view where we could have acted differently as a pragmatically useful fiction.
What's the difference? Well, you can't use the second to claim determinism is false.
It seems as though it should be possible to remove the Waluigi effect[1] by appropriately training a model.
Particularly, some combination of:
However, removing this effect might be problematic for certain situations where we want the ability to generate such content, for example, if we want it to write a story.
In this case, it might pay to add back the ability to generate such content within certain tags (ie. <stor...
Speculation from The Nature of Counterfactuals
I decided to split out some content from the end of my post The Nature of Counterfactuals because upon reflection I don't feel it is as high quality as the core of the post.
I finished The Nature of Counterfactuals by noting that I was incredibly unsure of how we should handle circular epistemology. That said, there are a few ideas I want to offer up on how to approach this. The big challenge with counterfactuals is not imagining other states the universe could be in or how we could apply our "laws" of physics t...
My position on Newcomb's Problem in a sentence: Newcomb's paradox results from attempting to model an agent as having access to multiple possible choices, whilst insisting it has a single pre-decision brain state.
Here's a crazy[1] idea that I had. But I think it's an interesting thought experiment.
What if we programmed an AGI had the goal of simulating the Earth, but with one minor modification? In the simulation, we would have access to some kind of unfair advantage, like an early Eliezer Yudkowsky getting a mysterious message dropped on his desk containing a bunch of the progress we've made in AI Alignment.
So we'd all die in real life when the AGI broke out of its box and turned the Earth into compute to better simulate us, but we might survive in virtual re...
Random idea: A lot of people seem discouraged from doing anything about AI Safety because it seems like such a big overwhelming problem.
What if there was a competition to encourage people to engage in low-effort actions towards AI safety, such as hosting a dinner for people who are interested, volunteering to run a session on AI safety for their local EA group, answering a couple of questions on the stampy wiki, offering to proof-read a few people’s posts or offering a few free tutorial sessions to aspiring AI Safety Researchers.
I think there’s a dec...
Thoughts on the introduction of Goodhart's. Currently, I'm more motivated by trying to make the leaderboard, so maybe that suggests that merely introducing a leaderboard, without actually paying people, would have had much the same effect. Then again, that might just be because I'm not that far off. And if there hadn't been the payment, maybe I wouldn't have ended up in the position where I'm not that far off.
I guess I feel incentivised to post a lot more than I would otherwise, but especially in the comments rather than the posts since if you post a lot of posts that likely suppresses the number of people reading your other posts. This probably isn't a worthwhile tradeoff given that one post that does really well can easily outweight 4 or 5 posts that only do okay or ten posts that are meh.
Another thing: downvotes feel a lot more personal when it means that you miss out on landing on the leaderboard. This leads me to think that having a leaderboard for the long term would likely be negative and create division.
If anyone was planning on submitting something to this competition, I'll give you another 48 hours to get it in - https://www.lesswrong.com/posts/Gzw6FwPD9FeL4GTWC/usd1000-usd-prize-circular-dependency-of-counterfactuals.
Thick and Thin Concepts
Take for example concepts like courage, diligence and laziness. These concepts are considered thick concepts because they have both a descriptive component and a moral component. To be courageous is most often meant* not only to claim that the person undertook a great risk, but that it was morally praiseworthy. So the thick concept is often naturally modeled as a conjunction of a descriptive claim and a descriptive claim.
However, this isn't the only way to understand these concepts. An alternate would be along the following lines: Im...
I don't want to comment on the whole Leverage Controversy as I'm far away enough from the action that other people are probably better positioned to sensemake here.
On the other hand, I have been watching some of Geoff Anders' streams does seem pretty good at theorising by virtue of being able to live-stream this. I expect this to be a lot harder than it looks, when I'm trying to figure out my position on an issue, I often find myself going over the same ground again and again and again, until eventually I figure out a way of putting what I want to express into words.
That said, I've occasionally debated with some high-level debaters and given almost any topic they're able to pretty much effortlessly generate a case and how the debate is likely to play out. I guess it seems on par with this.
So I think his ability to livestream demonstrates a certain level of skill, but I almost view it as speed-chess vs. chess, in that there's only so much you can tell about a person's ability in normal chess from how good they are at speed chess.
I think I've improved my own ability to theorise by watching the streams, but I wouldn't be surprised if I improved similarly from watching Eliezer, A...
I'm beginning to warm to the idea that the reason why we have evolved to think in terms of counterfactuals and probabilities is rooted in these are fundamental at the quantum-level. Normally I'm suspicious at rooting macro level claims in quantum level effects because at such a high level of abstraction it would be very easy for these effects to wash out, but the multi-world hypothesis is something that wouldn't wash out. Otherwise it would seem to be all a bit too much of a coincidence.
("Oh, so you believe that counterfactuals and probability are at least...
I was talking with Rupert McCallum about the simulation hypothesis yesterday. Rupert suggested that this argument is self-defeating; that is it pulls the rug from under its own feet. It assumes the universe has particular properties, then it tries to estimate the probability of being in a simulation from these properties and if the probability is sufficiently high, then we conclude that we are in a simulation. But if we are likely to be in a simulation, then our initial assumptions about the universe are likely to be false, so we've disproved the assu...
I really dislike the fiction that we're all rational beings. We really need to accept that sometimes people can't share things with us. Stronger: not just accept but appreciate people who make this choice for their wisdom and tact. ALL of us have ideas that will strongly trigger us and if we're honest and open-minded, we'll be able recall situations when we unfairly judged someone because of a view that they held. I certainty can, way too many times to list.
I say this as someone who has a really strong sense of curiosity, knowing that I...
I've always found the concept belief in belief slightly hard to parse cognitively. Here's what finally satisfied my brain: whether you will be rewarded or punished in heaven is tied to whether or not God exists, whether or not you feel a push to go to church is tied to whether or not you believe in God. If you do go to church and want to go your brain will say, "See I really do believe" and it'll do the reverse if you don't go. However, it'll only affect your belief in God indirectly through your "I believe in God" node. Putting it another way, going to ch
Pet theory about meditation: Lots of people say that if you do enough meditation that you will eventually realise that there isn't a self. Having not experienced this myself, I am intensely curious about what people observe that persuades them to conclude this. I guess I get a sense that many people are being insufficiently skeptical. There's a difference between there not appearing to be such a thing as a self and a self not existing. Indeed, how do we know meditation just doesn't temporarily silence whatever part of our mind is responsible...
I've recently been reading about ordinary language philosophy and I noticed that some of their views align quite significantly with LW. They believed that many traditional philosophical question only seemed troubling because of the philosophical tendency to assume words like "time" or "free will" necessarily referred to some kind of abstract entity when this wasn't necessary at all. Instead they argued that by paying attention to how we used these words in ordinary, everyday situations we could see that the way people used the...
Three levels of forgiveness - emotions, drives and obligations. The emotional level consists of your instinctual anger, rage, disappointment, betrayal, confusion or fear. This is about raw raws. The drives consists of your "need" for them to say sorry, make amends, regret their actions, have a conversation or emphasise with you. In other words, it's about needing the situation to turn out a particular way. The obligations are very similar to the drives, except it is about their duty to perform these actions rather than your desire to make it...
There appears to be something of a Sensemaking community developing on the internet, which could roughly be described as a spirituality-inspired attempt at epistemology. This includes Rebel Wisdom, Future Thinkers, Emerge and maybe you could even count post-rationality. While there are undoubtedly lots of critiques that could be made of their epistemics, I'd suggest watching this space as I think some interesting ideas will emerge out of it.
EDT agents handle Newcomb's problem as follows: they observe that agents who encounter the problem and one-box do better on average than those who encounter the problem and two-box, so they one-box.
That's the high-level description, but let's break it down further. Unlike CDT, EDT doesn't worry about the fact that their may be a correlation between your decision and hidden state. It assumes that if the visible state before you made your decision is the same, then the counterfactuals generated by considering your possible decisions are c...
Hegel - A Very Short Introduction by Peter Singer - Book Review Part 1: Freedom
Hegel is a philosopher who is notorious for being incomprehensible. In fact, for one of his books he signed a contract that assigned a massive financial penalty for missing the publishing deadline, so the book ended up being a little rushed. While there was a time when he was dominant in German philosophy, he now seems to be held in relatively poor regard and his main importance is seen to be historical. So he's not a philosopher that I was really planning to spend much time on.
Given this, I was quite pleased to discover this book promising to give me A Very Short Introduction, especially since it is written by Peter Singer, a philosopher who write and thinks rather clearly. After reading this book, I still believe that most of what Hegel wrote was pretentious nonsense, but the one idea that struck me as the most interesting was his conception of freedom.
A rough definition of freedom might be ensuring that people are able to pursue whatever it is that they prefer. Hegel is not a fan abstract definitions of freedom which treat all preferences the same and don't enquire where they come from.
In hi...
Book Review: Waking Up by Sam Harris
This book aims to convince everyone, even skeptics and athiests, that there is value in some spiritual practises, particularly those related to meditation. Sam Harris argues that mediation doesn't just help with concentration, but can also help us reach transcendental states that reveal the dissolution of the self. It mostly does a good job of what it sets out to do, but unfortunately I didn't gain very much benefit from this book because it focused almost exclusively on persuading you that there is value here,...
Anti-induction and Self-Reinforcement
Induction is the belief that the more often a pattern happens the more likely it is to continue. Anti-induction is the opposite claim: the more likely a pattern happens the less likely future events are to follow it.
Somehow I seem to have gotten the idea in my head that anti-induction is self-reinforcing. The argument for it is as follows: Suppose we have a game where at each step a screen flashes an A or a B and we try to predict what it will show. Suppose that the screen always flashes A, but the agent initially think...
I've been thinking about Rousseau and his conception of freedom again because I'm not sure I hit the nail on the head last time. The most typical definition of freedom and that championed by libertarians focuses on an individual's ability to make choices in their daily life. On the more libertarian end, the government is seen as an oppressor and a force of external compulsion.
On the other hand, Rousseau's view focuses on "the people" and their freedom to choose the kind of society that they want to live in. Instead of being se...
Review: Human-Compatible by Stuart Russell
I wasn't a fan of this book, but maybe that's just because I'm not in the target audience. As a first introduction to AI safety I recommend The AI Does Not Hate You by Tom Chivers (facebook.com/casebash/posts/10100403295741091) and for those who are interested in going deeper I'd recommend Superintelligence by Nick Bostrom. The strongest chapter was his assault on arguments against those who think we shouldn't worry about superintelligence, but you can just read it here: https://spectrum.ie...
Book Review: Communist Manifesto
“The history of all hitherto existing society is the history of class struggles. Freeman and slave, patrician and plebeian, lord and serf, guild-master and journeyman, in a word, oppressor and oppressed, stood in constant opposition to one another, carried on an uninterrupted, now hidden, now open fight, that each time ended, either in the revolutionary reconstitution of society at large, or in the common ruin of the contending classes”
Overall summary: Given the rise of socialism in recent years, now seemed like an appropriate time to review the Communist Manifesto. At times I felt that Marx’s writing was keenly insightful, at other times I felt he was in ignorance of basic facts and at other times I felt that he held views that were reasonable at the time, but for which the flaws are now obvious. In particular, I found the first-half much more engaging than I expected because, say what you like about Marx, he’s an engaged and poetic writer. Towards the end, the focused shifted into particular time-bounded political disputes for which I neither had the knowledge to understand nor the interest to acquire. At the start, I fe...
I once talked about this with a guy who identified as a Marxist, though I can't say how much his opinions are representative for the rest of his tribe. Anyway... he told me that in the trichotomy of Capital / Land / Labor, human talent is economically most similar to the Land category. This is counter-intuitive if you take the three labels literally, but if you consider their supposed properties... well, it's been a few decades since I studied economics, but roughly:
The defining property of Capital is fungibility. You can use money to buy a tech company, or an airplane factory, or a farm with cows. You can use it to start a company in USA, or in India. There is nothing that locks money to a specific industry or a specific place. Therefore, in a hypothetical perfectly free global market, the risk-adjusted profit rates would become the same globally. (Because if investing the money in cows gives you 5% per annum, but investing money in airplanes gives you 10%, people will start selling cow farms and buying airplane factories. This will reduce the number of cow farms, thus increasing their profit, and increase the competition in the airplane market, thus reducing their profi...
What does it mean to define a word? There's a sense in which definitions are entirely arbitrary and what word is assigned to what meaning lacks any importance. So it's very easy to miss the importance of these definitions - emphasising a particular aspect and provides a particular lense with which to see the world.
For example, if define goodness as the ability to respond well to others, it emphasizes that different people have different needs. One person may want advice, while another simple encouragement. Or if we define love as acceptance of the other, it suggests that one of the most important aspects of love is the idea that true love should be somewhat resilient and not excessively conditional.
Here's one way of explaining this: it's a contradiction to have a provable statement that is unprovable, but it's not a contradiction for it to be provable that a statement is unprovable. Similarly, we can't have a scenario that is simultaneously imagined and not imagined, but we can coherently imagine a scenario where things exist without being imagined by beings within that scenario.
If I can imagine a tree that exists outside of any mind, then I can imagine a tree that is not being imagined. But "an imagined X that i...
As I wrote before, evidential decision theory can be critiqued for failing to deal properly with situations where hidden state is correlated with decisions. EDT includes differences in hidden state as part of the impact of the decision, when in the case of the smoking lesion, we typically want to say that it is not.
However, Newcomb's problem also has hidden state is correlated with your decision. And if we don't want to count this when evaluating decisions in the case of the Smoking Lesion, perhaps we shouldn't count this in the case of Newc...
Writing has been one of the best things for improving my thinking as it has forced me to solidify my ideas into a form that I've been able to come back to later and critique when I'm less enraptured by them. On the other hand, for some people it might be the worst thing for their thinking as it could force them to solidify their ideas into a form that they'll later feel compelled to defend.
Despite having read dozens of articles discussing Evidential Decision Theory (EDT), I've only just figured out a clear and concise explanation of what it is. Taking a step back, let's look at how this is normally explained and one potential issue with this explanation. All major decision theories (EDT, CDT, FDT) rate potential decisions using expected value calculations where:
So it should be just a simple matter...
Book Review: Awaken the Giant Within Audiobook by Tony Robbins
First things first, the audiobook isn't the full book or anything close to it. The standard book is 544 pages, while the audiobook is a little over an hour and a half. The fact that it was abridged really wasn't obvious.
We can split what he offers into two main categories: motivational speaking and his system itself. The motivational aspect of his speaking is very subjective, so I'll leave it to you to evaluate yourself. You can find videos of his on Youtube and you should know wi...
Book Review: The Rosie Project:
Plot summary: After a disastrous series of dates, autistic genetics professor Don Tilman decides that it’d be easier to just create a survey to eliminate all of the women who would be unsuitable for him. Soon after, he meets a barmaid called Rosie who is looking for help with finding out who her father is. Don agrees to help her, but over the course of the project Don finds himself increasingly attracted to her, even though the survey suggests that he is completely unsuitable. The story is narrated in Don’s voice. He tells us...
Book Review: So Good They Can't Ignore You by Cal Newport:
This book makes an interesting contrast to The 4 Hour Workweek. Tim Ferris seems to believe that the purpose of work should be to make as much money as possible in the least amount of time and that meaning can then be pursued during your newly available free time. Tim gives you some productivity tips in the hope that it will make you valuable enough to negotiate flexibility in terms of how, when and where you complete your work, plus some dirty tricks as well.
Cal Newport's book is similar in that it focuses on becoming valuable enough to negotiate a job that you'll love and downplays the importance of pursuing your passions in your career. However, while Tim extolls the virtues of being a digital nomad, Cal Newport emphasises self-determination theory and autonomy, competence and relatedness. That is, the freedom to decide how you pursue your work, the satisfaction of doing a good job and the pleasure of working with people who you feel connected to. He argues that these traits are rare and valuable and so that if you want such a job you'll need skills that rare and valuable to offer in return.
That's...
Book Review: The 4 Hour Workweek
This is the kind of book that you either love or hate. I found value in it, but I can definitely understand the perspective of the haters. First off: the title. It's probably one of the most blatant cases of over-promising that I've ever seen. Secondly, he's kind of a jerk. A number of his tips involve lying and in school he had a strategy of interrogating his lecturers in detail when they gave him a bad mark so that they'd think very carefully assigning him a bad grade. And of course, while drop-shipping might have been an underexploited strategy at the time when he wrote the book, it's now something of a saturated market.
On the plus side, Tim is very good at giving you specific advice. To give you the flavour, he advises the following policies for running an online store: avoid international orders, no expedited or overnight shipping, two options only - standard and premium; no cheque or Western union, no phone number if possible, minimum wholesale order with tax id and faxed in order form, ect. Tim is extremely process oriented and it's clear that he has deep expertise here and is able to share it unusually well. I fo...
Book Review: Civilization and its discontents
Freud is the most famous psychologist of all time and although many of his theories are now discredited or seem wildly implausible, I thought it'd be interesting to listen to him to try and understand why it sounded plausible in the first place.
At times Freud is insightful and engaging; at other times, he falls into psychoanalytic lingo in such a way that I couldn't follow what he was trying to say. I suppose I can see why people might have assumed that the fault was with their failure to understand.
It's a short read, so if you're curious, there isn't that much cost to going ahead and reading it, but this is one of those rare cases where you can really understand the core of what he was getting at from the summary on Wikipedia (https://en.m.wikipedia.org/wiki/Civilization_and_Its_Discontents)
Since Wikipedia has a summary, I'll just add a few small remarks. This book focuses on a key paradox; our utter dependence on it for anything more than the most basic survival; but how it requires us to repress our own wants and desires so as to fit in with an ordered society. I find this to be an interesting answer to t...
I think I spent more time writing this than reading the book, as I find reviewing fiction much more difficult. I strongly recommend this book: it doesn't take very long to read, but you may spend much longer trying to figure out what to make of it.
Book Review: The Stranger by Camus (Contains spoilers)
I've been wanting to read some existentialist writing for a while and it seemed reasonable to start with a short book like this one. The story is about a man who kills a man for what seems to be no real reason at all and who is then subsequently arrested and m
I really like the short-form feature because after I have articulated a thought my head feels much clearer. I suppose that I could have tried just writing it down in a journal or something; but for some reason I don't feel quite the same effect unless I post it publicly.
The sad thing about philosophy is that as your answers become clearer, the questions become less mysterious and awe-inspiring. It's easy to assume that an imposing question must have an impressive answer, but sometimes the truth is just simple and unimpressive and we miss this because we didn't evolve for this kind of abstract reasoning.
This is the first classic that I’m reviewing. One of the challenges with figuring out which classics to read is that there are always people speaking very highly of it and in a vague enough manner that it makes it hard for you to decide whether to read it. Hopefully I can avoid this trap.
Book Review: Animal Farm
You probably already know the story. In a thinly veiled critique of the Russian Revolution, the animals in a farm decide to revolt against the farmer and run the the farm themselves. At start, the seven principles of Animalism are idealistically declared, but as time goes on, things increasingly seem to head downhill…
Why is this a classic?: This book was released at a time when the intellectual class was firmly sympathetic to the Soviets, ensuring controversy and then immortality when history proved it right.
Why you might want to read this: Short (only 112 pages or 3:11 on Audible), the story always moves along at a brisk pace, the writing is engaging and a few very emotionally impactful moments. The broader message of being wary of the promises made by idealistic movements still holds (especially "all animals are equal, but some animals are more equal than others"...
Wow, I've really been flying through books recently. Just thought I should mention that I'm looking for recommendations for audio books; bonus points for books that are short. Anyway....
Book Review: Zero to One
Peter Thiel is the most famous contrarian in Silicon Valley. I really enjoyed hearing someone argue against the common wisdom of the valley. Most people think in terms of beating the competition; Thiel thinks in terms of establishing a monopoly so that there is no competition. Agile methodology and the lean startup are all the rage, but Thiel argues that this only leads to incremental improvements and that truly changing the world requires you to commit to a vision. Most companies was to disrupt your competitors, but for Thiel this means that you've fallen into competition, instead of forging your own unique path. Most venture funds aim to diversify, but Thiel is more selective, only investing in companies that have billion dollar potential. Many startups spurn marketing, but Thiel argues that this is dishonest and that PR is also a form of marketing, even if that isn't anyone's job title. Everyone is betting on AI replacing humans, while Thiel is mor...
As I said before, I'll be posting book reviews. Please let me know if you have any questions and I'll answer them to the best of my ability.
Book Review: The AI does not hate you by Tom Chivers
The title of this book comes from a quote by Elizier Yudkowsky which reads in full: "The AI does not hate you, nor does it love you, but you are made of atoms which it can use of something else". This book covers not only potential risks from AI, but the rationalist community from which this evolved and also touches on the effective altruism movement.
This book fills something of a gap in the book market; when people are first learning about existential risks from AI I usually recommend the two-part Wait by Why post (https://waitbutwhy.com/2015/01/artificial-intelligence-revolution-1.html) and then I'm not really sure what to recommend next. The sequences are ridiculously long and Bostrom's Superintelligence is a challenging read for those not steeped in philosophy and computer science. In contrast, this book is much more accessible and provides the right level of detail for a first introduction, rather than someone who has already decided to try entering the field.
I mostly listened to this boo
I'm going to start writing up short book reviews as I know from past experience that it's very easy to read a book and then come out a few years later with absolutely no knowledge of what was learned.
Book Review: Everything is F*cked: A Book About Hope
To be honest, the main reason why I read this book was because I had enjoyed his first and second books (Models and The Subtle Art of Not Giving A F*ck) and so I was willing to take a risk. There were definitely some interesting ideas here, but I'd already received many of these through other s...
One thing I'm finding quite surprising about shortform is how long some of these posts are. It seems that many people are using this feature to indicate that they've just written up these ideas quickly in the hope that the feedback is less harsh. This seems valuable; the feedback here can be incredibly harsh at times and I don't doubt that this has discouraged many people from posting.
About this Post - 🔗: 🕺wiseaiadvisors.com , 🕴️Formal Edition (coming soon)
This post is still in draft, so any feedback would be greatly appreciated 🙏. It'll be posted as a full, proper Less Wrong/EA Forum/Alignment forum post, as opposed to just a short-form, when it's ready 🌱🌿🌳.
📲 QR Code
This post is a collaboration between Chris Leong (primary author) and Christopher Clay (editor), written in the voice of Chris Leong.
We have worked very hard on this[3] and we hope you find it to be of some use. Despite this work, it will most likely contain enough flaws that we will be at least somewhat embarrassed at having written at least parts of it and yet at some point a post has to go out into world to fend for itself.
There's no guarantee that this post will have the kind of impact that we deeply wish for, or even any achieve anything at all... And yet - regardless[4] - it is an honour to serve[5] especially at this dark hour when AGI looms on the horizon and doubt has begun to creep into the hearts of men[6] 🫡. The sheer scale of the problem feels overwhelming at times, and yet... we do what we can[7].
This post[8] doubles[9] as both as serious attempt to produce an original "alignment proposal"[10] and a passion project[11]. The AI safety community has spend nearly two and a half decades trying to convince people to take AI risks seriously. Whilst there have been significant successes, these have also fallen short[12]. Undoubtedly this is incredibly naive[13], but perhaps passion and authenticity[14] can succeed where arguments and logic have failed ✨🌠.
The initial draft of this post was produced during the 10th edition of AI Safety Camp. Thanks to my team: Matthew Hampton, Richard Kroon, and Chris Cooper, for their feedback. Unlike most other AI Safety Camp projects, which focus on a single, unitary project, we were more of a research collective with each person pursuing their own individual projects.
I am continuing development during the 2025 Summer Edition of the Cambridge ERA: AI Fellowship with an eye to eventually produce a more formal output. Thanks to my research manager, Peter Gebauer and my mentor, Prof. David Manley.
I also greatly appreciate feedback provided by many others[15], including: Jonathan Kummerfeld.
✒️ Selected Quotes: 🕵️
| But the moral considerations, Doctor... Did you and the other scientists not stop to consider the implications of what you were creating? — Roger Robb When you see something that is technically sweet, you go ahead and do it and you argue about what to do about it only after you have had your technical success. That is the way it was with the atomic bomb[16]— Oppenheimer |
| ❦ |
| There are moments in the history of science, where you have a group of scientists look at their creation and just say, you know: ‘What have we done?... Maybe it's great, maybe it's bad, but what have we done? — Sam Altman[17] |
| ❦ |
| Urgent: get collectively wiser - Yoshua Bengio, AI "Godfather"[18][19] |
We stand at a crucial moment in the history of our species. Fueled by technological progress, our power has grown so great that for the first time in humanity’s long history, we have the capacity to destroy ourselves—severing our entire future and everything we could become.
Yet humanity’s wisdom has grown only falteringly, if at all, and lags dangerously behind. Humanity lacks the maturity, coordination and foresight necessary to avoid making mistakes from which we could never recover. As the gap between our power and our wisdom grows, our future is subject to an ever-increasing level of risk. This situation is unsustainable. So over the next few centuries[20], humanity will be tested: it will either act decisively to protect itself and its long-term potential, or, in all likelihood, this will be lost forever — Toby Ord, The Precipice[21]
We have created a Star Wars civilization, with Stone Age emotions, medieval institutions, and godlike technology — Edward O. Wilson, The Social Conquest of Earth[22]
| ❦ |
Before the prospect of an intelligence explosion, we humans are like small children playing with a bomb. Such is the mismatch between the power of our plaything and the immaturity of our conduct — Nick Bostrom, Founder of the Future of Humanity Institute, Superintelligence[23]
| ❦ |
If we continue to accumulate only power and not wisdom, we will surely destroy ourselves — Carl Sagan, Pale Blue Dot[24]
Never has humanity had such power over itself, yet nothing ensures that it will be used wisely, particularly when we consider how it is currently being used…There is a tendency to believe that every increase in power means “an increase of ‘progress’ itself ”, an advance in “security, usefulness, welfare and vigour; …an assimilation of new values into the stream of culture”, as if reality, goodness and truth automatically flow from technological and economic power as such. — Pope Francis, Laudato si'[25]
| ❦ |
The fundamental test is how wisely we will guide this transformation – how we minimize the risks and maximize the potential for good — António Guterres, Secretary-General of the United Nations[26]
| ❦ |
Our future is a race between the growing power of our technology and the wisdom with which we use it. Let’s make sure that wisdom wins — Stephen Hawking, Brief Answers to the Big Questions[27]
| ❦ |
🎁 Additional quotes (from Life Itself)
|
| ❦ |
| 🧨 The Defining Challenge: Given the rapid speed of AI progress, massive strategic uncertainty and wide range of potential societal-scale vulnerabilities[30]... we face a growing gap between our capabilities and our wisdom[31]... |
| 📌 𝙿𝚁𝙸𝙼𝙰𝚁𝚈 𝙲𝙻𝙰𝙸𝙼: In light of this, wise AI advisors aren't just an urgent research priority, but absolutely critical (⁉️)[32] |
📖 𝙱𝙰𝚂𝙸𝙲 𝚃𝙴𝚁𝙼𝙸𝙽𝙾𝙻𝙾𝙶𝚈 - Wise AI advisors? Wisdom? (I recommend skipping initially) 🙏
⬇️ I suggest initially skipping this section and focusing on the core argument for now[33] ⬇️
Wise AI Advisors?
Do you mean?:
• AIs trained to provide advice to humans ✅
• AIs trained to act wisely in the world ❌
• Humans trained to provide wise advise about AI ❌
Most work on Wise AI is focused on the question of how AI could learn to act wisely in the world[34], however, I'm more interested in the former as it allows humans to compensate for the weaknesses in the AIs[35].
Even though Wise AI Advisers doesn't refer to humans, I am primarily interested in how Wise AI Advisors could be deployed as part of a cybernetic system.
Training humans to be wiser would help with this project:
• Wiser humans can train wiser AI
• When we combine AI and humans into a cybernetic system, wiser humans will both be better able to elicit capabilities from the AI and also better able to plug any gaps in the AIs wisdom.
What do you mean by wisdom?
For the purposes of the following argument, I'd encourage you to first consider this in relation to how you conceive of wisdom rather than worrying too much about how I conceive of wisdom. Two main reasons:
• I suspect this reduces the chance of losing someone partway through the argument because they conceive of wisdom slightly differently than I do[36].
• I believe that there are many different types of wisdom that are useful for steering the world in a positive direction and many perspectives on wisdom that are worth investigating. I'm encouraging readers to first consider this argument in relation to their own understanding of wisdom in order to increase the diversity of approaches pursued.
Even though I'd encourage you to read the core argument first, if you really want to hear more about how I conceive of wisdom right now, you can scroll down to the Clarifications section to find out more about what I believe. 🔜🧘 or ⬇️⬇️⬇️😔⭐ My core argument... in a single sentence ☝️⭐
😮 Absolutely critical? That's a strong claim! - 🛡️[37]
☞ Three Key Advantages:
(See the main post for more detail.) |
Five additional benefits
🚨💪🦾 — 1: This approach scales with increases in capabilities.
2) Forget marginal improvements, wisdom tech could provide the missing piece for another strategy, by allowing us to pursue it in a wiser manner.
3) Wisdom technology is likely favourable from a differential technology perspective. Many actors are only reckless because they're unwise.
4) Even a small coalition using these advisors to refine strategy, improve coordination and engage in non-manipulative persuasion could significantly shift the course of civilisation over time.
5) Suppose all else fails: training up a bunch of folks with both a deep understanding of the alignment problem and a strong understanding of wisdom seems incredibly useful.
Please note: This post is still a work in progress. Some sections have undergone more editing and refinement than others. Some bits may be inconsistent, such as if I'm in the middle of making a change. As this is just a draft, I may not end up endorsing all the claims made. Feedback is greatly appreciated 🙏.
|
Unaided human wisdom is vastly insufficient for the task before us..of navigating an entire series of highly uncertain and deeply contested decisions where a single mistake could prove ruinous with a greatly compressed timeline |
| ❦ |
| ...if you use the bold text to skim until the amber lantern section divider 😉 |
Two useful framings:
| 𝚃 𝙷 𝙴 🅂🅄🅅 𝚃 𝚁 𝙸 𝙰 𝙳 – speed, uncertainty, vulnerability🃏[46][47] |
Going through them in reverse order: 🅅 𝚄 𝙻 𝙽 𝙴 𝚁 𝙰 𝙱 𝙸 𝙻 𝙸 𝚃 𝚈 – 🌊🚣: The development of advanced AI technologies will have a massive impact on society given the essentially infinite ways to deploy such a general technology. There are lots of ways this could go well, and lots of ways w
|
Reflections - Why the SUV Triad is Fucking[56] Scary
Big if true: It may even present a reason (draft) to expect Disaster-By-Default ‼️
|
| ⚘ |
| 𝚃 𝙷 𝙴 𝚆 𝙸 𝚂 𝙳 𝙾 𝙼 — [𝙲 𝙰 𝙿 𝙰 𝙱 𝙸 𝙻 𝙸 𝚃 𝙸 𝙴 𝚂] 𝙶 𝙰 𝙿[57] – 📉📈 |
<TODO: Feels like there's some more implicit subclaims here I need to address> Subclaim 1: As humanity gain access to more power, we need more wisdom in order to navigate it
I think that it is, at least in the case of AI:
|
| ⚘ |
| How do we address the SUV Triad and the Wisdom Gap? |
I propose that the most straightforward way to address this is to train wise AI advisors. But what about the alternatives?:
|
| ⚘ |
In light of this:
📌 I believe that AI is much more likely to go well for humanity if we develop wise AI advisors |
I am skeptical of the main alternatives I am serious about making this happen... If you are serious about this too, please: |
☞ Or for a More Formal Analysis (Using Three Different Frameworks) — TODO
Importance-Tractability-Neglectedness
This is a standard EA framework for considering cause areas. Wise AI is broad enough that I consider it reasonable to analyse it as a cause area.
| Definition | Score | Reason | |
|---|---|---|---|
| Importance | |||
| Tractability | |||
| Neglectedess | |||
| OVERALL |
Safety-Freedom-Value
This framework is designed to appeal more to startup/open-source folks. These folks are more likely to put significant weight on the social value a technology provides and on freedom beyond mere utility.
| Definition | Score | Reason | |
|---|---|---|---|
| Safety | |||
| Positive Use Cases | |||
| * Freedom (as intrinsic good) | |||
| OVERALL |
Searching for Solutions
This framework is designed for folk who think current techniques are unlikely to work.
| Definition | Score | Reason | |
|---|---|---|---|
| Survival value | How much does the technique directly contribute to our chance of survival? | ||
| Information Value | |||
| Unlocked Optionality | |||
| Acceleration | |||
| * Malicious use cases | |||
| OVERALL |
☞ I'm sold! How can I get involved? 🥳🎁
As I said, if you're serious, please ✉️ PM me . If you think you might be serious, but need to talk it through, please reach out as well.
I'd be useful for me to know your background and how you think you could contribute. Maybe tell you a few facts about yourself, what interests you and drop a link to your LinkedIn profile?
If you scroll down, you'll see that I've answered some more questions about getting involved, but I'll include some useful links here as well:
• List of potentially useful projects : for those who want to know what could be done concretely. Scroll down further for a list of project lists ⬇️.
• Resources for founding an AI safety startup or non-profit : since I believe there should be multiple organisations pursuing this agenda
• Resources for getting started in AI Safety more broadly : since some of these resources might be useful here as well
Reminder: If you're looking for a definition of wise AI advisors, it's at the 🔝 of the page .
I've read through your argument and substituted my own understanding of wisdom. Now that you've wasted my time I've done this, perhaps you could clarify how you think about wisdom? ✅🙏
Sure. I've written at the top of this post ⬆️ why I often try to dodge this question initially. But given that you've made it this far, I'll share some thoughts. Just don't over-update on my views 😂.
Given how rapidly AI is developing, I suspect that we're unlikely to resolve the millennia-long philosophical debate about the true nature of wisdom before AGI is built. Therefore, I suggest that we instead sidestep this question by identifying specific capabilities related to wisdom that might useful for steering the world in a positive direction.
I'd suggest that examples might include: the ability to make wise strategic decisions, non-manipulation persuasion and the ability to find win-wins. I'll try to write up a longer list in the future.
Some of these capabilities will be more or less useful for steering the world in positive directions. On the other hand, negative externalities such as accelerating timelines or enabling malicious actors.
My goal is to figure out which capabilities to prioritise by balancing the benefits against the costs and then incorporating feasibility.
It may be the case that some people decide that wisdom requires different capabilities than those I end up deciding are important. As long as they pick a capability that isn't net-negative, I don't see that as bad. In fact, I see different people pursuing different understandings of wisdom as adding robustness to different world models.
If wisdom ultimately breaks down into specific capabilities, why not simply talk about these capabilities and avoid using a vague concept like wisdom? 🙅♂️🌐
So the question is: "Why do I want to break wisdom down into separate capabilities instead of choosing a unitary definition of wisdom and attempting to train that into an AI system?"
Firstly, I think the chance of us being able to steer the world towards a positive direction is much higher if we're able to combine multiple capabilities together, so it makes sense to have a handle for the broader project, in addition to handles for individual sub-projects. I believe that techniques designed for one capability will often carry over to other capabilities, as will the challenges, and having a larger handle makes it easier to make these connections. I also think there's a chance that these capabilities amplify each other (as per the final few paragraphs of Imagining and Building Wise Machines [60] by Johnson, Bengio, Grossmann, et al).
Secondly, I believe we should be aiming to increase both human wisdom and AI wisdom simultaneously. In particular, I believe it's important to increase the wisdom of folks creating AI systems and that this will then prove useful for a wide variety of specific capabilities that we might wish to train.
Finally, I'm interested in investigating this frame as part of a more ambitious plan to solve alignment on a principled level. Instead of limiting the win condition to building an AI that always (competently) acts in line with human values, the wise AI Advisors frame broadens it such that the AI only needs to inspire humans to make the right decision. It's hard to know in advance whether this reframing will be any easier, but even if it doesn't help, I have a strong intuition that understanding why it doesn't help would shed light on the barriers to solving the core alignment problem.
Weren't you focused on wise AI advisors via Imitation Learning before? 🎯
Yep, I was focused on it before. I now see that goal as overly narrow. The goal is to produce wise AI Advisors via any means. I think that Imitation Learning is underrated, but there are lots of other approaches that are worth exploring as well.
Isn't this argument a bit pessimistic? 🙅♂️⚖️. I prefer to be optimistic. 👌
Optimism isn't about burying your head in the sand and ignoring the massive challenges facing us. That's denialism. Optimism is about rolling up your sleeves and doing what needs to be done.
The nice thing about this proposal from an optimistic standpoint is that, assuming there is a way for AI to go well for humanity, then it seems natural to expect that there is some way to leverage AI to help us find it[61].
Additionally, the argument for developing wise AI advisors isn't in any way contingent on a pessimistic view of the world. Even if you think AI is likely to go well by default, wise AI advisors could still be of great assistance for making things go even better. For example, facilitating negotiations between powers, navigating the safety-openness tradeoff and minimising any transitional issues.
But I don't think timelines are short. They could be long. Like more than a decade. 👌🤷♂️
Short timelines add force to the argument I've made above, but they aren't at all a necessary component[62].
Even if AI will be developed over decades rather than years, there's still enough different challenges and key decisions that unaugmented human wisdom is unlikely to be sufficient.
In fact, my proposal might even work better over long timelines, as it provides more time for AI advisers to help steer the world in a positive direction[63].
Don't companies have a commercial incentive to train wise AI by default? 🤏
I'm extremely worried about the incentives created by a general chatbot product. The average user is low-context and this creates an incentive towards sychophancy 🤥.
I believe that a product aimed at providing advice for critically important decisions would provide better incentives, even it were created by the same company.
Furthermore, given the potential for short timelines, it seems extremely risky 🎰 to rely purely on the profit motive, especially since there is a much stronger profit motive to pursue capabilities 💰💰💰. A few months' delay could easily mean that a wise AI advisor isn't available for a crucial decision 🚌💨. Humanity has probably already missed having such advisors available to assist us during a number of key decision points 😭.
Won't this just be used by malicious actors? Doesn't this just accelerate capabilities? 🤔❌
I expect both the benefits and the externalities to vary hugely by capability. I expect some to be positive, some to be negative and some to be extremely hard to determine. More work is required to figure out which capabilities are best from a differential technology development perspective.
I understand that this answer might be frustrating, but I think it's worth sharing these ideas even though i haven't yet had time to run this analysis. I have a list of projects that I hope will prove fairly robust, despite all the uncertainty.
Is there value in wisdom given that wisdom is often illegible and this makes it non-verifiable? ✔️💯
Oscar Delany makes this argument in Tentatively against making AIs 'wise' (runner up in the AI Impact Essay Competition ).
This will depend on your definition of wisdom.
I admit that this tends to be a problem with how I tend to conceive of wisdom, however I will note that Imagining and Building Wise Machines ( summary ) takes the opposite stance - the wisdom, conceived of as metacognition - can actually assist with explainability.
But let's assume that this is a problem. How much of a problem is it?
I suspect that this varies significantly by actor. There's a reasonable argument that public institutions shouldn't be using such tools for reasons of justice. However, these arguments have much less force when it comes to private actors.
Even for private actors, it makes sense to use more legible techniques as much as possible, but I don't think this will be sufficient for all decisions. In particular, I don't think objective reasoning is sufficient for navigating the key decisions facing society in the transition to advanced AI.
But I also want to push back against the non-verifiability. You can do things like run a pool of advisors and only take dramatic action if more than a certain proportion agree, plus you can do testing, even advanced things like latent adversarial testing. It's not as verifiable as we'd like, but it's not like we're completely helpless here.
There will also be ways to combine wise AI advisors with more legible systems.
I'm mostly worried about inner alignment. Does this proposal address this? 🤞✔️
Inner alignment is an extremely challenging issue. If only we had some... wise AI advisors to help us navigate this problem.
"But this doesn't solve the problem, this assumes that these advisors are aligned themselves": Indeed, that is a concern. However, I suspect that the wise AI advisors approach has less exposure to these kinds of risks as it allows us to achieve certain goals at a lower base model capability level.
• Firstly, wise people don't always have huge amounts of intellectual firepower. So I believe that we will be able to achieve a lot without necessarily using the most powerful models.
• Secondly, the approach of combined human-AI teams allows the humans to compensate for any weaknesses present in the AIs.
In summary, this approach might help in two ways: by reducing exposure and advising us on how to navigate the issue.
Thanks for asking 🥳. Strangely enough I was looking for an excuse to hit a bunch of my key talking points 😂. Here's my top three:
|
| ❉ |
|
| ❉ |
|
🎁 Not persuaded yet? Here's three more bonus points :
I believe that this proposal is particularly promising because it has so many different plausible theories of change. It's hard to know in advance what assumptions will or will not pan out
You might also be interested in my draft post N Stories of Impact for Wise AI Advisors 🏗️ which attempts to cleanly seperate out the various possible theories of impact.
☞ I wasn't sold before, but I am now. How can I get involved? 🥳🎁
If you're serious, please ✉️ PM me . If you think you might be serious, but need to talk it through, please reach out as well.
I'd be useful for me to know your background and how you think you could contribute. Maybe tell you a few facts about yourself, what interests you and drop a link to your LinkedIn profile?
If you scroll down, you'll see that I've answered some more questions about getting involved, but I'll include some useful links here as well:
• List of potentially useful projects : for those who want to know what could be done concretely. Scroll down further for a list of project lists ⬇️.
• Resources for founding an AI safety startup or non-profit : since I believe there should be multiple organisations pursuing this agenda
• Resources for getting started in AI Safety more broadly : since some of these resources might be useful here as well
| ✋ But this is still all so abstract? 🧘👩🍼 |
Now that we've clarified some of the possible theories of impact, the next section will delve more into specific approaches and projects, including listing some (hopefully) robustly beneficial projects that could be pursued in this space. I also intend for some of my future work to be focused on making things more concrete. As an example, I'm hoping to spend some time during my ERA fellowship attempting to clarify what kinds of wisdom are most important for steering the world in positive directions. So whilst I think it is important to make this proposal more concrete, I'm not going to rush. Doing things well is often better than doing them as fast as possible. It took a long time for AI safety to move from abstract theoretical discussions to concrete empirical research and I expect it'll also take some time for these ideas to mature[77]. |
☞ Do you have specific projects that might be useful? - 📚Main List or expand for other lists
Yes, I created a list: Potentially Useful Projects in Wise AI . It contains a variety of projects from ones that would be marginally helpful to incredibly ambitious moonshots.
What other project lists exist?
Here are some project lists (or larger resources containing a project list) for wise AI or related areas:
• AI Impacts Essay Competition : Covers the automation of philosophy and wisdom.
• Fellowship on AI for Human Reasoning - Future of Life Foundation : AI tools for coordination and epistemics
• AI for Epistemics - Benjamin Todd - He writes: "The ideal founding team would cover the bases of: (i) forecasting / decision-making expertise (ii) AI expertise (iii) product and entrepreneurial skills and (iv) knowledge of an initial user-type. Though bear in mind that if you have a gap in one of these areas now, you could probably fill it within a year" and then provides a list of projects.
• Project ideas: Epistemics - Forethought - focuses on improving epistemics in general, not just AI solutions.
Do you have any advice for creating a startup in this space? - 📚Resources
See the AI Safety & Entrepreneurship wiki page for resources including articles, incubation programs, fiscal sponsorship and funding.
Is it really useful to make AI incrementally wiser? ✔️💯
AI being incrementally wiser might still be the difference between making a correct or incorrect decision at a key point.
Often several incremental improvements stack on top of each other, leading to a major advance.
Further, we can ask AI advisors to advise us on more ambitious projects to train wise AI. And the wiser our initial AI advisors are, the more likely this is to go well. That is, improvements that are initially incremental might be able to be leveraged to gain further improvements.
In the best case, this might kick off a (strong) positive feedback cycle (aka wisdom explosion).
Sure, you can make incremental progress. But is it really possible to train an AI to be wise in any deep way? 🤞✔️🤷
Possibly 🤷.
I'm not convinced it's harder than any of the other ambitious alignment agendas and we won't know how far we can go without giving it a serious effort. Is training an AI to be wise really harder than aligning it? If anything, it seems like a less stringent requirement.
Compare:
• Ambitious mechanistic interpretability aims to perfectly understand how a neural network works at the level of individual weights
• Agent foundations attempting to truly understand what concepts like agency, optimisation, decisions are values are at a fundamental level
• Davidad's Open Agency architecture attempting train AI's that come with proof certificates that an AI has less than a certain probability of having unwanted side-effects
Is it obvious that any of these are easier than training a truly wise AI advisor? I can't answer for you, but it isn't obvious to me.
Given the stakes, I think it is worth pursuing ambitious agendas anyway. Even if you think timelines are short, it's hard to justify holding a probability approaching 100%, so it makes sense for folk to be pursuing plans on different timelines.
I understand your point about all ambitious alignment proposals being extremely challenging, but do you have any specific ideas for training AI to be deeply wise (even if speculative)? ✅
This section provides a high-level overview of some of my more speculative ideas. Whilst these ideas are very far from fully developed, I nonetheless thought it was worthwhile sharing an early version of them. That said, I'm very much on the set of "let a hundred flowers bloom". Whilst I think the approach I propose is promising, I also think it's much more likely than not that someone else comes up with an even better idea.
I believe that imitation learning is greatly underrated. I have a strong intuition that using the standard ML approach for training wisdom will fail because of the traditional Goodhart's law reasons, except that it'll be worse because wisdom is such a fuzzy thing (Who the hell knows what wisdom really is?).
I feel that training a deeply wise AI system requires an objective function that we can optimise hard on. This naturally draws me towards imitation learning. It has its flaws, but it certainly seems like we could optimise much harder on an imitation function than with attempting to train wisdom directly.
Now, imitation learning is often thought of as weak and there's certainly some truth in this when we're just talking about an initial imitation model. However, this isn't a cap on the power of imitation learning, only a floor. There's nothing stopping us from training a bunch of such models and using amplification techniques such as debate, trees of agents or even iterated distillation and amplification. I expect there to be many other such techniques for amplifying your initial imitation models.
See An Overview of "Obvious" Approaches to Training Wise AI Advisors for more information. This post was the first half of my entry into the AI Impacts Essay Competition on the Automation of Wisdom and Philosophy which placed third.
Why mightn't I want to work on this?
I think it's important to consider whether you might not be the right person to make progress on this.
I mean — wisdom? Who the hell knows what that is?
I expect most people who try to gain traction on this to just end up confusing themselves and others. So, you probably shouldn't work on this unless you're a particularly good fit.
On the other hand, there are very strong selection effects where those who are fools think they are wise and those who are wise doubt their own wisdom.
I wish I had a good answer here. All I can say is to avoid both arrogance and performative modesty when reflecting on this question.
Many of the methods you've proposed are non-embodied. Doesn't wisdom require embodiment? 🤔❌🤷
There's a very simplistic version argument for this that proves too much: people have used this claim to argue that LLMs were never going to be able to learn to reason, whilst o3 seems to have conclusively disproven this. So I don't think that the standard version of this argument works.
It's also worth noting that in robotics, there are examples of zero-shot transfer to the real world. Even if wisdom isn't directly analogous, this suggests that large amounts of real-world experience might be less crucial than it appears at first glance.
All this said, I find it plausible that a more sophisticated version of this argument might post a greater challenge. Even if this were the case, I see no reason why we couldn't combine both embodied and non-embodied methods. So even if there was a more sophisticated version that ultimately turned out to be true, this still wouldn't demonstrated that research into non-embodied methods was pointless.
Coming soon 😉.
☞ Okay, you've convinced me now. Any chance you could repeat how to get involved so I don't have to scroll up? ✅📚
Sure 😊. Round 1️⃣2️⃣ 🔔, here we go 🪂!
As I said, if you're serious, please ✉️ PM me . If you think you might be serious, but need to talk it through, please reach out as well.
I'd be useful for me to know your background and how you think you could contribute. Maybe tell you a few facts about yourself, what interests you and drop a link to your LinkedIn profile?
Useful link:
• List of potentially useful projects : for those who want to know what could be done concretely. Scroll down further for a list of project lists ⬇️.
• Resources for founding an AI safety startup or non-profit : since I believe there should be multiple organisations pursuing this agenda
• Resources for getting started in AI Safety more broadly : since some of these resources might be useful here as well
Who are you? 👋
Hi, I'm Chris (Leong). I've studied maths, computer science, philosophy and psychology. I've been interested in AI safety for nearly a decade and I've participated in quite a large number of related opportunities.
With regards to the intersection of AI safety and wisdom:
I won third prize in the AI Impacts Competition on the Automation of Wisdom and Philosophy . It's divided into two parts:
• An Overview of “Obvious” Approaches to Training Wise AI Advisors
• Some Preliminary Notes on the Promise of a Wisdom Explosion
I recently ran an AI Safety Camp project on Wise AI Advisors .
More recently, I was invited to continue my research as a ERA Cambridge Fellow in Technical AI Governance.
Feel free to connect with me on LinkedIn (ideally mentioning your interest in wise AI advisors so I know to accept) or ✉️ PM me .
Hi, I'm Christopher Clay. I was previously a Non-Trivial Fellow and I participated in the Global Challenges Project.
Does anyone else think wise AI or wise AI advisors are important? ✔️💯
Yes, here are a few:
Imagining and Building Wise Machines: The centrality of AI metacognition
Authors: Yoshua Bengio, Igor Grossmann, Melanie Mitchell and Samuel Johnson[78] et al
Wise metacognition can lead to a virtuous cycle in AI, just as it does in humans. We may not know precisely what form wise AI will take—but it must surely be preferable to folly.
The paper mostly focuses on wise AI agents, however, not exclusively so:
Prof. Grossmann (personal correspondence): "I like the idea of wise advisors. I don't think the argument in our paper is against it -it all depends on how humans will use the technology (and there are several papers on the role of metacognition for discerning when to rely on decision-aids/AI advisors, too)."
AI Impacts Automation of Wisdom and Philosophy Competition
Organised by: Owen Cotton-Barrett
Judges: Andreas Stuhlmüller, Linh Chi Nguyen, Bradford Saad and David Manley[79]
competition announcement Wise AI Wednesdays post winners and judge's comments [80]
AI is likely to automate more and more categories of thinking with time.
By default, the direction the world goes in will be a result of the choices people make, and these choices will be informed by the best thinking available to them. People systematically make better, wiser choices when they understand more about issues, and when they are advised by deep and wise thinking.
Advanced AI will reshape the world, and create many new situations with potentially high-stakes decisions for people to make. To what degree people will understand these situations well enough to make wise choices remains to be seen. To some extent this will depend on how much good human thinking is devoted to these questions; but at some point it will probably depend crucially on how advanced, reliable, and widespread the automation of high-quality thinking about novel situations is[81]
Beyond Artificial Intelligence (AI): Exploring Artificial Wisdom (AW)
Authors: Dilip V Jest, Sarah A Graham, Tanya T Nguyen, Colin A Depp, Ellen E Lee, Ho-Cheol Kim
The ultimate goal of artificial intelligence (AI) is the development of technology that is best able to serve humanity and this will require advancements that go beyond the basic components of general intelligence. The term “intelligence” does not best represent the technological needs of advancing society, because intelligence alone does not guarantee well-being either for individuals or societies. It is not intelligence, but wisdom, that is associated with greater well-being, happiness, health, and perhaps even longevity of the individual and the society. Thus, the future need in technology is for artificial wisdom (AW)
What do you see as the strongest counterarguments against this general direction? 🔜🙏
I've discussed several possible counter-arguments above, but I'm going to hold off publicly posting about which ones seem strongest to me for now. I'm hoping this will increase diversity of thought and nerdsnipe people into helping red-team my proposal for me 🧑🔬👩🔬🎯. Sometimes with feedback there's a risk where if you say, "I'm particularly worried about these particular counter-arguments" you can redirect too much of the discussion onto those particular points, at the expense of other, perhaps stronger criticisms.
☞ Do you have any recommendations for further reading? ✅📚
I've created a weekly post series on Less Wrong called Wise AI Wednesdays .
Motivation
‘AI for societal uplift’ as a path to victory - LW Post: Examines the conditions in which a "societal uplift" - epistemics + co-ordination + institutional steering - might or might not lead to positive outcomes
N Stories of Impact for Wise AI Advisors - Draft 🏗️ : Different stories about how wise AI advisors could be useful for having a positive impact on the world.
Artificial Wisdom
Imagining and building wise machines: The centrality of AI metacognition by Johnson, Karimi, Bengio, et al. - paper , summary : This paper argues that wisdom involves two kinds of strategies (task-level strategies & metacognitive strategies). Since current AI is pretty good at the former, they argue that we should pursue the latter as a path to increasing AI wisdom.
Finding the Wisdom to Build Safe AI by Gordon Seidoh Worley - LW post : Seidoh talks about his own journey in becoming wiser through Zen and outlines a plan for building wise AI. In particular, he argues that it will be hard to produce wise AI without having a wise person to evaluate it.
Designing Artificial Wisdom: The Wise Workflow Research Organisation by Jordan Arel - EA forum post : Jordan proposes mapping the workflows within an organisation that is researching a topic like AI safety or existential risk. AI could be used to automate or augment parts of their work. This proportion would increase over time, with the hope being that this would eventually allow us to fully bootstrap an artificially wise system.
Should we just be building more datasets? by Gabriel Recchia - Substack : Argues that an underrated way of increasing the wisdom of AI systems would be building more datasets (whilst also acknowledging the risks).
Tentatively Against Making AIs 'wise' by Oscar Delany - EA forum post : This articles argues that insofar as wisdom is conceived of as being more intuitive than carefully reasoned pursuing AI wisdom would be a mistake as we need AI reasoning to be transparent. I've included this because it seems valuable to have at least one critical article.
Neighbouring Areas of Research
What's Important In "AI for Epistemics"? by Lukas Finnveden - Forethought : AI for Epistemics is a subtly different but overlapping area. It is close enough that this article is worth reading. It provides an overview of why you might want to work on this, heuristics for good interventions and concrete projects.
AI for AI Safety by Joe Carlsmith ( LW post ): Provides a strategic analysis of why AI for AI safety is important whether it's for making direct safety progress, evaluating risks, restraining capabilities or improving "backdrop capacity". Great diagrams.
AI Tools for Existential Security by Lizka Vaintrob and Owen Cotton-Barratt -- Forethought : Discusses how applications of AI can be used to reduce existential risks and suggests strategic implications.
Not Superintelligence: Supercoordination - forum post [82]: This article suggests that software-mediated supercoordination could be beneficial for steering the world in positive directions, but also identifies the possibility of this ending up as a "horrorshow".
Human Wisdom
Stanford Encyclopedia of Philosophy Article on Wisdom by Sharon Ryan - SEP article : SEP articles tend to be excellent, but also long and complicated. In contrast, this article maintains the excellence while being short and accessible.
Thirty Years of Psychology Wisdom Research: What We Know About the Correlates of an Ancient Concept by Dong, Weststrate and Fournier - paper : Provides an excellent overview of how different groups within psychology view wisdom.
The Quest for Artificial Wisdom by Sevilla - paper : This article outlines how wisdom is viewed in the Contemplative Sciences discipline. It has some discussion of how to apply this to AI, but much of this discussion seems outdated in light of the deep learning paradigm.
Applications to Governance
🏆 Wise AI support for government decision-making by Ashwin - Substack (Prize winning entry in the AI Impacts Automation of Wisdom and Philosophy Competition ): This article convinced me that it isn't too early to start trying to engage the government on wise AI. In particular, Ashwin considers the example of automating the Delphi process. He argues that even though you might begin by automating parts of the process, over time you could expand beyond this, for example, by helping the organisers figure out what questions they should be asking.
Some of my own work:
🏆 My third prize-winning entry in the AI Impacts Automation of Wisdom and Philosophy Competition (split into two parts):
Some Preliminary Notes on the Promise of a Wisdom Explosion : Defines a wisdom explosion as a recursive self-improvement feedback loop that enhances wisdom, unlike intelligence as per the more traditional intelligence explosion. Argues that wisdom tech is safer from a differential technology perspective.
An Overview of "Obvious" Approaches to Training Wise AI Advisors : Compares four different high-level approaches to training wise AI: direct training, imitation learning, attempting to understand what wisdom is at a deep principled level, the scattergun approach. One of the competition judges wrote: "I can imagine this being a handy resource to look at when thinking about how to train wisdom, both as a starting point, a refresher, and to double-check that one hasn’t forgotten anything important".
☞ What are some projects that exist in this space? — TODO
Automating the Delphi Method for Safety Cases — Philip Fox, Ketana Krishna, Tuneer Mondal, Ben Smith, Alejandro Tlaie — Ashwin and Michaelah Gertz-Billingsley proposed automating the Delphi Mehod as a foot-in-the door for getting the government to use wise AI. Well, turns out these folk at Arcadia Impact have done it!
⇢ "What's an metapost?" — Oh, I just made that term up. Essentially, it's just a megapost, but instead of being one long piece of text, it uses collapsible sections to keep the core post lightweight 🪶. In other words, it's both a megapost and a minipost at the same time 😉.
⇢ "But people will confuse this with a meta post?" — No space. Simple 🤷♂️.
⇢ This term happens to also be appropriate in a second sense; some aspects are inspired by metamodernism.
⇢ I jokingly refer to this as the "Party Edition" as it's designed to be shared at informal social gatherings 🕺.
⇢ "Party edition? Why the hell would you make a party edition!?" — I could you tell a story about how this will actually be impactful (HPMoR; Siliconversations; spreading ideas by going from person to person being massively underrated), but at the end of the day, I wanted to make it, so I made it 🤷♂️.
⇢ "But what is it?" — I've handcrafted this version for more casual contexts. I've tried to be more authentic and I hope you find it more engaging. That said, these are serious issues, so I'm also creating a "serious edition" for situations for contexts where sharing a post like this might come off as disrespectful 🎩. Otherwise: 🥳.
⇢ "That makes sense, but are the lame jokes really necessary?" — Yes, they are most necessary. "When the novice attained the rank of grad student, he took the name Bouzo and would only discuss rationality while wearing a clown suit" 🤡🎩
"Why put so much effort into a Less Wrong post?" — 🟥🍉
"Regardless" — Yes. One must imagine Sisyphus happy... ❤️🦉.
⇢ "I see in your eyes the same fear that would take the heart of me. A day may come when the courage of men fails, when we forsake our friends and break all bonds of fellowship, but it is not this day. An hour of wolves and shattered shields, when the age of men comes crashing down, but it is not this day! This day we fight!! By all that you hold dear on this good Earth, I bid you stand" 💍🌋🛡️ — ▶️
⇢ "If we as humans ever face extinction via alien invasion, I nominate Aragorn to come to life and lead mankind" — @jlop6822 (Youtube comment)
"If I see a situation pointed south, I can't ignore it. Sometimes I wish I could" — Steve Rogers 🛡️🇺🇸
It's really just this version that is the passion project. The "Formal Version" is necessary and the right version for certain contexts, but that format also loses something 💔.
"We will call this discourse, oscillating between a modern enthusiasm and a postmodern irony, metamodernism... New generations of artists increasingly abandon the aesthetic precepts of deconstruction, parataxis, and pastiche in favor of aesth-ethical notions of reconstruction, myth, and metaxis... History, it seems, is moving rapidly beyond its all too hastily proclaimed end" — Notes on metamodernism
I'm using the the term "alignment proposal" in an extremely general sense, particularly "how we could save the world", rather than the narrow sense of achieving good outcomes by specifically aligning models. That said, wise AI advisors could assist with aligning models and they also provide us with a generalised version of the alignment problem (discussed later in this post) 🌏 → 🌞.
"It is only as an aesthetic phenomenon that existence and the world are eternally justified" — ❤️🦉🔨.
Is naiviety always bad? Is there some kind of universal rule? Or are there exceptions? Perhaps there are situations where a certain form of naiviety can form a self-fulfilling prophecy (or hyperstition) 🔮🪞.
"Of all that is written, I love only what a man has written with his blood. Write with blood, and you will experience that blood is spirit" — ❤️🦉🔨.
Unfortunately, I've forgotten the names of many people who provided me with useful feedback 😔.
Commenting on the parallels between the Manhattan project and AI on This Past Weekend with Theo Von ☢️🤖.
Source: NeurIPS’2019 workshop on Fairness and Ethics, Vancouver, BC, On the Wisdom Race, December 13th, 2019 🦉🏎️🏁
One reason why this arrangement of quotes amuses me is because the Sam Altman quote was actually in response to the host specifically asking Sam about what he thought about Yoshua Bengio's concerns 😂.
"Centuries" — I wish 😭.
🧗♀️
🗿👑☢️
👦💣💥
📺: 🪐🔭
⛪: 🤲🥣 → 🦁🌱
🇺🇳: 👨💻, ⭐⭐⭐⭐⭐
👨🔬👨🦼: 🌬️🕳️
Salvia Ego Ipse, Philosopher of AI, Liber magnus, sapiens et praetiosus philosophiae ÷
"But what if an equivalent increase in wisdom turns out to be impossible?" — This is left as an exercise to the reader 😱.
I refer to this as the SUV Triad 🫰
⇢ I refer to this the Wisdom-Capabilities Gap or simply the Wisdom Gap.
⇢ "Does it make sense to assume that greater capabilities require greater wisdom to handle?" — Yes, I believe it does. I'll argue for this in the Core Argument 🔜.
I may still decide to weaken this claim. This is a draft post after all 🙏.
⇢ "Then why is this section first?": You might find it hard to believe, but there's actually folks who'd freak out if I didn't do this 🔱🫣. Yep... 🤷♂️
⇢ I recommend diving straight into the core argument, and seeing if you agree with it when you substitute your own understanding of what wisdom is. I promise you 💍, I'm not doing this arbitrarily. However, if you really want to read the definitions first, then you should feel free to just go for it 🧘👌.
I also believe that this increases our chances of being able to set up a positive feedback loop 🤞🤞🤞 of increasing wisdom (aka "wisdom explosion" 🦉🎆) compared to strategies that don't leverage humans. But I don't want to derail the discussion by delving into this right now 🐇🕳️.
I'd like to think of myself as being the kind of person who's sensible enough to avoid getting derailed by small or irrelevant details 🤞🤞. Nonetheless, I can recall countless a few times when I've failed at this in the past 🚇💥. Honestly, I wouldn't be surprised if many people were in the same boat 🧑🤝🧑🧑🤝🧑🧑🤝🧑🌏🤞.
I'm using emojis to provide a visual indication of my answer without needing to expand the section 🦸🩻.
This post is being written collaboratively by Chris Leong (primary author) and Christopher Clay (second author) in the voice of Chris Leong 🤔🤝✍️. The blame for any cringe in the footnotes can be pinned on Chris Leong 🎯.
⇢ I believe that good communication involves being as open about the language game ❤️🦉 being being played as possible 🫶.
Given this, I want to acknowledge that this post is not intended to deliver a perfectly balanced analysis, but rather to lay out an optimistic case for wise AI advisors 🌅. In fact, feel free to think of it as a kind of manifesto 📖🚪📜📌.
I suspect that many, if not most, readers of Less Wrong will be worried that this must necessarily come at the expense of truth-seeking 🧘. In fact, I probably would have agree with this not too long ago. However, recently I've been resonating a lot more with the idea that there's value in all kinds of language games, so long as they are deployed in the right context and pursued in the right way 🤔[83]. Most communication shouldn't be a manifesto, but the optimal number of manifestos isn't zero (❗).
⇢ Stone-cold rationality is important, but so is the ability to dream 😴💭🎑. It's important to be able to see clearly, but also to lay out a vision 🔭✨. These are in tension, sure, but not in contradiction ⚖️. Synthesis is hard, but not impossible 🧗.
⇢ If you're interested in further discussion, see the next footnote...
⇢ "Two footnotes on the exact same point? This is madness!": No, this is Sparta! 🌍⛰️🐰🕳️
Spartans don't quit and neither do we 💪. So here it is: Why it makes sense to lay out an optimistic case, Round 🔔🔔.
Who doesn't love a second bite at the apple? 👸🏻🍏💀😴
⇢ Adopting a lens of optimism unlocks creativity by preventing you from discarding ideas too early (conversely, a lens of pessimism aids with red-teaming by preventing you from discarding objections too easily). It increases the chance that bad ideas make it through your initial filter, but these can always be filtered further down the line. I believe that such a lens is the right choice for now, given that this area of thoughtspace is quite neglected. Exploration is a higher priority than filtration 🌬️⛵🌱.
⇢ "But you're biasing people toward optimism": I take this seriously 😔🤔. At the same time, I don't believe that readers need to be constantly handheld lest they form the wrong inference. Surely, it's worth risking some small amount of bias to avoid coming off as overly paternalistic?
Should texts be primarily written for the benefit of those who can't apply critical thinking? To be honest, that idea feels rather strange to me 🙃. I think it's important to primarily write for the benefit of your target audience and I don't think that the readers I care about most are just going to absorb the text wholesale, without applying any further skepticism or cognitive processing 🤔. Trusting the reader has risks, sure, but I believe there are many people for whom this is both warranted and deserved 🧭⛵🌞.
"Why all the emojis?": All blame belongs to Chris Leong 🎯.
"Not who deserves to be shot, but why are you using them?": A few different reasons: they let me bring back some of the expressiveness that's possible in verbal conversation, but which is typically absent in written communication 🥹😱, and striking a more casual tone helps differentiate it from any more academic or rigorous treatment (😉) that might hypothetically come later 🤠🛤️🎓 and they help keep the reader interested 🎁.
But honestly: I mostly just feel drawn to the challenge. The frame "both/and" has been resonating a lot with me recently and one such way is that I've been feeling drawn towards trying to produce content that manages to be both fun and serious at the same time 🥳⚖️🎩.
I'm starting to wonder if this point is actually true. This is a NOTE TO MYSELF to review this claim.
"Why can't we afford to leave anything on the table?" - The task we face is extremely challenging, the stakes sky high, it's extremely unclear what would have to happen for AI to go well, we have no easy way on gaining clarity on what would have to happen and we're kind of in an AI arms race which basically precludes us from fully thinking everything through 😅⏳⏰.
"But what about prioritisation?" — Think Wittgenstein. I'm asserting the need for a mindset, a way of being 🧘🏃♂️🏋️.
"But I don't understand you" — If a lion could speak, we could not understand him ❤️🦉🕵️.
This section is currently being edited, so it may take more than 3 minutes. Sorry 🙏!
I selected this name because it's easy to say and remember, but in my heart it'll always be the MST/MSU/MSV trifecta (minimally spare time, massive strategic uncertainty, many scary vulnerabilities 💔🚢🌊🎶. I greatly appreciate Will MacAskill's recommendation that I simplify the name 🙏.
I don't think I'm exactly arguing anything super original or unprecedented here 😭😱, but I don't know if I've ever really seen anyone really go hard on this shape of argument before 🌊🌊🌊.
Some of these risks could directly cause catastrophe. Others might indirectly lead to this via undermining vital societal infrastructure such that we are then exposed to other threats 🕷️🕸️.
One of the biggest challenges here is that AI throws so many civilisational scale challenges towards us at once. Humanity can deal with civilisational scale challenges, but global co-ordination is hard and when there's so many different issues to deal with its hard to give each problem the proper attention it deserves 😅🌫️💣💣💣.
I'm intentionally painting these questions as more of a binary than they actually are to help illustrate the range of opinions.
Some of this is deserved, imho 🤷♂️.
In many ways, the figuring out how to act part of the problem is the much harder component 🧭🌫️🪨. Given perfect co-ordination between nation states, these issues could almost certainly be resolved quite quickly, but given there's no agreement on what needs to be done and politics being the mind-killer, I wouldn't hold your breadth... 😯😶🐌🚀🚀🚀
The exponential increase in task-length actually generalises beyond coding.
What's more impressive is how it was solved: "Why am I excited about IMO results we just published: - we did very little IMO-specific work, we just keep training general models - all natural language proofs - no evaluation harness We needed a new research breakthrough and @alexwei_ and team delivered"
https://x.com/MillionInt/status/1946551400365994077 🤯
Some people have complained that the students used in this experiment were underqualified as judges, but I'm sure there'll be a follow-up next year that addresses this issue 🤷♂️.
⇢ Clark: "Sir, this is a Less Wrong. You can't swear here" — In order for a film to get a 12A or PG-13 rating, it cannot contain gratuitous use of profanity. One 'f#@k' is allowed, anymore and it automatically becomes a 15 or R rated" — See, PG-13.
⇢ Jax: "Censorship is terrible and the belief that swearing hurts kids is utterly absurd. Do do you think they've forgotten what it was like as a kid?" — Not taking sides, but restrictions aren't always negative. Sometimes limiting a resource encourages people to use that resource to maximum effect 🎯.
See Life Itself's article on the "wisdom gap" which is the terminology I prefer in informal contexts. 10 Billion uses the term capabilities-wisdom gap, which I've reversed in order to emphasis wisdom. I prefer this version for more formal contexts 📖😕.
I don't want to dismiss the value of addressing specific issues in terms of buying us time 🔄.
I recommend finishing the post first (don't feel obligated to expand all the sections) 🙏.
This paper argues that robustness, explainability and cooperation all facilitate each other 🤞💪. In case the link to wisdom is confusing (🦉❓), the authors argue that metacognition is both key to wisdom and key to these capabilities ☝️.
This argument could definitely be more rigorous, however, this response is addressed to optimists and this it is especially likely to be true if we adopt an optimistic stance on technical and social alignment. So I feel fine leaving it as is 🤞🙏.
⚠️ If timelines are extremely short, then many of my proposals might become irrelevant because there won't be time to develop them 😭⏳. We might be limited to simple things like tweaking the system prompt or developing a new user interface, as opposed to some of the more ambitious approaches that might require changing how we train our models. However, there's a lot of uncertainty with timelines, so more ambitious projects might make sense anyway. 🤔🌫️🤞🤷.
"If your proposal might even work better over long timelines, why does your core argument focus on short timelines 🤔⁉️": There are multiple arguments for developing wise AI advisors and I wanted to focus on one that was particularly legible 🔤🙏.
Andrew Critch has labelled this - "pivoting" civilisation across multiple acts across multiple persons and institutions - a pivotal process. He compares this favourably to the concept of a pivotal act - "pivoting" civilisation in a single action - on the basis of it being easier, safer and more legitimate 🗺️🌬️⛵️⛵️⛵️🌎.
I say "forget X" partly in jest. My point is that if there's a realistic chance of realising stronger possibility, then it would likely be much more significant than the weaker possibility 👁️⚽.
I think there's a strong case that if these advisers can help you gain allies at all, they can help you gain many allies 🎺🪂🪂🪂.
In So You Want To Make Marginal Progress..., John Wentworth argues that "when we don't already know how to solve the main bottleneck(s)... a solution must generalize to work well with the whole wide class of possible solutions to other subproblems... (else) most likely, your contribution will not be small; it will be completely worthless" 🗑️‼️.
My default assumption is that people will be making further use of AI advice going forwards. But that doesn't contradict my key point, that certain strategies may become viable if we "go hard on" producing wise AI advisors 🗺️🔓🌄.
"But "reckless and unwise" and "responsible & wise" aren't the only possible combinations!" - True 👍, but they co-occur sufficiently often that I think this is okay for a high-level analysis 🤔🤞🙏.
They could also speed up the ability of malicious and "reckless & unwise" actors to engage in destructive actions, however I still think such advisors would most likely to improve the comparative situation since many defences and precautions can be put in place before they are needed 🤔🤞💪.
This argument is developed further in Some Preliminary Notes on the Promise of a Wisdom Explosion 🦉🎆📄.
I agree it's highly unlikely that we could convince all the major labs to pursue a wisdom explosion instead of an intelligence explosion 🙇. Nonetheless, I don't think this renders the question irrelevant ☝️. Let's suppose (😴💭) it really were the case that the transition to AGI was would most likely go well if society had decided to pursue a wisdom explosion rather than an intelligence explosion ‼️. I don't know, but that seems like it'd be the kind of thing that would be pretty darn useful to know 🤷♂️. My experience with a lot of things is that it's a lot easier to find additional solutions than it is to find the first. In other words, solutions aren't just valuable for their potential to be implemented, but because of what they tell us about the shape of the problem 🗝️🚪🌎.
How can we tell which framings are likely to be fruitful and ones are likely to be not 👩🦰🍎🐍🌳? This isn't an easy question, but my intuition is that a framing more likely to be worth pursuing if they lead to questions that feel like they should (❓) be being asked, but have been neglected due to unconscious framing effects 🙈🖼️. In contrast, a framing is less likely to be fruitful if it asks questions that seem interesting prima facie, but for which the best researchers within the existing paradigm have good answers for, even if most researchers do not 🏜️✨.
I acknowledge that "intelligence" is only rather loosely connect with the capabilities AI systems have in practise. Factors like prestige, economic demand and tractability play a larger role in terms of what capabilities are developed than the name used to describe the area of research. Nonethless, I think it'd be hard to argue that the framing effect hasn't exterted any influence. So I think it is worthwhile understanding how this may have shaped the field and whether this influence has been for the best 🕵️🤔.
A common way that an incorrect paradigm can persist is if existing researchers don't have good answers to particular questions, but see them as unimportant 🙄👉.
I have an intuition that it's much more valuable for the AI safety and governance community to be generating talent with a distinct skillset/strengths than to just be recruiting more folk like those we already have 🧩. As we attract more people with the same skillset we likely experience decreasing marginal returns due to the highest impact roles already being filled 🙅♂️📉.
That said, given how fast AI development is going, I'm hoping the process of concretisation can be significantly sped up. Fortunately, I think it'll be easier to do it this time as I've already seen what the process looked like for alignment 🔭🚀🤞.
First author 🥇.
Wei Dai who was originally announced as a judge withdrew 🕳️.
Including me 🏆🍀😊.
They write: "The precise opinions expressed in this post should not be taken as institutional views of AI Impacts, but as approximate views of the competition organizers" ☝️👌.
Linking to this article does not constitute an endorsement of Sofiechan or any other views shared there. Unfortunately, I am not aware of other discussions of this concept 😭🤷, so I will keep this link in this post temporarily 🙏🔜.
You could even say: with the right person, and to the right degree, and at the right time, and for the right purpose, and in the right way ❤️🦉.
Thanks for sharing your thoughts.
I agree that humans with wise AI advisors is a more promising approach, at least at first, then attempting to directly program wisdom into an autonomously acting agent.
Beyond that, I personally haven't made up my mind yet about the best way to use wisdom tech.