***Epistemic status: personal experience***
In a number of prior posts, and in ARCHES, I’ve argued that more existential safety consideration is needed on the topic of multi-principal/multi-agent (multi/multi) dynamics among powerful AI systems.
In general, I have found it much more difficult to convince thinkers within and around LessWrong’s readership base to attend to multi/multi dynamics, as opposed to, say, convincing generally morally conscious AI researchers who are not (yet) closely associated with the effective altruism or rationality communities.
Because EA/rationality discourse is particularly concerned with maintaining good epistemic processes, I think it would be easy to conclude from this state of affairs that
[Epistemic status: slightly rambly, mostly personal intuition and opinion that will probably be experimentally proven wrong within a year considering how fast stuff moves in this field]
This post is also available on my personal blog.
Thanks to Gwern Branwen, Steven Byrnes, Dan Hendrycks, Connor Leahy, Adam Shimi, Kyle and Laria for the insightful discussions and feedback.
By now, most of you have probably heard about GPT-3 and what it does. There’s been a bunch of different opinions on what it means for alignment, and this post is yet another opinion from a slightly different perspective.
Some background: I'm a part of EleutherAI, a decentralized research collective (read: glorified discord server - come join us on Discord for ML, alignment, and dank memes). We're best known for our ongoing effort to...
I think a major crux is that the things you couldn't impart on Mary through language (assuming that such things do exist) would be wishy-washy stuff like qualia whose existence, for a nonhuman system modelling humans, essentially doesn't matter for predictive accuracy. In other words, a universe where Mary does learn something new and a universe where she doesn't are essentially indistinguishable from the outside, so whether it shows up in world models is irrelevant.
Note: The information provided below is not medical advice, and should not be treated as such. Please seek the advice of your physician with any questions you may have regarding a medical condition.
I believe a new form of psychotherapy has been found that is significantly more effective than more conventional therapies such as CBT. Despite this, it is unlikely to replace these in the near term. One of the many reasons for this is that the most complete source is found in a relatively obscure podcast, and it takes listening through hundreds of hour-long episodes in order to fully grasp how radically different it is from more conventional schools of therapy. However, the time investment is worth it: my own moods have improved drastically, and my life...
Awesome, really glad that you've found the episodes helpful! I have also found that the live sessions focused on relationship issues to be some of the most enlightening ones.
If you haven't already found them, there are several more episodes on the same theme. For example, you might be interested in listening to the ones with Mark:
Live Session (Mark) — Introduction & Testing (Part 1)
and the session with Brian:
Anger in Marriage: The Five Secrets Revisited
Suppose we’re working on some delightfully Hard problem - genetically engineering a manticore, or terraforming Mars, or aligning random ML models. We need very top tier collaborators - people who are very good at a whole bunch of different things. The more they’re good at, and the better they are, the better the chances of success for the whole project.
There’s two main ways to end up with collaborators with outstanding skill/knowledge/talent in many things: selection or training. Selection is how most job recruitment works: test people to see if they already have (some of) the skills we’re looking for. Training instead starts with people who don’t have (all of) the skills, and installs them de novo.
Key point of this post: selection does not scale well with the...
I disagree with the premise. The vast majority of selection is extremely parallelizable. In terms of humans, self-selection does most of the work - we don't even have to consider 99.999% of people for most of our collaboration. Or if we want (and can afford/attract) the best in the world, considering everyone, we set it up so they select among themselves for the first dozen levels of filter.
Training is almost always individual, and non-scalable by it's nature.
In truth, the mechanisms work together - a few layers of selection to get the most promising interested in training, then additional mixes of training and selection until the greatest at something are pretty damn great.
(This is not (quite) just a re-hashing of the homunculus fallacy.)
I'm contemplating what it would mean for machine learning models such as GPT-3 to be honest with us. Honesty involves conveying your subjective experience... but what does it mean for a machine learning model to accurately convey its subjective experience to us?
You've probably seen an optical illusion like this:

You've probably also heard an explanation something like this:
"We don't see the actual colors of objects. Instead, the brain adjusts colors for us, based on surrounding lighting cues, to approximate the surface pigmentation. In this example, it leads us astray, because what...
Somewhat along the lines of what TAG said, I would respond that this does seem pretty related to what is going on, but it's not clear that all models with room for an experiencer make that experiencer out to be a homunculus in a problematic way.
If we make "experience" something like the output of our world-model, then it would seem necessarily non-physical, as it never interacts.
But we might find that we can give it other roles.
This is the script of the Rational Animation video linked above, with a few minor edits and additions. I really like how the animations came out in this one, so if you are curious follow the link. If you only care about the arguments you can just read. Most of the images here are taken from the video.
If you honestly seek truth, and if you decide to tell the truth, at some point, you will accept to appear cringe to the eyes of most people. Why is that? Simply because truth may be cringe means that at some point, you will encounter a truth that other people are disgusted by, and if you decide to tell it, you will be associated with cringe.
Is it a necessity for...
Yeah, they seem similar, but "ugh fields" are more individual, while "cringe" is more social.
In Why don't long running conversations happen on LessWrong? adamzerner writes:
Here is how things currently work:
- Someone writes a post.
- It lingers around the front page for a few days. During this time, conversations emerge in the comments section.
- After a few days, the post no longer persists on the front page and conversations largely fizzle out.
I'd like to try and have a longer polymath project style collaboration focused on answering a question together. Instead of each person working to give their individual answers to the question, we'd come up with an answer together through an extended discussion.
When/if we've reached some sort of milestone by either answering the question, or making some interesting progress towards answering the question, we show the results of our collaboration to lesswrong in the form...
Thanks for writing this up! It's a good idea and a thing worth experimenting with.
Since August 2020 I've been recording conversations with brilliant and insightful rationalists, effective altruists (and people adjacent to or otherwise connected somehow to those communities). If you're an avid reader of this site, I suspect you will recognize many of the names of those I've spoken to.
Since I suspect some LessWrong readers will appreciate these conversations, here is a curated list with links, organized by the LessWrong relevant topics we cover in each conversation. All of these conversations can also be found by searching for "Clearer Thinking" in just about any podcast app. If there are other people you'd like to see me record conversations with, please nominate them in the comments! The format is that I invite each guest to bring 4 or 5 "ideas that...
Also one Spencer recorded with me: "Lines of Retreat and Incomplete Maps". Not sure why it isn't above; maybe was from earlier than the ones listed.
I would like to ask you to share in the comments what do you usually do when not working that provides a rest with good quality. Personally, I found that if I enjoy something a lot it is hard to stop (like a very interesting book). And if I don't enjoy it, then it is kind of no point in doing it to have rest. I will appreciate both short (like "Hiking") and long comments. Thank you!
Trying to get the gist of this post... There's the broad sweep of AI research across the decades, up to our contemporary era of deep learning, AlphaGo, GPT-3. In the 2000s, the school of thought associated with Eliezer, MIRI, and Less Wrong came into being. It was a pioneer in AI safety, but its core philosophy of preparing for the first comprehensively superhuman AI, remains idiosyncratic in a world focused on more specialized forms of AI.
There is a quote from Eliezer talking about "AI alignment" research, which would be that part of AI safety conce... (read more)