Recent Discussion

Where I’m coming from

***Epistemic status: personal experience***

In a number of prior posts, and in ARCHES, I’ve argued that more existential safety consideration is needed on the topic of multi-principal/multi-agent (multi/multi) dynamics among powerful AI systems.  

In general, I have found it much more difficult to convince thinkers within and around LessWrong’s readership base to attend to multi/multi dynamics, as opposed to, say, convincing generally morally conscious AI researchers who are not (yet) closely associated with the effective altruism or rationality communities.  

Because EA/rationality discourse is particularly concerned with maintaining good epistemic processes, I think it would be easy to conclude from this state of affairs that

  • multi/multi dynamics are not important (because communities with great concern for epistemic process do not care about them much), and
  • AI researchers who do care
...

Trying to get the gist of this post... There's the broad sweep of AI research across the decades, up to our contemporary era of deep learning, AlphaGo, GPT-3. In the 2000s, the school of thought associated with Eliezer, MIRI, and Less Wrong came into being. It was a pioneer in AI safety, but its core philosophy of preparing for the first comprehensively superhuman AI, remains idiosyncratic in a world focused on more specialized forms of AI. 

There is a quote from Eliezer talking about "AI alignment" research, which would be that part of AI safety conce... (read more)

10paulfchristiano5hSpeaking for myself, I'm definitely excited about improving cooperation/bargaining/etc., and I think that working on technical problems could be a cost-effective way to help with that. I don't think it's obvious without really getting into the details whether this is more or less leveraged than technical alignment research. To the extent we disagree it's about particular claims/arguments and I don't think disagreements can be easily explained by a high-level aversion to political thinking. (Clarifying in case I represent a meaningful part of the LW pushback, or in case other people are in a similar boat to me.)
2Vanessa Kosoy7hYou meant to say, biased against that possibility?
2Kaj_Sotala6hOops, yeah. Edited.

[Epistemic status: slightly rambly, mostly personal intuition and opinion that will probably be experimentally proven wrong within a year considering how fast stuff moves in this field]

This post is also available on my personal blog.

Thanks to Gwern Branwen, Steven Byrnes, Dan Hendrycks, Connor Leahy, Adam Shimi, Kyle and Laria for the insightful discussions and feedback.

Background

By now, most of you have probably heard about GPT-3 and what it does. There’s been a bunch of different opinions on what it means for alignment, and this post is yet another opinion from a slightly different perspective.

Some background: I'm a part of EleutherAI, a decentralized research collective (read: glorified discord server - come join us on Discord for ML, alignment, and dank memes). We're best known for our ongoing effort to...

2Dustin1hTangential, but the following sentence caused me to think of something... This nicely interleaves with something I've recently been discussing with my 11 year old daughter. She's discovered Mary's Room. On her own she came to this conclusion: (One of?) The reason(s) Mary learns something new when she sees the color red for the first time is because language is bad at conveying mental states. I think this idea is based upon the same foundation as many of the best rebuttals of Mary having learned something new. So, I found myself agreeing with the sentence I quoted above, but I'll have to think some more about how to reconcile that with my daughter's thoughts about language being bad.

I think a major crux is that the things you couldn't impart on Mary through language (assuming that such things do exist) would be wishy-washy stuff like qualia whose existence, for a nonhuman system modelling humans, essentially doesn't matter for predictive accuracy. In other words, a universe where Mary does learn something new and a universe where she doesn't are essentially indistinguishable from the outside, so whether it shows up in world models is irrelevant.

Note: The information provided below is not medical advice, and should not be treated as such. Please seek the advice of your physician with any questions you may have regarding a medical condition.

I believe a new form of psychotherapy has been found that is significantly more effective than more conventional therapies such as CBT. Despite this, it is unlikely to replace these in the near term. One of the many reasons for this is that the most complete source is found in a relatively obscure podcast, and it takes listening through hundreds of hour-long episodes in order to fully grasp how radically different it is from more conventional schools of therapy. However, the time investment is worth it: my own moods have improved drastically, and my life...

3Ilemauzar4hThank you, I had bought David Burn's "Feeling Good" book years ago and it was helpful at the time. Unfortunately, I seemingly lost the skills I had learned (or possibly I never truly learned them?) in my later years. I listened to a few episodes (including the Live Sessions with Lee) and I am really enjoying it. My main focus is improving my relationship, so the effective communication and focus on empathy are especially interesting to me.

Awesome, really glad that you've found the episodes helpful! I have also found that the live sessions focused on relationship issues to be some of the most enlightening ones.  

If you haven't already found them, there are several more episodes on the same theme. For example, you might be interested in listening to the ones with Mark:
Live Session (Mark) — Introduction & Testing (Part 1)

and the session with Brian:
Anger in Marriage: The Five Secrets Revisited
 

Suppose we’re working on some delightfully Hard problem - genetically engineering a manticore, or terraforming Mars, or aligning random ML models. We need very top tier collaborators - people who are very good at a whole bunch of different things. The more they’re good at, and the better they are, the better the chances of success for the whole project.

There’s two main ways to end up with collaborators with outstanding skill/knowledge/talent in many things: selection or training. Selection is how most job recruitment works: test people to see if they already have (some of) the skills we’re looking for. Training instead starts with people who don’t have (all of) the skills, and installs them de novo.

Key point of this post: selection does not scale well with the...

I disagree with the premise.  The vast majority of selection is extremely parallelizable.  In terms of humans, self-selection does most of the work - we don't even have to consider 99.999% of people for most of our collaboration.  Or if we want (and can afford/attract) the best in the world, considering everyone, we set it up so they select among themselves for the first dozen levels of filter.   

Training is almost always individual, and non-scalable by it's nature.

In truth, the mechanisms work together - a few layers of selection to get the most promising interested in training, then additional mixes of training and selection until the greatest at something are pretty damn great.

4romeostevensit3hI have the sense that training happens out in the tails via the mechanism of lineage. Lineage holders get some selection power and might be doing something inscrutable with it, but it's not like they can cast a net for PhD candidates arbitrarily wide so they must be doing some training or we wouldn't see the concentration of results we do. The main issue with this seems to be that it is very expensive. If I have only 10 people I think can do top tier work it is very costly to test hypotheses that involve them spending time doing things other than top tier work. Suggestion: find ways for candidates to work closely with top tier people such that it doesn't distract those people too much. Look at how intellectual lineages do this and assume that some of it looks dumb on the surface.
2johnswentworth2hIn particular, I currently think an apprenticeship-like model is the best starting point for experiments along these lines. Eli [https://www.lesswrong.com/users/elityre] also recently pointed out to me that this lines up well with Bloom's two-sigma problem [https://en.wikipedia.org/wiki/Bloom%27s_2_sigma_problem]: one-on-one tutoring works ~two standard deviations better than basically anything else in education.
7johnswentworth5hStrongly agree with this. Good explanation, too.

(This is not (quite) just a re-hashing of the homunculus fallacy.)

I'm contemplating what it would mean for machine learning models such as GPT-3 to be honest with us. Honesty involves conveying your subjective experience... but what does it mean for a machine learning model to accurately convey its subjective experience to us?

You've probably seen an optical illusion like this:

The checker shadow illusion. Although square A appears a darker shade of gray than square B, in the image the two have exactly the same luminance. Source: Wikipedia

You've probably also heard an explanation something like this:

"We don't see the actual colors of objects. Instead, the brain adjusts colors for us, based on surrounding lighting cues, to approximate the surface pigmentation. In this example, it leads us astray, because what...

Somewhat along the lines of what TAG said, I would respond that this does seem pretty related to what is going on, but it's not clear that all models with room for an experiencer make that experiencer out to be a homunculus in a problematic way.

If we make "experience" something like the output of our world-model, then it would seem necessarily non-physical, as it never interacts.

But we might find that we can give it other roles.

2abramdemski1hIn your model, do you think there's some sort of confused query-substitution going on, where we (at some level) confuse "is the color patch darker" with "is the square of the checkerboard darker"? Because for me, the actual color patch (usually) seems darker, and I perceive myself as being able to distinguish that query. Do the credences simply lack that distinction or something? More generally, my correction to your credences/assertions model would be to point out that (in very specific ways) the assertions can end up "smarter". Specifically, I think assertions are better at making crisp distinctions and better at logical reasoning. This puts assertions in a weird position.
2abramdemski1hWell, I'm saying "we perceive" here evokes a mental model where "we" (a homunculus, or more charitably, a part of our brain) get a corrected image. But I don't think this is what is happening. Instead, "we" get a more sophisticated data-stream which interprets the image. With color perception, we typically settle for a simple story where rods and cones translate light into a specific color-space. But the real story is far more complicated. The brain translates things into many different color spaces, at different stages of processing. At some point it probably doesn't make sense to speak in terms of color spaces any more (when we're knee deep in higher-level features). The challenge is to speak about this in a sensible way without saying wrong-headed things. We don't want to just reduce to what's physically going on; we also want to recover whatever is worth recovering of our talk of "experiencing". Another example: perspective warps 3D space into 2D space in a particular way. But actually, the retina is curved, which changes the perspective mapping somewhat. Naively, we might think this will change which lines we perceive to be straight near the periphery of our vision. But should it? We have lived all our lives with a curved retina. Should we not have learned what lines are straight through experience? I'm not predicting positively that we will be correct about what's curved/straight in our peripheral vision; I'm trying to point out that getting it wrong in the way the naive math of curved retinas suggests would require some very specific wrong machinery in the brain, such that it's not obvious evolution would put it there. Moreover, like how we transform through many different color spaces in visual processing, we might transform through many different space-spaces too (if that makes sense!). Just because an image is projected on the retina in a slightly wonky way doesn't mean that's final. I'm trying to point in the direction of thinking about this kind of
2abramdemski1hWhy is a query represented as an overconfident false belief? How would you query low-level details from a high-level node? Don't the hierarchically high-up nodes represent things which range over longer distances in space/time, eliding low-level details like lines?
This is a linkpost for https://youtu.be/yhQreMUO7LE

This is the script of the Rational Animation video linked above, with a few minor edits and additions. I really like how the animations came out in this one, so if you are curious follow the link. If you only care about the arguments you can just read. Most of the images here are taken from the video.

If you honestly seek truth, and if you decide to tell the truth, at some point, you will accept to appear cringe to the eyes of most people. Why is that? Simply because truth may be cringe means that at some point, you will encounter a truth that other people are disgusted by, and if you decide to tell it, you will be associated with cringe.

Is it a necessity for...

Yeah, they seem similar, but "ugh fields" are more individual, while "cringe" is more social.

3Rana Dexsin6hWhy would this happen rather than "you would be pushed into experiences and thought patterns that resolve the highly path-dependent determination of which things you enjoy in a way that makes it smoother for you to bond with the social groups around you"? Here I'm not asserting that the "deny" form is (generally) false, only that it doesn't seem clearly true. Experiential evidence suggests that there's a wide gradient, and indeed that alief-preconceptions about how much of one's identity is social themselves skew the gradient a lot.
5Lukas_Gloor13hAccording to my intuitions about cringiness, it's more about how people say things than what they say. E.g., discussions on inter-group differences in IQ are frequently really cringy when they happen on some culture war subreddit, but they can be fine (to my ears) when it's Sam Harris talking to a guest on his podcast. I guess you might reply that this effect is just: Sam Harris has a professional podcast and is already established, whereas redditors will seem like social outcasts when they discuss the same ideas? But I don't think that's what's going on. I feel like it's mostly the way a topic is addressed (framed, put into appropriate context, interpreted), and if I took the time I could point out various reasons why I think the reddit discussions are cringy. (Here's a list of things to get started [https://slatestarcodex.com/2019/07/04/style-guide-not-sounding-like-an-evil-robot/] .) I'd say you can always say true and important things without sounding cringy! (According to my cringingness intuitions, that is.)

In Why don't long running conversations happen on LessWrong? adamzerner writes:

 Here is how things currently work:

  • Someone writes a post.
  • It lingers around the front page for a few days. During this time, conversations emerge in the comments section.
  • After a few days, the post no longer persists on the front page and conversations largely fizzle out.

 

I'd like to try and have a longer polymath project style collaboration focused on answering a question together. Instead of each person working to give their individual answers to the question, we'd come up with an answer together through an extended discussion.

When/if we've reached some sort of milestone by either answering the question, or making some interesting progress towards answering the question, we show the results of our collaboration to lesswrong in the form...

Thanks for writing this up! It's a good idea and a thing worth experimenting with.

1weathersystems4hI'm a bit worried that my question will be picked and then I'll be the only one working on it. So to give this thing a better chance of at least two people collaborating, I'm not submitting a question.

Since August 2020 I've been recording conversations with brilliant and insightful rationalists, effective altruists (and people adjacent to or otherwise connected somehow to those communities). If you're an avid reader of this site, I suspect you will recognize many of the names of those I've spoken to.

Since I suspect some LessWrong readers will appreciate these conversations, here is a curated list with links, organized by the LessWrong relevant topics we cover in each conversation. All of these conversations can also be found by searching for "Clearer Thinking" in just about any podcast app. If there are other people you'd like to see me record conversations with, please nominate them in the comments! The format is that I invite each guest to bring 4 or 5 "ideas that...

Also one Spencer recorded with me: "Lines of Retreat and Incomplete Maps". Not sure why it isn't above; maybe was from earlier than the ones listed.

I would like to ask you to share in the comments what do you usually do when not working that provides a rest with good quality. Personally, I found that if I enjoy something a lot it is hard to stop (like a very interesting book). And if I don't enjoy it, then it is kind of no point in doing it to have rest. I will appreciate both short (like "Hiking") and long comments. Thank you!

  • going for a walk
  • taking a long bath or shower
  • going to the gym
  • taking a nap if I'm tired