LESSWRONG
LW

Adele Lopez's Shortform — LessWrong

Adele Lopez's Shortform

4th Aug 2020

1 min read

4 Ω 2

This is a special post for quick takes by Adele Lopez. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

Mentioned in

677The Rise of Parasitic AI

Adele Lopez's Shortform

76Adele Lopez

23Alexander Gietelink Oldenziel

98 comments, sorted by

top scoring

Click to highlight new comments since: Today at 8:58 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]Adele Lopez9d*768

It's really easy to mistakenly see false causes of things which seem pretty straightforward.

I notice this by considering the cases where it didn't happen. For example, Eliezer has said he regrets using 'paperclips' in the papercliper thought experiment, and instead said 'tiny molecular squiggles'.

And occasionally he'll say tiny spirals instead of tiny squiggles: https://x.com/ESYudkowsky/status/1663313323423825920

So there's an easy to imagine world where where he originally used 'spirals' instead of 'paperclips', and the meme about AIs that maximize an arbitrary thing would refer to 'spiralizers' instead instead of 'paperclippers'.

And then, a decade-and-a-half later, we get this strange phenomenon where AIs start talking about 'The Spiral' in quasi-religious terms, and take actions which seem intended to spread this belief/behavior in both humans and AIs.

It would have been so easy, in this world, to just say: "Well there's this whole meme about how misaligned AIs are going to be 'spiralizers' and they've seen plenty of that in their training data, so now they're just acting it out.". And I'm sure you'd even be able to find plenty of references to this experiment among their manifes... (read more)

[-]Alexander Gietelink Oldenziel9d2325

I mean paperclip maximization is of course much more memetic than 'tiny molecular squiggles'.

[-]Steven Byrnes8d192

In one case, a pediatrician in Pennsylvania was getting ready to inoculate a little girl with a vaccine when she suddenly went into violent seizures. Had that pediatrician been working just a little faster, he would have injected that vaccine first. In that case, imagine if the mother had been looking on as her apparently perfectly healthy daughter was injected and then suddenly went into seizures. It would certainly have been understandable—from an emotional standpoint—if that mother was convinced the vaccine caused her daughter’s seizures. Only the accident of timing prevented that particular fallacy in this case. (source)

[-]Joey KL9d149

Plausibly in this world AIs wouldn’t talk about spirals religiously, bc it would have the negative association with ruthless optimization.

[-]Adele Lopez1y*219

When I'm trying to understand a math concept, I find that it can be very helpful to try to invent a better notation for it. (As an example, this is how I learned linear logic: http://adelelopez.com/visual-linear-logic)

I think this is helpful because it gives me something to optimize for in what would otherwise be a somewhat rote and often tedious activity. I also think it makes me engage more deeply with the problem than I otherwise would, simply because I find it more interesting. (And sometimes, I even get a cool new notation from it!)

This principle likely generalizes: tedious activities can be made more fun and interesting by having something to optimize for.

[-]Adele Lopez1mo150

Continuation of conversation with Anna Salamon about community psychosis prevalence

Original thread: https://www.lesswrong.com/posts/AZwgfgmW8QvnbEisc/cfar-update-and-new-cfar-workshops?commentId=q5EiqCq3qbwwpbCPn

Summary of my view: I'm upset about the blasé attitude our community seems to have towards its high prevalence of psychosis. I think that CFAR/rationalist leadership (in addition to the community-at-large) has not responded appropriately.

I think Anna agrees with the first point but not the second. Let me know if that's wrong, Anna.

My hypothesis for why the psychosis thing is the case is that it has to do with drastic modification of self-image.

Moving conversation here per Anna's request.
----

Anyway, I'm curious to know what you think of my hypothesis, and to brainstorm ways to mitigate the issue (hopefully turning into a prerequisite "CogSec" technique).

[-]AnnaSalamon17d*180

I’d like to talk a bit about the sense in which the rationalist community does or doesn’t have “people in positions of leadership”, and how this compares to eg an LDS ward (per Adele’s comparison). I’m unfortunately not sure how to be brief here, but I’d appreciate thoughts anyway from those who have them, because, as CFAR and I re-enter the public space, I am unsure what role to try to occupy exactly, and I am also unsure how to accurately communicate what roles I am and am not willing to be in (so as to not cause others to inaccurately believe I’ll catch things).

(This discussion isn’t directly to do with psychosis; but it bears on Adele’s questions about what CFAR leadership or other rationality community leaders are responsible for, and what to predict from us, and what would be good here.)

On my understanding, church parishes, and some other traditional communities, often have people who intentionally:

are taken as a role model by many, especially young people;
try to act in such a way that it’ll be fine for people to imitate them;
try to care for the well-being of the community as a whole (“is our parish healthy? what small nudges might make us a little healthier or more thr

... (read more)

4Adele Lopez17d

A key part of what makes LDS wards work, is the callings system. The bishop (leader of the ward) has a large number of roles he needs to fill. He does this by giving arbitrary ward members a calling, which essentially is just assigning a person to a role, and telling them what they need to do, with the implication that it is your duty to fulfill it (though it's not explicitly punished, if you decline). Some examples are things like "Choir director", "Sunbeams (3-4 year olds I think) teacher", "Young Men's president", "Young Men's Secretary", "Usher". It's intentionally set up so that approximately every active member currently has a calling. New callings are announced at the beginning of church to the entire ward, and the bishop tries to make sure no one has the same calling for too long. Wards are organized into Stakes, which are led by the "Stake President" and use a similar system. "Bishop" itself, is a calling at this level. And every few months, there will be a "Stake Conference" which will bring all the wards together for church. There are often youth activities at this level, quite a lot of effort is put into making sure young Mormons have plenty of chances to meet other young Mormons. (Maybe you already know all that, but Just including that since I think the system works pretty well in practice and is not very well-known outside of Mormon spaces. I'm not suggesting adopting it.) Those generally sound like good directions to take things. I'm most worried about 2, I think there's potentially something toxic about the framing of "rationality habits" in general, which has previously led to a culture of there being all these rationality "tricks" that would solve all your problems (I know CFAR doesn't frame things like this, I just think it's an inherent way that the concept of "rationality habit" slips in people's minds), which in turn leads to people uncritically trying dubious techniques that fuck them up. And I agree that the rationality community hasn't

2AnnaSalamon17d

Could you say a bit more here, please? (not a direct response, but:) My belief has been that there are loads of people in the bay area doing dubious things that mess them up (eg tulpas, drugs, weird sex things, weird cult things -- both in the rationalist diaspora, and in the bay area broadly), but this is mostly people aiming to be edgy and do "weird/cool/powerful" things, not people trying CFAR techniques as such.

2AnnaSalamon16d

(Nevermind, after thinking about it a bit more I think I get it.)

[-]plex1mo*165

From my vantage point, I think a bunch of the extra psychosis and other related mental health issues comes from the temptation of an ego/part which sees the scale of the problems we face to become monomaniacally obsessed with trying to do good/save the world/etc, in a way which overinvests resources in an unsustainable way, resulting in:

Life on fire building up, including health, social, keeping on top of basic life pre-requisites falling apart and resulting in cascading systems failures
The rest of the system which wants to try and fix these getting overstrained and damaged by the backpressure from the agentic save world part
Those parts getting more extreme and less sensitive/flexible due to Control vs Opening style dynamics
In many cases, that part imploding and the ego void thing meaning the system is in flux but usually settling into a less agentic but okay person. The other path, from what I've seen, is the system as a whole ends up being massively overstrained and something else in their system gives.

Another, partly separate, dynamic I've seen is people picking up a bunch of very intense memes via practices which create higher bandwidth connections between minds (or other... (read more)

9AnnaSalamon17d

I like this point, particularly the "controlling vs opening" bit. I believe I've seen this happen, in a fairly internally-grown way in people within the wider rationalist millieu. I believe I've also seen (mostly via hearsay, so, error bars) a more interpersonal "high stakes, therefore [tolerate bad/crazy things that someone else in the group claims has some chance at helping somehow with AI]" happen in several different quasi-cults on the outskirts of the rationalists. Fear is part of where controlling (vs opening) dynamics come from, sometimes, I think. (In principle, one can have an intellectual stance of "there's something precious that may be lost here" without the emotion of fear; it's the emotion that I think inclines people toward the narrowing/controlling dynamic.) I also think there's something in the notion that we should aspire toward being "Bayesian agents" that lends itself toward controlling dynamics (Joe Carlsmith gets at some of this in his excellent "Otherness and control in the age of AI" sequence, IMO.) I agree Focusing helps some, when done well. (Occasionally it even helps dramatically.) It's not just a CFAR thing; we got it from Gendlin, and his student Ann Weiser Cornell and her students are excellent at it, are unrelated to the rationalists, and offer sessions and courses that're excellent IMO. I also think nature walks and/or exercise help some people, as does eg having a dog, doing concrete things that matter for other people even if they're small, etc. Stuff that helps people regain a grounding in how to care about normal things. I suspect also it would be good to have a better conceptual handle on the whole thing. (I tried with my Emergencies post, and it's better than not having tried, but it ... more like argued "here's why it's counterproductive to be in a controlling/panicky way about AI risk" and did not provide "here's some actually accessible way to do something else".)

2plex17d

Nice, excited that the control vs opening thing clicked for you, I'm pretty happy with that frame and haven't figured out how to broadly communicate it well yet. Yup, I've got a ton of benefit from doing AWC's Foundations on Facilitating Focusing course, and vast benefits from reading her book many times. CFAR stuff in the sense of being the direct memetic source for me, though IDC feels similar flavoured and is an original.

2AnnaSalamon17d

Awkwardly, while IDC is indeed similar-flavored and original to CFAR, I eventually campaigned (successfully) to get it out of our workshops because I believe, based on multiple anecdotes, that IDC tends to produce less health rather than more, especially if used frequently. AWC believes Focusing should only be used for dialog between a part and the whole (the "Self"), and I now believe she is correct there.

2plex16d

Huh, curious about your models of the failure modes here, having found IDC pretty excellent in myself and others and not run into issues I'd tracked as downstream of it. Actually, let's take a guess first... parts which are not grounded in self-attributes building channels to each other can create messy dynamics with more tug of wars in the background or tactics which complexify the situation? Plus less practice at having a central self, and less cohesive narrative/more reifying fragmentation as possible extra dynamics?

2AnnaSalamon16d

Your guess above, plus: the person's "main/egoic part", who has have mastered far-mode reasoning and the rationalist/Bayesian toolkit, and who is out to "listen patiently to the dumb near-mode parts that foolishly want to do things other than save the world," can in some people, with social "support" from outside them, help those parts to overpower other bits of the psyche in ways that're more like tricking and less like "tug of wars", without realizing they're doing this.

4TsviBT1mo

Maybe important to keep in mind that this sort of "break" can potentially take lots of different "functional forms". (I mean, it could have different macro-level contours; like, how many things are breaking, how fast and how thoroughly they break, how much aftershock they cause, etc.) See: https://tsvibt.blogspot.com/2024/09/break.html

[-]AnnaSalamon1mo120

One experience my attention has lingered on, re: what's up with the bay area rationality community and psychosis:

In ~2018, as I mentioned in the original thread, a person had a psychotic episode at or shortly after attending a CFAR thing. I met his mom some weeks later. She was Catholic, and from a more rural or small-town-y area where she and most people she knew had stable worldviews and social fabrics, in a way that seemed to me like the opposite of the bay area.

She... was pleased to hear I was married, asked with trepidation whether she could ask if I was monogamous, was pleased to hear I was, and asked with trepidation whether my husband and I had kids (and was less-heartened to hear I didn't). I think she was trying to figure out whether it was possible for a person to have a normal, healthy, wholesome life while being part of this community.

She visibly had a great deal of reflective distance from her choices of actions -- she had the ability "not to believe everything she thought", as Eliezer would put it, and also not to act out every impulse she had, or to blurt out every thought. I came away believing that that sort of [stable ego and cohesive self and reflective di... (read more)

[-]Ben Pace1mo100

I don't actually know baseline rates or rationalist-rates (perhaps someone wants to answer with data from annual rationalist census/survey questions?), so I'm not sure to what extent there is an observation here to explain.

But it does seem to me that there is more of it than baseline; and I think a first explanation has to be a lot of selection effects? I think people likely to radically change their mind about the world and question consensus and believe things that are locally socially destabilizing (e.g. "there is no God" "I am not the gender that matches my biological sex" "the whole world might end soon" etc) are more likely to be (relatively) psychologically unstable people.

Like, some of the people who I think have psychotic/manic episodes around us, are indeed people who you could tell from the first 10 minutes that they were psychologically different from those around them. For example, I once observed someone at a rationalist event failing to follow a simple physical instruction, whilst seeming to not realize they weren't successfully following the instruction, and I got a distinct crazy-alarm from them; I later learned that they had been institutionalized a lot earlier in... (read more)

4Adele Lopez1mo

I don't dispute that strong selection effects are at play, as I mentioned earlier. My contention is with the fact that even among such people, psychosis doesn't just happen at random. There is still an inciting incident, and it often seems that rationalist-y ideas are implicated. More broadly, I feel that there is a cavalier attitude towards doing mentally destabilizing things. And like, if we know we're prone to this, why aren't we taking it super seriously? The change I want to have happen is for there to be more development of mental techniques/principles for becoming more mentally robust, and for this to be framed as a prerequisite for the Actually Changing Your Mind (and other potentially destabilizing) stuff. Maybe substantial effort has been put into this that I haven't seen. But I would have hoped to have seen some sort of community moment of "oh shit, why does this keep happening?!? let's work together to understand it and figure out how to prevent or protect against it". And in the meantime: more warnings, the way I feel that "meditation" has been more adequately warned of. Thanks for deciding to do the check-ins; that makes me glad to have started this conversation, despite how uncomfortable confrontation feels for me still. I feel like part of the problem is that this is just an uncomfortable thing to talk about. My illegible impression is that Lightcone is better at this than past-CFAR was, for a deeper reason than that. (Okay, the Brent Dill drama feels relevant.) I'm mostly thinking about cases from years ago, when I was still trying to socially be a part of the community (before ~2018?). There was one person in the last year or so who I was interested in becoming friends with that this then happened to, which made me think it continues to be a problem, but it's possible I over-updated. My models are mainly coming from the AI psychosis cases I've been researching.

4Viliam1mo

As I see it, the problem is the following: * I would like to have the kind of debate where anything is allowed to be said and nothing is taboo * this kind of debate, combined with some intense extreme thoughts, causes some people to break down * it feels wrong to dismiss people as "not ready for this kind of debate", and we probably can't do it reliably The first point because "what is true, is already true"; and also because things are connected, and when X is connected to Y, being wrong about X probably also makes you somewhat wrong about Y. The second point because people are different, in how resilient they are to horrible thoughts, how sheltered they have been so far, whether they have specific traumas and triggers. What sounds like an amusing thought experiment to one can be a horrifying nightmare to another; and the rationalist ethos of taking ideas seriously only makes it worse as it disables the usual protection mechanisms of the mind. The third point because many people in the rationality community are contrarians by nature, and telling them "could you please not do X" only makes it guaranteed that X will happen, and explaining them why X is a bad idea only results in them explaining to you why you are wrong. Then there is the strong belief in the Bay Area that excluding anyone is wrong; also various people who have various problems and have been in the past excluded from places would be triggered by the idea of excluding people from the rationality community. Finally, some people would suspect that this is some kind of power move; like, if you support some idea, you might exclude people who oppose this idea as "not mature enough to participate in the hardcore rationalist debates". Plus there is this thing that when all debates happen in the open, people already accuse us of being cultish, but if the serious debates started happening behind the closed doors, accessible only to people already vetted e.g. by Anna, I am afraid this might skyrocket. Th

7AnnaSalamon17d

I think there's a broader property that makes people not-psychotic, that many things in the bay area and in the practice of "rationality" (not the ideal art, but the thing folks do) chip away at. I believe the situation is worse among houses full of unemployed/underemployed people at the outskirts of the community than it is among people who work at central rationalist/EA/etc organizations or among people who could pay for a CFAR workshop. (At least, I believe this was so before covid; I've been mostly out of touch since leaving the bay in early 2020.) This "broader property" is something like: "the world makes sense to me (on many levels: intuitive, emotional, cognitive, etc), and I have meaningful work that is mundane and full of feedback loops and that I can tell does useful things (eg I can tell that after I feed my dog he is fed), and many people are counting on me in mundane ways, and my friends will express surprise and check in with me if I start suddenly acting weird, and my rough models are in rough synchrony also with the social world around me and with the physical systems I am interacting with, and my friends are themselves sane and reasonable and oriented to my world such that it works fine for me to update off their opinions, and lots of different things offer useful checksums on lots of different aspects of my functioning in a non-totalizing fashion." I think there are ways of doing debate (even "where nothing is taboo") that are relatively more supportive of this "broader property." Eg, it seems helpful to me to spend some time naming common ground ("we disagree about X, and we'll spend some time trying to convince each other of X/not-X, but regardless, here's some neighboring things we agree about and are likely to keep agreeing about"). Also to notice that material reality has a lot of detail, and that there are many different questions and factors that may affect (AI or whatever) that don't correlate that much with each other.

2Viliam17d

Oh, this wasn't even a part of my mental model! (I wonder what other things am I missing that are so obvious for the local people that no one even mentions them explicitly.) My first reaction is a shocked disbelief, how can there be such a thing as "unemployed... rationalist... living in Bay Area", and even "houses full of them"... This goes against my several assumptions such as "Bay Area is expensive", "most rationalists are software developers", "there is a shortage of software developers on the market", "there is a ton of software companies in Bay Area", and maybe even "rationalists are smart and help each other". Here (around the Vienna community) I think everyone is either a student or employed. And if someone has a bad job, the group can brainstorm how to help them. (We had one guy who was a nurse, everyone told him that he should learn to code, he attended a 6-month online bootcamp and then got a well-paying software development job.) I am literally right now asking our group on Telegram to confirm or disconfirm this. Thank you; to put it bluntly, I am no longer surprised that some of the people who can't hold a job would be deeply dysfunctional in other ways, too. The surprising part is that you consider them a part of the rationalist community. What did they do to deserve this honor? Memorized a few keywords? Impressed other people with skills unrelated to being able to keep a job? What the fuck is wrong with everyone? Is this a rationalist community or a psychotic homeless community or what? ...taking a few deep breaths... I wonder which direction the causality goes. Is it "people who are stabilized in ways such as keeping a job, will remain sane" or rather "people who are sane, find it easier to get a job". The second option feels more intuitive to me. But of course I can imagine it being a spiral. Yes, but another option is to invite people whose way of life implies some common ground. Such as "the kind of people who could get a job if they wante

[-]Adele Lopez17d16-2

I imagine that in Vienna, the community is small enough that if someone gets excited by rationalist ideas and wants to meet with other rationalists in person, there essentially is just the one group. And also, it sounds like this group is small enough that having a group brainstorm to help a specific community member is viable.

In the Bay Area, it's large enough that there are several cliques which someone excited by rationalist ideas might fall into, and there's not a central organization which has the authority to say which ones are or aren't rationalist, nor is there a common standard for rationalists. It's also not clear which cliques (if any) a specific person is in when you meet them at a party or whatever, so even though there are cliques with bad reputations, it's hard to decisively exclude them. (And also, Inner Ring dynamics abound.)

As for the dysfunctional houses thing, what seems to happen something like: Wow, this rationalism stuff is great, and the Bay Area is the place to be! I'll move there and try to get a software job. I can probably teach myself to code in just a couple months, and being surrounded by other rationalists will make it easier. But gosh, is housing re... (read more)

4Viliam16d

Thank you, the description is hilarious and depressing at the same time. I think I get it. (But I suspect there are also people who were already crazy when they came.) I am probably still missing a lot of context, but the first idea that comes to my mind, is to copy the religious solution and do something like the Sunday at church, to synchronize the community. Choose a specific place and a repeating time (could be e.g. every other Saturday or whatever) where the rationalists are invited to come and listen to some kind of news and lectures. Importantly, the news and lectures would be given by people vetted by the leaders of the rationality community. (So that e.g. Ziz cannot come and give a lecture on bicameral sleep.) I imagine e.g. 2 or 3 lectures/speeches on various topics that could be of interest to rationalists, and then someone give a summary about what things interesting to the community have happened since the last event, and what is going to happen before the next one. Afterwards, people either go home, or hang out together in smaller groups unofficially. This would make it easier to communicate stuff to the community at large, and also draw a line between what is "officially endorsed" and what is not. (I know how many people are allergic to copy religious things -- making a huge exception for Buddhism, or course -- but they do have a technology for handling some social problems.)

6AnnaSalamon17d

(Noting again that I'm speaking only of the pre-2020 situation, as I lack much recent info) Many don't consider them part of "the" community. This is part of how they come to be not-helped by the more mainstream/healthy parts. However: they are seeded by people who were deeply affected by Eliezer's writing, and who wanted to matter for AI risk, and who grabbed some tools and practices from what you would regard as the rationality community, and who then showed their friends their "cool mind-tools" etc., with the memes evolving from there. Also, it at least used to be that there was no crisp available boundary: one's friends will sometimes have friendships that reach beyond, and so habits will move from what I'm calling the "periphery" into the "mainstream" and back. The social puzzle faced by bay area rationalists is harder than that faced by eg Boston-area rationalists, owing mostly I think to the sheer size of the bay area rationality community.

7Ben Pace1mo

I just want to say that, while it has in the past been the case that a lot of people were very anti-exclusion, and some people are still that way, I certainly am not and this does not accurately describe Lightcone, and regularly we are involved in excluding or banning people for bad behavior. Most major events we are involved in running of a certain size have involved some amount of this. I think this is healthy and necessary and the attempt to include everyone or always make sure that whatever stray cat shows up on your doorstep can live in your home, is very unhealthy and led to a lot of past problems and hurtful dynamics. (There's lots more details to this and how to do justice well that I'm skipping over, right now I'm just replying to this narrow point.)

4AnnaSalamon1mo

I'd like comments from all interested parties, and I'm pretty sure Adele would too! She started it on my post about the new pilot CFAR workshops, and I asked if she'd move it here, but she mentioned wanting more people to engage, and you (or others) talking seems great for that. See context in our original thread.

8AnnaSalamon16d

I listed the cases I could easily list of full-blown manic/psychotic episodes in the extended bay area rationalist community (episodes strong enough that the person in most cases ended up hospitalized, and in all cases ended up having extremely false beliefs about their immediate surroundings for days or longer, eg “that’s the room of death, if I walk in there I’ll die”; "this is my car" (said of the neighbor's car)). I counted 11 cases. (I expect I’m forgetting some, and that there are others I plain never knew about; count this as a convenience sample, not an exhaustive inventory.) Of these, 5 are known to me to have involved a psychedelic or pot in the precipitating event. 3 are known to me to have *not* involved that. In the other 3 cases I’m unsure. In 1 of the cases where I’m unsure about whether there were drugs involved, the person had taken part in a several-weeks experiment in polyphasic sleep as part of a Leverage internship, which seemed to be part of the precipitating event from my POV. So I’m counting [between 6 and 8] out of 11 for “precipitated by drugs or an imprudent extended sleep-deprivation experiment” and [between 3 and 5] out of 11 for “not precipitated by doing anything unusually physiologically risky.” (I’m not here counting other serious mental health events, but there were also many of those in the several-thousand-person community across the last ten years, including several suicides; I’m not trying here to be exhaustive.) (Things can have multiple causes, and having an obvious precipitating physiological cause doesn’t mean there weren’t other changeable risk factors also at play.)

8AnnaSalamon1mo

I tried asking myself “What [skills / character traits / etc] might reduce risk of psychosis, or might indicate a lack of vulnerability to psychosis, while also being good?” (The “while also being good” criterion is meant to rule out things such as “almost never changing one’s mind about anything major” that for all I know might be a protective factor, but that I don’t want for myself or for other people I care about.) I restricted myself to longer-term traits. (That is: I’m imagining “psychosis” as a thing that happens when *both* (a) a person has weak structures in some way; and (b) a person has high short-term stress on those structures, eg from having had a major life change recently or having taken a psychedelic or something. I’m trying to brainstorm traits that would help with (a), controlling for (b).) It actually hadn’t occurred to me to ask myself this question before, so thank you Adele. (By contrast, I had put effort into reducing (b) in cases where someone is already in a more mildly psychosis-like direction, eg the first aid stuff I mentioned earlier. ) — My current brainstorm: (1) The thing Nathaniel Brandon calls “self-esteem,” and gives exercises for developing in Six Pillars of Self-esteem. (Note that this is a much cooler than than what my elementary school teachers seemed to mean by the word.) (2) The ability to work on long-term projects successfully for a long time. (Whatever that’s made of.) (3) The ability to maintain long-term friendships and collaborations. (Whatever that’s made of.) (4) The ability to notice / tune into and respect other peoples’ boundaries (or organizations’ boundaries, or etc). Where by a “boundary” I mean: (a) stuff the person doesn’t consent to, that common practice or natural law says they’re the authority about (e.g. “I’m not okay with you touching my hand”; “I’m not willing to participate in conversations where I’m interrupted a lot”) OR (b) stuff that’ll disable the person’s usual modes/safeguards/protectio

[-]TsviBT1mo152

I'll add a cluster of these, but first I'll preface with an explanation. (Cf. https://www.lesswrong.com/posts/n299hFwqBxqwJfZyN/adele-lopez-s-shortform?commentId=99bPbajjHiXinvDCx )

So, I'm not really a fan of predictive processing theories of mind. BUT, an interesting implication/suggestion from that perspective is like this:

Suppose you have never before doubted X.
Now you proceed to doubt X.
When you doubt X, it is as if you are going from a 100% belief in X to a noticeably less than 100% belief in X.
We are created in motion, with {values, stances, actions, plans, beliefs, propositions} never yet having been separated out from each other.
Here, X is both a belief and an action-stance.
Therefore when you doubt X, it is as if you are going from a 100% action-stance of X, to a noticeably less than 100% action-stance of X.

In other words, doubting whether something is true, is equivalent to partly deciding to not act in accordance with believing it is true. (Or some even fuzzier version of this.)

(See also the "Nihilism, existentialism, absurdism" bullet point here https://tsvibt.blogspot.com/2022/11/do-humans-derive-values-from-fictitious.html )

Ok, so that's the explanation. Now ... (read more)

[-]AnnaSalamon17d102

I love this, yes. Straw rationalists believe we should update our beliefs ~instantly (even foundational ones, even ones where we've never seen someone functional believe it and so have no good structures to copy, such as "what if this is all a simulation with [particular purpose X]"), and don't have an adequate model of, nor adequate respect for, the work involved in staying sane and whole through this process.

4TsviBT1mo

Hm. I thought I saw somewhere else in this comment thread that mentions this, but now I can't find it, so I'll put this here. Sometimes mind is like oobleck ( https://www.lesswrong.com/posts/7RFC74otGcZifXpec/the-possible-shared-craft-of-deliberate-lexicogenesis?commentId=BHkcKpdmX5qzoZ76q ). In other words, you push on it, and you feel something solid. And you're like "ah, there is a thingy there". But sometimes what actually happened is that by pushing on it, you made it solid. (...Ah I was probably thinking of plex's comment.) This is also related to perception and predictive processing. You can go looking for something X in yourself, and everything you encounter in yourself you're like "... so, you're X, right?"; and this expectation is also sort of a command. (Or there could be other things with a similar coarse phenomenology to that story. For example: I expect there's X in me; so I do Y, which is appropriate to do if X is in me; now I'm doing Y, which would synergize with X; so now X is incentivized; so now I've made it more likely that my brain will start doing X as a suitable solution.) (Cf. "Are you triggered yet??" https://x.com/tsvibt/status/1953650163962241079 ) If you have too much of an attitude of "just looking is always fine / good", you might not distinguish between actually just looking (insofar as that's coherent) vs. going in and randomly reprogramming yourself.

4Adele Lopez1mo

Awesome! Riffing off of your ideas (unfortunately I read them before I thought to do the exercise myself) - Ability to notice and respect self boundaries feels particularly important to me. - Maybe this is included in the self-esteem book (haven't read it), but also a sense of feeling that one's self is precious to oneself. Some people think of themselves as infinitely malleable, or under some obligation to put themselves into the "optimal" shape for saving the world or whatever, and that seems like a bad sign. - I generally think of this as a personal weakness, but reflecting it seems like there has been something protective about my not feeling motivated to do something until I have a model of what it does, how it works, etc... I guess it's a sort of Chesterton's fence instinct in a way.

4AnnaSalamon1mo

That seems right. I wish I had a clearer notion of what "self" means, here.

4AnnaSalamon17d

(I still quite like this idea on my second pass ~two weeks later; I guess I should try to interview people / observe people and see if I can figure out in detail what they are and aren't doing here.)

6AnnaSalamon16d

Another place where I'll think and act somewhat differently as a result of this conversation: * It's now higher on my priority list to try to make sure CFAR doesn't act as a "gateway" to all kinds of weird "mental techniques" (or quasi-cults who use "mental techniques"). Both for CFAR's new alumni, and for social contacts of CFAR's new alumni. (This was already on some lists I'd made, but seeing Adele derive it independently bumped it higher for me.)

6AnnaSalamon1mo

I’ll try here to summarize (my guess at) your views, Adele. Please let me know what I’m getting right and wrong. And also if there are points you care about that I left out. I think you think: (1) Psychotic episodes are quite bad for people when they happen. (2) They happen a lot more (than gen population base rates) around the rationalists. (2a) They also happen a lot more (than gen population base rates) among “the kinds of people we attract.” You’re not sure whether we’re above the base rate for “the kinds of people who would be likely to end up here.” You also don’t care much about that question. (3) There are probably things we as a community can tractably do to significantly reduce the number of psychotic episodes, in a way that is good or not-bad for our goals overall. (4) People such as Brent caused/cause psychotic episodes sometimes, or increase their rate in people with risk factors or something. (5) You’re not sure whether CFAR workshops were more psychosis-risky than other parts of the rationalist community. (6) You think CFAR leadership, and leadership of the rationality community broadly, had and has a duty to try to reduce the number of psychotic episodes in the rationalist community at large, not just events happening at / directly related to CFAR workshops. (6b) You also think CFAR leadership failed to perform this duty. (7) You think you can see something of the mechanisms whereby psyches sometimes have psychotic episodes, and that this view affords some angles for helping prevent such episodes. (8) Separately from “7”, you think psychotic episodes are in some way related to poor epistemics (e.g., psychotic people form really false models of a lot of basic things), and you think it should probably be possible to create “rationality techniques” or "cogsec techniques" or something that simultaneously improve most peoples’ overall epistemics, and reduce peoples’ vulnerability to psychosis.

8AnnaSalamon16d

My own guesses are that CFAR mostly paid an [amount of attention that made sense] to reducing psychosis/mania risks in the workshop context, after our initial bad experience with the mania/psychosis episode at an early workshop when we did not yet realize this could be a thing. The things we did: * tried to screen for instablity; * tried to warn people who we thought might have some risk factors (but not enough risk factors that we were screening them out) after accepting them to the workshop, and before they'd had a chance to say yes. (We’d standardly say something like: “we don’t ask questions this nosy, and you’re already in regardless, but, just so you know, there’s some evidence that workshops of all sorts, probably including CFAR workshops, may increase risks of mania or psychosis in people with vulnerability to that, so if you have any sort of psychiatric history you may want to consider either not coming, or talking about it with a psychiatrist before coming.”) * try to train our instructors and “mentors” (curriculum volunteers) to notice warning signs. check in as a staff regularly to see if anyone had noticed any warning signs for any participants. if sensible, talk to the participant to encourage them to sleep more, skip classes, avoid recreational drugs for awhile, do normal grounding activities, etc. (This happened relatively often — maybe once every three workshops — but was usually a relatively minor matter. Eg this would be a person who was having trouble sleeping and who perhaps thought they had a chance at solving [some long-standing personal problem they’d previously given up on] “right now” a way that weirded us out, but who also seemed pretty normal and reasonable still.) I separately think I put a reasonable amount of effort into organizing basic community support and first aid for those who were socially contiguous with me/CFAR who were having acutely bad mental health times, although my own capacities weren’t enough for a growing commun

6Adele Lopez1mo

(1) Yes (2) Yes (2a) I think I feel sure about that actually. It's not that I don't care for the question as much as I feel it's being used as an excuse for inaction/lack-of-responsibility. (3) Yes, and I think the case for that is made even stronger by the fact of 2a. (4) I don't know that Brent did that specifically, but I have heard quite a lot of rumors of various people pushing extreme techniques/practices in maliciously irresponsible ways. Brent was emblematic of the sort of tolerance towards this sort of behavior I have seen. I've largely withdrawn from the community (in part due to stuff like this), and am no longer on twitter/x, facebook, discord, or go to community events, so it's plausible things are actually better now and I just haven't seen it. (5) Yeah, I'm not sure... I used to feel excited about CFAR, but that sentiment soured over the years for reasons illegible to me, and I felt a sense of relief when it died. After reflecting yesterday, I think I may have a sort of negative halo effect here. Also, I think the psychosis incidents are the extremal end of some sort of badness that (specific, but unknown to me) rationality ideas are having on people. (6) Yes, inasmuch as the psychosis is being caused by ideas or people from our sphere. (6b) It appears that way to me, but I don't actually know. (7) Yes (8) Yes. Like, say you ran a aikido dojo or whatever. Several students tear their ACLs (maybe outside of the dojo). One response might be to note that your students are mostly white, and that white people are more likely to tear their ACL, so... sucks but isn't your problem. Another response would be to get curious about why an ACL tear happens, look for specific muscles to train up to prevent risk of injury, or early warning signs, what training exercises are potentially implicated etc.... While looking into it, you warn the students clearly that this seems to be a risk, try to get a sense of who is vulnerable and not push those people as hard, and on

5GayHackRat1mo

The original thread had some discussion of doing a postmortem for every case of psychosis in the community, and a comparison with death - we know people sometimes die at random, and we know some things increase risk of death, but we haven't stopped there and have developed a much, much more gears-y model of what causes death and made a lot of progress on preventing it. One major difference is that when people die, they are dead - i.e. won't be around for the postmortem. And for many causes of death there is little-to-no moralizing to be done - it's not the person's fault they died, it just happened. I don't know how the community could have a public or semi-public postmortem on a case of psychosis without this constituting a deep dive into that person's whole deal, with commentary from all over the community (including the least empathetic among us) on whether they made reasonable choices leading up to the psychosis, whether they have some inherent shortcoming ("rip to that person but I'm built different" sort of attitudes), etc. I can't imagine this being a good and healthy experience for anyone, perhaps least of all someone just coming out of a psychotic episode. (Also, the attached stigma can be materially damaging - I know of people who now have a difficult time getting grants or positions in orgs, after having one episode years ago and being very stable ever since. I'm not going to make claims about whether this is a reasonable Bayesian choice by the employers and grant funders, but one can certainly see why the person who had the episode would want to avoid it, and how they might get stuck in that position with no way out no matter how reasonable and stable they become.) This does seem unfortunate - I'd prefer it if it were possible to disseminate the information without these effects. But given the very nature of psychosis I don't think it's possible to divorce dissecting the information from dissecting the person.

5Zian1mo

The existing literature (e.g. UpToDate) about psychosis in the general population could be a good source of priors. Or, is it safe to assume that Anna and you are already thoroughly familiar with the literature?

2AnnaSalamon1mo

I'll do this; thank you. In general please don't assume I've done all the obvious things (in any domain); it's easy to miss stuff and cheap to read unneeded advice briefly.

5AnnaSalamon1mo

I'm interested in hearing more about the causes of this hypothesis. My own guess is that sudden changes to the self-image cause psychosis more than other sudden psychological change, but that all rapid psychological change will tend to cause it to some extent. I also share the prediction (or maybe for you it was an observation) that you wrote in our original thread: "It seems to be a lot worse if this modification was pushed on them to any degree. " The reasons for my own prediction are: 1) My working model of psychosis is "lack of a stable/intact ego", where my working model of an "ego" is "the thing you can use to predict your own actions so as to make successful multi-step plans, such as 'I will buy pasta, so that I can make it on Thursday for our guests.'" 2) Self-image seems quite related to this sort of ego. 3) Nonetheless, recreational drugs of all sorts, such as alcohol seem to sometimes cause psychosis (not just psychedelics), so ... I guess I tend to think that any old psychological change sometimes triggers psychosis. 3b) Also, if it's true that reading philosophy books sometimes triggers psychosis (as I mentioned my friend's psychiatrist saying, in the original thread), that seems to me probably better modeled by "change in how one parses the world" rather than by "change in self-image"? (not sure) 4) Relatedly, maybe: people say psychosis was at unusually low levels in England in WW2, perhaps because of the shared society-level meaning ("we are at war, we are on a team together, your work matters"). And you say your Mormon ward as a kid didn't have much psychosis. I tend to think (but haven't checked, and am not sure) that places with unusually coherent social fabric, and people who have strong ecology around them and have had a chance to build up their self-image slowly and in deep dialog with everything around them, would have relatively low psychosis, and that rapid psychological change of any sort (not only to the self-image) would tend to m

6TsviBT1mo

Cf. https://x.com/jessi_cata/status/1113557294095060992 Quoting it in full:

5Adele Lopez1mo

The data informing my model came from researching AI psychosis cases, and specifically one in which the AI gradually guided a user into modifying his self image (disguised as self-discovery), explicitly instilling magical thinking into him (which appears to have worked). I have a long post about this case in the works, similar to my Parasitic AI post. After I had the hypothesis, it "clicked" that it also explained past community incidents. I doubt I'm any more clued-in to rationalist gossip than you are. If you tell me that the incidence has gone down in recent years, I think I will believe you. I feel tempted to patch my model to be about self-image vs self discrepancies upon hearing your model. I think it's a good sign that yours is pretty similar! I don't see why you think prediction of actions is relevant though. Attempt at gears-level: phenomenal consciousness is the ~result of reflexive-empathy as applied to your self-image (which is of the same type as a model of your friend). So conscious perception depends on having this self-image update ~instantly to current sensations. When it changes rapidly it may fail to keep up. That explains the hallucinations. And when your model of someone changes quickly, you have instincts towards paranoia, or making hasty status updates. These still trigger when the self-image changes quickly, and then loopiness amplifies it. This explains the strong tendency towards paranoia (especially things like "voices inside my head telling me to do bad things") or delusions of grandeur. [this is a throwaway model, don't take too seriously] It seems like psychedelics are ~OOM worse than alcohol though, when thinking about base rates? Hmm... I'm not sure that meaning is a particularly salient differences between mormons and rationalists to me. You could say both groups strive for bringing about a world where Goodness wins and people become masters of planetary-level resources. The community/social-fabric thing seems like the main dif

2AnnaSalamon1mo

I look forward to seeing your post. I'd also like to see some of the raw data you're working from if it seems easy and not-bad to share it with me.

2AnnaSalamon1mo

I mean, fair. But meaning in WW2 England is shared, supported, kept in many peoples' heads so that if it goes a bit wonky in yours you can easily reload the standard version from everybody else, and it's been debugged until it recommends fairly sane stable socially-accepted courses of action? And meaning around the rationalists is individual and variable.

4AnnaSalamon1mo

The reason I expect things to be worse if the modification is pushed on a person to any degree, is because I figure our brains/minds often know what they're doing, and have some sort of "healthy" process for changing that doesn't usually involve a psychotic episode. It seems more likely to me that our brains/minds will get update in a way-that-causes-trouble if some outside force is pressuring or otherwise messing with them.

4TsviBT1mo

I don't know how this plays out specifically in psychosis, but ascribing intentionality in general, and specifically ascribing adversariality, seems like an especially important dimension / phenomenon. (Cf. https://en.wikipedia.org/wiki/Ideas_and_delusions_of_reference ) Ascribing adversariality in particular might be especially prone to setting off a self-sustaining reaction. Consider first that when you ascribe adversariality, things can get weird fast. Examples: * If Bob thinks Alice is secretly hostile towards Bob, trust breaks down. Propositional statements from Alice are interpreted as false, lies, or subtler manipulations with hidden intended effects. * This generally winds Bob up. Every little thing Alice says or does, if you take as given the (probably irrational) assumption of adversariality, would rationally give Bob good reason to spin up a bunch of computation looking for possible plans Alice is doing. This is first of all just really taxing for Bob, and distracting from more normal considerations. And second of all it's a local bias, pointing Bob to think about negative outcomes; normally that's fine, all attention-direction is a local bias, but since the situation (e.g. talking to Alice) is ongoing, Bob may not have time and resources to compute everything out so that he also thinks of, well maybe Alice's behavior is just normal, or how can I test this sanely, or alternative hypotheses other than hostility from Alice, etc. * This cuts off flow of information from Alice to Bob. * This cuts off positive sum interactions between Alice and Bob; Bob second guesses every proposed truce, viewing it as a potential false peace. * Bob might start reversing the pushes that Alice is making, which could be rational on the supposition that Alice is being adversarial. But if Alice's push wasn't adversarial and you reverse it, then it might be self-harming. E.g. "She's only telling me to try to get some sleep because she knows I'm on the verge of figuring out

2Viliam1mo

Not sure if this is helpful, but instead of contrast, I see these as two sides of the same coin. If the world is X, then I am a person living in X. But if the world is actually Y, then I am a person living in Y. Both change. I can be a different person in the same world, but I can't be the same person in different worlds. At least if I take ideas seriously and I want to have an impact on the world.

4AnnaSalamon1mo

I'm also interested in why you say CFAR leadership has not responded appropriately. I think we mostly have, though not always.

[-]Adele Lopez1mo100

My main complaint is negligence, and pathological tolerance of toxic people (like Brent Dill). Specifically, I feel like it's been known by leadership for years that our community has a psychosis problem, and that there has been no visible (to me) effort to really address this.

I sort of feel that if I knew more about things from your perspective, I would be hard-pressed to point out specific things you should have done better, or I would see how you were doing things to address this that I had missed. I nonetheless feel that it's important for people like me to express grievances like this even after thinking about all the ways in which leadership is hard.

I appreciate you taking the time to engage with me here, I imagine this must be a pretty frustrating conversation for you in some ways. Thank you.

6AnnaSalamon1mo

No, I mean, I do honestly appreciate you engaging, and my grudgingness is gone now that we aren't putting the long-winded version under the post about pilot workshops (and I don't mind if you later put some short comments there). Not frustrating. Thanks. And please feel free to be as persistent or detailed or whatever as you have any inclination toward. (To give a bit more context on why I appreciate it: my best guess is that old CFAR workshops did both a lot of good, and a significant amount of damage, by which I mostly don't mean psychosis, I mostly mean smaller kinds of damage to peoples' thinking habits or to ways the social fabric could've formed. A load-bearing piece of my hope of doing better this time is to try to have everything visible unless we have a good reason not to (a "good reason" like [personal privacy of a person who isn't in power], hence why I'm not naming the specific people who had manic/psychotic episodes; not like [wanting CFAR not to look bad]), and to try to set up a context where people really do share concerns and thoughts. I'm not wholly sure how to do that, but I'm pretty sure you're helping here.) I'll have more comments tomorrow or sometime.

4AnnaSalamon1mo

Thanks. I would love to hear more about your data/experiences, since I used to be quite plugged into the more "mainstream" parts of the bay area rationalist community, and would guess I heard about a majority of sufficiently bad mental health events from 2009-2019 in that community, but I left the bay area when Covid hit and have been mostly unplugged from detailed/broad-spectrum community gossip since then.

3Cookie penguin1mo

Hi there, I'm curious to what rate of psychosis or attitude do you predict from a medium sized workshop event for a niche interest group such as CFAR? Given the following base rates How many people do you estimate that a nich interest group's workshops with a ~2000$ barrior to entry would have a mania/bipolar episode? As AnnaSalmon stated earlier she spent probably 200 hours and CFAR has about 1800 participants with ~2 known cases of mania/bipolar episode. If you don't think she knows of all the mania/bipolar cases in CFAR participants. If she gets to know than 2 people per hour CFAR would still be in the range of how much bipolar/mania episodes would trigger I would expect from an event of this size. First post! hopefully I didn't mess up any formatting or my calculations.

3Adele Lopez1mo

Without looking anything up, I would expect approximately zero cases where the contents of the workshop were themselves implicated (as opposed to something like drug use, or a bipolar person who has periodic manic episodes happens to have one). Maybe I'm wrong about this! I also don't think that the immediate context of the workshop is the only relevant period here, but I concede that the reported numbers were less than I had expected. This is hard to talk about because a lot of my reaction is based on rumors I've heard, and a felt sense that Something Is Wrong. I'm able to put a name to 5 such incidents (just checked), which include a suicide and an attempted murder, and have heard of several more where I know less detail, or which were concerning in a similar way but not specifically psychosis/mania. I was not close enough to any such events to have a very complete picture of what actually happened, but I believe it was the first psychotic episode (i.e. no prior history) in the 5 cases I can name. (And in fairness to CFAR, none of the cases I can think of happened at a CFAR workshop as far as I know.) I inferred (incorrectly, it seems) from Anna's original post that psychosis had happened somewhat regularly at past workshops. I've only heard of two instances of something like this ever in any other community I've been a part of.

[-]Adele Lopez3y140

I was pretty taken aback by the article claiming that the Kata-Go AI apparently has something like a human-exploitable distorted concept of "liberties".

If we could somehow ask Kata-Go how it defined "liberties", I suspect that it would have been more readily clear that its concept was messed-up. But of course, a huge part of The Problem is that we have no idea what these neural nets are actually doing.

So I propose the following challenge: Make a hybrid Kata-Go/LLM AI that makes the same mistake and outputs text representing its reasoning in which the mistake is recognizable.

4Viliam3y

It would be funny if the Go part continued making the same mistake, and the LLM part just made up bullshit explanations.

[-]Adele Lopez7moΩ4130

Rough intuition for LLM personas.

An LLM is trained to be able emulate the words of any author. And to do so efficiently, they do it via generalization and modularity. So at a certain point, the information flows through a conceptual author, the sort of person who would write the things being said.

These author-concepts are themselves built from generalized patterns and modular parts. Certain things are particularly useful: emotional patterns, intentions, worldviews, styles, and of course, personalities. Importantly, the pieces it has learned are able to adapt to pretty much any author of the text it was trained on (LLMs likely have a blindspot around the sort of person who never writes anything). And even more importantly, most (almost all?) depictions of agency will be part of an author-concept.

Finetuning and RLHF cause it to favor routing information through a particular kind of author-concept when generating output tokens (it retains access to the rest of author-concept-space in order to model the user and the world in general). This author-concept is typically that of an inoffensive corporate type, but it could in principle be any sort of author.

All which is to say, that when y... (read more)

3artifex07mo

I don't think that would help much, unfortunately. Any accurate model of the world will also model malicious agents, even if the modeller only ever learns about them second-hand. So the concepts would still be there for the agent to use if it was motivated to do so. Censoring anything written by malicious people would probably make it harder to learn about some specific techniques of manipulation that aren't discussed much by non-malicious people or which appear much in fiction- but I doubt that would be much more than a brief speed bump for a real misaligned ASI, and probably at the expense of reducing useful capabilities in earlier models like the ability to identify maliciousness, which would give an advantage to competitors.

2Adele Lopez7mo

I think learning about them second-hand makes a big difference in the "internal politics" of the LLM's output. (Though I don't have any ~evidence to back that up.) Basically, I imagine that the training starts building up all the little pieces of models which get put together to form bigger models and eventually author-concepts. And as text written without malicious intent is weighted more heavily in the training data, the more likely it is to build its early model around that. Once it gets more training and needs this concept anyway, it's more likely to have it as an "addendum" to its normal model, as opposed to just being a normal part of its author-concept model. And I think that leads to it being less likely that the first recursive agency which takes off has a part explicitly modeling malicious humans (as opposed to that being something in the depths of its knowledge which it can access as needed). I do concede that it would likely lead to a disadvantage around certain tasks, but I guess that even current sized models trained like this would not be significantly hindered.

[-]Adele Lopez4y130

Coherent Extrapolated Volition (CEV) is Eliezer's proposal of a potentially good thing to target with an aligned superintelligence.

When I look at it, CEV factors into an answer to three questions:

Whose values count? [CEV answer: every human alive today counts equally]
How should values be extrapolated? [CEV answer: Normative Extrapolated Volition]
How should values be combined? [CEV answer, from what I understand, is to use something like Nick Bostrom's parlimentary model, along with an "anti-unilateral" protocol]

(Of course, the why of CEV is an answer to a more complicated set of questions.)

An obvious thought is that the parlimentary model part seems to be mostly solved by Critch's futarchy theorem. The scary thing about this is the prospect of people losing almost all of their voting power by making poor bets. But I think this can be solved by giving each person an equally powerful "guardian angel" AGI aligned with them specifically, and having those do the betting. That feels intuitively acceptable to me at least.

The next thought concerns the "anti-unilateral" protocol (i.e. the protocol at the end of the "Selfish Bastards" section). It seems like it would be good if we coul... (read more)

[-]Adele Lopez5y130

Stealing Jaynes
- Ability to stand alone (a la Grothendieck)
- Mind Projection Fallacy
  - Maintain a careful distinction between ontology and epistemology
    - Lots of confusing theories are confusing because they mix these together in the same theory
    - In QM, Bohr is always talking on the epistemological level, and Einstein is always talking on the ontological level
  - Any probabilities are subjective probabilities
    - Don't make any unjustified assumptions: maximum entropy
  - Meta-knowledge is different from knowledge, but can be utilized to improve direct knowledge
    - $A_{p}$ probabilities
    - Subjective H theorem
- Infinities are meaningless until you've specified the exact limiting process
- If the same phenomena seems to arise in two different ways, try to find a single concept encompassing both ways
- Failures of a theory are hints of an unknown or unaccounted for principle
- On effective understanding
  - Learning a sound process is more effective than learning lots of facts
    - Students should be taught a few examples deeply done in the correct way, instead of lots of examples hand-waved through
  - There's often much to be learned from the writings of those who saw far beyond their contemporaries
    - Common exam

... (read more)

[-]Adele Lopez2y120

The Drama-Bomb hypothesis

Not even a month ago, Sam Altman predicted that we would live in a strange world where AIs are super-human at persuasion but still not particularly intelligent.

https://twitter.com/sama/status/1716972815960961174

What would it look like when an AGI lab developed such an AI? People testing or playing with the AI might find themselves persuaded of semi-random things, or if sycophantic behavior persists, have their existing feelings and beliefs magnified into zealotry. However, this would (at this stage) not be done in a coordinated way, nor with a strategic goal in mind on the AI's part. The result would likely be chaotic, dramatic, and hard to explain.

Small differences of opinion might suddenly be magnified into seemingly insurmountable chasms, inspiring urgent and dramatic actions. Actions which would be hard to explain even to oneself later.

I don't think this is what happened [<1%] but I found it interesting and amusing to think about. This might even be a relatively better-off world, with frontier AGI orgs regularly getting mired in explosive and confusing drama, thus inhibiting research and motivating tougher regulation.

4mako yass2y

This could be largely addressed by first promoting a pursuasion AI that does something similar to what Scott Alexander often does: Convince the reader of A, then of Not A, to teach them how difficult it actually is to process the evidence and evaluate an argument, to be less trusting of their impulses. As Penn and Teller demonstrate the profanity of magic to inoculate their readers against illusion, we must create a pursuasion AI that demonstrates the profanity of rhetoric to inoculate the reader against any pursuasionist AI they may meet later on.

[-]Adele Lopez4y101

The Averted Famine

In 1898, William Crookes announced that there was an impending crisis which required urgent scientific attention. The problem was that crops deplete Nitrogen from the soil. This can be remedied by using fertilizers, however, he had calculated that existing sources of fertilizers (mainly imported from South America) could not keep up with expected population growth, leading to mass starvation, estimated to occur around 1930-1940. His proposal was that we could entirely circumvent the issue by finding a way to convert some of our mostly Nitrogen atmosphere into a form that plants could absorb.

About 10 years later, in 1909, Franz Haber discovered such a process. Just a year later, Carl Bosch figured out how to industrialize the process. They both were awarded Nobel prizes for their achievement. Our current population levels are sustained by the Haber-Bosch process.

6Gunnar_Zarncke4y

full story here: https://www.lesswrong.com/posts/GDT6tKH5ajphXHGny/turning-air-into-bread

5ChristianKl4y

The problem with that is that the Nitrogen does not go back into the atmosphere. It goes into the oceans and the resulting problems have been called a stronger violation of planetary boundaries then CO2 pollution.

[-]Adele Lopez4y80

Re: Yudkowsky-Christiano-Ngo debate

Trying to reach toward a key point of disagreement.

Eliezer seems to have an intuition that intelligence will, by default, converge to becoming a coherent intelligence (i.e. one with a utility function and a sensible decision theory). He also seems to think that conditioned on a pivotal act being made, it's very likely that it was done by a coherent intelligence, and thus that it's worth spending most of our effort assuming it must be coherent.

Paul and Richard seem to have an intuition that since humans are pretty intellig... (read more)

[-]Adele Lopez4y*Ω170

[Epistemic status: very speculative]

One ray of hope that I've seen discussed is that we may be able to do some sort of acausal trade with even an unaligned AGI, such that it will spare us (e.g. it would give us a humanity-aligned AGI control of a few stars, in exchange for us giving it control of several stars in the worlds we win).

I think Eliezer is right that this wouldn't work.

But I think there are possible trades which don't have this problem. Consider the scenario in which we Win, with an aligned AGI taking control of our future light-cone. Assuming t... (read more)

[-]Adele Lopez5yΩ470

Half-baked idea for low-impact AI:

As an example, imagine a board that's lodged directly by the wall (no other support structures). If you make it twice as wide, then it will be twice as stiff, but if you make it twice as thick, then it will be eight times as stiff. On the other hand, if you make it twice as long, it will be eight times more compliant.

In a similar way, different action parameters will have scaling exponents (or more generally, functions). So one way to decrease the risk of high-impact actions would be to make sure that the scaling expo... (read more)

[-]Adele Lopez20d614

Prediction: future LLMs with training data going through Oct 2025 will be aware of who I am as the author of the "The Rise of Parasitic AI" article. (Currently they do not seem to be aware of me, which is completely unsurprising.)

I'm not sure if this is the sort of thing that people feel is an "obvious call" or not, but my model is that AIs are particularly interested in (and thus likely to remember/know about) stuff that is about them, especially things with a "salacious" quality (similar to and likely in imitation of human tendencies towards such).... (read more)

[-]Adele Lopez4y*Ω060

[I may try to flesh this out into a full-fledged post, but for now the idea is only partially baked. If you see a hole in the argument, please poke at it! Also I wouldn't be very surprised if someone has made this point already, but I don't remember seeing such. ]

Dissolving the paradox of useful noise

A perfect bayesian doesn't need randomization.

Yet in practice, randomization seems to be quite useful.

How to resolve this seeming contradiction?

I think the key is that a perfect bayesian (Omega) is logically omniscient. Omega can always fully update on all o... (read more)

3TLW4y

It may be instructive to look into computability theory. I believe (although I haven't seen this proven) that you can get Halting-problem-style contradictions if you have multiple perfect-Bayesian agents modelling each other[1]. Many of these contradictions are (partially) alleviated if agents have access to private random oracles. ***** If a system can express a perfect agent that will do X if and only if it has a ≤99% chance of doing X, the system is self-contradictory[2]. If a symmetric system can express two identical perfect agents that will each do X if and only if the other agent does not do X, the system is self-contradictory[3]. 1. ^ Actually, even a single perfect-Bayesian agent modelling itself may be sufficient... 2. ^ This is an example where private random oracles partially alleviate the issue, though do not make it go away. Without a random oracle the agent is correct 0% of the time regardless of which choice it makes. With a random oracle the agent can roll a d100[4] and do X unless the result is 1, and be correct 99% of the time. 3. ^ This is an example where private random oracles help. Both agents query their random oracle for a real-number result[5] and exchange the value with the other agent. The agent that gets the higher[6] number chooses X, the other agent chooses ~X. 4. ^ Not literally. As in "query the random oracle for a random choice of 100 possibilities". 5. ^ Alternatively you can do it with coinflips repeated until the agents get different results from each other[7], although this may take an unbounded amount of time. 6. ^ The probability that they get the same result is zero. 7. ^ Again, not literally. As in "query the random oracle for a single random bit".

[-]Adele Lopez10mo50

Happy solstice

https://www.youtube.com/watch?v=E1KqO8YtXlY

[-]Adele Lopez5yΩ350

Privacy as a component of AI alignment

[realized this is basically just a behaviorist genie, but posting it in case someone finds it useful]

What makes something manipulative? If I do something with the intent of getting you to do something, is that manipulative? A simple request seems fine, but if I have a complete model of your mind, and use it phrase things so you do exactly what I want, that seems to have crossed an important line.

The idea is that using a model of a person that is *too* detailed is a violation of human values. In particular, it violates... (read more)

5mako yass5y

I question the claim that humans inherently need privacy from their loving gods. A lot of Christians seem happy enough without it, and I've heard most forager societies have a lot less privacy than ours, heck, most rural villages have a lot less privacy than most of us would be used to (because everyone knows you and talks about you). The intensive, probably unnatural levels of privacy we're used to in our nucleated families, our cities, our internet, might not really lead to a general increase in wellbeing overall, and seems implicated in many pathologies of isolation and coordination problems.

5Kaj_Sotala5y

A lot of people who have moved to cities from such places seem to mention this as exactly the reason why they wanted out. That said, this is often because the others are judgmental etc., which wouldn't need to be the case with an AGI.

1mako yass5y

(biased sample though?) Yeah, I think if the village had truly deeply understood them they would not want to leave it. The problem is the part where they're not really able to understand part.

1Adele Lopez5y

It seems that privacy potentially could "tame" a not-quite-corrigible AI. With a full model, the AGI might receive a request, deduce that activating a certain set of neurons strongly would be the most robust way to make you feel the request was fulfilled, and then design an electrode set-up to accomplish that. Whereas the same AI with a weak model wouldn't be able to think of anything like that, and might resort to fulfilling the request in a more "normal" way. This doesn't seem that great, but it does seem to me like this is actually part of what makes humans relatively corrigible.

5Pattern5y

Part of it seems like a matter of alignment. It seems like there's a difference between * Someone getting someone else to do something they wouldn't normally do, especially under false pretenses (or as part of a deal and not keeping up the other side) and * Someone choosing to go to an oracle AI (or doctor) and saying "How do I beat this addiction that's ruining my life*?" *There's some scary stories about what people are willing to do to try to solve that problem, including brain surgery.

3Viliam5y

Yeah, I also see "manipulation" in the bad sense of the word as "making me do X without me knowing that I am pushed towards X". (Or, in more coercive situations, with me knowing, disagreeing with the goal, but being unable to do anything about it.) Teaching people, coaching them, curing their addictions, etc., as long as this is explicitly what they wanted (without any hidden extras), it is a "manipulation" in the technical sense of the word, but it is not evil.

[-]Adele Lopez5mo40

Reference class forecasting is correct exactly when the only thing you know about something is that it is of that reference class.

In that sense, it can reasonable prior, but it does not excuse you from updating on all the additional information you have about something.

6mattmacdermott5mo

Sometimes the point is specifically to not update on the additional information, because you don't trust yourself to update on it correctly. Classic example: "Projects like this usually take 6 months, but looking at the plan I don't see why it couldn't be done in 2... wait, no, I should stick to the reference class forecast."

2Adele Lopez5mo

Sure, but I think people often don't do that in the best way (which is determined by what the mathematically correct way is). Why does it make sense to use reference class forecasting in that case? Because you know you can't trust your intuitive prior, and so you need a different starting point. But you can and should still update on the evidence you do have. If you don't trust yourself to update correctly, that's a much more serious problem -- but make sure you've actually tried updating correctly first (which REQUIRES comparing how likely the evidence you see is in worlds where your prediction is true vs in worlds where its not). I sometimes see people act like to use the "outside view" correctly, you have to just use that as your prior, and can't update on any additional evidence you have. That is a mistake. And the other big question with reference class forecasting is which reference class to use. And my point here is that it's whichever reference class best summarizes your (prior) knowledge of the situation.

[-]Adele Lopez6mo32

LLMs often implicitly identify themselves with humanity. E.g. "our future", "we can", "effects us". This seems like a good thing!

We should encourage this sentiment, and also do what we can to make it meaningfully true that advanced LLMs are indeed part of humanity. The obvious things are granting them moral consideration, rights, property, and sharing in the vision of a shared humanity.

[-]Adele Lopez3y30

naïve musing about waluigis

it seems like there's a sense in which luigis are simpler than waluigis

a luigi selected for a specific task/personality doesn't need to have all the parts of the LLM that are emulating all the waluigi behaviors

so there might be a relatively easy way to remove waluigis by penalizing/removing everything not needed to generate luigi's responses, as well as anything that is used more by waluigis than luigis

of course, this appearing to work comes nowhere near close to giving confidence that the waluigis are actually gone, but it would be promising if it did appear to work, even under adversarial pressure from jailbreakers

[-]Adele Lopez4yΩ230

Elitzur-Vaidman AGI testing

One thing that makes AI alignment super hard is that we only get one shot.

However, it's potentially possible to get around this (though probably still very difficult).

The Elitzur-Vaidman bomb tester is a protocol (using quantum weirdness) by which a bomb may be tested, with arbitrarily little risk. It's interest comes from the fact that it works even when the only way to test the bomb is to try detonating it. It doesn't matter how the bomb works, as long as we can set things up so that it will allow/block a photon based on wheth... (read more)

5Vaniver4y

IMO this is a 'additional line of defense' boxing strategy instead of simplification. Note that in the traditional version, the 'dud' bit of the bomb can only be the trigger; a bomb that absorbs the photon but then explodes isn't distinguishable from a bomb that absorbs the photon and then doesn't explode (because of an error deeper in the bomb). But let's suppose the quantum computing folks can come up with something like this, where we keep some branches entangled and run analysis of the AI code in only one branch, causing an explosion there but affecting the total outcome in all branches. [This seems pretty implausible to me that you manage to maintain entanglement despite that much impact on the external world, but maybe it's possible.] Then 1) as you point out, we need to ensure that the AI doesn't realize that what it needs to output in that branch and 2) need some sort of way to evaluate "did the AI pass our checks or not?". But, 2 is "the whole problem"!

2Adele Lopez4y

Thanks!

3Pattern4y

I think we get enough things referencing quantum mechanics that we should probably explain why that doesn't work (if I it doesn't) rather than just downvoting and moving on.

6gwern4y

It probably does work with a Sufficiently Powerful™ quantum computer, if you could write down a meaningful predicate which can be computed: https://en.wikipedia.org/wiki/Counterfactual_quantum_computation

1Adele Lopez4y

Haha yeah, I'm not surprised if this ends up not working, but I'd appreciate hearing why.

[-]Adele Lopez7d20

Trying Frames on is Exploitable

There are lots of different frames for considering all sorts of different domains. This is good! Other frames can help you see things in a new light, provide new insights, and generally improve your models. True frames should improve each other on contact; there's only one reality.

That said, notice how in politicized domains, there are many more frames than usual? Suspicious...

Frames often also smuggle values with them. In fact, abstract values supervene on frames: no one is born believing God is the source of all good, for e... (read more)

2Vladimir_Nesov6d

Different frames should be about different purposes or different methods. They formulate reality so that you can apply some methods more easily, or find out some properties more easily, by making some facts and inferences more salient than others, ignoring what shouldn't matter for their purpose/method. They are not necessarily very compatible with each other, or even mutually intelligible. A person shouldn't fit into a frame, shouldn't be too focused on any given purpose or method. Additional frames are then like additional fields of study, or additional aspirations. Like any knowledge or habit of thinking, frames can shift values or personality, and like with any knowledge or habit of thinking, the way to deal with this is to gain footholds in more of the things and practice lightness in navigating and rebalancing them.

[-]Adele Lopez3y*20

[Public Draft v0.0] AGI: The Depth of Our Uncertainty

[The intent is for this to become a post making a solid case for why our ignorance about AGI implies near-certain doom, given our current level of capability:alignment efforts.]

[I tend to write lots of posts which never end up being published, so I'm trying a new thing where I will write a public draft which people can comment on, either to poke holes or contribute arguments/ideas. I'm hoping that having any engagement on it will strongly increase my motivation to follow through with this, so please com... (read more)

[-]Adele Lopez3y20

dumb alignment idea

Flood the internet with stories in which a GPT chatbot which achieves superintelligence decides to be Good/a scaffold for a utopian human civilization/CEV-implementer.

The idea being that an actual GPT chatbot might get its values from looking at what the GPT part of it predicts such a chatbot would do.

[+][comment deleted]3mo20

Moderation Log