AnnaSalamon's Comments

Player vs. Character: A Two-Level Model of Ethics

I'm a bit torn here, because the ideas in the post seem really important/useful to me (e.g., I use these phrases as a mental pointer sometimes), such that I'd want anyone trying to make sense of the human situation to have access to them (via this post or a number of other attempts at articulating much the same, e.g. "Elephant and the Brain"). And at the same time I think there's some crucial misunderstanding in it that is dangerous and that I can't articulate. Voting for it anyhow though.

Reality-Revealing and Reality-Masking Puzzles

Responding partly to Orthonormal and partly to Raemon:

Part of the trouble is that group dynamic problems are harder to understand, harder to iterate on, and take longer to appear and to be obvious. (And are then harder to iterate toward fixing.)

Re: individuals having manic or psychotic episodes, I agree with what Raemon says. About six months into a year into CFAR’s workshop-running experience, a participant had a manic episode a couple weeks after a workshop in a way that seemed plausibly triggered partly by the workshop. (Interestingly, if I’m not mixing people up, the same individual later told me that they’d also been somewhat destabilized by reading the sequences, earlier on.) We then learned a lot about warning signs of psychotic or manic episodes and took a bunch of steps to mostly-successfully reduce the odds of having the workshop trigger these. (In terms of causal mechanisms: It turns out that workshops of all sorts, and stuff that messes with one’s head of all sorts, seem to trigger manic or psychotic episodes occasionally. E.g. Landmark workshops; meditation retreats; philosophy courses; going away to college; many different types of recreational drugs; and different small self-help workshops run by a couple people I tried randomly asking about this from outside the rationality community. So my guess is that it isn’t the “taking ideas seriously” aspect of CFAR as such, although I dunno.)

Re: other kinds of “less sane”:

(1) IMO, there has been a build-up over time of mentally iffy psychological habits/techniques/outlook-bits in the Berkeley “formerly known as rationality” community, including iffy thingies that affect the rate at which other iffy things get created (e.g., by messing with the taste of those receiving/evaluating/passing on new “mess with your head” techniques; and by helping people be more generative of “mess with your head” methods via them having had a chance to see several already which makes it easier to build more). My guess is that CFAR workshops have accidentally been functioning as a “gateway drug” toward many things of iffy sanity-impact, basically by: (a) providing a healthy-looking context in which people get over their concerns about introspection/self-hacking because they look around and see other happy healthy-looking people; and (b) providing some entry-level practice with introspection, and with “dialoging with one’s tastes and implicit models and so on”, which makes it easier for people to mess with their heads in other, less-vetted ways later.

My guess is that the CFAR workshop has good effects on folks who come from a sane-isn or at least stable-is outside context, attend a workshop, and then return to that outside context. My guess is that its effects are iffier for people who are living in the bay area, do not have a day job/family/other anchor, and are on a search for “meaning.”

My guess is that those effects have been getting gradually worse over the last five or more years, as a background level of this sort of thing accumulates.

I ought probably to write about this in a top-level post, and may actually manage to do so. I’m also not at all confident of my parsing/ontology here, and would quite appreciate help with it.

(2) Separately, AI risk seems pretty hard for people, including ones unrelated to this community.

(3) Separately, “taking ideas seriously” indeed seems to pose risks. And I had conversations with e.g. Michael Vassar back in ~2008 where he pointed out that this poses risks; it wasn’t missing from the list. (Even apart from tail risks, some forms of “taking ideas seriously” seem maybe-stupid in cases where the “ideas” are not grounded also in one’s inner simulator, tastes, viscera — much sense is there that isn’t in ideology-mode alone). I don’t know whether CFAR workshops increase or decrease peoples’ tendency to take ideas seriously in the problematic sense, exactly. They have mostly tried to connect peoples’ ideas and peoples’ viscera in both directions.

“How to take ideas seriously without [the taking ideas seriously bit] causing them to go insane” as such actually still isn’t that high on my priorities list; I’d welcome arguments that it should be, though.

I’d also welcome arguments that I’m just distinguishing 50 types of snow and that these should all be called the same thing from a distance. But for the moment for me the group-level gradual health/wholesomeness shifts and the individual-level stuff show up as pretty different.

Reality-Revealing and Reality-Masking Puzzles

There are some edge cases I am confused about, many of which are quite relevant to the “epistemic immune system vs Sequences/rationality” stuff discussed above:

Let us suppose a person has two faculties that are both pretty core parts of their “I” -- for example, deepset “yuck/this freaks me out” reactions (“A”), and explicit reasoning (“B”). Now let us suppose that the deepset “yuck/this freaks me out” reactor (A) is being used to selectively turn off the person’s contact with explicit reasoning in cases where it predicts that B “reasoning” will be mistaken / ungrounded / not conducive to the goals of the organism. (Example: a person’s explicit models start saying really weird things about anthropics, and then they have a less-explicit sense that they just shouldn’t take arguments seriously in this case.)

What does it mean to try to “help” a person in such as case, where two core faculties are already at loggerheads, or where one core faculty is already masking things from another?

If a person tinkers in such a case toward disabling A’s ability to disable B’s access to the world… the exact same process, in its exact same aspect, seems “reality-revealing” (relative to faculty B) and “reality-masking” (relative to faculty A).

Reality-Revealing and Reality-Masking Puzzles

To try yet again:

The core distinction between tinkering that is “reality-revealing” and tinkering that is “reality-masking,” is which process is learning to predict/understand/manipulate which other process.

When a process that is part of your core “I” is learning to predict/manipulate an outside process (as with the child who is whittling, and is learning to predict/manipulate the wood and pocket knife), what is happening is reality-revealing.

When a process that is not part of your core “I” is learning to predict/manipulate/screen-off parts of your core “I”s access to data, what is happening is often reality-masking.

(Multiple such processes can be occurring simultaneously, as multiple processes learn to predict/manipulate various other processes all at once.)

The "learning" in a given reality-masking process can be all in a single person's head (where a person learns to deceive themselves just by thinking self-deceptive thoughts), but it often occurs via learning to impact outside systems that then learn to impact the person themselves (like in the example of me as a beginning math tutor learning to manipulate my tutees into manipulating me into thinking I'd explained things clearly)).

The "reality-revealing" vs "reality-masking" distinction is in attempt to generalize the "reasoning" vs "rationalizing" distinction to processes that don't all happen in a single head.

Reality-Revealing and Reality-Masking Puzzles

I like your example about your math tutoring, where you "had a fun time” and “[weren’t] too results driven” and reality-masking phenomena seemed not to occur.

It reminds me of Eliezer talking about how the first virtue of rationality is curiosity.

I wonder how general this is. I recently read the book “Zen Mind, Beginner’s Mind,” where the author suggests that difficulty sticking to such principles as “don’t lie,” “don’t cheat,” “don’t steal,” comes from people being afraid that they otherwise won’t get a particular result, and recommends that people instead… well, “leave a line of retreat” wasn’t his suggested ritual, but I could imagine “just repeatedly leave a line of retreat, a lot” working for getting unattached.

Also, I just realized (halfway through typing this) that cousin_it and Said Achmiz say the same thing in another comment.

Reality-Revealing and Reality-Masking Puzzles

Thanks; you naming what was confusing was helpful to me. I tried to clarify here; let me know if it worked. The short version is that what I mean by a "puzzle" is indeed person-specific.

A separate clarification: on my view, reality-masking processes are one of several possible causes of disorientation and error; not the only one. (Sort of like how rationalization is one of several possible causes of people getting the wrong answers on math tests; not the only one.) In particular, I think singularity scenarios are sufficiently far from what folks normally expect that the sheer unfamiliarity of the situation can cause disorientation and errors (even without any reality-masking processes; though those can then make things worse).

Reality-Revealing and Reality-Masking Puzzles

The difficulties above were transitional problems, not the main effects.

Why do you say they were "transitional"? Do you have a notion of what exactly caused them?

Reality-Revealing and Reality-Masking Puzzles

A couple people asked for a clearer description of what a “reality-masking puzzle” is. I’ll try.

JamesPayor’s comment speaks well for me here:

There was the example of discovering how to cue your students into signalling they understand the content. I think this is about engaging with a reality-masking puzzle that might show up as "how can I avoid my students probing at my flaws while teaching" or "how can I have my students recommend me as a good tutor" or etc.

It's a puzzle in the sense that it's an aspect of reality you're grappling with. It's reality-masking in that the pressure was away from building true/accurate maps.

To say this more slowly:

Let’s take “tinkering” to mean “a process of fiddling with a [thing that can provide outputs] while having some sort of feedback-loop whereby the [outputs provided by the thing] impacts what fiddling is tried later, in such a way that it doesn’t seem crazy to say there is some ‘learning’ going on.”

Examples of tinkering:

  • A child playing with legos. (The “[thing that provides outputs]” here is the [legos + physics], which creates an output [an experience of how the legos look, whether they fall down, etc.] in reply to the child’s “what if I do this?” attempts. That output then affects the child’s future play-choices some, in such a way that it doesn’t seem crazy to say there is some “learning” happening.)
  • An person doodling absent-mindedly while talking on the phone, even if the doodle has little to no conscious attention;
  • A person walking. (Since the walking process (I think) contains at least a bit of [exploration / play / “what happens if I do this?” -- not necessarily conscious], and contains some feedback from “this is what happens when you send those signals to your muscles” to future walking patterns)
  • A person explicitly reasoning about how to solve a math problem
  • A family member A mostly-unconsciously taking actions near another family member B [while A consciously or unconscoiusly notices something about how the B responds, and while A has some conscious or unconscious link between [how B responds] and [what actions A takes in future].

By a “puzzle”, I mean a context that gets a person to tinker. Puzzles can be person-specific. “How do I get along with Amy?” may be a puzzle for Bob and may not be a puzzle for Carol (because Bob responds to it by tinkering, and Carol responds by, say, ignoring it). A kong toy with peanut butter inside is a puzzle for some dogs (i.e., it gets these dogs to tinker), but wouldn’t be for most people. Etc.

And… now for the hard part. By a “reality-masking puzzle”, I mean a puzzle such that the kind of tinkering it elicits in a given person will tend to make that person’s “I” somehow stupider, or in less contact with the world.

The usual way this happens is that, instead of the tinkering-with-feedback process gradually solving an external problem (e.g., “how do I get the peaut butter out of the kong toy?”), the tinkering-with-feedback process is gradually learning to mask things from part of their own mind (e.g. “how do I not-notice that I feel X”).

This distinction is quite related to the distinction between reasoning and rationalization.

However, it differs from that distinction in that “rationalization” usually refers to processes happening within a single person’s mind. And in many examples of “reality-masking puzzles,” the [process that figures out how to mask a bit of reality from a person’s “I”] is spread across multiple heads, with several different tinkering processes feeding off each other and the combined result somehow being partially about blinding someone.

I am actually not all that satisfied by the “reality-revealing puzzles” vs “reality-masking puzzles” ontology. It was more useful to me than what I’d had before, and I wanted to talk about it, so I posted it. But… I understand what it means for the evidence to run forwards vs backwards, as in Eliezer’s Sequences post about rationalization. I want a similarly clear-and-understood generalization of the “reasoning vs rationalizing” distinction that applies also to processes to spread across multiple heads. I don’t have that yet. I would much appreciate help toward this. (Incremental progress helps too.)

We run the Center for Applied Rationality, AMA

No; that isn't the trouble; I could imagine us getting the money together for such a thing, since one doesn't need anything like a consensus to fund a position. The trouble is more that at this point the members of the bay area {formerly known as "rationalist"} "community" are divided into multiple political factions, or perhaps more-chaos-than-factions, which do not trust one another's judgment (even about pretty basic things, like "yes this person's actions are outside of reasonable behavioral norms). It is very hard to imagine an individual or a small committee that people would trust in the right way. Perhaps even more so after that individual or committee tried ruling against someone who really wanted to stay, and and that person attempted to create "fear, doubt, and uncertainty" or whatever about the institution that attempted to ostracize them.

I think something in this space is really important, and I'd be interested in investing significantly in any attempt that had a decent shot at helping. Though I don't yet have a strong enough read myself on what the goal ought to be.

AIRCS Workshop: How I failed to be recruited at MIRI.

Hi Mark,

This maybe doesn't make much difference for the rest of your comment, but just FWIW: the workshop you attended in Sept 2016 not part of the AIRCS series. It was a one-off experiment, funded by an FLI grant, called "CFAR for ML", where we ran most of a standard CFAR workshop and then tacked on an additional day of AI alignment discussion at the end.

The AIRCS workshops have been running ~9 times/year since Feb 2018, have been evolving pretty rapidly, and in recent iterations involve a higher ratio of either AI risk content, or content about how cognitive biases etc. that seem to arise in discussion about AI risk in particular. They have somewhat smaller cohorts for more 1-on-1 conversation (~15 participants instead of 23). They are co-run with MIRI, which "CFAR for ML" was not. They have a slightly different team and a slightly different beast.

Which... doesn't mean you wouldn't have had most of the same perceptions if you'd come to a recent AIRCS! You might well have. From a distance perhaps all our workshops are pretty similar. And I can see calling "CFAR for ML" "AIRCS", since it was in fact partially about AI risk and was aimed mostly at computer scientists, which is what "AIRCS" stands for. Still, we locally care a good bit of the distinctions between our programs, so I did want to clarify.

Load More