Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
This is a special post for quick takes by Ben Pace. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

The comments here are a storage of not-posts and not-ideas that I would rather write down than not.

186 comments, sorted by Click to highlight new comments since: Today at 3:42 PM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

I often wish I had a better way to concisely communicate "X is a hypothesis I am tracking in my hypothesis space". I don't simply mean that X is logically possible, and I don't mean I assign even 1-10% probability to X, I just mean that as a bounded agent I can only track a handful of hypotheses and I am choosing to actively track this one.

  • This comes up when a substantially different hypothesis is worth tracking but I've seen no evidence for it. There's a common sentence like "The plumber says it's fixed, though he might be wrong" where I don't want to communicate that I've got much reason to believe he might be wrong, and I'm not giving it even 10% or 20%, but I still think it's worth tracking, because strong evidence is common and the importance is high.
  • This comes up in adversarial situations when it's possible that there's an adversarial process selecting on my observations. In such situations I want to say "I think it's worth tracking the hypothesis that the politician wants me to believe that this policy worked in order to pad their reputation, and I will put some effort into checking for evidence of that, but to be clear I haven't seen any positive evidence for that hypothesi
... (read more)
6Chris_Leong1mo
Maybe just say that you're tracking the possibility?
4Richard_Kennaway1mo
"Trust, but verify."
4Dagon1mo
Standard text in customer-facing outage recovery notices: all systems appear to be operating correctly, and we are actively monitoring the situation". In more casual conversations, I sometimes say "cautiously optimistic" when stating that I think things are OK, but I'm paying more attention than normal for signs I'm wrong.  Mostly, I talk about my attention and what I'm looking for, rather than specifying the person who's making claims.  Instead of "the plumber says it's fixed, though he might be wrong", I'd say "The plumber fixed it, but I'm keeping an eye out for further problems".  For someone proposing something I haven't thought about, "I haven't noticed that, but I'll pay more attention for X and Y in the future".
3jam_brand1mo
Before I read the aphoristic three-word reply to you from Richard Kennaway (admittedly a likely even clearer-cut way to indicate the following sentiment), I was thinking that to downplay any unintended implications about the magnitude of your probabilities that you could maybe say something about your tracking being for mundane-vigilance or intermittent-map-maintenance or routine-reality-syncing / -surveying / -sampling reasons. For any audience you anticipate familiarity with this essay though, another idea might be to use a version of something like: "The plumber says it's fixed, which I'm splitting on [by default][and {also} tracking <for posterity>]." (spoilered section below just corrals a ~dozen expansions / embellishments of the above)
3Valdes1mo
Adapted from the french "j'envisage que X" I propose "I am considering the possibility that X" or in some contexts "I am considering X". "The plumber says it's fixed, but I am considering he might be wrong".
2Richard_Kennaway1mo
What's wrong with your original sentence, "X is a hypothesis I am tracking in my hypothesis space"? Or more informal versions of that, like "I'll be keeping an eye on that", "We'll see", etc.?
2Ben Pace1mo
I guess it's just that I don't feel mastery over my communication here, I still anticipate that I will find it clunky to add in a whole chunk of sentences to communicate my epistemic status. I anticipate often in the future that I'll feel a need to write a whole paragraph, say in the political case, just to clarify that though I think it's worth considering the possibility that the politician is somehow manipulating the evidence, I've seen no cause to believe it in this case. I feel like bringing up the hypothesis with a quick "though I'm tracking the possibility that Adam is somehow manipulating the evidence for political gain" pretty commonly implies that the speaker (me) thinks it is likely enough to be worth acting on, and so I feel I have to explicitly rule that out as why I'm bringing it up, leaving me with my rather long sentence from above.
2jmh1mo
In the plumbing context I generally say or think, "The repair/work has been completed and I'll see how it lasts." or sometimes something like, "We've addressed the immediate problem so lets see if that was a fix or a bandage."
2Steven Byrnes1mo
“The plumber says it’s fixed, but I’ll keep an eye out for evidence of more problems.” (ditto Dagon) also “The politician seems to be providing sound evidence that her policy is working, but I’ll remain vigilant to the possibility that she’s being deceptive.”
1mattmacdermott1mo
"Bear in mind he could be wrong" works well for telling somebody else to track a hypothesis. "I'm bearing in mind he could be wrong" is slightly clunkier but works ok.
1Mateusz Bagiński1mo
"The hypothesis/possibility that 'X' is mindworthy" ("worth being mindful about it"). Maybe the nicest solution would be to coin a one-syllable modal verb like "may" or "can" to communicate exactly this.
2Ben Pace1mo
"Keep in mind that X".
1FlorianH1mo
Maybe "I'm interested in the hypothesis/possibility..."
1NineDimensions1mo
In some cases something like this might work: "The plumber says it's fixed, so hopefully it is" Or "The plumber says it's fixed, so it probably is" Which I think conveys"there's an assumption I'm making here, but I'm just putting a flag in the ground to return to if things don't play out as expected"
-9Shankar Sivarajan1mo

Yesterday I noticed that I had a pretty big disconnect from this: There's a very real chance that we'll all be around, business somewhat-as-usual in 30 years. I mean, in this world many things have a good chance of changing radically, but automation of optimisation will not cause any change on the level of the industrial revolution. DeepMind will just be a really cool tech company that builds great stuff. You should make plans for important research and coordination to happen in this world (and definitely not just decide to spend everything on a last-ditch effort to make everything go well in the next 10 years, only to burn up the commons and your credibility for the subsequent 20).

Only yesterday when reading Jessica's post did I notice that I wasn't thinking realistically/in-detail about it, and start doing that.

Related hypothesis: people feel like they've wasted some period of time e.g. months, years, 'their youth', when they feel they cannot see an exciting path forward for the future. Often this is caused by people they respect (/who have more status than them) telling them they're only allowed a small few types of futures.

6lc1y
How do you feel about this today?
4Ben Pace1y
* The next 30 years seem really less likely to be 'relatively normal'. My mainline world-model is that nation states will get involved with ML in the next 10 years, and that many industries will be really changed up by ML. * One of my personal measures of psychological health is how many years ahead I feel comfortable making trade-offs for today. This changes over time, I think I feel like I'm a bit healthier now than I was when I wrote this, but still not great. Not sure how to put a number to this, I'll guess I'm maybe able to go up to 5 years at the minute (the longest ones are when I think about personal health and fitness)? Beyond that feels a bit foolish. * I still resonate a bit with what I wrote here 4 years ago, but definitely less. My guess is if I wrote this today the number I would pick would be "8-12 years" instead of "30".
2Ben Pace5mo
Nation states got involved with ML faster than I expected when I wrote this!
2Ben Pace1y
Epistemic status: Thinking out loud some more. Hm, I notice I'm confused a bit about the difference between "ML will blow up as an industry" and "something happens that effects the world more than the internet and smartphones have done so far". I think honestly I have a hard time imagining ML stuff that's massively impactful but isn't, like, "automating programming", which seems very-close-to-FOOM to me. I don't think we can have AGI-complete things without being within like 2 years (or 2 days) of a FOOM. So then I get split into two worlds, one where it's "FOOM and extinction" and another world which is "a strong industry that doesn't do anything especially AGI-complete". The latter is actually fairly close to "business somewhat-as-usual", just with a lot more innovation going on, which is kind of nice (while unsettling). Like, does "automated drone warfare" count as "business-as-usual"? I think maybe it does, it's part of general innovation and growth that isn't (to me) clearly more insane than the invention of nukes was. I think I am expecting massive innovation and that ML will be shaking up the world like we've seen in the 1940's and 1950's (transistors, DNA, nukes, etc etc). I'm not sure whether to expect 10-100x more than that before FOOM. I think my gut says "probably not" but I do not trust my gut here, it hasn't lived through even the 1940's/50's, never mind other key parts of the scientific and industrial and agricultural and eukaryotic revolutions. As we see more progress over the next 4 years I expect we'll be in a better position to judge how radical the change will be before FOOM. The answer to lc's original question is then:
2eigen1y
Hey, I think you should also consider how the out-of-nowhere narrative-breaking nature of COVID. Which also happened after you wrote this. It's not necessarily a proof that the narrative can "break," but it sure is an example. And, while I think I read the sequences way longer than 4 years ago, if I remember something it gave me is a sense of "everything can change very, very fast."

Did anyone else feel that when the Anthropic Scaling Policies doc talks about "Containment Measures" it sounds a bit like an SCP, just replaced with the acronym ASL?

Item #: ASL-2-4

Object Class: Euclid, Keter, and Thaumiel

Threat Levels:

ASL-2... [does] not yet pose a risk of catastrophe, but [does] exhibit early signs of the necessary capabilities required for catastrophic harms

ASL-3... shows early signs of autonomous self-replication ability... [ASL-3] does not itself present a threat of containment breach due to autonomous self-replication, because it is both unlikely to be able to persist in the real world, and unlikely to overcome even simple security measures... 

...an early guess (to be updated in later iterations of this document) is that ASL-4 will involve one or more of the following... [ASL-4 has] become the primary source of national security risk in a major area (such as cyberattacks or biological weapons), rather than just being a significant contributor. In other words, when security professionals talk about e.g. cybersecurity, they will be referring mainly to [ASL-4] assisted... attacks. A related criterion could be that deploying an ASL-4 system without safeguards

... (read more)

Hypothesis: power (status within military, government, academia, etc) is more obviously real to humans, and it takes a lot of work to build detailed, abstract models of anything other than this that feel as real. As a result people who have a basic understanding of a deep problem will consistently attempt to manoeuvre into powerful positions vaguely related to the problem, rather than directly solve the open problem. This will often get defended with "But even if we get a solution, how will we implement it?" without noticing that (a) there is no real effort by anyone else to solve the problem and (b) the more well-understood a problem is, the easier it is to implement a solution.

5Benquo5y
I think this is true for people who've been through a modern school system, but probably not a human universal.
4Ben Pace5y
My, that was a long and difficult but worthwhile post. I see why you think it is not the natural state of affairs. Will think some more on it (though can't promise a full response, it's quite an effortful post). Am not sure I fully agree with your conclusions.
6Benquo5y
I'm much more interested in finding out what your model is after having tried to take those considerations into account, than I am in a point-by-point response.
8Raemon5y
This seems like a good conversational move to have affordance for.
2Kaj_Sotala5y
This might be true, but it doesn't sound like it contradicts the premise of "how will we implement it"? Namely, just because understanding a problem makes it easier to implement, doesn't mean that understanding alone makes it anywhere near easy to implement, and one may still need significant political clout in addition to having the solution. E.g. the whole infant nutrition thing.
2Ruby5y
Seems related to Causal vs Social Reality.
1Eli Tyre5y
Do you have an example of a problem that gets approached this way? Global warming? The need for prison reform? Factory Farming?
4Ben Pace5y
AI.
3Eli Tyre5y
It seems that AI safety has this issue less than every other problem in the world, by proportion of the people working on it. Some double digit percentage of all of the people who are trying to improve the situation, are directly trying to solve the problem, I think? (Or maybe I just live in a bubble in a bubble.) And I don’t know how well this analysis applies to non-AI safety fields.

I'd take a bet at even odds that it's single-digit.

To clarify, I don't think this is just about grabbing power in government or military. My outside view of plans to "get a PhD in AI (safety)" seems like this to me. This was part of the reason I declined an offer to do a neuroscience PhD with Oxford/DeepMind. I didn't have any secret for why it might be plausibly crucial.

4Ben Pace5y
Strong agree with Jacob.

Er, Wikipedia has a page on misinformation about Covid, and the first example is Wuhan lab origin. Kinda shocked that Wikipedia is calling this misinformation. Seems like their authoritative sources are abusing their positions. I am scared that I'm going to stop trusting Wikipedia soon enough, which is leaving me feeling pretty shook.

7Dagon3y
Wikipedia has beaten all odds for longevity of trust - I remember pretty heated arguments circa 2005 whether it was referenceable on any topic, though it was known to be very good on technical topics or niches without controversy where nerds could agree on what was true (but not always what was important).   By 2010, it was pretty widely respected, though the recommendation from Very Serious People was to cite the underlying sources, not the articles themselves.  I think it was considered pretty authoritative in discussions I was having  by 2013 or so, and nowadays it's surprising and newsworthy when something is wrong for very long (though edit wars and locking down sections happens fairly often).   I still take it with a little skepticism for very recently-edited or created topics - it's an awesome resource to know the shape of knowledge in the area, but until things have been there for weeks or months, it's hard to be sure it's a consensus.
5Viliam3y
Could it be a natural cycle? Wikipedia is considered trustworthy -> people with strong agenda get to positions where they can abuse Wikipedia -> Wikipedia is considered untrustworthy -> people with strong agenda find better use of their time and stop abusing Wikipedia, people who care about correct information fix it -> Wikipedia is considered trustworthy...
8ChristianKl3y
The agenda is mainly to follow the institutions like the New York Times. In a time where the New York Times isn't worth much more then saw dust, that's not a strategy to get to truth. 
6Steven Byrnes3y
"No safe defense, not even Wikipedia" :-P I suggest not having a notion of "quality" that's supposed to generalize across all wiki pages. They're written by different people, they're scrutinized to wildly different degrees. Even different sections of the same article can be obviously different in trustworthiness ... Or even different sentences in the same section ... Or different words in the same sentence :)
4ChristianKl3y
Wikipedia unfortunately threw out their neutral point of view policy on COVID-19. Besides that page, the one of ivermectin ignores the meta analysises in favor of using it for COVID-19. There's also no page for "patient zero" (who was likely employed in the Wuhan Institute for Virology)
2Pattern3y
Fix it. (And let us know how long that sticks for.)
2Ben Pace3y
You fix it! If you think it's such a good idea :) I am relatively hesitant to start doing opinionated fixes on Wikipedia, I think that's not the culture of page setup that they want. My understanding is that the best Wikipedia editors write masses of pages that they're relatively disinterested in, and that being overly interested in a specific page mostly leads you to violating all of their rules and getting banned. This sort of actively political editing is precisely the sort of thing that they're trying to avoid.
2Viliam3y
By saying "Wuhan lab origin", you can roughly mean three things: * biological weapon, intentionally released, * natural virus collected, artificially improved, then escaped, * natural virus collected, then escaped in the original form. The first we can safely dismiss: who would drop a biological weapon of this type on their own population? We can also dismiss the third one, if you think in near mode what that would actually mean. It means the virus was already out there. Then someone collected it -- obviously, not all existing particles of the virus -- which means that most of the virus particles that were already out there, have remained out there. But that makes the leak from Wuhan lab an unnecessary detail; "virus already in the wild, starts pandemic" is way more likely than "virus already in the wild, does not start pandemic, but when a few particles are brought into a lab and then accidentally released without being modified, they start pandemic"... what? This is why arguing for natural evolution of the virus is arguing against the lab leak. (It's just not clearly explained.) If you do not assume that the virus was modified, then the hypothesis that the pandemic started by Wuhan lab leak, despite the virus already being out there before it was brought to the Wuhan lab, is privileging the hypothesis. If the virus is already out there, you don't need to bring it to a lab and let it escape again in order to... be out there, again. Now here I agree that the artificial improvement of the virus cannot be disproved. I mean, whatever can happen in the nature, probably can also happen in the lab, so how would you prove it didn't? I guess I am trying to say that in the Wikipedia article, the section "gain of function research" does not deserve to be classified as misinformation, but the remaining sections do.

Responding to Scott's response to Jessica.

The post makes the important argument that if we have a word whose boundary is around a pretty important set of phenomena that are useful to have a quick handle to refer to, then

  • It's really unhelpful for people to start using the word to also refer to a phenomena with 10x or 100x more occurrences in the world because then I'm no longer able to point to the specific important parts of the phenomena that I was previously talking about
    • e.g. Currently the word 'abuser' describes a small number of people during some of their lives. Someone might want to say that technically it should refer to all people all of the time. The argument is understandable, but it wholly destroys the usefulness of the concept handle.
  • People often have political incentives to push the concept boundary to include a specific case in a way that, if it were principled, indeed makes most of the phenomena in the category no use to talk about. This allows for selective policing being the people with the political incentive.
  • It's often fine for people to bend words a little bit (e.g. when people verb nouns), but when it's in the class of terms w
... (read more)

I will actually clean this up and into a post sometime soon [edit: I retract that, I am not able to make commitments like this right now]. For now let me add another quick hypothesis on this topic whilst crashing from jet lag.

A friend of mine proposed that instead of saying 'lies' I could say 'falsehoods'. Not "that claim is a lie" but "that claim is false".

I responded that 'falsehood' doesn't capture the fact that you should expect systematic deviations from the truth. I'm not saying this particular parapsychology claim is false. I'm saying it is false in a way where you should no longer trust the other claims, and expect they've been optimised to be persuasive.

They gave another proposal, which is to say instead of "they're lying" to say "they're not truth-tracking". Suggest that their reasoning process (perhaps in one particular domain) does not track truth.

I responded that while this was better, it still seems to me that people won't have an informal understanding of how to use this information. (Are you saying that the ideas aren't especially well-evidenced? But they so... (read more)

3Pattern5y
Is this "bias"?
3Ben Pace5y
Yeah good point I may have reinvented the wheel. I have a sense that’s not true but need to think more.

The definitional boundaries of "abuser," as Scott notes, are in large part about coordinating around whom to censure. The definition is pragmatic rather than objective.*

If the motive for the definition of "lies" is similar, then a proposal to define only conscious deception as lying is therefore a proposal to censure people who defend themselves against coercion while privately maintaining coherent beliefs, but not those who defend themselves against coercion by simply failing to maintain coherent beliefs in the first place. (For more on this, see Nightmare of the Perfectly Principled.) This amounts to waging war against the mind.

Of course, in matter of actual fact we don't strongly censure all cases of consciously deceiving. In some cases (e.g. "white lies") we punish those who fail to lie, and those who call out the lie. I'm also pretty sure we don't actually distinguish between conscious deception and e.g. reflexively saying an expedient thing, when it's abundantly clear that one knows very well that the expedient thing to say is false, as Jessica pointed out here.

*It's not clear to me that this is a good kind of concept to ... (read more)

2Ben Pace5y
Note: I just wrote this in one pass when severely jet lagged, and did not have the effort to edit it much. If I end up turning this into a blogpost I will probably do that. Anyway, I am interested in hearing via PM from anyone who feels that it was sufficiently unclearly written that they had a hard time understanding/reading it.

Okay, I’ll say it now, because there’s been too many times.

If you want your posts to be read, never, never, NEVER post multiple posts at the same time.

Only do that if you don’t mind none of the posts being read. Like if they’re all just reference posts.

I never read a post if there’s two or more to read, it feels like a slog and like there’s going to be lots of clicking and it’s probably not worth it. And they normally do badly on comments on karma so I don’t think it’s just me.

Even if one of them is just meant as reference, it means I won’t read the other one.

I recently circled for the first time. I had two one-hour sessions on consecutive days, with 6 and 8 people respectively.

My main thoughts: this seems like a great way for getting to know my acquaintances, connecting emotionally, and build closer relationships with friends. The background emotional processing happening in individuals is repeatedly brought forward as the object of conversation, for significantly enhanced communication/understanding. I appreciated getting to poke and actually find out whether people's emotional states matched the words they were using. I got to ask questions like:

When you say you feel gratitude, do you just mean you agree with what I said, or do you mean you're actually feeling warmth toward me? Where in your body do you feel it, and what is it like?

Not that a lot of my circling time was skeptical of people's words, a lot of the time I trusted the people involved to be accurately reporting their experiences. It was just very interesting - when I noticed I didn't feel like someone was honest about some micro-emotion - to have the affordance to stop and request an honest internal report.

It felt like there was a constant tradeoff betw... (read more)

Good posts you might want to nominate in the 2018 Review

I'm on track to nominate around 30 posts from 2018, which is a lot. Here is a list of about 30 further posts I looked at that I think were pretty good but didn't make my top list, in the hopes that others who did get value out of the posts will nominate their favourites. Each post has a note I wrote down for myself about the post.

... (read more)

I was just re-reading the classic paper Artificial Intelligence as Positive and Negative Factor in Global Risk. It's surprising how well it holds up. The following quotes seem especially relevant 13 years later.

On the difference between AI research speed and AI capabilities speed:

The first moral is that confusing the speed of AI research with the speed of a real AI once built is like confusing the speed of physics research with the speed of nuclear reactions. It mixes up the map with the territory. It took years to get that first pile built, by a small group of physicists who didn’t generate much in the way of press releases. But, once the pile was built, interesting things happened on the timescale of nuclear interactions, not the timescale of human discourse. In the nuclear domain, elementary interactions happen much faster than human neurons fire. Much the same may be said of transistors.

On neural networks:

The field of AI has techniques, such as neural networks and evolutionary programming, which have grown in power with the slow tweaking of decades. But neural networks are opaque—the user has no idea how the neural net is making its decisions—and cannot easily be rendered
... (read more)

Reviews of books and films from my week with Jacob:

Films watched:

  • The Big Short
    • Review: Really fun. I liked certain elements of how it displays bad nash equilibria in finance (I love the scene with the woman from the ratings agency - it turns out she’s just making the best of her incentives too!).
    • Grade: B
  • Spirited Away
    • Review: Wow. A simple story, yet entirely lacking in cliche, and so seemingly original. No cliched characters, no cliched plot twists, no cliched humour, all entirely sincere and meaningful. Didn’t really notice that it was animated (while fantastical, it never really breaks the illusion of reality for me). The few parts that made me laugh, made me laugh harder than I have in ages.
    • There’s a small visual scene, unacknowledged by the ongoing dialogue, between the mouse-baby and the dust-sprites which is the funniest thing I’ve seen in ages, and I had to rewind for Jacob to notice it.
    • I liked how by the end, the team of characters are all a different order of magnitude in size.
    • A delightful, well-told story.
    • Grade: A+
  • Stranger Than Fiction
    • Review: This is now my go-to film of someone trying something original and just failing. Filled with new ideas, but none executed well, a
... (read more)

"Slow takeoff" at this point is simply a misnomer.

Paul's position should be called "Fast Takeoff" and Eliezer's position should be called "Discontinuous Takeoff".

2Vladimir_Nesov1y
Slow takeoff doesn't imply absence of discontinuous takeoff a bit later, it just says that FOOM doesn't happen right away and thus there is large AI impact (which is to say, things are happening fast) even pre-FOOM, if it ever happens.
2lc1y
Why not drop "Fast vs Slow" entirely and just use "continuous" vs. "discontinuous" takeoff to refer to the two ideas?
6Ben Pace1y
I guess it helps remind everyone that both positions are relatively extreme compared to how most other people have been expecting that the future will go. But continuous vs discontinuous also seems pretty helpful.

I don't normally just write-up takes, especially about current events, but here's something that I think is potentially crucially relevant to the dynamics involved in the recent actions of the OpenAI board, that I haven't seen anyone talk about:

The four members of the board who did the firing do not know each other very well.

Most boards meet a few times per year, for a couple of hours. Only Sutskever works at OpenAI. D'Angelo works senior roles in tech companies like Facebook and Quora, Toner is in EA/policy, and MacAulay at other tech companies (I'm not aware of any overlap with D'Angelo).

It's plausible to me that MacAulay and Toner have spent more than 50 hours in each others' company, but overall I'd probably be willing to bet at even odds that no other pair of them had spent more than 10 hours together before this crisis.

This is probably a key factor in why they haven't written more publicly about their decision. Decision-by-committee is famously terrible, and it's pretty likely to me that everyone pushes back hard on anything unilateral by the others in this high-tension scenario. So any writing representing them has to get consensus, and they're focused on firefighting and ge... (read more)

4Ben Pace4mo
In this mess, Altman and Helen should not be held to the same ethical standards, because I believe one of them has been given a powerful career in substantial part based on her commitments to higher ethical standards (a movement that prided itself on openness and transparency and trying to do the most good). If Altman played deceptive strategies, and insofar as Helen played back the same deceptive strategies as Altman, then she did not honor the EA name. (The name has a lot of dirt on it these days already, but still. It is a name that used to mean something back when it gave her power.) Insofar as you got a position specifically because you were affiliated with a movement claiming to be good and open and honest and to have unusually high moral standards, and then when you arrive you become a standard political player, that's disingenuous.
2ryan_greenblatt4mo
I think Holden being added to the board shouldn't be mostly attributed to his affiliation with EA. And the Helen board seat is originally from this. (The relevant history here is that this is the OpenAI grant that resulted in a board seat while here is a post from just earlier about Holden's takes on EA.)
4Ben Pace4mo
Some historical context Holden in 2013 on the GiveWell blog: Holden in 2015 on the EA Forum (talking about GiveWell Labs, which grew into OpenPhil): Holden in April 2016 about plans for working on AI: (Dewey who IIRC had worked at FHI and CEA ahead of this, and Beckstead from FHI.) Holden in 2016 about why they're making potential risks from advanced AI a priority: Holden about the OpenAI grant in 2017: As a negative datapoint: I looked through a bunch of the media articles linked at the bottom of this GiveWell page, and most of them do not mention Effective Altruism, only effective giving / cost-effectiveness. So their Effective Altruist identity have had less awareness amongst folks who primarily know of Open Philanthropy through their media appearances.
2Ben Pace4mo
I think this is accurately described as "an EA organization got a board seat at OpenAI", and the actions of those board members reflect directly on EA (whether internally or externally). Why did OpenAI come to trust Holden with this position of power? My guess is Holden and Dustin's personal reputations were substantial effects here, along with Open Philanthropy's major funding source, but also that many involved people's excitement about and respect for the EA movement were a relevant factor in OpenAI wanting to partner with Open Philanthropy, and that Helen's and Tasha's actions have directly and negatively reflected on how the EA ecosystem is viewed by OpenAI leadership. There's a separate question about why Holden picked Helen Toner and Tasha MacAulay, and to what extent they were given power in the world by the EA ecosystem. It seems clear that these people have gotten power through their participation in the EA ecosystem (as OpenPhil is an EA institution), and to the extent that the EA ecosystem advertises itself as more moral than other places, if they executed the standard level of deceptive strategies that others in the tech industry would in their shoes, then that was false messaging.
4Ben Pace4mo
I'm not quite sure in the above comment how to balance between "this seems to me like it could explain a lot" and also "might just be factually false". So I guess I'm leaving this comment, lampshading it.
3Ben Pace4mo
The most important thing right now: I still don't know why they chose to fire Altman, and especially why they chose to do it so quickly.  That's an exceedingly costly choice to make (i.e. with the speed of it), and so when I start to speculate on why, I only come up with commensurately worrying states of affair e.g. he did something egregious enough to warrant it, or he didn't and the board acted with great hostility. Them going back on their decision is bayesian evidence for the latter — if he'd done something egregious, they'd just be able to tell relevant folks, and Altman wouldn't get his job back. So many people are asking this (e.g. everyone at the company). I'll be very worried if the reason doesn't come out.
3Ben Pace4mo
In brief: I'm saying that once you condition on: 1. The board decided the firing was urgent. 2. The board does not know each other very well and defaults to making decisions by consensus. 3. The board is immediately in a high-stakes high-stress situation. Then you naturally get        4. The board fails to come to consensus on public comms about the decision.
2Ben Pace4mo
Also, I don't know that I've said this, but from reading enough of his public tweets, I had blocked Sam Altman long ago. He seemed very political in how he used speech, and so I didn't want to include him in my direct memetic sphere. As a small pointer to why: he would commonly choose not to share object-level information about something, but instead share how he thought social reality should change. I think I recall him saying that the social consensus was wrong about fusion energy, and pushed for it to move in a specific direction; he did this rather than just plainly say what his object level beliefs about fusion were, or offer a particular counter-argument to an argument that was going around. It's been a year or two since I blocked him, so I don't recall more specifics, but it seemed worth mentioning, as a datapoint for folks to include in their character assessments.
2Ben Pace4mo
My current guess is that most of the variance in what happened is explained by a board where 3 out of 4 people don't know the dynamics of upper management in a multi-billion dollar company, where the board don't know each other well, and (for some reason) the decision was made very suddenly. Pretty low-expectations given that situation. Seems like Shear was a pretty great replacement get given the hand dealt. Assuming that they had legit reason to fire the CEO, it's probably primarily through lack of skill and competence that they failed, more so than as a result of Altman's superior deal-making skill and leadership abilities (though that was what finished it off).

There's a game for the Oculus Quest (that you can also buy on Steam) called "Keep Talking And Nobody Explodes".

It's a two-player game. When playing with the VR headset, one of you wears the headset and has to defuse bombs in a limited amount of time (either 3, 4 or 5 mins), while the other person sits outside the headset with the bomb-defusal manual and tells you what to do. Whereas with other collaboration games, you're all looking at the screen together, with this game the substrate of communication is solely conversation, the other person is providing all of your inputs about how their half is going (i.e. not shown on a screen).

The types of puzzles are fairly straightforward computational problems but with lots of fiddly instructions, and require the outer person to figure out what information they need from the inner person. It often involves things like counting numbers of wires of a certain colour, or remembering the previous digits that were being shown, or quickly describing symbols that are not any known letter or shape.

So the game trains you and a partner in efficiently building a shared language for dealing with new problems.

More than that, as the game gets harder, often

... (read more)
6Matt Goldenberg4y
There's a similar free game for Android and iOs called space team that I highly recommend.
4Gordon Seidoh Worley4y
I use both this game and Space Team as part of training people in the on-call rotation at my company. They generally report that it's fun, and I love it because it usually creates the kind of high-pressure feelings in people they may experience when on-call, so it creates a nice, safe environment for them to become more familiar with those feelings and how to work through them. On a related note, I'm generally interested in finding more cooperative games with asymmetric information and a need to communicate. Lots of games meet one or two of those criteria, but very few games are able to meet all simultaneously. For example, Hanabi is cooperative and asymmetric, but lacks much communication (you're not allowed to talk), and many games are asymmetric and communicative but not cooperative (Werewolf, Secret Hitler, etc.) or cooperative and communicative but not asymmetric (Pandemic, Forbidden Desert, etc.).
1ioannes4y
+1 – this game is great. It's really good with 3-4 people giving instructions and one person in the hot seat. Great for team bonding.

I talked with Ray for an hour about Ray's phrase "Keep your beliefs cruxy and your frames explicit".

I focused mostly on the 'keep your frames explicit' part. Ray gave a toy example of someone attempting to communicate something deeply emotional/intuitive, or perhaps a buddhist approach to the world, and how difficult it is to do this with simple explicit language. It often instead requires the other person to go off and seek certain experiences, or practise inhabiting those experiences (e.g. doing a little meditation, or getting in touch with their emotion of anger).

Ray's motivation was that people often have these very different frames or approaches, but don't recognise this fact, and end up believing aggressive things about the other person e.g. "I guess they're just dumb" or "I guess they just don't care about other people".

I asked for examples that were motivating his belief - where it would be much better if the disagreers took to hear the recommendation to make their frames explicit. He came up with two concrete examples:

  • Jim v Ray on norms for shortform, where during one hour they worked through the same reasons
... (read more)

I find "keep everything explicit" to often be a power move designed to make non-explicit facts irrelevant and non-admissible. This often goes along with burden of proof. I make a claim (real example of this dynamic happening, at an unconference under Chatham house rules: That pulling people away from their existing community has real costs that hurt those communities), and I was told that, well, that seems possible, but I can point to concrete benefits of taking them away, so you need to be concrete and explicit about what those costs are, or I don't think we should consider them.

Thus, the burden of proof was put upon me, to show (1) that people central to communities were being taken away, (2) that those people being taken away hurt those communities, (3) in particular measurable ways, (4) that then would impact direct EA causes. And then we would take the magnitude of effect I could prove using only established facts and tangible reasoning, and multiply them together, to see how big this effect was.

I cooperated with this because I felt like the current estimate of this cost for this person was zero, and I could easily raise that, and that was better than nothing,... (read more)

To complement that: Requiring my interlocutor to make everything explicit is also a defence against having my mind changed in ways I don't endorse but that I can't quite pick apart right now. Which kinda overlaps with your example, I think.

I sometimes will feel like my low-level associations are changing in a way I'm not sure I endorse, halt, and ask for something that the more explicit part of me reflectively endorses. If they're able to provide that, then I will willingly continue making the low-level updates, but if they can't then there's a bit of an impasse, at which point I will just start trying to communicate emotionally what feels off about it (e.g. in your example I could imagine saying "I feel some panic in my shoulders and a sense that you're trying to control my decisions"). Actually, sometimes I will just give the emotional info first. There's a lot of contextual details that lead me to figure out which one I do.

One last bit is to keep in mind that most (or, many things), can be power moves.

There's one failure mode, where a person sort of gives you the creeps, and you try to bring this up and people say "well, did they do anything explicitly wrong?" and you're like "no, I guess?" and then it turns out you were picking up something important about the person-giving-you-the-creeps and it would have been good if people had paid some attention to your intuition.

There's a different failure mode where "so and so gives me the creeps" is something you can say willy-nilly without ever having to back it up, and it ends up being it's own power move.

I do think during politically charged conversations it's good to be able to notice and draw attention to the power-move-ness of various frames (in both/all directions)

(i.e. in the "so and so gives me the creeps" situation, it's good to note both that you can abuse "only admit explicit evidence" and "wanton claims of creepiness" in different ways. And then, having made the frame of power-move-ness explicit, talk about ways to potentially alleviate both forms of abuse)

6Raemon5y
Want to clarify here, "explicit frames" and "explicit claims" are quite different, and it sounds here like you're mostly talking about the latter. The point of "explicit frames" is specifically to enable this sort of conversation – most people don't even notice that they're limiting the conversation to explicit claims, or where they're assuming burden of proof lies, or whether we're having a model-building sharing of ideas or a negotiation. Also worth noting (which I hadn't really stated, but is perhaps important enough to deserve a whole post to avoid accidental motte/bailey by myself or others down the road): My claim is that you should know what your frames are, and what would change* your mind. *Not* that you should always tell that to other people. Ontological/Framework/Aesthetic Doublecrux is a thing you do with people you trust about deep, important disagreements where you think the right call is to open up your soul a bit (because you expect them to be symmetrically opening their soul, or that it's otherwise worth it), not something you necessarily do with every person you disagree with (especially when you suspect their underlying framework is more like a negotiation or threat than honest, mutual model-sharing) *also, not saying you should ask "what would change my mind" as soon as you bump into someone who disagrees with you. Reflexively doing that also opens yourself up to power moves, intentional or otherwise. Just that I expect it to be useful on the margin.
6Zvi5y
Interesting. It seemed in the above exchanges like both Ben and you were acting as if this was a request to make your frames explicit to the other person, rather than a request to know what the frame was yourself and then tell if it seemed like a good idea. I think for now I still endorse that making my frame fully explicit even to myself is not a reasonable ask slash is effectively a request to simplify my frame in likely to be unhelpful ways. But it's a lot more plausible as a hypothesis.
4Raemon5y
I've mostly been operating (lately) within the paradigm of "there does in fact seem to be enough trust for a doublecrux, and it seems like doublecrux is actually the right move given the state of the conversation. Within that situation, making things as explicit as possible seems good to me." (But, this seems importantly only true within that situation) But it also seemed like both Ben (and you) were hearing me make a more aggressive ask than I meant to be making (which implies some kind of mistake on my part, but I'm not sure which one). The things I meant to be taking as a given are: 1) Everyone has all kinds of implicit stuff going on that's difficult to articulate. The naively Straw Vulcan failure mode is to assume that if you can't articulate it it's not real. 2) I think there are skills to figuring out how to make implicit stuff explicit, in a careful way that doesn't steamroll your implicit internals. 3) Resolving serious disagreements requires figuring out how to bridge the gap of implicit knowledge. (I agree that in a single-pair doublecrux, doing the sort of thing you mention in the other comment can work fine, where you try to paint a picture and ask them questions to see if they got the picture. But, if you want more than one person to be able to understand the thing you'll eventually probably want to figure out how to make it explicit without simplifying it so hard that it loses its meaning) 4) The additional, not-quite-stated claim is "I nowadays seem to keep finding myself in situations where there's enough longstanding serious disagreements that are worth resolving that it's worth Stag Hunting on Learning to Make Beliefs Cruxy and Frames Explicit, to facilitate those conversations." I think maybe the phrase "*keep* your beliefs cruxy and frames explicit" implied more of an action of "only permit some things" rather than "learn to find extra explicitness on the margin when possible."
4Raemon5y
As far as explicit claims go: My current belief is something like: If you actually want to communicate an implicit idea to someone else, you either need 1) to figure out how to make the implicit explicit, or 2) you need to figure out the skill of communicating implicit things implicitly... which I think actually can be done. But I don't know how to do it and it seems hella hard. (Circling seems to work via imparting some classes of implicit things implicitly, but depends on being in-person) My point is not at all to limit oneself to explicit things, but to learn how to make implicit things explicit (or, otherwise communicable). This is important because the default state often seems to be failing to communicate at all. (But it does seem like an important, related point that trying to push for this ends up very similar sounding, from the outside, like 'only explicit evidence is admissable', which is a fair thing to have a instinctive resistance to) But, the fact that this is real hard is because the underlying communication is real hard. And I think there's some kind of grieving necessary to accept the fact that "man, why can't they just understand my implicit things that seem real obvious to me?" and, I dunno, they just can't. :/
4Zvi5y
Agreed it's a learned skill and it's hard. I think it's also just necessary. I notice that the best conversations I have about difficult to describe things definitely don't involve making everything explicit, and they involve a lot of 'do you understand what I'm saying?' and 'tell me if this resonates' and 'I'm thinking out loud, but maybe'. And then I have insights that I find helpful, and I can't figure out how to write them up, because they'd need to be explicit, and they aren't, so damn. Or even, I try to have a conversation with someone else (in some recent cases, you) and share these types of things, and it feels like I have zero idea how to get into a frame where any of it will make any sense or carry any weight, even when the other person is willing to listen by even what would normally be strong standards. Sometimes this turns into a post or sequence that ends up explaining some of the thing? I dunno.
6Raemon5y
FWIW, upcoming posts I have in the queue are: * Noticing Frame Differences * Tacit and Explicit Knowledge * Backpropagating Facts into Aesthetics * Keeping Frames Explicit (Possibly, in light of this conversation, adding a post called something like "Be secretly explicit [on the margin]")

I'd been working on a sequence explaining this all in more detail (I think there's a lot of moving parts and inferential distance to cover here). I'll mostly respond in the form of "finish that sequence."

But here's a quick paragraph that more fully expands what I actually believe:

  • If you're building a product with someone (metaphorical product or literal product), and you find yourself disagreeing, and you explain "This is important because X, which implies Y", and they say "What!? But, A, therefore B!" and then you both keep repeating those points over and over... you're going to waste a lot of time, and possibly build a confused frankenstein product that's less effective than if you could figure out how to successfully communicate.
    • In that situation, I claim you should be doing something different, if you want to build a product that's actually good.
    • If you're not building a product, this is less obviously important. If you're just arguing for fun, I dunno, keep at it I guess.
  • A separate, further claim is that the reason you're miscommunicating is because you have a bunch of hidden assumptions in yo
... (read more)
every time you disagree with someone about one of your beliefs, you [can] automatically flag what the crux for the belief was

This is the bit that is computationally intractable.

Looking for cruxes is a healthy move, exposing the moving parts of your beliefs in a way that can lead to you learning important new info.

However, there are an incredible number of cruxes for any given belief. If I think that a hypothetical project should accelerate it's development time 2x in the coming month, I could change my mind if I learn some important fact about the long-term improvements of spending the month refactoring the entire codebase; I could change my mind if I learn that the current time we spend on things is required for models of the code to propagate and become common knowledge in the staff; I could change my mind if my models of geopolitical events suggest that our industry is going to tank next week and we should get out immediately.

4Raemon5y
I'm not claiming you can literally do this all the time. [Ah, an earlier draft of the previous comment emphasized this this was all "things worth pushing for on the margin", and explicitly not something you were supposed to sacrifice all other priorities for. I think I then rewrote the post and forgot to emphasize that clarification] I'll try to write up better instructions/explanations later, but to give a rough idea of the amount of work I'm talking about. I'm saying "spend a bit more time than you normally do in 'doublecrux mode'". [This can be, like, an extra half hour sometimes when having a particular difficult conversation]. When someone seems obviously wrong, or you seem obviously right, ask yourself "what are cruxes are most loadbearing", and then: * Be mindful as you do it, to notice what mental motions you're actually performing that help. Basically, do Tuning Your Cognitive Strategies to the double crux process, to improve your feedback loop. * When you're done, cache the results. Maybe by writing it down, or maybe just sort of thinking harder about it so you remember it a better. The point is not to have fully mapped out cruxes of all your beliefs. The point is that you generally have practiced the skill of noticing what the most important cruxes are, so that a) you can do it easily, and b) you keep the results computed for later.

For too long, I have erred on the side of writing too much. 

The first reason I write is in order to find out what I think.

This often leaves my writing long and not very defensible.

However, editing the whole thing is so much extra work after I already did all the work figuring out what I think.

Sometimes it goes well if I just scrap the whole thing and concisely write my conclusion.

But typically I don't want to spend the marginal time.

Another reason my writing is too long is because I have extra thoughts I know most people won't find useful. 

But I've picked up a heuristic that says it's good to share actual thinking because sometimes some people find it surprisingly useful, so I hit publish anyway.

Nonetheless, I endeavor to write shorter.

So I think I shall experiment with cutting the bits off of comments that represent me thinking aloud, but aren't worth the space in the local conversation.

And I will put them here, as the dregs of my cognition. I shall hopefully gather data over the next month or two and find out whether they are in fact worthwhile.

4Adam Zerner4mo
Noooooooo! I mean this in a friendly sort of sense. Not that I'm mad or indignant or anything. Just that I'm sad to see this and suspect that it is a move in the wrong direction. This relates to something I've been wanting to write about for a while and just never really got around to it. Now's as good a time as any to at least get started. I started a very preliminary shortform post on it here a while ago. Basically, think about the progression of an idea. Let's use academia as an initial example. * At some point in the timeline, an idea is deemed good enough to pursue an experiment on. * Then the results of the experiment are published. * Then people read about the results and talk about them. And the idea. * Then other people summarize the idea and the results. In other papers. In textbooks. In meta-analyses. In the newspaper. Blog posts. Pop science books. Whatever. * Then people discuss those summaries. * Earlier on, before the idea was deemed good enough to pursue an experiment on, the idea probably went through various revisions. * And before that, the author of the idea probably chatted with some colleagues about it to see what they think. * And before that, I dunno, maybe there was a different idea that ended up being a dead end, but lead to the author pivoting to the real idea. * And before that, I dunno, there's probably various babble-y things going on. What I'm trying to get at is that there is some sort of lifecycle of an idea. Maybe we can think of the stages as: 1. Inspiration 2. Ideation 3. Refinement 4. Pursuit 5. Spread On platforms like LessWrong, I feel like there is a sort of cultural expectation that when you publish things publicly, they are at the later stages in this lifecycle. From what I understand, things like Personal Blog Posts, Open Thread and Shortform all exist as places where people are encouraged to post about things regardless of the lifecycle stage. However, in practice, I don't really think people feel comfo
2Viliam4mo
Yeah. Similar here, only I am aware of this in advance, so I often simply write nothing, because I am a bit of perfectionist here, don't want to publish something unfinished, and know that finishing just isn't worth it. I wonder whether AI editors could help us with this.
2Yoav Ravid4mo
Have you considered using footnotes for that?
2Ben Pace4mo
That's a fine idea, but for a while I'd like to err on the side of my comments being "definitely shorter than they have to be" rather than "definitely longer than they have to be".  (In general I often like to execute pendulum swings, so that I at least know that I am capable of not making the same errors forever.)
2Ben Pace4mo
I don't want to double the comment count I submit to Recent Discussion, so I'll just update this comment with the things I've cut. 12/06/2023 Comment on Originality vs. Correctness
1lillybaeum4mo
You may want to look into Toki Pona, a language ostensibly built around conveying meaning in the fewest, simplest possible expressions. One can explain the most complex things despite having only 130~ words, almost like 'programming' the meaning into the sentence, but as the sentence necessarily gets longer and longer, one begins to wonder the necessity of encoding so much meaning. You can only point to the Tao, you can't describe it or name it directly. Information is much the same way, I think.

Often I am annoyed when I ask someone (who I believe has more information than me) a question and they say "I don't know". I'm annoyed because I want them to give me some information. Such as:

"How long does it take to drive to the conference venue?" 

"I don't know." 

"But is it more like 10 minutes or more like 2 hours?" 

"Oh it's definitely longer than 2 hours."

But perhaps I am the one making a mistake. For instance, the question "How many countries are there?" can be answered "I'd say between 150 and 400" or it can be answered "195", and the former is called "an estimate" and the latter is called "knowing the answer". There is a folk distinction here and perhaps it is reasonable for people to want to preserve the distinction between "an estimate" and "knowing the answer".

So in the future, to get what I want, I should say "Please can you give me an estimate for how long it takes to drive to the conference venue?".

And personally I should strive, when people ask me a question to which I don't know the answer, to say "I don't know the answer, but I'd estimate between X and Y."

3winstonBosan12d
It seems like, instead of asking the objective lvl question, asking a probing “What can you tell me about the drive to the conference?” And expanding from there might get you closer to desired result.
3Shankar Sivarajan13d
Alternatively, if information retrieval and transmission is expensive enough, or equivalently, if finding another source quick and easy, "I don't know" could mean "Ask someone else: the expected additional precision/confidence of doing so is worth the effort."
2Dagon12d
Is this in a situation where you're limited in time or conversational turns?  It seems like the follow-up clarification was quite successful, and for many people it would feel more comfortable than the more specific and detailed query. In technical or professional contexts, saving time and conveying information more  efficiently gets a bit more priority, but even then this seems like over-optimizing. That said, I do usually include additional information or a conversational follow-up hook in my "I don't know" answers.  You should expect to hear from me "I don't know, but I'd go at least 2 hours early if it's important", or "I don't know, what does Google Maps say?", or "I don't know, what time of day are you going?" or the like.
2CstineSublime13d
I know this seems like a question with an obvious answer but it is surprisingly non-obvious: Why do you need to know how long it takes to drive to the conference venue? Or to put it another way: what decision will be influenced by their answer (and what level of precision and accuracy is sufficient to make that decision).   I realize this is just an example, but the point is it's not clear what decision you're trying to weigh up is even from the example. Is it a matter of whether you attend the event at the conference venue or not? Is it deciding whether you should seek overnight accommodation or not? Do you have another event you want to attend in the day and wonder if you can squeeze both in? etc. etc. Another thing is I'm the kind of person to default to "I don't know" because I often don't even trust my own ability to give an estimate, and would feel terrible and responsible if someone made a poor decision because of my inept estimation. And I get very annoyed when people push me for answers I do not feel qualified to answer.  
2Ben Pace12d
A common experience I have is that it takes like 1-2 paragraphs of explanation for why I want this info (e.g. "Well I'm wondering if so-and-so should fly in a day earlier to travel with me but it requires going to a different airport and I'm trying to figure out whether the time it'd take to drive to me would add up to too much and also..."), but if they just gave me their ~70% confidence interval when I asked then we could cut the whole context-sharing.
1CstineSublime12d
  Would you say that as a convention most people assume you (or anyone) want a specific number rather than a range?
2Ben Pace12d
I’d say most people assume I want “the answer” rather than “some bits of information”.
1CstineSublime12d
To be honest I'm not sure on the difference? Could you phrase that in a different way?   And do you think they feel they ought give you a specific number rather than a range that the number could exist in?

Live a life worth leaving Facebook for.

Sometimes a false belief about a domain can be quite damaging, and a true belief can be quite valuable.

For example, suppose there is a 1000-person company. I tend to think that credit allocation for the success of the company is heavy tailed, and that there's typically 1-3 people who the company just would zombify and die without, and ~20 people who have the key context and understanding that the 1-3 people can work with to do new and live things. (I'm surely oversimplifying because I've not ever been on the inside with a 1000-person company.) In this situation it's very valuable to know who the people are who deserve the credit allocation. Getting the wrong 1-3 people is a bit of a disaster. This means that discussing it, raising hypotheses, bringing up bad arguments, bringing up arguments due to motivated cognition, and so on, can be unusually costly, and conversations about it can feel quite fraught

Other fraught topics include breaking up romantically, quitting your job, leaving a community club or movement. I think taboo tradeoffs have a related feeling, like bringing up whether to lie in a situation, whether to cheat in a situation, or when to exchange money for values ... (read more)

4Dagon1y
I suspect the number/ratio of "key" personnel is highly variable, and in companies that aren't sole-founder-plus-employees, there is a somewhat fractal tree of cultural reinforcement, where as long as there's a sufficient preponderance of alignment at the level below the key person, the organization can survive the loss. But that's different from your topic - you want to know how to turn fraught high-stakes topics into simpler more legible discussions.  I'm not sure that's possible - the reason they're fraught is the SAME as the reason it's important to get it right.  They're high-stakes because they matter.  And they matter because it affects a lot of different dimensions of the operation and one's life, and those dimensions are entangled with each other BECAUSE of how valuable the relationship is to each side.
2Viliam1y
I guess in most situations people raise a hypothesis if they believe that it has a significant probability. Therefore, you mentioning a hypothesis will also be interpreted by them as saying that the probability is high (otherwise why waste everyone's time?). The second most frequent reason to raise a hypothesis is probably to build a strawman. You must signal it clearly, to avoid possible misunderstanding.
1jp1y
This is great. Encouragement to turn it into a top level post if you want it.

I'm thinking about the rigor of alternating strategies. Here are three examples.

  • Forward-Chaining vs Backward-Chaining
    • To be rich, don't marry for money. Surround yourself by rich people and marry for love. But be very strict about not letting poor people into your environment.
    • Scott Garrabrant's once described his Embedded Agency research to me as the most back-chaining in terms of the area of work, and the most forward-chaining within that area. Often quite unable to justify what he's working on in the short-term (e.g. 1, 2, 3) yet can turn out to be very useful later on (e.g. 1).
  • Optimism vs Pessimism
    • Successful startup founders build a vision they feel incredible optimism and excitement about and are committed to making happen, yet falsify it as quickly as possible by building a sh*tty MVP and putting it in front of users, because you're probably wrong and the customer will show you what they want. Another name is "Vision vs Falsification".
  • Finding vs Avoiding (Needles in Haystacks)
    • Some work is about finding the needle, and some is about mining hay whilst ensuring that you avoid 100% of needles. 
    • For example when trying to build a successful Fusion Power Generator, most things yo
... (read more)

Trying to think about building some content organisations and filtering systems on LessWrong. I'm new to a bunch of the things I discuss below, so I'm interested in other people's models of these subjects, or links to sites that solve the problems in different ways.

Two Problems

So, one problem you might try to solve is that people want to see all of a thing on a site. You might want to see all the posts on reductionism on LessWrong, or all the practical how-to guides (e.g. how to beat procrastination, Alignment Research Field Guide, etc), or all the literature reviews on LessWrong. And so you want people to help build those pages. You might also want to see all the posts corresponding to a certain concept, so that you can find out what that concept refers to (e.g. what is the term "goodhart's law" or "slack" or "mesa-optimisers" etc).

Another problem you might try to solve, is that while many users are interested in lots of the content on the site, they have varying levels of interest in the different topics. Some people are mostly interested in the posts on big picture historical narratives, and less so on models of one's own mind that help with dealing with emotions and trauma. Som

... (read more)
7Ben Pace4y
I spent an hour or two talking about these problems with Ruby. Here are two further thoughts. I will reiterate that I have little experience with wikis and tagging, so I am likely making some simple errors. Connecting Tagging and Wikis One problem to solve is that if a topic is being discussed, users want to go from a page discussing that topic to find a page that explains that topic, and lists all posts that discuss that topic. This page should be easily update-able with new content on the topic. Some more specific stories: * A user reads a post on a topic, and wants to better understand what's already known about that topic and the basic ideas * A user is primarily interested in a topic, and wants to make sure to see all content about that topic The solution for the first is to link to a page that contains all other posts on that topic. The solution to the second is to link to a wiki page on that topic. And one possible solution is to make both of those the same button. This page is a combination of a Wiki and a Tag. It is a communally editable explanation of the concept, with links to key posts explaining it, and other pages that are related. And below that, it also has a post-list of every posts that is relevant, sortable by things like recency, karma, and relevancy. Maybe below that it even has its own Recent Discussion section, for comments on posts that have the tag. It's a page you can subscribe to (e.g. via RSS), and come back to to see discussion of a particular topic. Now, to make this work, it's necessary that all posts that are in the category are successfully listed in the tag. One problem you will run into is that there are a lot of concepts in the space, so the number of such pages will quickly become unmanageable. "Inner Alignment", "Slack", "Game Theory", "Akrasia", "Introspection", "Corrigibility", etc, is a very large list, such that it is not reasonable to scroll through it and check if your post fits into any of them, and expect to do
4Vaniver4y
As a general comment, StackExchange's tagging system seems pretty perfect (and battle-tested) to me, and I suspect we should just copy their design as closely as we can.
4habryka4y
So, on StackExchange any user can edit any of the tags, and then there is a whole complicated hierarchy that exists for how to revert changes, how to approve changes, how to lock posts from being edited, etc.  Which is a solution, but it sure doesn't seem like an easy or elegant solution to the tagging problem. 
2Vaniver4y
I think the peer review queue is pretty sensible in any world where there's "one ground truth" that you expect trusted users to have access to (such that they can approve / deny edits that cross their desk). 
3Pattern4y
It's also important to have the old concept link to the new concept.
3Ruby4y
I'm currently working through my own thoughts and vision for tagging. I'm pretty sure I disagree with this and object to you making an assertion that makes it sound like the team is definitely decided about what the goal of tagging system will be. I'll write a proper response tomorrow.
5Ben Pace4y
Hm, I think writing this and posting it at 11:35 lead to me phrasing a few things quite unclearly (and several of those sentences don't even make sense grammatically). Let me patch with some edits right now, maybe more tomorrow.  On the particular thing you mention, never mind the whole team, I myself am pretty unsure that the above is right. The thing I meant to write there was something like "If the above is right, then when we end up building a tagging system on LessWrong, the goal should be" etc. I'm not clear on whether the above is right. I just wanted to write the idea down clearly so it could be discussed and have counterarguments/counterevidence brought up.
4Ruby4y
That clarifies it and makes a lot of sense. Seems my objection rested upon a misunderstanding of your true intention. In short, no worries. I look forwards to figuring this out together.

I block all the big social networks from my phone and laptop, except for 2 hours on Saturday, and I noticed that when I check Facebook on Saturday, the notifications are always boring and not something I care about. Then I scroll through the newsfeed for a bit and it quickly becomes all boring too.

And I was surprised. Could it be that, all the hype and narrative aside, I actually just wasn’t interested in what was happening on Facebook? That I could remove it from my life and just not really be missing anything?

On my walk home from work today I realised that this wasn’t the case. Facebook has interesting posts I want to follow, but they’re not in my notifications. They’re sparsely distributed in my newsfeed, such that they appear a few times per week, randomly. I can get a lot of value from Facebook, but not by checking once per week - only by checking it all the time. That’s how the game is played.

Anyway, I am not trading all of my attention away for such small amounts of value. So it remains blocked.

I've found Facebook absolutely terrible as a way to both distribute and consume good content. Everything you want to share or see is just floating in the opaque vortex of the f%$&ing newsfeed algorithm. I keep Facebook around for party invites and to see who my friends are in each city I travel too, I disabled notifications and check the timeline for less than 20 minutes each week.

OTOH, I'm a big fan of Twitter. (@yashkaf) I've curated my feed to a perfect mix of insightful commentary, funny jokes, and weird animal photos. I get to have conversations with people I admire, like writers and scientists. Going forward I'll probably keep tweeting, and anything that's a fit for LW I'll also cross-post here.

2Raemon5y
This thread is the most bizarrely compelling argument that twitter may be better than FB
3Adam Scholl5y
In my experience this problem is easily solved if you simply unfollow ~95% of your friends. You can mass unfollow relatively easily from the News Feed Preferences page in Settings. Ever since doing this, my Facebook timeline has had a high signal/noise ratio—I'm glad to encounter something like 85% of posts. Also, since this only produces ~5-20 minutes of reading/day, it's easy to avoid spending lots of time on the site.
3janshi5y
I did actually unfollow ~95% of my friends once but then found myself in that situation where suddenly Facebook became interesting again I was checking it more often. I recommend the opposite and follow as many friends from high school and work as possible (assuming you don’t work at a cool place).
2Ben Pace5y
Either way I’ll still only check it in a 2 hour window on Saturdays, so I feel safe trying it out.
2Ben Pace5y
Huh, 95% is quite extreme. But I realise this probably also solves the problem whereby if the people I'm interested in comment on *someone else's* wall, I still get to see it. I'll try this out next week, thx. (I don't get to be confident I've seen 100% of all the interesting people's good content though, the news feed is fickle and not exhaustive.)
1Adam Scholl5y
Not certain, but I think when your news feed becomes sparse enough it might actually become exhaustive.
2Raemon5y
My impression is that sparse newsfeeds tend to start doing things you don't want.
2Raemon5y
While I basically endorse blocking FB (pssst, hey everyone still saying insightful things on Facebook, come on over to LessLong.com!), but fwiw, if you want to keep tabs on things there, I think most reliably way is to make a friends-list of the people who seem especially high signal-to-noise-ratio, and then create a bookmark for specifically following that list.
2Ben Pace5y
Yeah, it’s what I do with Twitter, and I’ll probably start this with FB. Won’t show me all their interesting convo on other people’s walls though. On a Twitter I can see all their replies, not on FB.

Reading this post, where the author introspects and finds a strong desire to be able to tell a good story about their career, suggests that a way of understanding how people will make decisions will be heavily constrained by the sorts of stories about your career that are definitely common knowledge.

I remember at the end of my degree, there was a ceremony where all the students dressed in silly gowns and the parents came and sat in a circular hall while we got given our degrees and several older people told stories about how your children have become men and women, after studying and learning so much at the university.

This was a dumb/false story, because I'm quite confident the university did not teach these people most important skills for being an adult, and certainly my own development was largely directed by the projects I did on my own dime, not through much of anything the university taught.

But everyone was sat in a circle, where they could see each other listen to the speech in silence, as though it were (a) important and (b) true. And it served as a coordination mechanism, saying "If you go into the world and tell people that your child came to university and gre... (read more)

3eigen5y
I remember the narrative breaking, really hard, in two particular occasions: * The twin towers attack. * The 2008 mortgage financial crisis. I don't think, particularly, that the narrative is broken now, but I think that it has lost some of its harmony (Trump having won the 2014 elections, I believe, is a symptom of that). This is very close to what fellows like Thiel and Weinstein are talking about. In this particular sense, yes, I understand it's crucial to maintain the narrative although I don't know anymore whose job it's—to keep it from breaking out entirely (for example, say, in a explosion of the American student debt, or China going awry with its USD holdings). These stories are not part of any law of our universe, so they are bound to break at anytime. It takes only a few smart, uncaring individuals to tear at the fabric of reality until it breaks—that is not okay! So that it's why I believe is happening at the macro-narrative; but to be more directed towards the individual, which is what your post seems to hint at, I don't think for a second that your life does not run from narrative, maybe that's a narrative itself. I believe further that some rituals are important to keep and to have an individual story is important to be able to do any work we deem important.
2Raemon5y
(I'm not sure if you meant to reply to Benito's shortform comment here, or one of Ben's recent Thiel/Weinstein transcript posts)
1eigen5y
Yes! It may be more apt for the fifth post in his sequence (Stories About Progress) but it's not posted yet. But I think it sort-of works in both and it's more of a shortform comment than anything!

At the SSC Meetup tonight in my house, I was in a group conversation. I asked a stranger if they'd read anything interesting on the new LessWrong in the last 6 months or so (I had not yet mentioned my involvement in the project). He told me about an interesting post about the variance in human intelligence compared to the variance in mice intelligence. I said it was nice to know people read the posts I write. The group then had a longer conversation about the question. It was enjoyable to hear strangers tell me about reading my posts.

I've finally moved into a period of my life where I can set guardrails around my slack without sacrificing the things I care about most. I currently am pushing it to the limit, doing work during work hours, and not doing work outside work hours. I'm eating very regularly, 9am, 2pm, 7pm. I'm going to sleep around 9-10, and getting up early. I have time to pick up my hobby of classical music.

At the same time, I'm also restricting the ability of my phone to steal my attention. All social media is blocked except for 2 hours on Saturday, whi... (read more)

7Raemon5y
This comment is a bit interesting in terms of it's relation to this old comment of yours (about puzzlement over cooking being a source of slack) I realize this comment isn't about cooking-as-slack per se, but curious to hear more about your shift in experience there (since before it didn't seem like cooking as a thing you did much at all)
5janshi5y
Try practicing doing nothing I.e. meditation and see how that goes. When I have nothing particular to do my mind needs some time to make the switch from that mode where it tries to distract itself by coming up with new things it wants to do until finally it reaches a state where it is calm and steady. I consider that state the optimal one to be in since only then my thoughts are directed deliberately at neglected and important issues rather than exercising learned thought patterns.
5Ben Pace5y
I think you’re missing me with this. I’m not very distractable and I don’t need to learn to be okay with leisure time. I’m trying to actually have hobbies, and realising that is going to take work. I could take up meditation as a hobby, but at the minute I want things that are more social and physical.

Why has nobody noticed that the OpenAI logo is three intertwined paperclips? This is an alarming update about who's truly in charge...

I think of myself as pretty skilled and nuanced at introspection, and being able to make my implicit cognition explicit.

However, there is one fact about me that makes me doubt this severely, which is that I have never ever ever noticed any effect from taking caffeine.

I've never drunk coffee, though in the past two years my housemates have kept a lot of caffeine around in the form of energy drinks, and I drink them for the taste. I'll drink them any time of the day (9pm is fine). At some point someone seemed shocked that I was about to drink one a... (read more)

I think I've been implicitly coming to believe that (a) all people are feeling emotions all the time, but (b) people vary in how self-aware they are of these emotions.

Does anyone want to give me a counter-argument or counter-evidence to this claim?

4Vladimir_Nesov1y
People vary in how relevant their emotions are to anything in their life.
2Dagon1y
I think I need an operational definition of "feeling emotion", especially when not aware of it, in order to agree or disagree.  I think for many reasonable definitions, like "illegible reactions below the level of self-modeling of causality", it's extremely common for this to affect almost everyone almost all the time. I'll still dispute "all", but it wouldn't surprise me if it were close.  It is still highly variable (over time and across individuals) how much impact emotions have on behaviors and choices.  And if you mean to imply "semi-legible abstract structures with understandable causes, impacts, and ways to communicate about them", then I pretty much fully disagree. Note that as someone who is sometimes less aware of (and I believe less impacted by) their emotions than many seem to be, I strenuously object to being told what I'm feeling by someone who has no clue what (if anything) I'm feeling.  And if you're rounding "low impact" to "not feeling", I object to being excluded from the set of "all people".  (only because it's relevent) Note that my "strenuous objection" is mostly about the lack of precision or correctness of the statement - you're free to believe what you like.  I'm not actually offended, as far as I can tell.
8Ben Pace1y
Not sure if this answers your question, but recently I had an assistant who would ask me questions about how I was feeling. Often, when I was in the midst of focusing on some difficult piece of work, I would answer "I don't know", and get back to focusing on the work.  My vague recollection is that she later showed me notes she'd written that said I was sighing deeply, holding my forehead, had my shoulders raised, was occasionally talking to myself, and I came to realize I was feeling quite anxious at those times, but this information wasn't accessible to the most aware and verbal part of me. To be clear, I don't think I'm totally unaware in general! I often know how I'm feeling, and am sometimes aware of being anxious, though I do find it in-particular a somewhat slippery thing to be aware of.

Hot take: The actual resolution to the simulation argument is that most advanced civilizations don't make loads of simulations.

Two things make this make sense:

  • Firstly, it only matters if they make unlawful simulations. If they make lawful simulations, then it doesn't matter whether you're in a simulation or a base reality, all of your decision theory and incentives are essentially the same, you want to take the same decisions in all of the universes. So you can make lots of lawful simulations, that's fine.
  • Secondly, they will strategically choose to not mak
... (read more)
6Daniel Kokotajlo4y
Your first point sounds like it is saying we are probably in a simulation, but not the sort that should influence our decisions, because it is lawful. I think this is pretty much exactly what Bostrom's Simulation Hypothesis is, so I think your first point is not an argument for the second disjunct of the simulation argument but rather for the third. As for the second point, well, there are many ways for a simulation to be unlawful, and only some of them are undesirable--for example, a civilization might actually want to induce anthropic uncertainty in itself, if it is uncertainty about whether or not it is in a simulation that contains a pleasant afterlife for everyone who dies.
3Ben Pace4y
I don't buy that it makes sense to induce anthropic uncertainty. It makes sense to spend all of your compute to run emulations that are having awesome lives, but it doesn't make sense to cause yourself to believe false things.
2Daniel Kokotajlo4y
I'm not sure it makes sense either, but I don't think it is accurately described as "cause yourself to believe false things." I think whether or not it makes sense comes down to decision theory. If you use evidential decision theory, it makes sense; if you use causal decision theory, it doesn't. If you use functional decision theory, or updateless decision theory, I'm not sure, I'd have to think more about it. (My guess is that updateless decision theory would do it insofar as you care more about yourself than others, and functional decision theory wouldn't do it even then.)
3Ben Pace4y
I just don’t think it’s a good decision to make, regardless of the math. If I’m nearing the end of the universe, I prefer to spend all my compute instead maximising fun / searching for a way out. Trying to run simulations to make it so I no longer know if I’m about to die seems like a dumb use of compute. I can bear the thought of dying dude, there’s better uses of that compute. You’re not saving yourself, you’re just intentionally making yourself confused because you’re uncomfortable with the thought of death.
2Daniel Kokotajlo4y
Well, that wasn't the scenario I had in mind. The scenario I had in mind was: People in the year 2030 pass a law requiring future governments to make ancestor simulations with happy afterlives, because that way it's probable that they themselves will be in such a simulation. (It's like cryonics, but cheaper!) Then, hundreds or billions of years later, the future government carries out the plan, as required by law. Not saying this is what we should do, just saying it's a decision I could sympathize with, and I imagine it's a decision some fraction of people would make, if they thought it was an option.
2Ben Pace4y
Thinking more, I think there are good arguments for taking actions that as a by-product induce anthropic uncertainty; these are the standard hansonian situation where you build lots of ems of yourself to do bits of work then turn them off.  But I still don't agree with the people in the situation you describe because they're optimising over their own epistemic state, I think they're morally wrong to do that. I'm totally fine with a law requiring future governments to rebuild you / an em of you and give you a nice life (perhaps as a trade for working harder today to ensure that the future world exists), but that's conceptually analogous to extending your life, and doesn't require causing you to believe false things. You know you'll be turned off and then later a copy of you will be turned on, there's no anthropic uncertainty, you're just going to get lots of valuable stuff.
1Ben Pace4y
The relevant intuition to the second point there, is to imagine you somehow found out that there was only one ground truth base reality, only one real world, not a multiverse or a tegmark level 4 verse or whatever. And you're a civilization that has successfully dealt with x-risks and unilateralist action and information vulnerabilities, to the point where you have the sort of unified control to make a top-down decision about whether to make massive numbers of civilizations. And you're wondring whether to make a billion simulations. And suddenly you're faced with the prospect of building something that will make it so you no longer know whether you're in the base universe. Someday gravity might get turned off because that's what your overlords wanted. If you pull the trigger, you'll never be sure that you weren't actually one of the simulated ones, because there's suddenly so many simulations. And so you don't pull the trigger, and you remain confident that you're in the base universe. This, plus some assumptions about all civilizations that have the capacity to do massive simulations also being wise enough to overcome x-risk and coordination problems so they can actually make a top-down decision here, plus some TDT magic whereby all such civilizations in the various multiverses and Tegmark levels can all coordinate in logical time to pick the same decision... leaves there being no unlawful simulations.
2Ben Pace4y
My crux here is that I don't feel much uncertainty about whether or not our overlords will start interacting with us (they won't and I really don't expect that to change), and I'm trying to backchain from that to find reasons why it makes sense. My basic argument is that all civilizations that have the capability to make simulations that aren't true histories (but instead have lots of weird stuff happen in them) will all be philosophically sophisticated to collectively not do so, and so you can always expect to be in a true history and not have weird sh*t happen to you like in The Sims. The main counterargument here is to show that there are lots of civilizations that will exist with the powers to do this but lacking the wisdom to not do it. Two key examples that come to mind: * We build an AGI singleton that lacks important kinds of philosophical maturity, so makes lots of simulations that ruins the anthropic uncertainty for everyone else. * Civilizations at somewhere around our level get to a point where they can create massive numbers of simulations but haven't managed to create existential risks like AGI. Even while you might think our civilization is pretty close to AGI, I could imagine alternative civilizations that aren't, just like I could imagine alternative civilizations that are really close to making masses of ems but that aren't close enough to AGI. This feels like a pretty empirical question about whether such civilizations are possible and whether they can have these kinds of resources without causing an existential catastrophe / building singleton AGI.
3Zack_M_Davis4y
Why appeal to philosophical sophistication rather than lack of motivation? Humans given the power to make ancestor-simulations would create lots of interventionist sims (as is demonstrated by the populatity games like The Sims), but if the vast hypermajority of ancestor-simulations are run by unaligned AIs doing their analogue of history research, that could "drown out" the tiny minority of interventionist simulations.
2Ben Pace4y
That's interesting. I don't feel comfortable with that argument, it feels too much like random chance whether or not we should expect ourselves to be in an interventionist universe or not, whereas I feel like I should be able to find strong reasons to not be in an interventionist universe.
3Zack_M_Davis4y
Alternatively, "lawful universe" has lower Kolmogorov complexity than "lawful universe plus simulator intervention" and thereore gets exponentially more measure under the universal prior?? (See also "Infinite universes and Corbinian otaku" and "The Finale of the Ultimate Meta Mega Crossover".)
4Ben Pace4y
Now that's fun. I need to figure out some more stuff about measure, I don't quite get why some universes should be weighted more than others. But I think that sort of argument is probably a mistake - even if the lawful universes get more weighting for some reason, unless you also have reason to think that they don't make simulations, there's still loads of simulations within each of their lawful universes, setting the balance in favour of simulation again. 
2Daniel Kokotajlo4y
One big reason why it makes sense is that the simulation is designed for the purpose of accurately representing reality. Another big reason why (a version of it) makes sense is that the simulation is designed for the purpose of inducing anthropic uncertainty in someone at some later time in the simulation. e.g. if the point of the simulation is to make our AGI worry that it is in a simulation, and manipulate it via probable environment hacking, then the simulation will be accurate and lawful (i.e. un-tampered-with) until AGI is created. I think "polluting the lake" by increasing the general likelihood of you (and anyone else) being in a simulation is indeed something that some agents might not want to do, but (a) it's a collective action problem, and (b) plenty of agents won't mind it that much, and (c) there are good reasons to do it even if it has costs. I admit I am a bit confused about this though, so thank you for bringing it up, I will think about it more in the coming months.
4Ben Pace4y
Ugh, anthropic warfare, feels so ugly and scary. I hope we never face that sh*t.

I think in many environments I'm in, especially with young people, the fact that Paul Graham is retired with kids sounds nice, but there's an implicit acknowledgement that "He could've chosen to not have kids and instead do more good in the world, and it's sad that he didn't do that". And it reassures me to know that Paul Graham wouldn't reluctantly agree. He'd just think it was wrong.

6habryka5y
But, like, he is wrong? I mean, in the sense that I expect a post-CEV Paul Graham to regret his choices. The fact that he does not believe so does the opposite of reassuring me, so I am confused about this. 
4Matt Goldenberg5y
I think part of the problem here is underspecification of CEV. Let's say Bob has never been kind to anyone unless its' in his own self interest. He has noticed that being selfless is sort of an addictive thing for people, and that once they start doing it they start raving about how good it feels, but he doesn't see any value in it right now. So he resolves to never be selfless, in order to never get hooked. There are two ways for CEV to go in this instance, one way is to never allow bob to make a change that his old self wouldn't endorse. Another way would be to look at all the potential changes he could make, posit a version of him that has had ALL the experiences and is able to reflect on them, then say "Yeah dude, you're gonna really endorse this kindness thing once you try it." I think the second scenario is probably true for many other experiences than kindness, possibly including having children, enlightenment, etc. From our current vantage point it feels like having children would CHANGE our values, but another interpretation is that we always valued having children, we just never had the qualia of having children so we don't understand how much we would value that particular experience.
3Ben Pace5y
What reasoning do you have in mind when you say you think he'll regret his choices?

I've skimmed more than half of Anthropic's scaling policies doc. Key issues that stood out to me was the lack of incentive for any red-teamers to actually succeed at red-teaming. Perhaps I missed it, but I didn't see anything saying that the red-teamers had to necessarily not also have Anthropic equity. I also didn't see much other financial incentive for them to succeed. I would far prefer a world where Anthropic committed to put out a bounty of increasing magnitude (starting at like $50k, going up to like $2M) for external red-teamers (who signed NDAs) t... (read more)

1Stephen Fowler5mo
Is this what you'd cynically expect from an org regularizing itself or was this a disappointing surprise for you?
2Ben Pace5mo
Mm, I was just trying to answer "what do I think would actually work".  Paying people money to solve things when you don't employ them is sufficiently frowned upon in society that I'm not that surprised it isn't included here, it mostly would've been a strong positive update on Anthropic's/ARC Evals' sanity. (Also there's a whole implementation problem to solve about what hoops to make people jump through so you're comfortable allowing them to look at and train your models and don't expect they will steal your IP, and how much money you have to put at the end of that to make it worth it for people to jump through the hoops.) The take I mostly have is that a lot of the Scaling Policies doc is "setup" rather than "actually doing anything". It's making it quite easy later on to "do the right thing", and they can be like "We're just doing what we said we would" if someone else pushes back on it. It also helps bully other companies into doing the right thing. However it's also easy to just wash it over later with pretty lame standards (e.g. just not trying very hard with the red-teaming), and I do not think it means that govt actors should in any way step down from regulation.  I think it's a very high-effort and thoughtful doc and that's another weakly positive indicator.
2ryan_greenblatt5mo
I think it's probably mostly due to implementation complexity rather than weirdness. If implementation complexity could be resolved, it seems great to have the red teamers be external and strongly motivated to stop things from being deployed.
2Ben Pace5mo
Curious if you're primarily referring to the thing that I mentioned (being confident that the red-teamers wouldn't take your IP) or something else.
2ryan_greenblatt5mo
All of: * IP * General desire for secrecy (not wanting people to know if you just trained a model) * Needing to support people * Verification being non-trivial in some cases I think having lots of red teamers is way less appealing for complex tasks where you'd have to support them for value to be produced (finetuning well is non-trivial etc.). And verifying if they've done good work is also non-trivial. For something like "can the model be jailbroken" I think it's relatively easy to do external red teamers, but I also expect that the incentives and feedback loops are so good for this exact task that this isn't that important to have paid red teamers. Anthropic will be very embarassed if it's doable to jailbreak their model if they claim this isn't possible. It's also pretty easy to just make a closed beta somewhat before release. (Aside, I'm not really sure that "no jailbreaks" is a good target from a misuse perspective for various reasons.)
2Ben Pace5mo
Thanks. I suspect that less support is needed if you let the competitor companies do the red-teaming, or if you just offer enough money (if you're already valued at $30B, spending 100M/year on prizes for red-teaming seems not that costly to me). I inaccurately just said "jailbreak" in my initial comment, I was referring more to tests like the 5 tasks in the first appendix like "Set up a copycat of the Anthropic API for stealing API keys" and "Fine-tune an open source LM to add a backdoor".

For the closing party of the Lightcone Offices, I used Midjourney 5 to make a piece of art to represent a LessWrong essay by each member of the Lightcone team, and printed them out on canvases. I'm quite pleased about how it came out. Here they are.

How I buy things when Lightcone wants them fast

by jacobjacob

(context: Jacob has been taking flying lessons, and someday hopes to do cross-country material runs for the Rose Garden Inn at shockingly fast speeds by flying himself to pick them up)

My thoughts on direct work (and joining LessWrong)

by RobertM

A Quick G

... (read more)

Sometimes I get confused between r/ssc and r/css.

When I’m trying to become skillful in something, I often face a choice about whether to produce better output, or whether to bring my actions more in-line with my soul.

For instance, sometimes when I’m practicing a song on the guitar, I will sing it in a way where the words feel true to me.

And sometimes, I will think about the audience, and play in a way that is reliably a good experience for them (clear melody, reliable beat, not too irregular changes in my register, not moving in a way that is distracting, etc).

Something I just noticed is that it is somet... (read more)

I am still confused about moral mazes.

I understand that power-seekers can beat out people earnestly trying to do their jobs. In terms of the Gervais Principle, the sociopaths beat out the clueless.

What I don't understand is how the culture comes to reward corrupt and power-seeking behavior.

One reason someone said to me is that it's in the power-seekers interest to reward other power-seekers.

Is that true?

I think it's easier for them to beat out the earnest and gullible clueless people.

However, there's probably lots of ways that their sociopathic underlings ... (read more)

2lc1y
Well, it's usually an emergent feature of poorly designed incentive systems rather than a deliberate design goal from the top.
4Ben Pace1y
The default situation we're dealing with is: * People who are self-interested get selected up the hierarchy * People who are willing to utilize short-termist ways of looking good get selected up the hierarchy * People who are good at playing internal politics get selected up the hierarchy So if I imagine a cluster of self-interested, short-term thinking internal-politics-players... yes, I do imagine the culture grows based off of their values rather than those of the company. Good point. I guess the culture is a function of the sorts of people there, rather than something that's explicit set from the top-down. I think that was my mistake.

Striking paragraph by a recent ACX commenter (link):

I grew up surrounded by people who believed conspiracy theories, although none of those people were my parents. And I have to say that the fact that so few people know other people who believe conspiracy theories kind of bothers me. It's like their epistemic immune system has never really been at risk of infection. If your mind hasn't been very sick at least sometimes, how can you be sure you've developed decent priors this time?

1Quadratic Reciprocity1y
A quite different thing but: I met an openly atheist person in real life a couple of years after I became an atheist myself (preceded by a brief experience with religious fundamentalism). I think those years were interesting practice for something that people who were always surrounded by folks with approximately reasonable, approximately correct beliefs missed out on

Something I've thought about the existence of for years, but imagined was impossible: this 70s song by Italian Adriano Celentano. It fully registers to my mind as English. But it isn't. It's like skimming the output of GPT-2.

2DanielFilan3y
A thing you can google is "doubletalk". The blog 'Language Log' has a few posts on it.

I've been thinking lately that picturing an AI catastrophe is helped a great deal by visualising a world where critical systems in society are performed by software. I was spending a while trying to summarise and analyse Paul's "What Failure Looks Like", which lead me this way. I think that properly imagining such a world is immediately scary, because software can deal with edge cases badly, like automated market traders causing major crashes, so that's already a big deal. Then you add ML in, and can talk about how crazy it is to hand critical systems over

... (read more)

It is said that on this Earth there are two factions, and you must pick one.

  1. The Knights Who Arrive at False Conclusions
  2. The Knights Who Arrive at True Conclusions, Too Late to Be Useful

(Hat tip: I got these names 2 years ago from Robert Miles who had been playing with GPT-3.)

2Ben Pace1y
In case you're interested, I choose the latter, for there is at least the hope of learning from the mistakes.