All of orthonormal's Comments + Replies

Elizabeth has invested a lot of work already, and has explicitly requested that people put in some amount of work when trying to argue against her cruxes (including actually reading her cruxes, and supporting one's points with studies whose methodology one has critically checked).

The citation on that sentence is the same as the first paragraph in this post about the website; Elizabeth is aware of that study and did not find it convincing.

In this and your comments below, you recapitulate points Elizabeth made pretty exactly- so it looks like you didn't need to read it after all!

My answer is "work on applications of existing AI, not the frontier". Advancing the frontier is the dangerous part, not using the state-of-the-art to make products.

But also, don't do frontend or infra for a company that's advancing capabilities.

I also had a bunch of thoughts of ‘oh, well, that’s easy, obviously you would just [OH MY LORD IS THIS CENSORED]’

I applaud your Virtue of Silence, and I'm also uncomfortable with the simplicity of some of the ones I'm sitting on.

Thanks for the info about sockpuppeting, will edit my first comment accordingly.

Re: Glassdoor, the most devastating reviews were indeed after 2017, but it's still the case that nobody rated the CEO above average among the ~30 people who worked in the Spartz era.

Thanks for updating! LessWrong at it’s best :) 

I went through and added up all of the reviews from when Emerson was in charge and the org averaged a 3.9 rating. You can check my math if you’d like (5+3+5+4+1+4+5+5+5+5+5+5+5+1+5+5+3+5+5+5+3+1+2+4+5+3+1)/27

For reference, Meta has a 4 star rating on GlassDoor and has won one of their prizes for Best Place to Work for 12 years straight. (2022 (#47), 2021 (#11), 2020 (#23), 2019 (#7), 2018 (#1), 2017 (#2), 2016 (#5), 2015 (#13), 2014 (#5), 2013 (#1), 2... (read more)

This is a good idea; unfortunately, based on discussions on the EA Forum, Nonlinear is not an organization I would trust to handle it. (Note, as external evidence, that the Glassdoor reviews of Emerson's previous company frequently mention a toxic upper management culture of exactly the sort that the commenter alleges at Nonlinear, and have a 0% rating of him as CEO.)

[EDITED TO ADD: The second comment quotes reviews written after the Spartz era (although I'm sure many of them were present during it), which is misleading; moreover, the second commenter was ... (read more)

Hi, thanks for saying you liked the idea, and also appreciate the chance to clear up some things here. As a reminder, we’re not making funding decisions. We’re just helping funders and applicants find each other. 

Some updates on that thread you might not have seen: the EA Forum moderators investigated and banned two users for creating ~8 fake sockpuppet accounts. This has possibly led to information cascades about things “lots of people are saying.”

Another thing you might not be aware of: the Glassdoor CEO rating of 0% was actually not Emers... (read more)

I bet I know your survey answer on one of Aella's questions.

Looking for "elbows" in a noisy time series with relatively few points is a pretty easy way to get spurious results. If the 1960-62 obesity number was overestimated by 2 points and/or the 1976-1980 number was underestimated by 2 points, it wouldn't look like 1976-1980 was a special transition at all.

(And clearly errors of that magnitude happen, unless you think there's a deep reason why obesity rates were nonmonotonic from 2005-06 to 2011-12.)

That's true, I just think that seeing an elbow is more compatible with there being an actual elbow than there not being an elbow. It's not definitive, just interesting and potentially worth further study.

[EDIT: fallenpegasus points out that there's a low bar to entry to this corner of TIME's website. I have to say I should have been confused that even now they let Eliezer write in his own idiom.]

The Eliezer of 2010 had no shot of being directly published (instead of featured in an interview that at best paints him as a curiosity) in TIME of 2010. I'm not sure about 2020.

I wonder at what point the threshold of "admitting it's at least okay to discuss Eliezer's viewpoint at face value" was crossed for the editors of TIME. I fear the answer is "last month".

Public attention is rare and safety measures are even more rare unless there's real world damage. This is a known pattern in engineering, product design and project planning so I fear there will be little public attention and even less legislation until someone gets hurt by AI. That could take the form of a hot coffee type incident or it could be a Chernobyl type incident. The threshold won't be discussing Eliezer's point of view, we've been doing that for a long time, but losing sleep over Eliezer's point of view. I appreciate in the article Yudkowsky's use of the think-of-the-children stance which has a great track record for sparking legislation.

I can confirm that Nate is not backdating memories—he and Eliezer were pretty clear within MIRI at the time that they thought Sam and Elon were making a tremendous mistake and that they were trying to figure out how to use MIRI's small influence within a worsened strategic landscape.

You were paying more attention than me (I don't follow anyone who engages with him a lot, so I maybe saw one of his tweets a week). I knew of him as someone who had been right early about COVID, and I also saw him criticizing the media for some of the correct reasons, so I didn't write him off just because he was obnoxious and a crypto fanatic.

The interest rate thing was therefore my Igon Value moment.

Balaji treating the ratio between 0.1% interest and 4.75% interest as deeply meaningful is so preposterous that I'm going to stop paying attention to anything he says from here on out.

We were (checks notes) a few days early to the party on that.

I can imagine this coming from the equivalent of "adapt someone else's StackOverflow code" level capability, which is still pretty impressive. 

In my opinion, the scariest thing I've seen so far is coding Game Of Life Pong, which doesn't seem to resemble any code GPT-4 would have had in its training data. Stitching those things together means coding for real for real.

Sam's real plan for OpenAI has never changed, and has been clear from the beginning if you knew about his and Elon's deep distrust of DeepMind:

  1. Move fast, making only token efforts at incorporating our safety team's work into our capabilities work, in order to get way ahead of DeepMind. (If that frustration makes our original safety team leave en masse, no worries, we can always hire another one.)
  2. Maybe once we have a big lead, we can figure out safety.

Kudos for talking about learning empathy in a way that seems meaningfully different and less immediately broken than adjacent proposals.

I think what you should expect from this approach, should it in fact succeed, is not nothing- but still something more alien than the way we empathize with lower animals, let alone higher animals. Consider the empathy we have towards cats... and the way it is complicated by their desire to be a predator, and specifically to enjoy causing fear/suffering. Our empathy with cats doesn't lead us to abandon our empathy for their... (read more)


Very cool! How does this affect your quest for bounded analogues of Löbian reasoning?

I'm working on it :) At this point what I think is true is the following: If ShortProof(x \leftrightarrow LongProof(ShortProof(x) \to x)), then MediumProof(x). Apologies that I haven't written out calculations very precisely yet, but since you asked, that's roughly where I'm at :)

I used to believe, as do many Christians, that an open-hearted truthseeker will become convinced of the existence of the true God once they are exposed. To say otherwise makes missionary work seem rather manipulative (albeit still important for saving souls). More importantly, the principle is well attested in Christian thought and in the New Testament (Jesus with Nicodemus, Paul with the Athenians, etc).

There are and have been world religions that don't evangelize because they don't have the same assumption, but Christianity in particular is greatly wounded if that assumption proves false.

1Jakub Supeł5mo
Oh, so then the question should be "What would I think about these arguments if I hadn't already committed myself to faith and I were an open-hearted truthseeker?". Your claim is that: 1) such a person should consider arguments for the Christian faith to be good, on balance (otherwise "Christianity is greatly wounded"), and 2) such a person often would not consider arguments for the Christian faith to be good. Why do you believe (2)? That is, how can you know what a sincere seeker is going to think of any particular argument? Or, even worse, about all the arguments so that they can decide which theory is more probable on the balance?  I met unbelievers who found some arguments convincing and others who found them unconvincing, but there is no way for me to know if any of them were open-hearted truthseekers. If doctrine (1) is true, it's just not an empirically verifiable doctrine, since there is no observation by means of which you could determine even your own sincerity, much less that of others.

I have not read the book but I think this is exactly wrong, in that what happens after the ??? step is that shareholder value is not maximized.

I think you misinterpreted the book review: Caroline was almost surely making a Underpants Gnomes reference, which is used to indicate that the last thing does not follow in any way from the preceding.

This is honestly some of the most significant alignment work I've seen in recent years (for reasons I plan to post on shortly), thank you for going to all this length!

Typo: "Thoughout this process test loss remains low - even a partial memorising solution still performs extremely badly on unseen data!", 'low' should be 'high' (and 'throughout' is misspelled too).

3Neel Nanda9mo
Thanks, I really appreciate it.  Though I wouldn't personally consider this among the significant alignment work, and would love to hear about why you do!

So I would argue that all of the main contenders are very training data efficient compared to artificial neural nets. I'm not going to go into detail on that argument, unless people let me know that that seems cruxy to them and they'd like more detail.

I'm not sure I get this enough for it to even be a crux, but what's the intuition behind this?

My guess for your argument is that you see it as analogous to the way a CNN beats out a fully-connected one at image recognition, because it cuts down massively on the number of possible models, compatibly with the k... (read more)

1Nathan Helm-Burger1y
I don't think they are better representations of general intelligence. I'm quite confident that much better representations of general intelligence exist and just have yet to be discovered. I'm just saying that these are closer to a proven path, and although they are inefficient and unwise, somebody would likely follow these paths if suddenly given huge amounts of compute this year. And in that imaginary scenario, I predict they'd be pretty effective. My reasoning for saying this for the Blue Brain Project is that I've read a lot of their research papers, and understand their methodology pretty well, and I believe they've got really good coverage of a lot of details. I'm like 97% confident that whatever 'special sauce' allows the human brain to be an effective general intelligence, BBP has already captured that in their model. I think they've captured every detail they could justify as being possibly slightly important, so I think they've also captured a lot of unessecary detail. I think this is bad for interpretability and compute efficiency. I don't recommend this path, I just believe it fulfills the requisites of the thought experiment on 12 OOMs of compute magically appearing.

Has any serious AI Safety research org thought about situating themselves so that they could continue to function after a nuclear war?

Wait, hear me out.

A global thermonuclear war would set AI timelines back by at least a decade, for all of the obvious reasons. So an AI Safety org that survived would have additional precious years to work on the alignment problem, compared to orgs in the worlds where we avoid that war.

So it seems to me that at least one org with short timelines ought to move to New Zealand or at least move farther away from cities.

(Yes, I k... (read more)

The distinction between your post and Eliezer's is more or less that he doesn't trust anyone to identify or think sanely about [plans that they admit have negative expected value in terms of log odds but believe possess a compensatory advantage in probability of success conditional on some assumption].

Such plans are very likely to hurt the remaining opportunities in the worlds where the assumption doesn't hold, which makes it especially bad if different actors are committing to different plans. And he thinks that even if a plan's assumptions hold, the odds... (read more)

In principle, I was imagining talking about two AIs.

In practice, there are quite a few preferences I feel confident a random person would have, even if the details differ between people and even though there's no canonical way to rectify our preferences into a utility function. I believe that the argument carries through practically with a decent amount of noise; I certainly treat it as some evidence for X when a thinker I respect believes X.

Identifying someone else's beliefs requires you to separate a person's value function from their beliefs, which is impossible.

I think it's unfair to raise this objection here while treating beliefs about probability as fundamental throughout the remainder of the post.

If you instead want to talk about the probability-utility mix that can be extracted from seeing another agent's actions even while treating them as a black box... two Bayesian utility-maximizers with relatively simple utility functions in a rich environment will indeed start inferring Bayesian... (read more)

This seems consistent with the claim: You did word things rather clearly. Though I imagine some might object to 'relatively simple utility functions(/rich environment)' - i.e., people don't have simple utility functions.

This fails to engage with Eli's above comment, which focuses on Elon Musk, and is a counterargument to the very thing you're saying.

I meant to reply to the OP, not Eli.

Probable typos: the Qs switch from Q4 to Q5 before the bolded Q5 question.

and they, I'm afraid, will be PrudentBot, not FairBot.

This shouldn't matter for anyone besides me, but there's something personally heartbreaking about seeing the one bit of research for which I feel comfortable claiming a fraction of a point of dignity, being mentioned validly to argue why decision theory won't save us.

(Modal bargaining agents didn't turn out to be helpful, but given the state of knowledge at that time, it was worth doing.)


It would be dying with a lot less dignity if everyone on Earth - not just the managers of the AGI company making the decision to kill us - thought that all you needed to do was be CooperateBot, and had no words for any sharper concepts than that.  Thank you for that, Patrick.

But sorry anyways.

I see Biden as having cogent things he intends to communicate but sometimes failing to speak them coherently, while Trump is a pure stream of consciousness sometimes, stringing together loosely related concepts like a GPT.

(This isn't the same as cognitive capacity, mind you. Trump is certainly more intelligent than many people who speak more legibly.)

I haven't seen a "word salad" from Biden where I can't go "okay, here's the content he intended to communicate", but there are plenty from Trump where I can't reconstruct anything more than sentiment and gestures at disconnected facts.

Oh huh, and it also pairs well with my later Choosing the Zero Point post!

"How" questions are less amenable to lucky guesses than "what" questions. Especially planning questions, e.g. "how would you make a good hat out of food?"

As Anisha said, GPT can pick something workable from a top-100-most-common menu with just a bit of luck, but engineering a plan for a nonstandard task seems beyond its capacity.

Thanks for drawing distinctions - I mean #1 only.

Is there already a concept handle for the notion of a Problem Where The Intuitive Solution Actually Makes It Worse But Makes You Want To Use Even More Dakka On It?

My most salient example is the way that political progressives in the Bay Area tried using restrictive zoning and rent control in order to prevent displacement... but this made for a housing shortage and made the existing housing stock skyrocket in value... which led to displacement happening by other (often cruel and/or backhanded) methods... which led to progressives concluding that their rules... (read more)

1Michael Cohn1y
In terms of naming / identifying this, do you think it would help to distinguish what makes you want to double down on the current solution? I can think of at least 3 reasons:  1. Not being aware that it's making things worse 2. Knowing that it made things worse, but feeling like giving up on that tactic would make things get even worse instead of better 3. Being committed to the tactic more than to the outcome (what pjeby described as "The Principle of the Thing") -- which could itself have multiple reasons, including emotionally-driven responses, duty-based reasoning, or explicitly believing that doubling down somehow leads to better outcomes in the long run.  Do these all fall within the phenomenon you're trying to describe?
"The Human Condition"? ;-) More seriously, though, do you have any examples that aren't based on the instinct-to-punish(reality, facts, people,...) that I ranted about in Curse of the Counterfactual? If they all fall in this category, one could call it an Argument With Reality, which is Byron Katie's term for it. (You could also call it, "The Principle of the Thing", an older and more colloquial term for people privileging the idea of a thing over the substance of the thing, usually to an irrational extent.) When people are having an Argument With Reality, they: * Go for approaches that impose costs on some target(s), in preference to ones that are of benefit to anyone * Refuse to acknowledge other points of view except for how it proves those holding them to be the Bad Wrong Enemies * Double down as long as reality refuses to conform or insufficient Punishment has occurred (defined as the Bad Wrong Enemies surrendering and submitting or at least showing sufficiently-costly signals to that effect) A lot of public policy is driven this way; Wars on Abstract Nouns are always more popular than rehabiliation, prevention, and other benefit-oriented policies, which will be denigrated as being too Soft On Abstract Nouns. (This also applies of course to non-governmental public policies, with much the same incentives for anybody in the public view to avoid becoming considered one of the Bad Wrong Enemies.)

You can see my other reviews from this and past years, and check that I don't generally say this sort of thing:

This was the best post I've written in years. I think it distilled an idea that's perennially sorely needed in the EA community, and presented it well. I fully endorse it word-for-word today.

The only edit I'd consider making is to have the "Denial" reaction explicitly say "that pit over there doesn't really exist".

(Yeah, I know, not an especially informative review - just that the upvote to my past self is an exceptionally strong one.)

Thank you!

Re: your second paragraph, I was (and am) of the opinion that, given the first sentence, readers were in danger of being sucked down into their thoughts on the object-level topic before they would even reach the meta-level point. So I gave a hard disclaimer then and there.

Your mileage varied, of course, but I model more people as having been saved by the warning lights than blinded by them.

There are some posts with perennial value, and some which depend heavily on their surrounding context. This post is of the latter type. I think it was pretty worthwhile in its day (and in particular, the analogy between GPT upgrades and developmental stages is one I still find interesting), but I leave it to you whether the book should include time capsules like this.

It's also worth noting that, in the recent discussions, Eliezer has pointed to the GPT architecture as an example that scaling up has worked better than expected, but he diverges from the thes... (read more)

Fighting is different from trying. To fight harder for X is more externally verifiable than to try harder for X. 

It's one thing to acknowledge that the game appears to be unwinnable. It's another thing to fight any less hard on that account.

One tiny note: I was among the people on AAMLS; I did leave MIRI the next year; and my reasons for so doing are not in any way an indictment of MIRI. (I was having some me-problems.) 

I still endorse MIRI as, in some sense, being the adults in the AI Safety room, which has... disconcerting effects on my own level of optimism.

Ditto - the first half makes it clear that any strategy which isn't at most 2 years slower than an unaligned approach will be useless, and that prosaic AI safety falls into that bucket.

Thanks for asking about the ITT. 

I think that if I put a more measured version of myself back into that comment, it has one key difference from your version.

"Pay attention to me and people like me" is a status claim rather than a useful model.

I'd have said "pay attention to a person who incurred social costs by loudly predicting one later-confirmed bad actor, when they incur social costs by loudly predicting another". 

(My denouncing of Geoff drove a wedge between me and several friends, including my then-best friend; my denouncing of the other on... (read more)

Thanks, supposedlyfun, for pointing me to this thread.

I think it's important to distinguish my behavior in writing the comment (which was emotive rather than optimized - it would even have been in my own case's favor to point out that the 2012 workshop was a weeklong experiment with lots of unstructured time, rather than the weekend that CFAR later settled on, or to explain that his CoZE idea was to recruit teens to meddle with the other participants' CoZE) from the behavior of people upvoting the comment.

I expect that many of the upvotes were not of the form "this is a good comment on the meta level" so much as "SOMEBODY ELSE SAW THE THING ALL ALONG, I WORRIED IT WAS JUST ME".

5[DEACTIVATED] Duncan Sabien2y
This seems true to me.  I'm also feeling a little bit insecure or something and wanting to reiterate that I think that particular comment was a net-positive addition and in my vision of LessWrong would have been positively upvoted. Just as it's important to separate the author of a comment from the votes that comment gets (which they have no control over), I want to separate a claim like "this being in positive territory is bad" (which I do not believe) from "the contrast between the total popularity of this and that is bad." I'm curious whether I actually passed your ITT with the rewrite attempt.

Is this meant to be a linkpost? I don't see any content except for the comment above.

Hm, it was, I am quite sure I did paste the link into the link field (I'm not that careless). I'm not sure what happened there... I added 2 tags afterwards, I wonder if there is a bug in the tag-adding code which can erase links? (Or if setting the link field doesn't automatically turn a post into a link post, which if true, would be another reason for my old suggestion that link posts should get the little circle link-icon.) Anyway, I've added the link (back?).

The subconscious mind knows exactly what it's flinching away from considering. :-)

9Eliezer Yudkowsky2y
My autobiographical episodic memory is nowhere near good enough to answer this question, alas.

A secondary concern in that it's better to have one org that has some people in different locations, but everyone communicating heavily, than to have two separate organizations.

4Vanessa Kosoy2y
This might be the right approach, but notice that no existing AI risk org does that. They all require physical presence.

I think this is much more complex than you're assuming. As a sketch of why, costs of communication scale poorly, and the benefits of being small and coordinating centrally often beats the costs imposed by needing to run everything as one organization. (This is why people advise startups to outsource non-central work.)

Sure - and MIRI/FHI are a decent complement to each other, the latter providing a respectable academic face to weird ideas. 

Generally though, it's far more productive to have ten top researchers in the same org rather than having five orgs each with two top researchers and a couple of others to round them out. Geography is a secondary concern to that.

4Vanessa Kosoy2y
A "secondary concern" in the sense that, we should work remotely? Or in the sense that everyone should relocate? Because the latter is unrealistic: people have families, friends, communities, not anyone can uproot themself.

Thank you for writing this, Jessica. First, you've had some miserable experiences in the last several years, and regardless of everything else, those times sound terrifying and awful. You have my deep sympathy.

Regardless of my seeing a large distinction between the Leverage situation and MIRI/CFAR, I agree with Jessica that this is a good time to revisit the safety of various orgs in the rationality/EA space.

I almost perfectly overlapped with Jessica at MIRI from March 2015 to June 2017. (Yes, this uniquely identifies me. Don't use my actual name here anyw... (read more)

I think CFAR would be better off if Anna delegated hiring to someone else.

I think Pete did (most of?) the hiring as soon as he became ED, so I think this has been the state of CFAR for a while (while I think Anna has also been able to hire people she wanted to hire).

: People in and adjacent to MIRI/CFAR manifest major mental health problems, significantly more often than the background rate.

I think this is true

My main complaint about this and the Leverage post is the lack of base-rate data. How many people develop mental health problems in a) normal companies, b) startups, c) small non-profits, d) cults/sects? So far, all I have seen are two cases. And in the startups I have worked at, I would also have been able to find mental health cases that could be tied to the company narrative. Humans being human narratives get... (read more)

if one believed somebody else were just as capable of causing AI to be Friendly, clearly one should join their project instead of starting one's own.

Nitpicking: there are reasons to have multiple projects, for example it's convenient to be in the same geographic location but not anyone can relocate to any place.

Additionally, as a canary statement: I was also never asked to sign an NDA.

Load More