All of Rob Bensinger's Comments + Replies

Does "par-human reasoning" mean at the level of an individual human or at the level of all of humanity combined?

If it's the former, what human should we compare it against? 50th percentile? 99.999th percentile?

I partly answered that here, and I'll edit some of this into the post:

By 'matching smart human performance... across all the scientific work humans do in that field' I don't mean to require that there literally be nothing humans can do that the AI can't match. I do expect this kind of AI to quickly (or immediately) blow humans out of the water, but t

... (read more)

Steering towards world states, taken literally, for a realistic agent is impossible, because an embedded agent cannot even contain a representation of a detailed world-state.

I'm not imagining AI steering toward a full specification of a physical universe; I'm imagining it steering toward a set of possible worlds. Sets of possible worlds can often be fully understood by reasoners, because you don't need to model every world in the set in perfect detail in order to understand the set; you just need to understand at least one high-level criterion (or set of c... (read more)

-1TAG1mo
Yes, you can do things approximating steering towards world states...and you still can't literally steer towards detailed world states, as I said.

The definition I give in the post is "AI that has the basic mental machinery required to do par-human reasoning about all the hard sciences". In footnote 3, I suggest the alternative definition "AI that can match smart human performance in a specific hard science field, across all the scientific work humans do in that field".

By 'matching smart human performance... across all the scientific work humans do in that field' I don't mean to require that there literally be nothing humans can do that the AI can't match. I do expect this kind of AI to quickly (or i... (read more)

For starters, you can have goal-directed behavior without steering the world toward particular states. Novelty seeking, for example, don't imply any particular world-state to achieve. 

If you look from the outside like you're competently trying to steer the world into states that will result in you getting more novel experience, then this is "goal-directed" in the sense I mean, regardless of why you're doing that.

If you (e.g.) look from the outside like you're selecting the local action that's least like the actions you've selected before, regardless o... (read more)

1red75prime1mo
We know that evolution has no preferences (evolution is not an agent), so we generally don't frame our preferences as an approximation of evolution's ones. People who believe that they were created with some goal in mind of the creator do engage in reasoning of what was truly meant for them to do.
1red75prime1mo
The provided link assumes that any preference can be expressed as a utility function over world-states. If you don't assume that (and you shouldn't as human preferences can't be expressed as such), you cannot maximize weighted average of potential utility functions. Some actions are preference-wise irreversible. Take for example virtue ethics: wiping out your memory doesn't restore your status as a virtuous person even if the world doesn't contain any information of your unvirtuous acts anymore, so you don't plan to do that. When I asked here earlier why the article "Problem of Fully Updated Deference [https://arbital.com/p/updated_deference/]" uses incorrect assumption, I've got the answer that it's better to have some approximation than none as it allows to move forward in exploring the problem of alignment. But I see that it became an unconditional cornerstone and not a toy example of analysis.

Dustin Moskovitz comments on Twitter:

The deployment problem is part of societal response to me, not separate.

[...] Eg race dynamics, regulation (including ability to cooperate with competitors), societal pressure on leaders, investment in watchdogs (human and machine), safety testing norms, whether things get open sourced, infohazards.

"The deployment problem is hard and weird" comes from a mix of claims about AI (AGI is extremely dangerous, you don't need a planet-sized computer to run it, software and hardware can and will improve and proliferate by defau... (read more)

Note that if it were costless to make the title way longer, I'd change this post's title from "AGI ruin mostly rests on strong claims about alignment and deployment, not about society" to the clearer:

The AGI ruin argument mostly rests on claims that the alignment and deployment problems are difficult and/or weird and novel, not on strong claims about society

4DanielFilan2mo
could be a subtitle (appended with the word "Or,")?

One reason I like "the danger is in the space of action sequences that achieve real-world goals" rather than "the danger is in the space of short programs that achieve real-world goals" is that it makes it clearer why adding humans to the process can still result in the world being destroyed.

If powerful action sequences are dangerous, and humans help execute an action sequence (that wasn't generated by human minds), then it's clear why that is dangerous too.

If the danger instead lies in powerful "short programs", then it's more tempting to say "just don't ... (read more)

Thanks for the replies, Ryan!

I think the exact quantitative details make a big difference between "AGI ruin seems nearly certain in the absense of positive miracless" and "doom seems quite plausible, but we'll most likely make it through" (my probability of takeover is something like 35%)

I don't think that 'the very first STEM-level AGI is smart enough to destroy the world if you relax some precautions' and 'we have 2.5 years to work with STEM-level AGI before any system is smart enough to destroy the world' changes my p(doom) much at all. (Though this is ... (read more)

I think you should probably note where people (who are still sold on AI risk) often disagree.

If I had a list of 5-10 resources that folks like Paul, Holden, Ajeya, Carl, etc. see as the main causes for optimism, I'd be happy to link those resources (either in a footnote or in the main body).

I'd definitely include something like 'survey data on the same population as my 2021 AI risk survey, saying how much people agree/disagree with the ten factors", though I'd guess this isn't the optimal use of those people's time even if we want to use that time to surve... (read more)

9ryan_greenblatt2mo
I think my views on takeoff/timelines are broadly similar to Paul's except that I have somewhat shorter takeoffs and timelines (I think this is due to thinking AI is a bit easier and also due to misc deference). Fair enough on 'this is very soon', but I think the exact quantitative details make a big difference between "AGI ruin seems nearly certain in the absense of positive miracless" and "doom seems quite plausible, but we'll most likely make it through" (my probability of takeover is something like 35%) I agree with 'we won't have decades' (in the absense of large efforts to slow down which seem unlikely). But from the perspective of targeting our work and alignment research, there is a huge difference between steady and quite noticable takeoff over the course of a few years (which is still insanely fast to humans to be clear) and sudden takeoff within a month. For instance, this disagreement seems to drive a high fraction of the overall disagreement between OpenPhil/Paul/etc views and MIRI-ish views. I don't think this difference should be nearly enough to think the situation is close to ok! Under my views, the goverment should probably take immediate and drastic action if they could do so competently! That said, the picture for alignment researchers is quite different under these views and it seems important to try and get the exact details right when trying to explain the story for AI risk (I think we actually disagree here on details). Additionally, I'd note that I do have some probability on 'Yudkowsky style takeoff' (but maybe only like 5%). Even if we were fine in all other worlds, this alone should be easily sufficient to justify a huge response from society! [not necessarily endorsed by Paul] My understanding is that Paul has a 20 year median on 'dyson sphere or similarly large technical accomplishment'. He also thinks the probability on 'dyson sphere or similarly large technical accomplishment' by end of the decade (within 7 years) is around 15%.

So, this argument seems mostly circular

I don't think your claim makes the argument circular / question-begging; it just means there's an extra step in explaining why and how a random action sequence destroys the world.

Maybe you mean that I'm putting the emphasis in the wrong place, and it would be more illuminating to highlight some specific feature of random smart short programs as the source of the 'instrumental convergence' danger? If so, what do you think that feature is?

From my current perspective I think the core problem really is that most random sh... (read more)

2Rob Bensinger2mo
One reason I like "the danger is in the space of action sequences that achieve real-world goals" rather than "the danger is in the space of short programs that achieve real-world goals" is that it makes it clearer why adding humans to the process can still result in the world being destroyed. If powerful action sequences are dangerous, and humans help execute an action sequence (that wasn't generated by human minds), then it's clear why that is dangerous too. If the danger instead lies in powerful "short programs", then it's more tempting to say "just don't give the program actuators and we'll be fine". The temptation is to imagine that the program is like a lion, and if you just keep the lion physically caged then it won't harm you. If you're instead thinking about action sequences, then it's less likely to even occur to you that the whole problem might be solved by changing the AI from a plan-executor to a plan-recommender. Which is a step in the right direction in terms of actually grokking the nature of the problem.

It's true that if humans were reliably very ambitious, consequentialist, and power-seeking, then this would be stronger evidence that superintelligent AI tends to be ambitious and power-seeking. So the absence of that evidence has to be evidence against "superintelligent AI tends to be ambitious and power-seeking", even if it's not a big weight in the scales.

1rotatingpaguro2mo
Mainly from the second paragraph, I got the impression that "randomly sampled plans" referred to, or at least included, what is the goal, not just how much you optimize it. Anyway, I think I'm losing the thread of the discussion, so whatever.

Also, per footnote 1: "I wrote this post to summarize my own top reasons for being worried, not to try to make a maximally compelling or digestible case for others."

The original reason I wrote this was that Dustin Moskovitz wanted something like this, as an alternative to posts like AGI Ruin:

[H]ave you tried making a layman's explanation of the case? Do you endorse the summary? I'm aware of much longer versions of the argument, but not shorter ones!

From my POV, a lot of the confusion is around the confidence level. Historically EY makes many arguments to e

... (read more)

Thanks for the feedback, John! I've moved the Aryeh/Eliezer exchange to a footnote, and I welcome more ideas for ways to improve the piece. (Folks are also welcome to repurpose anything I wrote above to create something new and more beginner-friendly, if you think there's a germ of a good beginner-friendly piece anywhere in the OP.)

Tagging @Richard_Ngo 

9Rob Bensinger2mo
Also, per footnote 1: "I wrote this post to summarize my own top reasons for being worried, not to try to make a maximally compelling or digestible case for others." The original reason I wrote this was that Dustin Moskovitz wanted [https://twitter.com/moskov/status/1642612082284859393] something like this, as an alternative to posts like AGI Ruin [https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities]: This post is speaking for me and not necessarily for Eliezer, but I figure it may be useful anyway. (A MIRI researcher did review an earlier draft and left comments that I incorporated, at least.) And indeed, one of the obvious ways it could be useful is if it ends up evolving into (or inspiring) a good introductory resource, though I don't know how likely that is, I don't know whether it's already a good intro-ish resource paired with something else, etc.

Copying over a Twitter reply from Quintin Pope (which I haven't replied to, and which was responding to the wording of the Twitter draft of this post):

I think your intuition about how SGD works is wildly wrong. E.g., SGD doesn't do anything like "randomly sample from the set of all low loss NN parameter configurations". https://arxiv.org/abs/2110.00683 

Also, your point about human plans not looking like randomly sampled plans is a point against your intuition that multi-level search processes will tend to generate such plans.

Finally, I don't think it'

... (read more)
5rotatingpaguro2mo
I think Mr. Bensinger's argument is "randomly w.r.t. human plans," while I read your answer as interpreting it as an inherent "randomness" property of plans. Humans do not look random to other humans. This is not an argument for anything else then not looking random to humans.

Quintin, in case you are reading this, I just wanna say that the link you give to justify 

I think your intuition about how SGD works is wildly wrong. E.g., SGD doesn't do anything like "randomly sample from the set of all low loss NN parameter configurations". https://arxiv.org/abs/2110.00683 

really doesn't do nearly enough to justify your bold "wildly wrong" claim. First of all, it's common for papers to overclaim, this seems like the sort of paper that could turn out to be basically just flat wrong. (I lack the expertise to decide for myself, i... (read more)

I find Quintin's reply here somewhat unsatisfying, because I think it is too narrowly focused on current DL-paradigm methods and the artifacts they directly produce, without much consideration for how those artifacts might be composed and used in real systems. I attempted to describe my objections to this general kind of argument in a bit more detail here.

This post evolved from a Twitter thread I wrote two weeks ago. Copying over a Twitter reply by Richard Ngo (n.b. Richard was replying to the version on Twitter, which differed in lots of ways):

Rob, I appreciate your efforts, but this is a terrible framing for trying to convey "the basics", and obscures way more than it clarifies.

I'm worried about agents which try to achieve goals. That's the core thing, and you're calling it a misconception?! That's blatantly false.

In my first Alignment Fundamentals class I too tried to convey all the nuances of my thinkin

... (read more)

I definitely agree with Richard that the post would probably benefit from more iteration with intended users, if new people are the audience you want to target. (In particular, I doubt that the section quoted from the Aryeh interview will clarify much for new people.)

That said, I definitely think that it's the right call to emphasize up-front that instrumental convergence is a property of problem-space rather than of agency. More generally: when there's a common misinterpretation, which very often ends up load-bearing, then it makes sense to address that u... (read more)

I think it's more likely that being conservative about impact would generate probabilities much less than 10%.

I don't know what you mean by "conservative about impact". The OP distinguishes three things:

  • conservatism in decision-making and engineering: building in safety buffer, erring on the side of caution.
  • non-conservatism in decision-making and engineering, that at least doesn't shrug at things like "10% risk of killing all humans".
  • non-conservatism that does shrug at medium-probability existential risks.

It separately distinguishes these two things:

  • foreca
... (read more)
0Signer2mo
I mean predicting modest impact for reasons futurist maybe should predict modest impacts (like "existential catastrophes never happened before" or "novel technologies always plateau" or whole cluster of similar heuristics in opposition to "building safety buffer"). Not necessary "rigorous" - I'm not saying such thinking is definitely correct. I just can't visualize thought process that arrives at 50% before correction, then applies conservative adjustment, because it's all crazy, still gets 10% and proceeds to "then it's fine". So if survey respondents have higher probabilities and no complicated plan, then I don't actually believe that opposite-of-engineering-conservatism mindset applies to them. Yes, maybe you mostly said things about not being decision-maker, but then what's the point of that quote about bridges?

This is why I said in the post:

Some people do have confident beliefs that imply "things will go well"; I disagree there, but I expect some amount of disagreement like that.

... and focused on the many people who don't have a confident objection to nanotech.

I and others have given lots of clear arguments for why relatively early AGI systems will plausibly be vastly smarter than humans. Eric Drexler has given lots of clear arguments for why nanotechnology is probably fairly easy to build.

None of this constitutes a proof that early AGI systems will be able to ... (read more)

4Signer2mo
To be clear, I very much agree with being careful with technologies that have 10% chance of causing existential catastrophe. But I don't see how the part of OP about conservatism connects to it. I think it's more likely that being conservative about impact would generate probabilities much less than 10%. And if anyone says that their probability is 10%, then maybe it's the case of people only having enough resolution for three kinds of probabilities and they think it's less than 50%. Or they are already trying to not be very certain and explicitly widen their confidence intervals (maybe after getting probability from someone more confident), but they actually believe in being conservative more than they believe in their stated probability. So then it becomes about why it is at least 10% - why being conservative in that direction is wrong in general or what are your clear arguments and how are we supposed to weight them against "it's hard to make impact"?

I figured this would be obvious enough, and both surveys discuss this issue; but phrasing things in a way that encourages keeping selection bias in mind does seem like a good idea to me. I've tweaked the phrasing to say "In a survey, X".

my sense is that most LW uses of "crux" are in the context of "double crux"

I think that's not true, and "crux" is mostly used for single cruxes.

It's often harder to tell whether something is a double crux, and in any case "double crux" mostly only makes sense when there are exactly two people in a conversation. In a ten-person Internet forum conversation where everyone has different views, it will be a lot harder to find a claim that would update everyone about the relevant proposition -- and it doesn't especially make sense to try.

 and that the term

... (read more)

I think it goes without saying that one can disagree with anything in the Sequences and can also be assumed to have read and understood it

This seems false as stated -- some nontrivial content in the Sequences consists of theorems.

More generally, there are some claims in the original Sequences that are false (so agreeing with the claim may be at least some evidence that you didn't understand it), some that I'd say "I think that's true, but reasonable people can definitely disagree", some where it's very easy for disagreement to update me toward "you didn't ... (read more)

2Thoth Hermes2mo
It depends on whether you think what I stated was closer to "completely false" or "technically false, because of the word 'anything'." If I had instead said "I think it goes without saying that one can disagree with nearly anything in the Sequences and can also be assumed to have read and understood it", that might bring it out of "false" territory for you, but I feel we would still have a disagreement.  There are theorems in the Sequences that I disagree with Eliezer's characterization of, like Löb's Theorem [https://www.lesswrong.com/posts/GTiFNjYm3SrrG5xfc/why-do-the-sequences-say-that-loeb-s-theorem-shows-that-a], where I feel very confident that I have fully understood both my reading of the theorem as well as Eliezer's interpretation of it to arrive at my conclusions. Also, that this disagreement is fairly substantial, and also may be a key pillar of Eliezer's case for very high AI Risk in general.  My worry still stands that disagreement with Eliezer (especially about how high AI Risk actually is) will be conflated with not being up-to-speed on the Sequences, or about misunderstanding key material, or about misunderstanding theorems or things that have allegedly been proven. I think the example I gave is one specific case of something where Eliezer's interpretation of the theorem (which I believe to have been incorrect) was characterized as the theorem itself.  My position that is regardless of whether or not you think all what I just said is preposterous and proof that I don't understand key material, the norm(s) of good-faith assumption and charitability are still highly advisable to have. I generally believe that in most disagreements, it is possible for both parties to assume that the other party understands them well enough, just that they have assigned very different probabilities to the same statements. 

(Meta: The TIME piece is paywalled in some countries, and is plastered with ads, so Eliezer wanted the text mirrored on the MIRI Blog. He also assented to my having the LW admins cross-post this here. This version adds some clarifying notes Eliezer wrote on Twitter regarding the article.)

2tricky_labyrinth2mo
mfw you didn't add the final addendum (https://twitter.com/ESYudkowsky/status/1642216007552106496 [https://twitter.com/ESYudkowsky/status/1642216007552106496])

Disagree-voted just because of the words "I'm certain that the reason...". I'd be much less skeptical of "I'm pretty dang sure that the reason..." or at the very least "I'm certain that an important contributing factor was..."

(But even the latter seems pretty hard unless you have a lot of insider knowledge from talking to the people who made the decision at DeepMind, along with a lot of trust in them. E.g., if it did turn out that DeepMind was trying to reduce AI hype, then they might have advertised a result less if they thought it were a bigger deal. I don't know this to be so, but it's an example of why I raise an eyebrow at "I'm certain that the reason".)

Or just promising the human some money, with the sequence of actions set up to obscure that anything important is happening. (E.g., you can use misdirection like 'the actually important event that occurred was early in the process, when you opened a test tube to add some saline and thereby allowed the contents of the test tub to start propagating into the air; the later step where you mail the final product to an address you were given, or record an experimental result in a spreadsheet and email the spreadsheet to your funder, doesn't actually matter for the plan'.)
 

3CronoDAS3mo
Getting humans to do things is really easy, if they don't know of a good reason not to do it. It's sometimes called "social engineering", and sometimes it's called "hiring them".

You have to weigh the conjunctive aspects of particular plans against the disjunctiveness of 'there are many different ways to try to do this, including ways we haven't thought of'.

Until this week, all of this was [...] unknown to anyone who could plausibly claim to be a world leader.

I don't think this is known to be true.

In fact they had no idea this debate existed.

That seems too strong. Some data points:

1. There's been lots of AI risk press over the last decade. (E.g., Musk and Bostrom in 2014, Gates in 2015, Kissinger in 2018.)

2. Obama had a conversation with WIRED regarding Bostrom's Superintelligence in 2016, and his administration cited papers by MIRI and FHI in a report on AI the same year. Quoting that report:

General AI (some

... (read more)

in general I think the trend of alignment is positive. We haven't solved the problems, but were quite a bit closer to the solution than 10 years ago.

I mean, I could agree with those two claims but think the trendlines suggest we'll have alignment solved in 200 years and superintelligent capabilities in 14 years. I guess it depends on what you mean by "quite a bit closer"; I think we've written up some useful semiformal descriptions of some important high-level aspects of the problem (like 'Risks from Learned Optimization'), but this seems very far from 'th... (read more)

-3Noosphere893mo
I disagree, though you're right that my initial arguments weren't enough. To talk about the alignment progress we've achieved so far, here's a list: 1. We finally managed to solve the problem of deceptive alignment while being capabilities competitive. In particular, we figured out a goal that is both more outer aligned than the Maximum Likelihood Estimation goal that LLMs use, and critically it is a myopic goal, meaning we can avoid deceptive alignment even at arbitrarily high capabilities. 2. The more data we give to the AI, the more aligned the AI is, which is huge in the sense that we can reliably get AI to be more aligned as it's more capable, vindicating the scalable alignment agenda. 3. The training method doesn't allow the AI to affect it's own distribution, unlike online learning, where the AI selects all the data points to learn, and thus can't shift the distribution nor gradient hack. As far as how much progress? I'd say this is probably 50-70% of the way there, primarily because we finally are figuring out ways to deal with core problems of alignment like deceptive alignment or outer alignment of goals without too much alignment taxes.

Capabilities Researcher: *repeatedly shooting himself in the foot, reloading his gun, shooting again* "Wow, it sure is a shame that my selfish incentives aren't aligned with the collective good!" *reloads gun, shoots again*

Classical prisoners' dilemma, where individuals receive the greatest payoffs if they betray the group rather than cooperate.

In this case, "defecting" gives lower payoffs to the defector -- you're shooting yourself in the foot and increasing the risk that you die an early death.

The situation is being driven mostly by information asymmetries (not everyone appreciates the risks, or is thinking rationally about novel risks as a category), not by deep conflicts of interest. Which makes it doubly important not to propagate the meme that this is a prisoner's dilemma: one of the ways people end up with a false belief about this is exactly that people round this situation off to a PD too often!

1wobblz3mo
My point is that, as you said, you take the safest route when not knowing what others will do - do whatever is best for you and, most importantly, guaranteed. You take some years, and yes, you lose the opportunity to walk out of doing any time, but at least you're in complete control of your situation. Just imagine a PD with 500 actors... I know what I'd pick. 
2Noosphere893mo
The issue is the payoffs involved. Even if it's say at 50% risk, it's still individually rational to take the plunge, because the other 50% in expected value terms outweighs everything else. I don't believe this for a multitude of reasons, but it's useful to illustrate. The payoffs are essentially cooperate and reduce X-risk from say 50% to 1%, which gives them a utility of say 50-200, or defect and gain expected utility of say 10^20 or more if we grant the assumption on LW that AI is the most important invention in human history. Meanwhile for others, cooperation has the utility of individual defection in this scenario, which is 10^20+ utility, whereas defection essentially reverses the sign of utility gained, which is -10^20+ utility. The problem is that without a way to enforce cooperation, it's too easy to defect until everyone dies. Now thankfully, I believe that existential risk is a lot lower, but if existential risk were high in my model, then we eventually need to start enforcing cooperation, as the incentives would be dangerous if existential risk is high. I don't believe that, thankfully.

Capabilities Researcher: *repeatedly shooting himself in the foot, reloading his gun, shooting again* "Wow, it sure is a shame that my selfish incentives aren't aligned with the collective good!" *reloads gun, shoots again*

1Gerald Monroe3mo
It's also possible to interpret the risks differently or believe you can handle the dangers, and be correct or not correct.

I agree with this. I find it very weird to imagine that "10% x-risk this century" versus "90% x-risk this century" could be a crux here. (And maybe it's not, and people with those two views in fact mostly agree about governance questions like this.)

Something I wouldn't find weird is if specific causal models of "how do we get out of this mess" predict more vs. less utility for state interference. E.g., maybe you think 10% risk is scarily high and a sane world would respond to large ML training runs way more aggressively than it responds to nascent nuclear programs, but you also note that the world is not sane, and you suspect that government involvement will just make the situation even worse in expectation.

I think that Eliezer (and many others including myself!) may be suspectable to "living in the should-universe"

That's a new one!

More seriously: Yep, it's possible to be making this error on a particular dimension, even if you're a pessimist on some other dimensions. My current guess would be that Eliezer isn't making that mistake here, though.

For one thing, the situation is more like "Eliezer thinks he tried the option you're proposing for a long time and it didn't work, so now he's trying something different" (and he's observed many others trying other thi... (read more)

1Qumeric3mo
I specifically said "I do not necessarily say that this particular TIME article was a bad idea" mainly because I assumed it probably wasn't that naive. Sorry I didn't make it clear enough. I still decided to comment because I think this is pretty important in general, even if somewhat obvious. Looks like one of those biases which show up over and over again even if you try pretty hard to correct it. Also, I think it's pretty hard to judge what works and what doesn't. The vibe has shifted a lot even in the last 6 months. I think it is plausible it shifted more than in a 10-year period 2010-2019.
1Noosphere893mo
I think this is the big disagreement I have. I do think the alignment community is working, and in general I think the trend of alignment is positive. We haven't solved the problems, but were quite a bit closer to the solution than 10 years ago. The only question was whether LW and the intentional creation of an alignment community was necessary, or was the alignment problem going to be solved without intentionally creating LW and a field of alignment research.

The verbatim statement is:

We have people in crypto who are good at breaking things, and they're the reason why anything is not on fire. And some of them might go into breaking AI systems instead, 'cause that's where you learn anything.

You know, you know, any fool can build a crypto system that they think will work. Breaking existing crypto systems -- cryptographical systems -- is how we learn who the real experts are. So maybe the people finding weird stuff to do with AIs, maybe those people will come up with some truth about these systems that m

... (read more)
1Rana Dexsin3mo
I did in fact go back and listen to that part, but I interpreted that clarifying expansion as referring to the latter part of your quoted segment only, and the former part of your quoted segment to be separate—using cryptocurrency as a bridging topic to get to cryptography afterwards. Anyway, your interpretation is entirely reasonable as well, and you probably have a much better Eliezer-predictor than I do; it just seemed oddly unconservative to interpolate that much into a transcript proper as part of what was otherwise described as an error correction pass.

I'm happy you linkposted this so people could talk about it! The transcript above is extremely error-laden, though, to the extent I'm not sure there's much useful signal here unless you read with extreme care?

I've tried to fix the transcription errors, and posted a revised version at the bottom of this post (minus the first 15 minutes, which are meta/promotion stuff for Bankless). I vote for you copying over the Q&A transcript here so it's available both places.

1vonk3mo
Thanks, Rob. In my defense, it took over 8 hours to merely fix the auto-transcriptor's word misinterpretations (Eliezer occasionally speaks fast, some new concepts, and the audio has gaps/quality issues); and then I was too numb to pay much attention to more detailed organization. (not that I could've done it in such detail in any case, as I'm not a native speaker). I decided posting it in any case because no had seemed to.

Do you know of any arguments with a similar style to The Most Important Century that is as pessimistic as EY/MIRI folks (>90% probability of AGI within 15 years)?

Wait, what? Why do you think anyone at MIRI assigns >90% probability to AGI within 15 years? That sounds wildly too confident to me. I know some MIRI people who assign 50% probability to AGI by 2038 or so (similar to Ajeya Cotra's recently updated view), and I believe Eliezer is higher than 50% by 2038, but if you told me that Eliezer told you in a private conversation "90+% within 15 years"... (read more)

Thanks for posting this, Andrea_Miotti and remember! I noticed a lot of substantive errors in the transcript (and even more errors in vonk's Q&A transcript), so I've posted an edited version of both transcripts. I vote that you edit your own post to include the revisions I made.

Here's a small sample of the edits I made, focusing on ones where someone may have come away from your transcript with a wrong interpretation or important missing information (as opposed to, e.g., the sentences that are just very hard to parse in the original transcript because ... (read more)

1remember1mo
Thank you so much for doing this! Andrea and I both missed this when you first posted it, I'm really sorry I missed your response then. But I've updated it now! 

Gratitude to Andrea_Miotti, remember, and vonk for posting more-timely transcripts of this so LW could talk about it at the time -- and for providing a v1 transcript to give me a head start.

Here's a small sample of the edits I made to the previous Bankless transcript on LW, focusing on ones where someone may have come away from the original transcript with a wrong interpretation or important missing information (as opposed to, e.g., the sentences that are just very hard to parse in the original transcript because too many filler words and false starts to s... (read more)

2Rana Dexsin3mo
Was there out-of-band clarification that Eliezer meant “cryptography” here (at 01:28:41)? He verbalized “crypto”, and I interpreted it as “cryptocurrency” myself, partly to tie things in with both the overall context of the podcast and the hosts' earlier preemptively-retracted question which was more clearly about cryptocurrency. Certainly I would guess that the first statement there is informally true either way, and there's a lot of overlap. (I don't interpret the “cryptosystem” reference a few sentences later to bias it much, to be clear, due to that overlap.)

But this seems to contradict the element of Non-Deception. If you're not actually on the same side as the people who disagree with you, why would you (as a very strong but defeasible default) role-play otherwise?

This is a good question!! Note that in the original footnote in my post, "on the same side" is a hyperlink going to a comment by Val:

"Some version of civility and/or friendliness and/or a spirit of camaraderie and goodwill seems like a useful ingredient in many discussions. I'm not sure how best to achieve this in ways that are emotionally hon

... (read more)

Note that in the original footnote in my post, "on the same side" is a hyperlink going to a comment by Val

Thanks for pointing this out. (I read Val's comment while writing my post, but unfortunately neglected to add the hyperlink when pasting the text of the footnote into my draft.) I have now edited the link into my post.

the goal isn't to trick people into thinking your disagreements are small, it's to make typical disagreements feel less like battles between warring armies

I think the fact that disagreements often feel like battles between warring ... (read more)

But why should we err at all? Should we not, rather, use as many carrots and sticks as is optimal?

"Err on the side of X" here doesn't mean "prefer erring over optimality"; it means "prefer errors in direction X over errors in the other direction". This is still vague, since it doesn't say how much to care about this difference; but it's not trivial advice (or trivially mistaken).

4Said Achmiz3mo
Yes, I know what the expression means. But that doesn’t answer the objection, which is “why are we concerning ourselves with the direction of the errors, when our objective should be to not have errors?” The actual answer has already been given elsethread (a situation where changing the sign of the error is substantially easier than reducing magnitude of error, plus a payoff matrix that is asymmetric w.r.t. the direction of error).

so when I see the brand name being used to market a particular set of discourse norms without a clear explanation of how these norms are derived from the law, that bothers me enough to quickly write an essay or two about it

Seems great to me! I share your intuition that Goodwill seems a bit odd to include. I think it's right to push back on proposed norms like these and talk about how justified they are, and I hope my list can be the start of a conversation like that rather than the end.

I do have an intuition that Goodwill, or something similar to Goodwill,... (read more)

Basically the fact LW has far more arguments for "alignment will be hard" compared to alignment being easy is the selection effect I'm talking about.

That could either be 'we're selecting for good arguments, and the good arguments point toward alignment being hard', or it could be a non-epistemic selection effect.

Why do you think it's a non-epistemic selection effect? It's easier to find arguments for 'the Earth is round' than 'the Earth is flat', but that doesn't demonstrate a non-epistemic bias.

I was also worried because ML people don't really think that

... (read more)

... By 'an Aumann sense' do you just mean 'if you know nothing about a brain, then knowing it believes P is some Bayesian evidence for the truth of P'? That seems like a very weird way to use "Aumann", but if that's what you mean then sure. It's trivial evidence to anyone who's spent much time poking at the details, but it's evidence.

Basically, it means that the fact that other smart people working in ML/AI doesn't agree with LW is itself evidence that LW is wrong, since rational reasoner's updating towards the same priors should see disagreements lesse... (read more)

I think a more likely thing we'd want to stick around to do in that world is 'try to accelerate humanity to AGI ASAP'. "Sufficiently advanced AGI converges to human-friendly values" is weaker than "AGI will just have human-friendly values by default".

I was surprised at how low the hour estimates were, particularly for the OP people (especially Holden) and even for Paul.

Maybe worth keeping in mind that Nate isn't the only MIRI person who's spent lots of hours on this (e.g., Eliezer and Benya have as well), and the numbers only track Nate-time.

Also maybe worth keeping in mind the full list of things that need doing in the world. This is one of the key important leveraged things that needs doing, so it's easy to say "spend more time on it". But spending a thousand hours (so, like, a good chunk of a year w... (read more)

My worry is something similar may be happening for AI risk.

Why do you think this?

8Noosphere894mo
Basically the fact LW has far more arguments for "alignment will be hard" compared to alignment being easy is the selection effect I'm talking about. I was also worried because ML people don't really think that AGI poses an existential risk, and that's evidence, in an Aumann sense. Now I do think this is explainable, but other issues remain:

Even when these discussions don't produce agreement, do you think they're helpful for the community?

I've spoken to several people who have found the MIRI dialogues useful as they enter the field, understand threat models, understand why people disagree, etc. It seems not-crazy to me that most of the value in these dialogues comes from their effects on the community (as opposed to their effects on the participants). 

IMO having and releasing those dialogues was one of the most obviously useful things MIRI has done to date, and I'm super happy with them.... (read more)

3. Will continue to exist regardless of how well you criticize any one part of it.

Depending on what you mean by "any one part of it", I think 3 is false. E.g., a sufficiently good critique of "AGI won't just have human-friendly values by default" would cause MIRI to throw a party and close up shop.

8Raemon4mo
Huh, roll to disbelieve on 'sufficient to close up shop'?. I don't think this is my only crux for AI being really dangerous.  Even if sufficiently advanced AGI reliably converges to human-friendly values in a very strong sense (i.e. two rival humans trying to build AGIs for war, or many humans with many AGIs embarking on complex economic goals, will somehow always figure out the best things for humans even if it means disobeying orders by stupid humans)... ...there's still a separate case to be made multipolar narrow non-fully-superhuman AIs won't kill us before the AGI sovereign fixes everything.

I've rewritten this post for the EA Forum, to help introduce more EAs to rationalist culture and norms. The rewrite goes into more detail about a lot of the points, explaining jargon, motivating some of the less intuitive norms, etc. I expect some folks will prefer that version, and some will prefer the LW version.

(One shortcoming of the EA Forum version is that it's less concise. Another shortcoming is that there's more chance I got stuff wrong, since I erred on the side of "spell things out more in the hope of conveying more of the spirit to people who a... (read more)

How is there no blog post on this website that just introduces/explains cruxes. How. Am I missing something??

2gjm4mo
There's already a double-crux tag which includes a brief definition of "crux".

Basically: whether something is good or bad, enjoyable or unpleasant, desirable or undesirable, interesting or boring, etc. It's the aspect of experience that evaluates some things as better or worse to varying degrees and in various respects.

How about "hurting a person or deminishing their credibility, or the credibility of their argument, without using a rational argument"?

"Hurting a person" still seems too vague to me (sometimes people are "hurt" just because you disagreed with them on a claim of fact), "Diminishing... the credibility of their argument, without using a rational argument" sounds similar to "using symmetric weapons" to me (but the latter strikes me as more precise and general: don't try to persuade people via tools that aren't Bayesian evidence for the truth of the thing you'r... (read more)

1cubefox4mo
As I said, if someone feels upset by mere disagreement, that's not a violation of a rational discourse norm. The focus on physical violence is nice insofar violence is halfway clear-cut, but is also fairly useless insofar the badness of violence is obvious to most people (unlike things like bullying, bad-faith mockery, moral grandstanding, etc which are very common), and mostly irrelevant in internet discussions without physical contact, where most irrational discourse is happening nowadays, very nonviolently. That seems to me an uncharitable interpretation. Social ostracization is prototypically something which happens e.g. when someone gets cancelled by a Twitter mob. "Mob" insofar those people don't use rational arguments to attack you, even if "attacking you without using arguments" can't be defined perfectly precisely. (Something like the Bostrom witch-hunt on Twitter, which included outright defamation, but hardly any arguments.) If you would consequently shun vagueness, then you couldn't even discourage violence, because the difference between violence and non-violence is gradual, it likewise admits of borderline cases. But since violence is bad despite borderline cases, the borderline cases and exceptions you cited also don't seem very serious. You never get perfectly precise definitions. And you have to embrace some more vagueness than in the case of violence, unless you want to refer only to a tiny subset of irrational discourse. By the way, I would say banning/blocking is irrational when it is done in response to disagreement (often people on Twitter ban other people who merely disagree with them) and acceptable when off-topic or purely harassment. Sometimes there are borderline cases which lie in between, those are grey areas where blocking may be neither clearly bad nor clearly acceptable, but such grey areas are in no way counterexamples to the clear-cut cases.

You may not feel bas about mockery (I don't generally do so either), but do you think it reflects well on you as a rationalist?

I like this example! I do indeed share the intuition "mocking Time Cube guy on Twitter doesn't reflect well on me as a rationalist". It also just seems mean to me.

I think part of what's driving my intuition here, though, is that "mocking" sounds inherently mean-spirited, and "on Twitter" makes it sound like I'm writing the sort of low-quality viral personal attack that's common on Twitter.

"Make a light-hearted reference to Time Cub... (read more)

Load More