All of Zvi's Comments + Replies

Would be happy to have this done automatically by the mods, but am hesitant to introduce another weekly task I'd have to do by hand.

Now we don't have to read it! Not all heroes wear capes. 

I don't see the reason for this defeatism - not on housing where YIMBY is actively winning some battles and gaining strength, not on aging where there might not be as much research as we'd like but there's definitely research and it will improve over time. As for balancing the budget, we did it as recently as the 1990s and also it's not obvious why we need to care about that. 

So basically on your (1) I'd say yes we agree there are upsides I don't see how that leads to enough to justify the risks, and (2) I disagree strongly with the premise but even i... (read more)

I mean, usually not Grimes, in this case the people I monitor were talking about her, and she is doing some interesting things (e.g. the AI voice thing) and it happened to take place four times in a week. She's a person, actually trying and thinking things, in addition to seeking attention and... we don't have as many people as we'd like.

Thank you for replying!

The reason I included that was so I didn't have to get into arguments about it or have people harp on it, not because I thought you actually needed it. The whole idea is to isolate different objections.

I believe I have seen him make multiple statements on Twitter over months that express this point of view, and I see his statements herein as reinforcing that, but in the interests of this not being distracting to the main point I am editing this line.

Also including an additional exchange from yesterday he had with Elon Musk. 

Mods, please reimport. 

EDIT: Also adding his response to Max Tegmark from yesterday, and EY's response to that, at the bottom, but raising the bar for further inclusions substantially. 

Includes this quote: But regardles... (read more)

I agree that "the worst that can happen is..." suggests an unreasonably low estimate of risk, and technically implies implies either zero threat or zero risk of human error.

That said, I think it's worth distinguishing the story "we will be able to see the threat and we will stop," from "there is no threat." The first story makes it clearer that there is actually broad support for measurement to detect risk and institutional structures that can slow down if risk is large.

It also feels like the key disagreement isn't about corporate law or arguments for risk... (read more)

Actually a bit stronger than that. My intention is that I will continue to update this post if Yann posts in-thread within the next few days, but to otherwise not cover Yann going forward unless it involves actual things happening, or his views updating. And please do call me out on this if I slip up, which I might.

If a lot of readers do that? Seems fine with me! Hell, if enough others find it sufficiently interesting I'll happily make that its own post.

Please do! I've been thinking a lot the past few weeks about how to build a mechanism for coordinated action; it would be great to hear your take on it.

Editing based on google confirming and the 9 karma, but can anyone else confirm this for sure?

4Matt Goldenberg2mo
Yes, that's correct.

How was this generated, I wonder, given the article is several times the length of the context window (or at least, the one I have available)? 

(Note that I didn't find it useful or accurate or anything, but there are other things I'd be curious to try).

It's simply a summary of summaries when the context length is too long.    This summary is likely especially bad because of not using the images and the fact that the post is not about a single topic.
  1. I see why one might think this is a mostly safe assumption, but it doesn't seem like one to me - it's kind of presuming some sort of common sense morality can be used as a check here, even under Goodhart conditions, and I don't think it would be that reliable in most doom cases? I'm trying to operationalize what this would mean in practice in a way a sufficiently strong AI wouldn't find a way around via convincing itself or us, or via indirect action, or similar, and I can't.  
  2. This implies that you think that if you win the grey area you know how to u
... (read more)
2Rohin Shah2mo
Yes, that all seems reasonable. I think I have still failed to communicate on (1). I'm not sure what the relevance of common sense morality is, and if a strong AI is thinking about finding a way to convince itself or us that's already the situation I want to detect and stop. (Obviously it's not clear that I can detect it, but the claim here is just that the facts of the matter are present to be detected.) But probably it's not that worth going into detail here. On (2), the theory of change is that you don't get into the red area, which I agree is equivalent to "the grey area solution would be sufficient". I'm not imagining pivotal acts here. The key point is that before you are in the red area, you can't appeal to "but the misaligned superintelligence would just defeat your technique via X" as a reason that the technique would fail. Personally, I think it's (non-trivially) more likely that you don't get to the red areas the more you catch early examples of deception, which is why I like this theory of change. I expect vehement disagreement there, and I would really like to see arguments for this position that don't go via "but the misaligned superintelligence could do X". I've tried this previously (e.g. this discussion with Eliezer []) and haven't really been convinced. (Tbc, Eliezer did give arguments that don't go through "the misaligned superintelligence could do X"; I'm just not that convinced by them.) I basically agree with (3), (4) and (5). I do expect I'm more optimistic than you about how useful or tractable each of those things are. As a result, I expect that given your beliefs, my plans would look to you like they are relying more on (3), (4), and (5) than would be ideal (in the sense that I expect you'd want to divert some effort to other things that we probably both agree are very hard and not that likely to work but look better on the margin to you relative to the things that would be

Fair enough.

A twitter list is literally: You create it (or use someone else's) and if you load it (e.g. you get the people on the lists in reverse chronological order and nothing else (or you can use Tweetdeck). Doesn't seem to have traps.

What's the difference between the Google Doc and a Twitter List with those accounts on it?

I can see the weird border case where the $8 gets you invested in a bad way but $0 makes you consume in a good way, I guess, but it's weird. Mostly sounds like you very much agree on the danger of much worse than -$8.

Also, you say there are still a bunch of mistakes. Even if it's effectively too late to fix them for the post (almost all readers are within the first 48 hours) I'd like to at least fix my understanding, what else did I still get wrong, to extent you know? 

I don't know what a Twitter List is, but I wouldn't be at all surprised to see it containing some kind of trap to steer the user into a news feed. Social media/enforced addiction stuff is not only something that I avoid talking about publicly, but it's also something that I personally must not change the probability of anyone blogging about it. I will get back to you on this once I've gone over more of your research, but what I was thinking of would have to be some kind of research contracting for Balsa, that comes with notoriously difficult-to-hash-out assurances of not going public about specific domains of information.

Before GPT-4 I would have said I'd run that experiment and I definitely couldn't.

With it? Maybe.

Huh. It was much higher initially. I will fix.

Yes. His argument is it is against any particular risk and here the risk is particular, or something. Scott Alexander's response is... less polite than mine, and emphasizes this point.

Just read that one this morning. Glad we have a handle for it now. Confusion, I dub thee Tyler's Weird Uncertainty Argument Safe Uncertainty Fallacy! First pithy summarization: Safety =/= SUFty

Thank you. I will update the post once I read Wolfram's post and decide what I think about this new info.

In the future, please simply say 'this is wrong' rather than calling something like this misinformation, saying highly misleading with the bold highly, etc. 


EDIT: Mods this is ready for re-import, the Wordpress and Substack versions are updated.

-2Gerald Monroe2mo
See the openAI blog post. They say in the same post that they have made a custom model, as in they weight updated gpt-4, so it can use plugins. It's near the bottom of the entry in plugins. Probably they also weight updated it so it knows to use the browser and local interpreter plugins well, without needing to read the description. Since it disagrees with the authoritative source, is obviously technically wrong, I called it misinformation. I apologize but you have so far failed to respond to most criticism and this was a pretty glaring error.

I don't see his arguments as being in good enough faith or logic to do that. Hanania I have tried to engage with, I don't see how to do that with Pinker. What would you say are the arguments he makes that are worth an actual response? 

(Note that I get the opposite pushback more often, people saying various forms of 'why are you feeding the trolls?')

I encourage people to use agree-disagree voting on Nathan's comment (I will abstain) on the question of whether I should engage more with Pinker. 

A shame - I see this at an agreement voting -3 a day later, which means I didn't do a good enough job.

Thus, I kindly request some combination of (A) will someone else take a shot and/or (B) what would I have to do to get it where it needs to go?


(Edit, it's now at +3? Hmm. We'll see if that holds.)

2Cleo Nardo2mo
My guess is that the people voting "disagree" think that including the distillation in your general write-up is sufficient, and that you don't need to make the distillation its own post.

Thread for suggesting if there is anything here that be its own post, either in current form or expanded form.

General request for feedback on my AI posts and how they could be improved, keeping in mind that LW is not the main demographic I am aiming for. 

Many twitter posts get deleted or are not visible due to privacy settings. Some solution for persistently archiving tweets as seen would be great. One possible realisation would be an in browser script to turn a chunk of twitter into a static HTML file including all text and maybe the images. Possibly auto upload to a server for hosting and then spit out the corresponding link. Copyright could be pragmatically ignored via self hosting. A single author hosting a few thousand tweets+context off a personal amazon S3 bucket or similar isn't a litigation/takedown target. Storage/Hosting costs aren't likely to be that bad given this is essentially static website hosting.
8Lone Pine2mo
When you link to a Twitter thread, I wish you would link to the first post in the thread. It's already confusing enough to get context on Twitter, please don't make it harder for us.

Yeah, I quickly fixed this in original, I definitely flipped the sign reading the graph initially.

Mods can reimport, since I don't know the right way to fix the <img errors.

We were (checks notes) a few days early to the party on that.

You were paying more attention than me (I don't follow anyone who engages with him a lot, so I maybe saw one of his tweets a week). I knew of him as someone who had been right early about COVID, and I also saw him criticizing the media for some of the correct reasons, so I didn't write him off just because he was obnoxious and a crypto fanatic. The interest rate thing was therefore my Igon Value moment.

You can find my attempt at the Waluigi Effect mini-post at: 

I haven't posted it on its own yet, everyone please vote on whether this passes the quality threshold with agreement voting - if this is in the black I'll make it its own post. If you think it's not ready, appreciated if you explain why. 

A shame - I see this at an agreement voting -3 a day later, which means I didn't do a good enough job. Thus, I kindly request some combination of (A) will someone else take a shot and/or (B) what would I have to do to get it where it needs to go?   (Edit, it's now at +3? Hmm. We'll see if that holds.)

Anyone up for writing software that can automate this browser process? Seems like it should be viable to write a program that checks all your autocompletes, you tell it what you want to change and then it fixes it via doing the thing a human would do?

3Lone Pine3mo
Seems like we could use a browser/addon that gives the user more direct control over the autocomplete, rather than writing workarounds to hack bad software.
If this doesn't work (chrome might refuse to cooperate) you can send an email to yourself containing all the links, and then bookmark the url to that email. After using it for a week or so, you can have that email be second on the list of search suggestions (the correct word for the stuff in the chrome search bar) whenever you type the first letter that takes you to gmail/other mail. You can also use alt + shift + backspace and  ctrl + T, ctrl + V, enter to change that search suggestion to a new updated email with new links. Or, even easier, just make a bookmark to the email to yourself.

For many overdetermined reasons I highly recommend the Mac/Windows versions - the Android experience is worse even if you're using it responsibly. 


This is great. I notice I very much want a version that is aimed at someone with essentially no technical knowledge of AI and no prior experience with LW - and this is seems like it's much better at that then par, but still not where I'd want it to be. Whether or not I manage to take a shot, I'm wondering if anyone else is willing to take a crack at that?

1Jay Bailey2mo
If anyone writes this up I would love to know about it - my local AI safety group is going to be doing a reading + hackathon of this in three weeks, attempting to use the ideas on language models in practice. It would be nice to have this version for a couple of people who aren't experienced with AI who will be attending, though it's hardly gamebreaking for the event if we don't have this.
1Alex K. Chen2mo
Can't GPT4 ELI5 this already?
2Jonathan Weil3mo
I am not a million miles from that person. I have admittedly been consuming your posts on the subject rather obsessively over the past month or two, and following the links a lot, but have zero technical background and can’t really follow the mathematical notation. I still found it fascinating and think I “got it.”
2Cleo Nardo3mo

(I fixed a few things, probably worth reimporting - confirming 3:55pm on a few new minor fixes)

Mods please reimport/fix: First Kate Hall tweet is supposed to be this one, accidentally went with a different one instead. Also there's some bug with how the questions get listed on the lefthand side.

Fixed the image.  The table of contents relies on some hacky heuristics that include checking for bolded text and I'm not totally sure what's going wrong there.
Note that in a later Tweet [] she said she was psychotic at the time Edit: and also in this one [].

Note: This is being edited in real time in response to late feedback. You can see the most updated version on Substack while that's happening, I'll have this re-imported when the process is done, but overall levels of change are minor so far.

(Looks like it'll be stable for at least a bit.)

Oh, yeah, guess so. That's how much I don't know how this should be graded!

In fact, I read and enjoyed those - I don't mean to take anything away from them.  But I do note that "ideally X is a matching problem" is a framing that COULD be used for many topics, and I don't have a great explanation for why it'd be correct for universities and not for housing.

Being free to the student (although of course the French overall still pay taxes to fund the real costs) makes it less toxic, but it also means you have that much less excuse if  you don't go. So my guess is this makes it maybe 25% less bad?

Thank you. I agree that the phrasing here wasn't as clear as it could be and I'll watch out for similar things in the future (I won't be fixing this instance because I don't generally edit posts after the first few days unless they are intended to be linked to a lot in the future, given how traffic works these days.) 

If it's still confusing, I meant: I would not say that making my parents proud is one of my main goals in life. I would expect [people I know] to mostly also not see this as one of their main goals. I think that the percentage of people a... (read more)

Thanks for the clarification! In fact, that opinion wasn't even one of the ones I had considered you might have.

I've been editing the post at the Substack version ( but I don't see the option to switch back to the usual non-HTML editor on this one, so mods please reimport and let me know what I'm missing. 


There is no requirement that the windows serve this purpose. Mine don't. I don't find this to be a reasonable response.

Many buildings are grandfathered. Egress windows have been mandatory since the eighties afaik.

I can speak a bit to what I have in mind to do. It's too early to speak too much about how I intend to get those particular things passed but am looking into it. 

I am studying the NEPA question, will hopefully have posts in a month or two after Manchin's reforms are done trying to pass. There are a lot of options both for marginal reforms and for radical reimagining. Right now, as far as I can tell, no one has designed a outcome-based replacement that someone could even consider (as opposed to process based) and I am excited to get that papered over t... (read more)

Yes, there are lesser goals that I could hit with 90% probability. Note that in that comment, I was saying that 2% would make the project attractive, rather than saying I put our chances of success at 2%. And also that the bar there was set very high - getting a clear attributable major policy win. Which then got someone willing to take the YES side at 5% (Ross).

Our funding sources are not public, but I will say at this time we are not funded in any way by FTX or OP.

I am confident that the container stacking rules caused major damage when compared to better stacking rules. If we had a sensible stacking rule across LB/LA from the start I am confident there would have been far less backlog. 

What is less clear is the extent to which the rules changes that were enacted mitigated the problem. While LB made some changes on the day of the post, LA didn't act and LB's action wasn't complete. Thus there was some increase in permitted stacking but it was far from what one would have hoped for. And Elizabeth is right that we did not see a difference in port backlog that we can definitively link to the partial change that was enacted.

Good stuff. This is going to be the first work of fiction linked in a weekly post of mine.

Somewhat tempted to write the rationalfic version of this story, because these characters are missing all the fun.

Thank you. I would love to see that rationalfic take on the idea!

Naming things in politics is much more about not shooting yourself in the foot than anything else - you can't win that way but you can lose, and [plant] research is a standard option. Can always change later if we find something awesome. I learned from MetaMed that obsessing over finding the right name is not worth it, and this was (1) available (2) short (3) sounds nice and (4) is a good metaphorical plant.

a wood famous for being flimsy seems like a bad choice here

-1Amelia Bedelia8mo
I'm sorry, this may come across as very rude, but: MetaMed, a startup both you and Vance were on, failed abjectly and then received precious little coverage or updating from the broader rat community (as far as I've seen). I am happy to believe your skills have improved or that the cause area is better (though this one is so nebulously ambitious that I can't help but feel a cold churn of pessimism). Certainly, demanding that every project a person attempts must meet with success is too high a bar. But this time I would like to see you and your cofounders hold yourselves accountable to keep the communities funding you informed. In practice what I'd want is a legible goalset with predictions on whether each will be met on some future date.

This is a full energy top priority effort.

I will continue the blog as part of that effort, it is the reason I am in position to be able to do this, and I will continue to attend to other commitments because life is complicated, but the effective barrier is 'I can only do so much in a week on this type of stuff no matter what anyway.'

Oh, yeah, I forgot to edit this copy, it's fixed now.

I much prefer to create a post that defines the jargon, you can then link to it as needed. I keep meaning to make a glossary page when I have time, as another tool.

1Nathan Helm-Burger9mo
Yeah, I think a short post just with definitions would work great for that. If you just link a post that has the term buried in an even longer article than the one you are linking from, I think that works less well.
LessWrong tags are also a great place to define jargon.

Scott Alexander asked things related to this, but still seems worth being more explicit about what this perfect 1.69 loss model would be like in practice if we got there?

The correct answer is the annoyingly trivial one: "it would be the best possible model of this type, at the task of language modeling on data sampled from the same distribution as MassiveText." How good is that, though?  Well, it depends entirely on how good you think transformer LMs are capable of being, in principle. If you're Gary Marcus and you think transformer LMs will always suck in some ways, then you think the 1.69 model will also suck in those ways.  Whereas, if you think a perfect transformer LM would be an AGI (even if only trained on MassiveText-like data), then you think the 1.69 model would be an AGI.  Both of these people are right, conditional on their other beliefs. The key distinction here is that "1.69 loss" may not the best achievable loss on this dataset.  It's just an estimate of the best loss achievable by this kind of model. The question "what would a model be like, if it got the best achievable loss, period?" is more interesting, but nothing in this post or these papers really touches on it.

Not entirely, but basically yes.

OK, that makes a lot more sense. I parse "utilitarianism" as "total order over lotteries of world states" so your disagreement with "utilitarianism" threw me for a loop.
Load More