Wiki Contributions



Leaving an unaligned force (humans, here) in control of 0.001% of resources seems risky. There is a chance that you've underestimated how large the share of resources controlled by the unaligned force is, and probably more importantly, there is a chance that the unaligned force could use its tiny share of resources in some super-effective way that captures a much higher fraction of resources in the future. The actual effect on the economy of the unaligned force, other than the possibility of its being larger than thought or being used as a springboard to gain more control, seems negligible, so one should still expect full extermination unless there's some positive reason for the strong force to leave the weak force intact.

Humans do have such reasons in some cazes (we like seeing animals, at least in zoos, and being able to study them, etc.; same thing for the Amish; plus we also at least sometimes place real value on the independence and self-determination of such beings and cultures), but there would need to be an argument made that AI will have such positive reasons (and a further argument why the AIs wouldn't just "put whatever humans they wanted to preserve" in "zoos", if one thinks that being in a zoo isn't a great future). Otherwise, exterminating humans would be trivially easy with that large of a power gap. Even if there are multiple ASIs that aren't fully aligned with one another, offense is probably easier than defense; if one AI perceives weak benefits to keeping humans around, but another AI perceives weak benefits to exterminating us, I'd assume we get exterminated and then the 2nd AI pays some trivial amount to the 1st for the inconvenience. Getting AI to strongly care about keeping humans around is, of course, one way to frame the alignment problem. I haven't seen an argument that this will happen by default or that we have any idea how to do it; this seems more like an attempt to say it isn't necessary.


Ah, okay, some of those seem to me like they'd change things quite a lot. In particular, a week's notice is usually possible for major plans (going out of town, a birthday or anniversary, concert that night only, etc.) and being able to skip books that don't interest one also removes a major class of reason not to go. The ones I can still see are (1) competing in-town plans, (2) illness or other personal emergency, and (3) just don't feel like going out tonight. (1) is what you're trying to avoid, of course. On (3) I can see your opinion going either way. It does legitimately happen sometimes that one is too tired for whatever plans one had to seem appealing, but it's legitimate to say that if that happens to you so often that you mind the cost of the extra rounds of drinks you end up buying, maybe you're not a great member for that club. (2) seems like a real problem, and I'm gonna guess that you actually wouldn't make people pay for drinks if they said they missed because they had COVID, there was a death in the family, etc.?


Reads like a ha ha only serious to me anyway.


I started a book club in February 2023 and since the beginning I pushed for the rule that if you don't come, you pay for everyone's drinks next time.

I'm very surprised that in that particular form that worked, because the extremely obvious way to postpone (or, in the end, avoid) the penalty is to not go next time either (or, in the end, ever again). I guess if there's agreement that pretty close to 100% attendance is the norm, as in if you can only show up 60% of the time don't bother showing up at all, then it could work. That would make sense for something like a D&D or other tabletop RPG session, or certain forms of competition like, I dunno, a table tennis league, where someone being absent even one time really does cause quite significant harm to the event. But it eliminates a chunk of the possible attendees entirely right from the start, and I imagine would make the members feel quite constrained by the club, particularly if it doesn't appear to be really required by the event itself. And those don't seem good for getting people to show up, either.

That's not to say the analogy overall doesn't work. I'd imagine requiring people to buy a ticket to go to poker night, with that ticket also covering the night's first ante / blind, does work to increase attendance, and for the reasons you state (and not just people being foolish about "sunk costs"). It's just payment of the penalty after the fact, and presumably with no real enforcement, that I don't get. And if you say it works for your book club, I guess probably it does and I'm wrong somehow. But in any case, I notice that I am confused.


I think this is a very important distinction. I prefer to use "maximizer" for "timelessly" finding the highest value of an objective function, and reserve "optimizer" for the kind of stepwise improvement discussed in this post. As I use the terms, to maximize something is to find the state with the highest value, but to optimize it is to take an initial state and find a new state with a higher value. I recognize that "optimize" and "optimizer" are sometimes used the way you're saying, as basically synonymous with "maximize" / "maximizer", and I could retreat to calling the inherently temporal thing I'm talking about an "improver" (or an "improvement process" if I don't want to reify it), but this actually seems less likely to be quickly understood, and I don't think it's all that useful for "optimize" and "maximize" to mean exactly the same thing.

(There is a subset of optimizers as I (and this post, although I think the value should be graded rather than binary) use the term that in the limit reach the maximum, and a subset of those that even reach the maximum in a finite number of steps, but optimizers that e.g. get stuck in local maxima aren't IMO thereby not actually optimizers, even though they aren't maximizers in any useful sense.)


Good post; this has way more value per minute spent reading and understanding it than the first 6 chapters of Jaynes, IMO.

There were 20 destroyed walls and 37 intact walls, leading to 10 − 3×20 − 1×37 = 13db

This appears to have an error; 10 − 3×20 − 1×37 = 10 - 60 - 37 = -87, not 13. I think you meant for the 37 to be positive, in which case 10 - 60 + 37 = -13, and the sign is reversed because of how you phrased which hypothesis the evidence favors (although you could also just reverse all the signs if you want the arithmetic to come out perfectly).

Also, nitpick, but

and every 3 db of evidence increases the odds by a factor of 2

should have an "about" in it, since 10^(3/10) is ~1.99526231497, not 2. (3db ≈ 2× is a very useful approximation, and implied by 10^3 ≈ 2^10, but encountering it indirectly like this would be very confusing to anyone who isn't already familiar with it.)


I re-read this, and wanted to strong-upvote it, and was disappointed that I already had. This is REALLY good. Way better than the thing it parodies (which was already quite good). I wish it were 10x as long.


The way that LLM tokenization represents numbers is all kinds of stupid. It's honestly kind of amazing to me they don't make even more arithmetic errors. Of course, an LLM can use a calculator just fine, and this is an extremely obvious way to enhance its general intelligence. I believe "give the LLM a calculator" is in fact being used, in some cases, but either the LLM or some shell around it has to decide when to use the calculator and how to use the calculator's result. That apparently didn't happen or didn't work properly in this case.


Thanks for your reply. "70% confidence that... we have a shot" is slightly ambiguous - I'd say that most shots one has are missed, but I'm guessing that isn't what you meant, and that you instead meant 70% chance of success.

70% feels way too high to me, but I do find it quite plausible that calling it a rounding error is wrong. However, with a 20 year timeline, a lot of people I care about will almost definitely still die, who could have not died if death were Solved, which group with very much not negligible probability includes myself. And as you note downthread, the brain is a really deep problem with prosaic life extension. Overall I don't see how anything along these lines can be fast enough and certain enough to be a crux on AI for me, but I'm glad people are working on it more than is immediately apparent to the casual observer. (I'm a type 1 diabetic and would have died at 8 years old if I'd lived before insulin was discovered and made medically available, so the value of prosaic life extension is very much not lost on me.)


P.S. Having this set of values and beliefs is very hard on one's epistemics. I think it's a writ-large version of what Eliezer has stated as "thinking about AI timelines is bad for one's epistemics". Here are some examples:

(1) Although I've never been at all tempted by e/acc techno-optimism (on this topic specifically) / alignment isn't a problem at all / alignment by default, boy, it sure would be nice to hear about a strategy for alignment that didn't sound almost definitely doomed for one reason or another. Even though Eliezer can (accurately, IMO) shoot down a couple of new alignment strategies before getting out of bed in the morning. So far I've never found myself actually doing it, but it's impossible not to notice that if I just weren't as good at finding problems or as willing to acknowledge problems found by others, then some alignment strategies I've seen might have looked non-doomed, at least at first...

(2) I don't expect any kind of deliberate slowdown of making AGI to be all that effective even on its own terms, with the single exception of indiscriminate "tear it all down", which I think is unlikely to get within the Overton window, at least in a robust way that would stop development even in countries that don't agree (forcing someone to sabotage / invade / bomb them). Although such actions might buy us a few years, it seems overdetermined to me that they still leave us doomed, and in fact they appear to cut away some of the actually-helpful options that might otherwise be available (the current crop of companies attempting to develop AGI definitely aren't the least concerned with existential risk of all actors who'd develop AGI if they could, for one thing). Compute thresholds of any kind, in particular, I expect to lead to much greater focus on doing more with the same compute resources rather than doing more by using more compute resources, and I expect there's a lot of low-hanging fruit there since that isn't where people have been focusing, and that the thresholds would need to decrease very much very fast to actually prevent AGI, and decreasing the thresholds below the power of a 2023 gaming rig is untenable. I'm not aware of any place in this argument where I'm allowing "if deliberate slowdowns were effective on their own terms, I'd still consider the result very bad" to bias my judgment. But is it? I can't really prove it isn't...

(3) The "pivotal act" framing seems unhelpful to me. It seems strongly impossible to me for humans to make an AI that's able to pass strawberry alignment that has so little understanding of agency that it couldn't, if it wanted to, seize control of the world. (That kind of AI is probably logically possible, but I don't think humans have any real possibility of building one.) An AI that can't even pass strawberry alignment clearly can't be safely handed "melt all the GPUs" or any other task that requires strongly superhuman capabilities (and if "melt all the GPUs" were a good idea, and it didn't require strongly superhuman capabilities, then people should just directly do that). So, it seems to me that the only good result that could come from aiming for a pivotal act would be that the ASI you're using to execute it is actually aligned with humans and "goes rogue" to implement our glorious transhuman future; and it seems to me that if that's what you want, it would be better to aim for that directly rather than trying to fit it through this weirdly-shaped "pivotal act" hole.

But... if this is wrong, and a narrow AGI could safely do a pivotal act, I'd very likely consider the resulting world very bad anyway, because we'd be in a world where unaligned ASI has been reliably prevented from coming into existence, and if the way that was done wasn't by already having aligned ASI, then by far the obvious way for that to happen is to reliably prevent any ASI from coming into existence. But IMO we need aligned ASI to solve death. Does any of that affect how compelling I find the case for narrow pivotal-act AI on its own terms? Who knows...

Load More