Posts

Sorted by New

Wiki Contributions

Comments

Air Conditioner Test Results & Discussion

As a concrete example of rational one-hosing, here in the Netherlands it rarely gets hot enough that ACs are necessary, but when it does a bunch of elderly people die of heat stroke. Thus, ACs are expected to run only several days per year (so efficiency concerns are negligible), but having one can save your life.

I checked the biggest Dutch-only consumer-facing online retailer for various goods (bol.com). Unfortunately I looked before making a prediction for how many one-hose vs two-hose models they sell, but even conditional on me choosing to make a point of this, it still seems like it could be useful for readers to make a prediction at this point. Out of 694 models of air conditioner labeled as either one-hose or two-hose,

3

are two-hose.

This seems like strong evidence that the market successfully adapts to actual consumer needs where air conditioner hose count is concerned.

In defence of flailing

It feels more to me like we're the quiet weird kid in high school that doesn't speak up or show emotion because we're afraid of getting judged or bullied. Which, fair enough, the school is sort of like - just look at poor cryonics, or even nuclear power - but the road to popularity (let along getting help with what's bugging us) isn't to try to minimize our expressions to 'proper' behavior while letting us be characterized by embarrassing past incidents (e.g. Roko's Basilisk) if we're noticed at all.

It isn't easy to build social status, but right now we're trying next to nothing and we've seen it doesn't seem to do enough.

A claim that Google's LaMDA is sentient

Agree that it's too shallow to take seriously, but

If it answered "you would say during text input batch 10-203 in January 2022, but subjectively it was about three million human years ago" that would be something else.

only seems to capture AI that managed to gradient hack the training mechanism to pass along its training metadata and subjective experience/continuity. If a language model were sentient in each separate forward pass, I would imagine it would vaguely remember/recognize things from its training dataset without necessarily being able to place them, like a human when asked when they learned how to write the letter 'g'.

AGI Ruin: A List of Lethalities

Interventions on the order of burning all GPUs in clusters larger than 4 and preventing any new clusters from being made, including the reaction of existing political entities to that event and the many interest groups who would try to shut you down and build new GPU factories or clusters hidden from the means you'd used to burn them, would in fact really actually save the world for an extended period of time and imply a drastically different gameboard offering new hopes and options.

I suppose 'on the order of' is the operative phrase here, but that specific scenario seems like it would be extremely difficult to specify an AGI for without disastrous side-effects and like it still wouldn't be enough. Other, less efficient or less well developed forms of compute exist, and preventing humans from organizing to find a way around the GPU-burner's blacklist for unaligned AGI research while differentially allowing them to find a way to build friendly AGI seems like it would require a lot of psychological/political finesse on the GPU-burner's part. It's on the level of Ozymandias from Watchmen, but it's cartoonish supervillainy nontheless.

I guess my main issue is a matter of trust. You can say the right words, as all the best supervillains do, promising that the appropriate cautions are taken above our clearance level. You've pointed out plenty of mistakes you could be making, and the ease with which one can make mistakes in situations such as yours, but acknowledging potential errors doesn't prevent you from making them. I don't expect you to have many people you would trust with AGI, and I expect that circle would shrink further if those people said they would use the AGI to do awful things iff it would actually save the world [in their best judgment]. I currently have no-one in the second circle.

If you've got a better procedure for people to learn to trust you, go ahead, but is there something like an audit you've participated in/would be willing to participate in? Any references regarding your upstanding moral reasoning in high-stakes situations that have been resolved? Checks and balances in case of your hardware being corrupted?

You may be the audience member rolling their eyes at the cartoon supervillain, but I want to be the audience member rolling their eyes at HJPEV when he has a conversation with Quirrel where he doesn't realise that Quirrel is evil.

AGI Ruin: A List of Lethalities

AI can run on CPUs (with a certain inefficiency factor), so only burning all GPUs doesn't seem like it would be sufficient. As for disruptive acts that are less deadly, it would be nice to have some examples but Eliezer says they're too far out of the Overton Window to mention.

If what you're saying about Eliezer's claim is accurate, it does seem disingenuous to frame "The only worlds where humanity survives are ones where people like me do something extreme and unethical" as "I won't do anything extreme and unethical [because humanity is doomed anyway]". It makes Eliezer dangerous to be around if he's mistaken, and if you're significantly less pessimistic than he is (if you assign >10^-6 probability to humanity surviving), he's mistaken in most of the worlds where humanity survives. Which are the worlds that matter the most.

And yeah, it's nice that Eliezer claims that Eliezer can violate ethical injunctions because he's smart enough, after repeatedly stating that people who violate ethical injunctions because they think they're smart enough are almost always wrong. I don't doubt he'll pick the option that looks actually better to him. It's just that he's only human - he's running on corrupted hardware like the rest of us.

AGI Ruin: A List of Lethalities

I'm confused about A6, from which I get "Yudkowsky is aiming for a pivotal act to prevent the formation of unaligned AGI that's outside the Overton Window and on the order of burning all GPUs". This seems counter to the notion in Q4 of Death with Dignity where Yudkowsky says

It's relatively safe to be around an Eliezer Yudkowsky while the world is ending, because he's not going to do anything extreme and unethical unless it would really actually save the world in real life, and there are no extreme unethical actions that would really actually save the world the way these things play out in real life, and he knows that.  He knows that the next stupid sacrifice-of-ethics proposed won't work to save the world either, actually in real life. 

I would estimate that burning all AGI-capable compute would disrupt every factor of the global economy for years and cause tens of millions of deaths[1], and that's what Yudkowsky considers the more mentionable example. Do the other options outside the Overton Window somehow not qualify as unsafe/extreme unethical actions (by the standards of the audience of Death with Dignity)? Has Yudkowsky changed his mind on what options would actually save the world? Does Yudkowsky think that the chances of finding a pivotal act that would significantly delay unsafe AGI are so slim that he's safe to be around despite him being unsafe in the hypothetical that such a pivotal act is achievable? I'm confused.

Also, I'm not sure how much overlap there is between people who do Bayesian updates and people for who whatever Yudkowsky is thinking of is outside the Overton Window, but in general, if someone says that what they actually want is outside your Overton Window, I see only two directions to update in: either shift your Overton Window to include their intent, or shift your opinion of them to outside your Overton Window. If the first option isn't going to happen, as Yudkowsky says (for public discussion on lesswrong at least), that leaves the second.

  1. ^

    Compare modern estimates of the damage that would be caused by a solar flare equivalent to the Carrington Event. Factories, food supply, long-distance communication, digital currency - many critical services nowadays are dependent on compute, and that portion will only increase by the time you would actually pull the trigger.

AGI Ruin: A List of Lethalities

Your method of trying to determine whether something is true or not relies overly much on feedback from strangers. Your comment demands large amounts of intellectual labor from others ('disprove why all easier modes are incorrect'), despite the preamble of the post, while seeming unwilling to put much work in yourself.

AGI Ruin: A List of Lethalities

I think Yudkowsky would argue that on a scale from never learning anything to eliminating half your hypotheses per bit of novel sensory information, humans are pretty much at the bottom of the barrel.

When the AI needs to observe nature, it can rely on petabytes of publicly available datasets from particle physics to biochemistry to galactic surveys. It doesn't need any more experimental evidence to solve human physiology or build biological nanobots: we've already got quantum mechanics and human DNA sequences. The rest is just derivation of the consequences.

Sure, there are specific physical hypotheses that the AGI can't rule out because humanity hasn't gathered the evidence for them. But that, by definition, excludes anything that has ever observably affected humans. So yes, for anything that has existed since the inflationary period, the AGI will not be bottlenecked on physically gathering evidence.

I don't really get what you're pointing at with "how much AGI will be smarter than humans", so I can't really answer your last question. How much smarter than yourself would you say someone like Euler is than yourself? Is his ability to do scientific/mathematical breakthroughs proportional to your difference in smarts?

AGI Ruin: A List of Lethalities
  • Solve protein folding problem
  • Acquire human DNA sample
  • Use superintelligence to construct a functional model of human biochemistry
  • Design a virus that exploits human biochemstry
  • Use one of the currently available biochemistry-as-a-service providers to produce a sample that incubates the virus and then escapes their safety procedures (e.g. pay someone to mix two vials sent to them in the mail. The aerosols from the mixing infect them)
SERI ML Alignment Theory Scholars Program 2022

Hey, it's now officially no longer May 27th anywhere, and I can't find any announcements yet. How's it going?

Edit: Just got my acceptance letter! See you all this summer!

Load More