LESSWRONG
LW

223
Garrett Baker
5735Ω105111320
Message
Dialogue
Subscribe

I have signed no contracts or agreements whose existence I cannot mention.

They thought they found in numbers, more than in fire, earth, or water, many resemblances to things which are and become; thus such and such an attribute of numbers is justice, another is soul and mind, another is opportunity, and so on; and again they saw in numbers the attributes and ratios of the musical scales. Since, then, all other things seemed in their whole nature to be assimilated to numbers, while numbers seemed to be the first things in the whole of nature, they supposed the elements of numbers to be the elements of all things, and the whole heaven to be a musical scale and a number.

Metaph. A. 5, 985 b 27–986 a 2.

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
1D0TheMath's Shortform
5y
233
Isolating Vector Additions
No wikitag contributions to display.
JDP Reviews IABIED
Garrett Baker8h20

Its the difference between outer an inner alignment. The former makes the argument that it is possible, for some intelligent optimizer to be misaligned with humans, and likely for "alien gods" such as evolution or your proposed AGI. Its an argument about outer alignment not being trivial. It analogizes evolution to the AGI itself. Here is a typical example:

Why is Nature cruel? You, a human, can look at an Ichneumon wasp, and decide that it's cruel to eat your prey alive. You can decide that if you're going to eat your prey alive, you can at least have the decency to stop it from hurting. It would scarcely cost the wasp anything to anesthetize its prey as well as paralyze it. Or what about old elephants, who die of starvation when their last set of teeth fall out? These elephants aren't going to reproduce anyway. What would it cost evolution—the evolution of elephants, rather—to ensure that the elephant dies right away, instead of slowly and in agony? What would it cost evolution to anesthetize the elephant, or give it pleasant dreams before it dies? Nothing; that elephant won't reproduce more or less either way.

If you were talking to a fellow human, trying to resolve a conflict of interest, you would be in a good negotiating position—would have an easy job of persuasion. It would cost so little to anesthetize the prey, to let the elephant die without agony! Oh please, won't you do it, kindly... um...

There's no one to argue with.

Human beings fake their justifications, figure out what they want using one method, and then justify it using another method. There's no Evolution of Elephants Fairy that's trying to (a) figure out what's best for elephants, and then (b) figure out how to justify it to the Evolutionary Overseer, who (c) doesn't want to see reproductive fitness decreased, but is (d) willing to go along with the painless-death idea, so long as it doesn't actually harm any genes.

There's no advocate for the elephants anywhere in the system.

The latter analogizes evolution to the training process of you AGI. It doesn't focus on the perfectly reasonable (for evolution) & optimal decisions your optimization criteria will make, it focuses on the staggering weirdness that happens to the organisms evolution creates outside their ancestral environment. Like humans' taste for ice cream over "salted and honeyed raw bear fat". This is not evolution coldly finding the most optimal genes for self-propagation, this is evolution going with the first "idea" it has which is marginally more fit in the ancestral environment, then ultimately, for no inclusive genetic fitness justified reason, creating AGIs which don't care a lick about inclusive genetic fitness. 

That is, an iterative process which selects based on some criteria, and arrives at an AGI, need not also produce an AGI which itself optimizes that criteria outside the training/ancestral environment.

Reply1
JDP Reviews IABIED
Garrett Baker16h60

don't reflect the authors' tendency to update, seeing as they've been using evolutionary metaphors since the beginning

This seems locally invalid. Eliezer at least has definitely used evolution in different ways and to make different points throughout the years. Originally using the “alien god” analogy to show optimization processes do not lead to niceness in general (in particular, no chaos or unpredictability required), now they use evolution for an “inner alignment is hard” analogy, mainly arguing it implies a big problem is that objective functions do not constrain generalization behavior enough to be useful for AGI alignment. Therefore the goals of your system will be very chaotic.

I think this definitely constitutes an update, “inner alignment” concerns were not a thing in 2008.

Reply
Jacob_Hilton's Shortform
Garrett Baker19h20

Noticeable speed-up of applied sciences: it's not clear that such a dramatic speed-up in the formal sciences would have that dramatic consequences for the rest of the world, given how abstract much of it is. Cryptography, formal verification and programming languages might be the most consequential areas, followed by areas like experimental physics and computational chemistry. However, in most of the experimental sciences, formal results are not the main bottleneck, so speed-ups would be more dependent on progress on coding, fuzzier tasks, robotics, and so on. Math-heavy theoretical AI alignment research would be significantly sped up, but may still face philosophical hurdles.

Its not clear that the primary bottleneck to formal sciences are proofs either. I get the impression from some mathematicians that the big bottlenecks are in new ideas, so in the "what do you want to prove?" question, rather than the "can you prove x?" question. That seems much less formally verifiable.

Reply
Shortform
Garrett Baker1d20

the primary purpose is awareness among policymakers and the public

If the success or failure of current techniques provide no evidence about future AI, then isn’t this dishonest? Maybe we are ok with dishonesty here, but if you are right, then this is bound to backfire.

the evals are useful information to the safety community

What use do the evals have for the safety community, from a research perspective? If they are mostly junk, then publishing them would seem more misleading than anything, given the number who think they ought to be trusted.

Reply
Vivek Hebbar's Shortform
Garrett Baker2d64

This seems less likely the harder the problem is, and therefore the more the AI needs to use its general intelligence or agency to pursue it, which are often the sorts of tasks we’re most scared about the AI doing surprisingly well on.

I agree this argument suggests we will have a good understanding of more simple capabilities the model has, like what facts about biology it knows about, which may end up being useful anyway.

Reply
More Was Possible: A Review of IABIED
Garrett Baker2d190

Note that this is discussed in their supplemental materials, in particular, in line with your last paragraph,

Thresholds don’t matter all that much, in the end, to the argument that if anyone builds artificial superintelligence then everyone dies. Our arguments don’t require that some AI figures out how to recursively self-improve and then becomes superintelligent with unprecedented speed. That could happen, and we think it’s decently likely that it will happen, but it doesn’t matter to the claim that AI is on track to kill us all.

All that our arguments require is that AIs will keep on getting better and better at predicting and steering the world, until they surpass us. It doesn’t matter much whether that happens quickly or slowly.

The relevance of threshold effects is that they increase the importance of humanity reacting to the threat soon. We don’t have the luxury of waiting until the AI is a little better than every human at every mental task, because by that point, there might not be very much time left at all. That would be like looking at early hominids making fire, yawning, and saying, “Wake me up when they’re halfway to the moon.”

It took hominids millions of years to travel halfway to the moon, and two days to complete the rest of the journey. When there might be thresholds involved, you have to pay attention before things get visibly out of hand, because by that point, it may well be too late.

Reply
Shortform
Garrett Baker2d20

Doesn’t having multiple layers of protection seem better to you? Having it be so the AI would more likely naturally conclude we won’t read its scratchpad and modifying its beliefs in this way seems better than not.

You have also recently argued modern safety research is ”shooting with rubber bullets”, so what are we getting in return by breaking such promises now? If its just practice, there’s no reason to put the results online.

Reply
Shortform
Garrett Baker2d50

Who (besides yourself) has this position? I feel like believing the safety research we do now is bullshit is highly correlated with thinking its also useless and we should do something else.

Reply
MakoYass's Shortform
Garrett Baker2d20

the property electrons have that you observe within yourself and want to call "conscious"-as-in-hard-problem-why-is-there-any-perspective is, imo, simply "exists". existence is perspective-bearing. in other words, in my view, the hard problem is just the localitypilled version of "why is there something rather than nothing?"

This actually leads into why I feel drawn to Tegmark’s mathematical universe. It seems that regardless of whether or not my electrons are tagged with the “exists” xml tag, I would have no way of knowing that fact, and would think the same thoughts regardless, so I’m skeptical this word doesn’t get dissolved as we know more philosophy, so that we end up saying stuff like “yeah actually everything exists” or “well no, nothing exists”, and then derive our UDASSA without reference to “existence” as a primitive.

Reply
MakoYass's Shortform
Garrett Baker3d40

How do electrons having the property “conscious”, but otherwise continuing to obey Maxwell’s equations translate into me saying “I am conscious”?

Or more generally, how does any lump of matter, having the property “conscious” but otherwise continuing to obey unchanged physical laws, end up uttering the words “I am conscious”?

Reply
Load More
67What and Why: Developmental Interpretability of Reinforcement Learning
1y
4
53On Complexity Science
1y
21
52So You Created a Sociopath - New Book Announcement!
1y
3
75Announcing Suffering For Good
1y
5
40Neuroscience and Alignment
2y
25
16Epoch wise critical periods, and singular learning theory
2y
1
24A bet on critical periods in neural networks
2y
1
27When and why should you use the Kelly criterion?
2y
25
26Singular learning theory and bridging from ML to brain emulations
2y
16
61My hopes for alignment: Singular learning theory and whole brain emulation
2y
5
Load More