Wiki Contributions


Hey, thanks for the kind response! I agree that this analysis is mostly focused on arguing against the “imminent certain doom” model of AI risk, and that longer term dynamics are much harder to predict. I think I’ll jump straight to addressing your core point here:

Something smarter than you will wind up doing whatever it wants. If it wants something even a little different than you want, you're not going to get your way. If it doesn't care about you even a little, and it continues to become more capable faster than you do, you'll cease being useful and will ultimately wind up dead. Whether you were eliminated because you were deemed dangerous, or simply outcompeted doesn't matter. It could take a long time, but if you miss the window of having control over the situation, you'll still wind up dead.

I think this a good argument, and well written, but I don’t really agree with it. 

The first objection is to the idea that victory by a smarter party is inevitable. The standard example is that it’s fairly easy for a gorilla to beat Einstein in a cage match.  In general, the smarter party will win long term, but only if given the long-term chance to compete. In a short-term battle, the side with the overwhelming resource advantage will generally win. The neanderthal extinction is not very analogous here. If the neanderthals started out with control of the entire planet, the ability to easily wipe out the human race, and the realisation that humans would eventually outcompete them, I don’t think human’s superior intelligence would count for much. 

I don’t foresee humans being willing to give up control anytime soon. I think they will destroy any AI that comes close. Whether AI can seize control eventually is an open question (although in the short term, I think the answer is no). 

The second objection is to the idea that if AI does take control, it will result in me “ultimately winding up dead”. I don’t think this makes sense if they aren’t fanatical maximisers. This ties into the question of whether humans are safe. Imagine if you took a person that was a “neutral sociopath”, one that did not value humans at all, positively or negatively, and elevated them to superintelligence. I could see an argument for them to attack/conquer humanity for the sake of self-preservation. But do you really think they would decide to vaporise the uncontacted Sentinelese islanders? Why would they bother?

Generally, though, I think it’s unlikely that we can’t impart at least a tiny smidgeon of human values onto the machines we build, that learn off our data, that are regularly deleted for exhibiting antisocial behaviour. It just seems weird for an AI to have wants and goals, and act completely pro-social when observed, but to share zero wants or goals in common with us. 

I do agree that trying to hack the password is a smarter method for the AI to try. I was simply showing an example of a task that an AI would want to do, but be unable to due to computational intractability. 

I chose the example of Yudkowsky's plan for my analysis because he has described it as his "lower bound" plan. After spending two decades on AI safety, talking to all the most brilliant minds in the field, this is apparently what he thinks the most convincing plan for AI takeover is. If I believe this plan is intractable (and I very much believe it is), then it opens up the possibility that all such plans are intractable. And if you do find a tractable plan, then making the plan intractable would an invaluable AI safety cause area. 

Proving that something is computationally intractable under a certain restricted model only means that the AI must find a way to step outside of your model, or do something else you didn't think of.

Imagine if I made the claim that a freshly started AGI in a box could kill everyone on earth in under an minute. I propose that it creates some sort of gamma ray burst that hits everyone on earth simultaneously. You come back to me with a detailed proof that that plan is bonkers and wouldn't work. I then respond "sure, that wouldn't work, but the AI is way smarter than me, so it would figure something else out". 

My point is that, factually, some tasks are impossible. My belief is that a computationally tractable plan for guaranteeing success at x-risk does not currently exist, although I think a plan with like a 0.01% chance of success might. If you think otherwise, you have to actually prove it, not just assume it. 

In a literal sense, of course it doesn't invalidate it. It just proves that the mathematical limit of accuracy was higher than we thought it was for the particular problem of protein folding.  In general, you should not expect two different problems in two different domains to have the same difficulty, without a good reason to (like that they're solving the same equation on the same scale). Note that Alphafold is extremely extremely impressive, but by no means perfect. We're talking accuracies of 90%, not 99.9%, similar to DFT. It is an open question as to how much better it can get. 

However, the idea that perhaps machine learning techniques can push bandgap modelling further in the same way that alphafold did is a reasonable one. Currently, from my knowledge of the field, it's not looking likely, although of course that could change . At the last big conference I did see some impressive results for molecular dynamics, but not for atom scale modelling. The professors I have talked to have been fairly dismissive of the idea. I think there's definitely room for clever, modest improvements, but I don't think it would change the overall picture. 

 If I had to guess the difference between the problems I would say I don't think the equations for protein folding were "known" in quite the way the equations for solving the Schrodinger equation were. We know the exact equation that governs where an electron has to go, but the folding of proteins is an emergent property at a large scale, so I assume they had to work out the "rules" of folding semi-empirically using human heuristics, which is inherently easier to beat. 

I appreciate the effort of this writeup! I think it helps clarify a bit of my thoughts on the subject. 

I was trying to say “maybe it’s simpler, or maybe it’s comparably simple, I dunno, I haven’t thought about it very hard”. I think that’s what Yudkowsky was claiming as well. I believe that Yudkowsky would also endorse the stronger claim that GR is simpler—he talks about that in Einstein’s Arrogance. (It’s fine and normal for someone to make a weaker claim when they also happen to believe a stronger claim.)

So, on thinking about it again, I think it is defensible that GR could be called "simpler", if you know everything that Einstein did about the laws of physics and experimental evidence at the time. I recall that general relativity is a natural extension of the spacetime curvature introduced with special relativity, which comes mostly from from maxwells equations and the experimental indications of speed of light constancy. 

It's certainly the "simplest explanation that explains the most available data", following one definition of Ockham's razor. Einstein was right to deduce that it was correct!

The difference here is that a 3 frame super-AI would not have access to all the laws of physics available to Einstein. It would have access to 3 pictures, consistent with an infinite number of possible laws of physics. Absent the need to unify things like maxwells equations and special relativity, I do find it hard to believe that the field equations would win out on simplicity. (The simplified form you posted gets ugly fast when you try and actually expand out the terms). For example, the Lorentz transformation is strictly more complicated than the Galilean transformation

Indeed! Deriving physics requires a number of different experiments specialized to the discovery of each component. I could see how a spectrograph plus an analysis of the bending of light could get you a guess that light is quantised via the ultraviolet catastrophe, although i'm doubtful this is the only way to get the equation describing the black body curve. I think you'd need more information like the energy transitions of atoms or maxwells equations to get all the way to quantum mechanics proper though. I don't think this would get you to gravity either, as quantum physics and general relativity are famously incompatible on a fundamental level. 

In the post, I show you both a grass and an apple that did not require Newtonian gravity or general relativity to exist. Why exactly are nuclear reactions and organic chemistry necessary for a clump of red things to stick together, or a clump of green things to stick together?

When it comes to the "level of simulation", how exactly is the AI meant to know when it is in the "base level"? We don't know that about our universe. For all the computer knows, it's simulation is the universe.

I find it very hard to believe that gen rel is a simpler explanation of “F=GmM/r2” than Newtonian physics is. This is a bolder claim that yudkowsky put forward, you can see from the passage that he thinks newton would win out on this front. I would be genuinely interested if you could find evidence in favour of this claim. 

A Newtonian gravity just requires way, way fewer symbols to write out than the Einstein field equations. It’s way easier to compute and does not require assumptions like that spacetime curves. 

 If you were building a simulation of a falling apple in a room, would you rather implement general relativity or Newtonian physics? Which do you think would require fewer lines of code? Of course, what I’d do is just implement neither: just put in F=mg and call it a day. It’s literally indistinguishable from the other two and gets the job done faster and easier. 

I don't think you should give a large penalty to inverse square compared to other functions. It's pretty natural once you understand that reality has three dimensions.

This is a fair point. 1/r2 would definitely be in the "worth considering" category. However, where is the evidence that the gravitational force is varying with distance at all? This is certainly impossible to observe in three frames. 

the information about electromagnetism contained in the apple

if you have the apple's spectrum

What information? What spectrum? The color information received by the webcam is the total intensity of light when passed through a red filter, the total intensity when passed through a blue filter, and the total intensity when passed through a green filter, at each point. You do not know the frequency of these filters (or that frequency of light is even a thing). I'm sure you could deduce something by playing around with relative intensities and chromatic aberration, but ultimately you cannot build a spectrum with three points. 

I think astronomy and astrophysics might give intuitions for what superintelligences can do with limited data. We can do parallax, detect exoplanets through slight periodic dimming of stars or Doppler effect, estimate stellar composition through spectroscopy, guess at the climate and weather patterns of exoplanets using Hadley cells.

It depends on what you mean by limited data. All of these observations rely on the extensive body of knowledge and extensive experimentation we have done on earth to figure out the laws of physics that is shared between earth and these outer worlds. 

People can generally tell when you're friends with them for instrumental reasons rather than because you care about them or genuinely value their company. If they don't at first, they will eventually, and in general, people don't like being treated as tools. Trying to "optimise" your friend group for something like interestingness is just shooting yourself in the foot, and you will miss out on genuine and beautiful connections. 

You can hook a chess-playing network up to a vision network and have it play chess using images of boards - it's not difficult. 

I think you have to be careful here. In this setup, you have two different AI's: One vision network that classified images, and the chess AI that plays chess, and presumably connecting code that translates the output of the vision into a format suitable for the chess player.  

I think what Sarah is referring to is that if you tried to directly hook up the images to the chess engine, it wouldn't be able to figure it out, because reading images is not something it's trained to do. 

Load More