Wiki Contributions

Comments

I can think of two interpretations of consciousness being "causally connected" to physical systems:

1. consciousness is the result of physical phenomena like brain states, but it does not cause any. So it has an in-edge coming from the physical world, but not an out-edge to the physical world. Again, this implies that consciousness cannot be what causes me to think about consciousness.

2. consciousness causes things in the physical world. Which, again, I believe, necessitates a consciousness variable in the laws of the universe.

Note that I am not trying to get at what Eliezer was arguing, I am asking about the consequences of his arguments, even ones that he may not have intended.

So does this suppose that there is some “consciousness” variable in the laws of the universe? If consciousness causes me to think X, and thinking X can be traced back to a set of physical laws that govern the neurons in my brain, then there must be some consciousness variable somewhere in these physical laws, no? Otherwise it has to be that consciousness corresponds to some physical phenomenon, and it is that phenomenon - not the consciousness - that caused you to think about it. If there were no consciousness attached to that physical phenomenon, you would go along just the same way, thinking the exact same thing.

I was reading this old Eliezer piece arguing against the conceivability of p-zombies. https://www.lesswrong.com/posts/7DmA3yWwa6AT5jFXt/zombies-redacted

And to me this feels like a more general argument against the existence of consciousness itself, in any form similar to how we normally think about it, not just against p-zombies.

Eliezer says that consciousness itself cannot be what causes me to think about consciousness, that causes philosophers to write papers about consciousness, thus it must be the physical system that such consciousness corresponds to that causes these things. But then…that seems to discredit the ability to use your own consciousness as evidence for anything. If my own conscious experience cannot be evidence for me thinking about consciousness, why should we think consciousness exists?

Am I confused about something here?

That's not a simple problem.First you have to specify "not killing everyone" robustly (outer alignment) and then you have to train the AI to have this goal and not an approximation of it (inner alignment).

See my other comment for the response.

Anyway, the rest of your response is spent talking about the case where AI cares about its perception of the paperclips rather than the paperclips themselves. I'm not sure how severity level 1 would come about, given that the AI should only care about its reward score. Once you admit that the AI cares about worldly things like "am I turned on", it seems pretty natural that the AI would care about the paperclips themselves rather than its perception of the paperclips. Nevertheless, even in severity level 1, there is still no incentive for the AI to care about future AIs, which contradicts concerns that non-superintelligent AIs would fake alignment during training so that future superintelligent AIs would be unaligned.

We don't know how to represent "do not kill everyone"

I think this goes to Matthew Barnett’s recent article of actually yes we do. And regardless I don’t think this point is a big part of Eliezer’s argument. https://www.lesswrong.com/posts/i5kijcjFJD6bn7dwq/evaluating-the-historical-value-misspecification-argument

We don't know how to pick which quantity would be maximized by a would-be strong consequentialist maximizer

Yeah so I think this is the crux of it. My point is that if we find some training approach that leads to a model that cares about the world itself rather than hacking some reward function, that’s a sign that we can in fact guide the model in important ways and there’s a good chance this includes being able to tell it not to kill everyone

We don't know know what a strong consequentialist maximizer would look like, if we had one around, because we don't have one around (because if we did, we'd be dead)

This is just a way of saying “we don’t know what AGI would do”. I don’t think this point pushes us toward x-risk any more than it pushes us toward not-x-risk.

I also do not think the responses to this question are satisfying enough to be refuting. I don’t even think they are satisfying enough to make me confident I haven’t just found a hole in AI risk arguments.This is not a simple case of “you just misunderstand something simple”.

I don’t care that much but if LessWrong is going to downvote sincere questions because it finds them dumb or whatever this will make for a site very unwelcoming to newcomers

I do indeed agree this is a major problem even if I'm not sure if I agree with the main claim. The rise of fascism in the last decade and expectation that it will continue is extremely evident; its consequences for democracy are a lot less clear.

The major wrinkle in all of this is in assessing anti-democratic behavior. Democracy indices not a great way of assessing democracy for much the same reason that the Doomsday Clock is a bad way of assessing nuclear risk: they're subjective metrics by (probably increasingly) left-leaning academics and tend to measure a lot of things that I wouldn't classify as democracy (eg rights of women/LGBT people/minorities). This paper found that using re-election rates there has been no evidence of global democratic backsliding. This started quite the controversy in political science; my read on the subsequent discussion is that there is evidence of backsliding, but such backsliding has been fairly modest.

I expect things to get worse as more countries get far-right leaders and those which already have far-right leaders have their democratic institutions increasingly captured by far-right leaders. And yet...a lot of places with far-right leaders continue to have close elections. See Poland, Turkey, Israel if you count that. In Brazil they even lost election. One plausible theory here is that the more anti-democratic behavior a party engages in the more resistance they face - either because voters are turned off or because their opponents increasingly become center or center-right parties seeking to create broad pro-democracy coalitions - and that this roughly balances out. What does this mean for how one evaluates democracy?

 

Finally, some comments specifically on more Western countries. I think the future of these countries is really uncertain.

For the next decade, it's really dependent on a lot of short-term events. Will Italy's PM Meloni engage in anti-democratic behavior? Will Le Pen win election in France, and if so will she engage in anti-democratic behavior? Will Trump win in 2024? How quickly/far will the upward trend in polling for Germany and Spain's far-right continue?

I know the piece specifies the next decade, but more long-term, the rise of fascism has come quite suddenly in the span of these last 8 years. If it continues for a few decades (and AI doesn't kill us all) then we are probably destined for fascist governments almost everywhere and the deterioration of democratic institutions. But how long this global trend will last is really the big question in global politics. Maybe debates over AI issues will become the big issue to supplant fascism? IDK. I'd love to see some analysis of historical trends in public approval to see what a prior for this question would look like; I've never gotten around to doing it myself and am really not very well informed about history here. 

I'm going to quote this from an EA Forum post I just made for why simply repeated exposure to AI Safety (through eg media coverage) will probably do a lot to persuade people:

[T]he more people hear about AI Safety, the more seriously people will take the issue. This seems to be true even if the coverage is purporting to debunk the issue (which as I will discuss later I think will be fairly rare) - a phenomenon called the illusory truth effect. I also think this effect will be especially strong for AI Safety. Right now, in EA-adjacent circles, the argument over AI Safety is mostly a war of vibes. There is very little object-level discussion - it's all just "these people are relying way too much on their obsession with tech/rationality" or "oh my god these really smart people think the world could end within my lifetime". The way we (AI Safety) win this war of vibes, which will hopefully bleed out beyond the EA-adjacent sphere, is just by giving people more exposure to our side.  

Load More